Re: dcollections 1.0 and 2.0a beta released
On 2010-05-28 14:12, Steven Schveighoffer wrote: On Fri, 28 May 2010 06:10:49 -0400, Jacob Carlborg wrote: On 2010-05-27 12:32, Steven Schveighoffer wrote: On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros wrote: On 24/05/2010 16:45, Andrei Alexandrescu wrote: In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. I see no problem retrofitting a no-interface container into a formal interface if so needed. I don't understand this discussion: isn't the reason above pretty much a dead-on hard requirement for the collections to have interfaces? Something like, for example, an interface version of the range traits? Only if you wish to have binary compatibility with dynamic libs. Such a thing isn't likely today since dynamic libs aren't very well supported in D, and even phobos or dcollections isn't a dynamic lib. I've got my patch, for build Tango as a dynamic library on Mac, quite recently included in trunk. And I've also have a patch for druntime and Phobos in bugzilla just waiting to be included + one patch making it easier creating dynamic libraries directly with DMD. I would say it's a bad idea to still think that dynamic libraries aren't support, we have to think forward and assume they will be supported. I remember that, and I'm very encouraged by it. That being said, the ultimate goal is to have dmd be able to build dynamic libraries easily. D has had dynamic library "support" for years, but you have to do all kinds of manual stuff, and the standard library isn't dynamic. Until the standard library is dynamic, and I can build a dynamic library with a -shared equivalent switch, dynamic libs are a laboratory feature, and not many projects will use it. Yes, exactly, I noticed there isn't an easy way to build dynamic libraries, among other things you have to know the path to the standard library when manually building. Just so you know, I think it's important to support binary compatibility in dcollections, and since std.container has not adopted dcollections, I'm going to keep interfaces. I was just pointing out the position others may have WRT binary support. -Steve -- /Jacob Carlborg
Re: dcollections 1.0 and 2.0a beta released
On Fri, 28 May 2010 06:10:49 -0400, Jacob Carlborg wrote: On 2010-05-27 12:32, Steven Schveighoffer wrote: On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros wrote: On 24/05/2010 16:45, Andrei Alexandrescu wrote: In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. I see no problem retrofitting a no-interface container into a formal interface if so needed. I don't understand this discussion: isn't the reason above pretty much a dead-on hard requirement for the collections to have interfaces? Something like, for example, an interface version of the range traits? Only if you wish to have binary compatibility with dynamic libs. Such a thing isn't likely today since dynamic libs aren't very well supported in D, and even phobos or dcollections isn't a dynamic lib. I've got my patch, for build Tango as a dynamic library on Mac, quite recently included in trunk. And I've also have a patch for druntime and Phobos in bugzilla just waiting to be included + one patch making it easier creating dynamic libraries directly with DMD. I would say it's a bad idea to still think that dynamic libraries aren't support, we have to think forward and assume they will be supported. I remember that, and I'm very encouraged by it. That being said, the ultimate goal is to have dmd be able to build dynamic libraries easily. D has had dynamic library "support" for years, but you have to do all kinds of manual stuff, and the standard library isn't dynamic. Until the standard library is dynamic, and I can build a dynamic library with a -shared equivalent switch, dynamic libs are a laboratory feature, and not many projects will use it. Just so you know, I think it's important to support binary compatibility in dcollections, and since std.container has not adopted dcollections, I'm going to keep interfaces. I was just pointing out the position others may have WRT binary support. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Fri, 28 May 2010 06:24:26 -0400, Bruno Medeiros wrote: On 27/05/2010 11:32, Steven Schveighoffer wrote: On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros wrote: On 24/05/2010 16:45, Andrei Alexandrescu wrote: In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. I see no problem retrofitting a no-interface container into a formal interface if so needed. I don't understand this discussion: isn't the reason above pretty much a dead-on hard requirement for the collections to have interfaces? Something like, for example, an interface version of the range traits? Only if you wish to have binary compatibility with dynamic libs. Such a thing isn't likely today since dynamic libs aren't very well supported in D, and even phobos or dcollections isn't a dynamic lib. Ah, nevermind, my mind slipped and I was thinking of any kind of library, that is, static ones as well. Although even just dynamic library compatibility seems to be a valid enough case that we should consider from the start, even if its not well supported currently. I agree, that's why dcollections has interfaces. I'm keeping them there since std.container has gone its own direction. -Steve
Re: dcollections 1.0 and 2.0a beta released
On 27/05/2010 11:32, Steven Schveighoffer wrote: On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros wrote: On 24/05/2010 16:45, Andrei Alexandrescu wrote: In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. I see no problem retrofitting a no-interface container into a formal interface if so needed. I don't understand this discussion: isn't the reason above pretty much a dead-on hard requirement for the collections to have interfaces? Something like, for example, an interface version of the range traits? Only if you wish to have binary compatibility with dynamic libs. Such a thing isn't likely today since dynamic libs aren't very well supported in D, and even phobos or dcollections isn't a dynamic lib. Ah, nevermind, my mind slipped and I was thinking of any kind of library, that is, static ones as well. Although even just dynamic library compatibility seems to be a valid enough case that we should consider from the start, even if its not well supported currently. -- Bruno Medeiros - Software Engineer
Re: dcollections 1.0 and 2.0a beta released
On 2010-05-27 12:32, Steven Schveighoffer wrote: On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros wrote: On 24/05/2010 16:45, Andrei Alexandrescu wrote: In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. I see no problem retrofitting a no-interface container into a formal interface if so needed. I don't understand this discussion: isn't the reason above pretty much a dead-on hard requirement for the collections to have interfaces? Something like, for example, an interface version of the range traits? Only if you wish to have binary compatibility with dynamic libs. Such a thing isn't likely today since dynamic libs aren't very well supported in D, and even phobos or dcollections isn't a dynamic lib. I've got my patch, for build Tango as a dynamic library on Mac, quite recently included in trunk. And I've also have a patch for druntime and Phobos in bugzilla just waiting to be included + one patch making it easier creating dynamic libraries directly with DMD. I would say it's a bad idea to still think that dynamic libraries aren't support, we have to think forward and assume they will be supported. And I have specifically decided not to use interfaces with ranges because that makes them reference types. Ranges work well as value types, but not well as reference types. Therefore, to use dcollections as interfaces, you must not require the range traits. -Steve -- /Jacob Carlborg
Re: dcollections 1.0 and 2.0a beta released
On Wed, 19 May 2010 17:18:04 -0400, Steven Schveighoffer wrote: Andrej Mitrovic Wrote: Well as long as you're here can I submit an error here? :) I get an error while building the D2 branch: Error: ArrayMultiset.obj : No such file or directory Crud, I admit that I assumed anything that built on Linux would build on Windows. I still believe it will, but of course, I need to change the batch file :) It seems the ArrayMultiset.d is not in the dcollections folder for the D2.x branch, although it is in trunk (I guess the trunk is the D1.x one?). Yes, D1 is trunk, D2 is going to eventually be trunk Is that module deprecated for d2.x? (although it's listed in the win32 batch file) Yes, just remove it. I'll fix it when I get a chance. I've fixed the windows build files and recreated the zip/tarball files for 2.0a. Learned some batch-fu online and hopefully the new versions will be future-proof. Please re-download the zipfile to build with windows. Thanks -Steve
Re: dcollections 1.0 and 2.0a beta released
On Wed, 26 May 2010 10:06:32 -0400, Bruno Medeiros wrote: On 24/05/2010 16:45, Andrei Alexandrescu wrote: In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. I see no problem retrofitting a no-interface container into a formal interface if so needed. I don't understand this discussion: isn't the reason above pretty much a dead-on hard requirement for the collections to have interfaces? Something like, for example, an interface version of the range traits? Only if you wish to have binary compatibility with dynamic libs. Such a thing isn't likely today since dynamic libs aren't very well supported in D, and even phobos or dcollections isn't a dynamic lib. And I have specifically decided not to use interfaces with ranges because that makes them reference types. Ranges work well as value types, but not well as reference types. Therefore, to use dcollections as interfaces, you must not require the range traits. -Steve
Re: dcollections 1.0 and 2.0a beta released
On 24/05/2010 16:45, Andrei Alexandrescu wrote: In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. I see no problem retrofitting a no-interface container into a formal interface if so needed. I don't understand this discussion: isn't the reason above pretty much a dead-on hard requirement for the collections to have interfaces? Something like, for example, an interface version of the range traits? -- Bruno Medeiros - Software Engineer
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 11:01:06 -0400, Steven Schveighoffer wrote: > It's done all the time in Java and .NET. For example, A GUI listbox > widget exposes its elements as an array of elements, which implement the > List interface. You don't ever see the implementation or need it. > Granted Java and .NET have less problems than C++ and D with binary > compatibility, since the function tables are dynamic, but the potential is > there for D to make binary compatibility possible with interfaces. > Cocoa (NeXT's/Apple's framework for Objective-C) uses a very successful and well-thought-out delegation pattern, whereby GUI elements representing large amounts of data, like table views, have delegates (not in the D sense of the word) that provide them with the actual contents. Granted, Objective-C's runtime is much more dynamic than D, but a simplified version of such a pattern could still work in D. After all, user interfacing is typically where dynamism is more important than speed.
Re: dcollections 1.0 and 2.0a beta released
Steven Schveighoffer wrote: On the flip side, if containers did not implement interfaces, having to do this: class WrappedSet!(alias Impl, V) : Set!V { private Impl!V impl; int functionToSatisfySet() { return impl.functionToSatisfySet(); } ... } seems to me like a lot more crufty and bloated than simply adding : Set!V on the end of the class declarations. This would not be necessary. We can get the function names off of the interface, so we could have a template do this for us. -- Simen
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 17:47:18 -0400, Andrei Alexandrescu wrote: On 05/24/2010 04:38 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 17:35:11 -0400, Andrei Alexandrescu wrote: On 05/24/2010 04:08 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu wrote: Sorry. Yes, by-key iteration should be possible. OK, so we should be able to iterate keys. And the keys are not stored in the trie structure itself. So how does one iterate the keys of the container without reconstructing them from the trie nodes using the heap? You can't. At some point you need to copy tree traces into T[] arrays. If the trie stores parent nodes, you don't need to do that as you can expose a trace as a double-ended range containing the key. There's a kind of trie that was defined to avoid such issues; it stores the keys in clear, in the leaves, at the expense of duplication. I don't remember the name offhand. Does anyone? OK, so the keys function of Map should work just fine for a Trie implementation. Good to know. This wins a little battle but not the war. Long term you're facing the sysyphic job of either knocking new containers into the existing interfaces, or extending the interfaces to accommodate new containers. It doesn't look to me like a winning proposition. It's always easy to say that there may come a day when the interfaces don't work -- none of us can see the future. When that day comes, we can find a solution to it. The worst case scenario is that you simply don't implement any interfaces. It does not detract from the interfaces that exist. It would seem odd that some of the collections do not implement the interfaces while others do. At the very least though, all containers should be iterable, meaning they will implement the Iterator!V interface. That at least allows it to interact with the other container types. On the flip side, if containers did not implement interfaces, having to do this: class WrappedSet!(alias Impl, V) : Set!V { private Impl!V impl; int functionToSatisfySet() { return impl.functionToSatisfySet(); } ... } seems to me like a lot more crufty and bloated than simply adding : Set!V on the end of the class declarations. But I'm willing to drop the interfaces for now since interfaces obviously are an unpopular choice. -Steve
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 04:38 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 17:35:11 -0400, Andrei Alexandrescu wrote: On 05/24/2010 04:08 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu wrote: Sorry. Yes, by-key iteration should be possible. OK, so we should be able to iterate keys. And the keys are not stored in the trie structure itself. So how does one iterate the keys of the container without reconstructing them from the trie nodes using the heap? You can't. At some point you need to copy tree traces into T[] arrays. If the trie stores parent nodes, you don't need to do that as you can expose a trace as a double-ended range containing the key. There's a kind of trie that was defined to avoid such issues; it stores the keys in clear, in the leaves, at the expense of duplication. I don't remember the name offhand. Does anyone? OK, so the keys function of Map should work just fine for a Trie implementation. Good to know. This wins a little battle but not the war. Long term you're facing the sysyphic job of either knocking new containers into the existing interfaces, or extending the interfaces to accommodate new containers. It doesn't look to me like a winning proposition. The FIC (Federation of Independent Containers) does not have that problem. It does have its specifications and guidelines but whichever container doesn't support some or even all of the required methods can simply define its own under other names. Then users can play with the containers and with the ranges they define as they find fit. Andrei
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 17:35:11 -0400, Andrei Alexandrescu wrote: On 05/24/2010 04:08 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu wrote: Sorry. Yes, by-key iteration should be possible. OK, so we should be able to iterate keys. And the keys are not stored in the trie structure itself. So how does one iterate the keys of the container without reconstructing them from the trie nodes using the heap? You can't. At some point you need to copy tree traces into T[] arrays. If the trie stores parent nodes, you don't need to do that as you can expose a trace as a double-ended range containing the key. There's a kind of trie that was defined to avoid such issues; it stores the keys in clear, in the leaves, at the expense of duplication. I don't remember the name offhand. Does anyone? OK, so the keys function of Map should work just fine for a Trie implementation. Good to know. -Steve
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 04:08 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu wrote: Sorry. Yes, by-key iteration should be possible. OK, so we should be able to iterate keys. And the keys are not stored in the trie structure itself. So how does one iterate the keys of the container without reconstructing them from the trie nodes using the heap? You can't. At some point you need to copy tree traces into T[] arrays. If the trie stores parent nodes, you don't need to do that as you can expose a trace as a double-ended range containing the key. There's a kind of trie that was defined to avoid such issues; it stores the keys in clear, in the leaves, at the expense of duplication. I don't remember the name offhand. Does anyone? Andrei
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 16:27:46 -0400, Andrei Alexandrescu wrote: Sorry. Yes, by-key iteration should be possible. OK, so we should be able to iterate keys. And the keys are not stored in the trie structure itself. So how does one iterate the keys of the container without reconstructing them from the trie nodes using the heap? Forget about interfaces, pretend you were writing it separate from dcollections. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 16:07:22 -0400, Andrei Alexandrescu wrote: On 05/24/2010 01:49 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu wrote: A pointer to the end element would be required for both appending and back(). This further erodes my confidence. Size needs to be maintained _and_ a pointer to the last element must be maintained. For many uses, all of this stuff is unnecessary overhead. You make it sound like incrementing a counter and changing a pointer are incredibly slow operations :) They are. Ask Walter for a good example. Actually, I realize that pointing to the end node isn't good enough, because once you remove that node, there is no way to get to the previous node without traversing the list again. So slist cannot implement the List interface, you were right. The overhead of storage is minimal compared to the elements of the list, which do not contain said overhead. I'm talking containers of lists here. I was more talking about the ratio of elements to lists. For example, the Hash implementation uses the same link list nodes as elements in its table, and generally a hash has very few elements per list, so overhead on the list head would be too much. On top of that, a LinkList has its own allocator, meaning each one consumes a large chunk of data assuming you will be adding lots of elements to it. So maybe we need to refine the implementation building blocks that back the dcollections classes so they are useable in special situations where performance is critical. It actually would be pretty trivial to define SLink as a specialization of dcollections' Link struct, and that would essentially give you a lot of functionality in terms of your ideal list type. Which may be acceptable to you if you want the bleeding fastest speed available. There will always be tailored code crafted to be as fast as possible for some specific design and dcollections and your slist just won't fit the bill. Except that the set is considerably smaller for a well-designed slist. The set of problems that an slist solves is also considerably smaller. Having that backwards capability enables more algorithms. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 16:35:03 -0400, bearophile wrote: Andrei Alexandrescu: When was the last time you measured? I thought the speed has largely improved since Steve integrated his work. I have timed it after that integration. I have seen a performance improvement, but it's small. I can perform some syntactic benchmarks. Performance was vastly improved for situations where one was appending to multiple arrays at once. For appending to a single array in a loop, it is improved, but not that much. But the main improvement was for safety. The append operation on D's dynamic arrays is a tradeoff between footprint, speed, and safety. It also does not and should not compromise performance on operations besides appending. You will always be able to outperform the D append operation with more focus on the append operation, but D's array appending is fine for most situations. -Steve
Re: dcollections 1.0 and 2.0a beta released
Andrei Alexandrescu: > When was the last time you measured? I thought the speed has largely > improved since Steve integrated his work. I have timed it after that integration. I have seen a performance improvement, but it's small. I can perform some syntactic benchmarks. Later, bearophile
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 03:18 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 15:58:46 -0400, Andrei Alexandrescu wrote: On 05/24/2010 01:12 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu wrote: On 05/24/2010 12:11 PM, bearophile wrote: Steven Schveighoffer: Is it unreasonable to expect to be able to iterate the keys in a trie? (I don't really know, I've never worked with them) On tries you can iterate on all keys or on a sorted range of the keys. There's one more way of iteration that's unique to tries - by key element (e.g. if key is string, you iterate by character). That would be an exploratory iteration. At each step in the iteration you pass a character, so popFront() takes an argument, it's popFront(char). trie.popFront('a') takes you to all elements in the trie that start with 'a'. Then you can continue exploring by e.g. trie.popFront('x') which takes you to all elements that start with "ax". At any point the range may become dry (nothing with this prefix). If the range is not empty, front() gives you information about all strings that share the particular prefix you have spanned (count and breadth comes to mind). That could be supported. To paraphrase a commercial, "there's an interface for that" :o). No there isn't. I wouldn't make that an interface, as it's not foreachable. I'd build a special range for it. But iterating over the keys, is that not advisable? Would your trie iterator be the only way to iterate over the elements? It seems rather non-useful. I think a trie, like many other nontrivial containers, supports several means of iteration. Including keys?!!! Still not a straight answer. If you should be able to iterate keys, then say you should be able to iterate keys. If you shouldn't, then say that. Sorry. Yes, by-key iteration should be possible. Andrei
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 15:58:46 -0400, Andrei Alexandrescu wrote: On 05/24/2010 01:12 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu wrote: On 05/24/2010 12:11 PM, bearophile wrote: Steven Schveighoffer: Is it unreasonable to expect to be able to iterate the keys in a trie? (I don't really know, I've never worked with them) On tries you can iterate on all keys or on a sorted range of the keys. There's one more way of iteration that's unique to tries - by key element (e.g. if key is string, you iterate by character). That would be an exploratory iteration. At each step in the iteration you pass a character, so popFront() takes an argument, it's popFront(char). trie.popFront('a') takes you to all elements in the trie that start with 'a'. Then you can continue exploring by e.g. trie.popFront('x') which takes you to all elements that start with "ax". At any point the range may become dry (nothing with this prefix). If the range is not empty, front() gives you information about all strings that share the particular prefix you have spanned (count and breadth comes to mind). That could be supported. To paraphrase a commercial, "there's an interface for that" :o). No there isn't. I wouldn't make that an interface, as it's not foreachable. I'd build a special range for it. But iterating over the keys, is that not advisable? Would your trie iterator be the only way to iterate over the elements? It seems rather non-useful. I think a trie, like many other nontrivial containers, supports several means of iteration. Including keys?!!! Still not a straight answer. If you should be able to iterate keys, then say you should be able to iterate keys. If you shouldn't, then say that. Or is there another way to iterating the keys that would be more efficient than building an array to contain the 'key' for each node? I don't know. I'm just glad I can explore that without having to think in interfaces. I don't really know what this means. The interfaces are a way to plug containers in at runtime. They are not meant to be the entire API for the collection. You can absolutely add whatever functionality that does not fit in the interface as an additional feature, and users will have to use the exact type in order to use that feature. -Steve
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 02:29 PM, bearophile wrote: Steven Schveighoffer: Look at D's arrays. Is appending with D's arrays the fastest it possibly could be? Hell no, but it's good enough for most situations, and safe. Append in D dynamic arrays is awful, it's slow and requires complex code (and currently there are not explanations about how it works in the D docs, see http://d.puremagic.com/issues/show_bug.cgi?id=4091 ), so it's not a good example to be copied :-) It's not fast enough for most situations, that's why there is the need of an Appender still. When was the last time you measured? I thought the speed has largely improved since Steve integrated his work. Andrei
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 02:01 PM, Pelle wrote: On 05/24/2010 06:27 PM, Andrei Alexandrescu wrote: struct List { int v; List * next; } While I do agree with that design for a list, that is no reference type. List* is. My point was that the pressure for a really simple hand-rolled SLL is huge. A good abstraction should convince the Walters of this world that the abstraction is actually better than the hand-rolled SLL. Andrei
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 01:49 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu wrote: A pointer to the end element would be required for both appending and back(). This further erodes my confidence. Size needs to be maintained _and_ a pointer to the last element must be maintained. For many uses, all of this stuff is unnecessary overhead. You make it sound like incrementing a counter and changing a pointer are incredibly slow operations :) They are. Ask Walter for a good example. The overhead of storage is minimal compared to the elements of the list, which do not contain said overhead. I'm talking containers of lists here. I think we need to build the shared vision that in Phobos we're competing against hand-written, highly-optimized code. This is the fight STL took and largely won. We can't afford to lower our standards for one tiny bit. I was once talking over a beer about the advantages of container abstractions. Walter slapped my face by defining the fastest SLL in the world on a napkin. It looked like this: struct List { int v; List * next; } He took so little time to define that, and the primitives he needed were so simple and fast, that it was an absolute overkill to replace that with a generic whiz-bang container. If I'd mentioned there'd be any extra data involved, that would mean the end of negotiation. "Why do you need that?" "For length()!" "But I don't need length()!" "Well you have to." "That's Communism in design!" And I agree. But something like the above is prone to all sorts of nasty things, like inadvertent cycles in your list. Yeah, and I'd have my list of arguments to add too. Which may be acceptable to you if you want the bleeding fastest speed available. There will always be tailored code crafted to be as fast as possible for some specific design and dcollections and your slist just won't fit the bill. Except that the set is considerably smaller for a well-designed slist. And dcollections' link list node is exactly what you wrote there (with a prev pointer of course). The only difference is, I don't define an element is the same as a list. The whole list has it's own data, including a pointer to the first element in the list. Look at D's arrays. Is appending with D's arrays the fastest it possibly could be? Hell no, but it's good enough for most situations, and safe. If I could make them faster without adversely affecting the performance of other operations, I would. There's considerable constraints at plain in the array design for historical and other reasons. Andrei
Re: dcollections 1.0 and 2.0a beta released
Steven Schveighoffer: > complex code? > a ~= b; > Seems pretty not complex. I meant on the implementation side. Arrays are used all the time, and they are basic blocks to be used everywhere so a programmer probably must understand how they work inside. > That part is still pretty new. What do you find is messy and complex > about it? The internal semantics of the current design of dynamic arrays is not transparent enough, this means I often don't know what it is doing. This can be acceptable for a more complex data structure as the associative arrays. And it's not easy to know exactly where to use assumeSafeAppend. Despite their complexity, D dynamic arrays are not safe enough, see the small thread about passing D dynamic arrays by reference on default. > When I said "good enough for most situations" I didn't mean "most > situations where appending speed is crucial to the success of the > application" :P I have written few hundreds of small D programs, and I have seen that appending often happens (more often that I have originally thought) in points where performance is important enough. Using the Appender of my dlibs was usually able to solve the problem. I don't know better solutions, so maybe I'm just wasting your time, I am sorry. Before seeing D I have never thought that designing good dynamic arrays can be so hard :-) Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 01:12 PM, Steven Schveighoffer wrote: On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu wrote: On 05/24/2010 12:11 PM, bearophile wrote: Steven Schveighoffer: Is it unreasonable to expect to be able to iterate the keys in a trie? (I don't really know, I've never worked with them) On tries you can iterate on all keys or on a sorted range of the keys. There's one more way of iteration that's unique to tries - by key element (e.g. if key is string, you iterate by character). That would be an exploratory iteration. At each step in the iteration you pass a character, so popFront() takes an argument, it's popFront(char). trie.popFront('a') takes you to all elements in the trie that start with 'a'. Then you can continue exploring by e.g. trie.popFront('x') which takes you to all elements that start with "ax". At any point the range may become dry (nothing with this prefix). If the range is not empty, front() gives you information about all strings that share the particular prefix you have spanned (count and breadth comes to mind). That could be supported. To paraphrase a commercial, "there's an interface for that" :o). But iterating over the keys, is that not advisable? Would your trie iterator be the only way to iterate over the elements? It seems rather non-useful. I think a trie, like many other nontrivial containers, supports several means of iteration. Or is there another way to iterating the keys that would be more efficient than building an array to contain the 'key' for each node? I don't know. I'm just glad I can explore that without having to think in interfaces. Andrei
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 15:29:23 -0400, bearophile wrote: Steven Schveighoffer: Look at D's arrays. Is appending with D's arrays the fastest it possibly could be? Hell no, but it's good enough for most situations, and safe. Append in D dynamic arrays is awful, it's slow and requires complex code. complex code? a ~= b; Seems pretty not complex. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 15:32:01 -0400, bearophile wrote: And the appending is hard to use too, see the ridiculously complex to use & messy capacity, reserve and and assumeSafeAppend. So overall it's a good example of what I will never want to copy. That part is still pretty new. What do you find is messy and complex about it? When I said "good enough for most situations" I didn't mean "most situations where appending speed is crucial to the success of the application" :P Most situations means you need to do appending once in a while. A perfect example is something like toStringz. And most situations do not require the three above mentioned functions. -Steve
Re: dcollections 1.0 and 2.0a beta released
And the appending is hard to use too, see the ridiculously complex to use & messy capacity, reserve and and assumeSafeAppend. So overall it's a good example of what I will never want to copy. Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
Steven Schveighoffer: > Look at D's arrays. Is appending with D's arrays the fastest it possibly > could be? Hell no, but it's good enough for most situations, and safe. Append in D dynamic arrays is awful, it's slow and requires complex code (and currently there are not explanations about how it works in the D docs, see http://d.puremagic.com/issues/show_bug.cgi?id=4091 ), so it's not a good example to be copied :-) It's not fast enough for most situations, that's why there is the need of an Appender still. Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 06:27 PM, Andrei Alexandrescu wrote: struct List { int v; List * next; } While I do agree with that design for a list, that is no reference type. I thought we wanted reference types.
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 12:27:10 -0400, Andrei Alexandrescu wrote: On 05/24/2010 11:14 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu wrote: On 05/24/2010 10:23 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, Do you agree that's an awkwardness? Yes, at that point it's an optimization. The only place where I assume length could be invalid is when working with the Iterator!V type, which might not be a dcollections object. That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have interfaces, the door is ajar forever. Client code can't write this: auto total = cont1.length + cont2.length; because that is incorrect code (that compiles, runs, and produces some odd result). So don't take it lightly. NO_LENGTH_SUPPORT means no length support. Then pretending it's supported will only harm everyone. I get what you are saying. What I meant was it's moot if you are not using interfaces. If you don't know what the underlying type is, then you have to do a runtime check. I agree it's awkward, and I could have not included length as a member of Iterator, in which case it would be on all the container types and NO_LENGTH_SUPPORT would not exist. All containers are meant to have a valid length, it is only with non-container Iterators that length can be NO_LENGTH_SUPPORT. However, supporting non-length containers via generic programming vs. interfaces would be much smoother. In that case, you can do coll.begin.empty to determine if the collection has any elements. Then why not make length optional and define empty? From the above it looks like an obviously better design. But all dcollections are bi-directional anyways, there is no singly linked list. That was a decision I made early on, because it allows more assumptions about dcollections' containers. I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple. Do you see dcollections' inability to express slists as a problem? I don't see them as unable to express slists. They would break the design guidelines that I set for collections being iterable in both directions, but an slist fits fine within the List interface. What's the required complexity of concat? O(n) with n being the number of elements added Is add expected to put things in a specific place? How does slist implement back()? slist does not fit the List interface. A pointer to the end element would be required for both appending and back(). This further erodes my confidence. Size needs to be maintained _and_ a pointer to the last element must be maintained. For many uses, all of this stuff is unnecessary overhead. You make it sound like incrementing a counter and changing a pointer are incredibly slow operations :) The overhead of storage is minimal compared to the elements of the list, which do not contain said overhead. I think we need to build the shared vision that in Phobos we're competing against hand-written, highly-optimized code. This is the fight STL took and largely won. We can't afford to lower our standards for one tiny bit. I was once talking over a beer about the advantages of container abstractions. Walter slapped my face by defining the fastest SLL in the world on a napkin. It looked like this: struct List { int v; List * next; } He took so little time to define that, and the primitives he needed were so simple and fast, that it was an absolute overkill to replace that with a generic whiz-bang container. If I'd mentioned there'd be any extra data involved, that would mean the end of negotiation. "Why do you need that?" "For length()!" "But I don't need length()!" "Well you have to." "That's Communism in design!" And I agree. But something like the above is prone to all sorts of nasty things, like inadvertent cycles in your list. Which may be acceptable to you if you want the bleeding fastest speed available. There will always be tailored code crafted to be as fast as possible for some specific design and dcollections and your slist just won't fit the bill. And dcollections' link list node is exactly what you wrote there (with a prev pointer of course). The only difference is, I don't define an element is the same as a list. The whole list has it's own data, including a pointer to the first element in the list. Look at D's arrays. Is appending with D's arrays the fastest it possibly could be? Hell no, but it's good enough for most situations, and safe. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Thu, 20 May 2010 02:11:58 -0400, Bernard Helyer wrote: On 20/05/10 13:39, Bernard Helyer wrote: Oooohhh goody goody goody! n_n I'm in the process of making a little toy project ATM. I'll shall integrate dcollections 2.0a into ASAP. ArrayList doesn't compile with warnings as it overrides opEquals, but doesn't mark it as such. http://www.dsource.org/projects/dcollections/ticket/5 Any other problems, please create a ticket (including your Thing one, but I'm not sure what you were doing there :) -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 13:36:01 -0400, Andrei Alexandrescu wrote: On 05/24/2010 12:11 PM, bearophile wrote: Steven Schveighoffer: Is it unreasonable to expect to be able to iterate the keys in a trie? (I don't really know, I've never worked with them) On tries you can iterate on all keys or on a sorted range of the keys. There's one more way of iteration that's unique to tries - by key element (e.g. if key is string, you iterate by character). That would be an exploratory iteration. At each step in the iteration you pass a character, so popFront() takes an argument, it's popFront(char). trie.popFront('a') takes you to all elements in the trie that start with 'a'. Then you can continue exploring by e.g. trie.popFront('x') which takes you to all elements that start with "ax". At any point the range may become dry (nothing with this prefix). If the range is not empty, front() gives you information about all strings that share the particular prefix you have spanned (count and breadth comes to mind). That could be supported. But iterating over the keys, is that not advisable? Would your trie iterator be the only way to iterate over the elements? It seems rather non-useful. Or is there another way to iterating the keys that would be more efficient than building an array to contain the 'key' for each node? -Steve
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 12:11 PM, bearophile wrote: Steven Schveighoffer: Is it unreasonable to expect to be able to iterate the keys in a trie? (I don't really know, I've never worked with them) On tries you can iterate on all keys or on a sorted range of the keys. There's one more way of iteration that's unique to tries - by key element (e.g. if key is string, you iterate by character). That would be an exploratory iteration. At each step in the iteration you pass a character, so popFront() takes an argument, it's popFront(char). trie.popFront('a') takes you to all elements in the trie that start with 'a'. Then you can continue exploring by e.g. trie.popFront('x') which takes you to all elements that start with "ax". At any point the range may become dry (nothing with this prefix). If the range is not empty, front() gives you information about all strings that share the particular prefix you have spanned (count and breadth comes to mind). Andrei
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 13:11:27 -0400, Simen kjaeraas wrote: Steven Schveighoffer wrote: And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to use interfaces, and your code depends on X and Y, and interfaces aren't defined by dcollections, where are you then? You're at the point where the language allows you to create a class following an interface, which whose all methods redirect to those of the struct. This could even be done automagically. A situation Robert has stated he would like to avoid. -Steve
Re: dcollections 1.0 and 2.0a beta released
Steven Schveighoffer wrote: And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to use interfaces, and your code depends on X and Y, and interfaces aren't defined by dcollections, where are you then? You're at the point where the language allows you to create a class following an interface, which whose all methods redirect to those of the struct. This could even be done automagically. -- Simen
Re: dcollections 1.0 and 2.0a beta released
Steven Schveighoffer: > Is it unreasonable to expect to be able > to iterate the keys in a trie? (I don't really know, I've never worked > with them) On tries you can iterate on all keys or on a sorted range of the keys. Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 13:04:31 -0400, bearophile wrote: Steven Schveighoffer: Scope class members should solve this. It's been thrown around forever -- essentially, you should be able to define that a class member's storage is contained within the owner object. I too have proposed this idea, but in my opinion you can't design a big part of a standard lib on the base of a syntax/language optimization that doesn't exists yet. Yeah, but essentially, it's an optimization. Whether the storage is stored in the same heap block or a different one really doesn't matter in terms of functionality. Allocating one heap block vs. two is faster, that's what the OP was saying. In other words, scope class members aren't essential to using dcollections. -Steve
Re: dcollections 1.0 and 2.0a beta released
Steven Schveighoffer: > Scope class members should solve this. It's been thrown around forever -- > essentially, you should be able to define that a class member's storage is > contained within the owner object. I too have proposed this idea, but in my opinion you can't design a big part of a standard lib on the base of a syntax/language optimization that doesn't exists yet. Bye, bearophile
Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)
On Mon, 24 May 2010 12:18:02 -0400, Andrei Alexandrescu wrote: On 05/24/2010 10:58 AM, Steven Schveighoffer wrote: On Sun, 23 May 2010 17:36:52 -0400, Andrei Alexandrescu wrote: I've thought for a very long time about the class vs. struct choice in a container, and I came to a startling conclusion: it (almost) doesn't matter. Could be either, and the tradeoffs involved are nonessential. Here they are: 1. Using a class makes implementing members easier because there's no need to do work through an additional member. With a struct, you need a pimpl approach. For example: struct Array { struct Impl { ... } Impl * impl; ... @property length() const { return impl->length; } ... } 2. struct gives you more power in managing collection's own memory, as long as the collection doesn't escape item addresses or other addresses of internal handles. So all other things being equal, struct has a net advantage. Yes, but most containers are node-based, so as long as you use structs for nodes, you are generally in control of the bulk of allocation (dcollections does this). I'm saying, the net advantage of struct is that it can safely and deterministically release memory in its destructor. OK. 3. The creation syntaxes are different. For Phobos, I suggest adding a simple function make() to std.algorithm. make!T(a, b, c) returns a newly constructed object T by invoking the constructor with arguments a, b, and c. That way we can make client code virtually agnostic of the class/struct choice for a container. Classes have builtin object monitors. But you could handle that via a struct as well. The language support wouldn't be there though. Oh, and the monitor isn't needed so that's an unused word there. I'd forgotten about it. Why isn't the monitor needed? Will we no longer be able to use a globally shared container object? That's it. Otherwise, one could use either to build a container. Let me note that I have reached the conclusion that containers should be at best reference types, with a meta-constructor Value!(C) that takes a container C and makes it into a value type. The thing that classes really give you is the language support. Structs need to be hand-crafted to support the same syntax. Classes are enforced as always being reference types and always being on the heap. I'd agreed classes are more convenient because they make the implementation straightforward. The Value!(C) is questionable because creating the head of a container on the stack leads to easily escaped stack references. I don't understand this. Could you give an example? For example, dcollections' TreeMap has an 'end' node that defines the end of the container. It actually is the root of the RB tree and all the data is on the left child of the 'end' node. This makes it always the last node iterated. The end node is actually member of the class, it's not allocated separately on the heap for efficiency and a less complex constructor. So let's say TreeMap could be on the stack, here's a situation where an escape would occur: auto foo() { TreeMap!(string, uint) tmap; // allocated on stack ... // fill up tmap; return tmap["hi"..tmap.end]; // oops! end is part of the range, and it was allocated on the stack. } In dcollections, all the classes have some sort of members that are associated with the entire container, not the elements. These would disappear once a stack-based container exited scope, but there may be dangling references returned. To get around this, you could ensure that all data was allocated on the heap, but then wouldn't Value!(C) be almost useless? What data would be allocated on the stack that is part of the container that would never be referenced by the heap-based parts? -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 12:06:18 -0400, Robert Jacques wrote: On Mon, 24 May 2010 08:06:29 -0400, Steven Schveighoffer wrote: On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques wrote: On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer wrote: [snip] I understand these points, but I'm already using interfaces to copy data between containers. I don't have to, I could have used generic code, but this way, only one function is instantiated to copy data from all the other containers. The problem with using generic code is that the compiler will needlessly duplicate functions that are identical. This sounds like a failure of design. Why aren't you using ranges to do this? Why are ranges necessarily better? I'm using the container's opApply, which I'd probably still use even if there were no interfaces. opApply allows much more possibilities in traversal than ranges which cannot use stack recursion without heap activity. Ranges are not necessarily better and may have some minor amount of overhead over a well optimized opApply. Then again, the opposite may be true. The point is that if the difference between opApply and a range is more than trivial, you've probably had a failure of design occur. The difference is trivial. I support both ranges and opApply. The main benefit from using opApply being only one function is compiled by the compiler. Using interfaces is not as viral as you think. My interfaces can be used in generic code, as long as the generic code uses functions in the interfaces. If a library returns an interface, the author is saying "I don't want you using any functions outside this interface," so why is that a bad thing? Well, needlessly duplicated functions for one. :) More importantly, the example I gave was about third party libraries which I have no control over. So this solution explicitly doesn't work. And even if everyone used templates everywhere in order to be compatible with both interfaces and classes, isn't that a viral effect of having both? If a 3rd party library uses interfaces, it's probably for good reason. They most likely want to remain binary compatible with other libs, and/or want to abstract the implementation of some custom container type. If you don't like their requirements, don't use the library. -Steve No, if a 3rd party library _needs_ to use interfaces it's probably for a good reason. The problem is, if they exist, people are going to use them even if they don't need them. Which therein lies the problem. And if 3rd party X _needs_ to use interfaces, and 3rd party Y _needs_ to use interfaces, and your code depends on X and Y, and interfaces aren't defined by dcollections, where are you then? I don't think there's any way you can define a generic standard library such that people won't create bad designs, that's a lost cause. I think part of the problem with all this is that interfaces aren't likely to be needed any time soon (D's dynamic lib support is almost non-existent), so I'm probably going to drop the interfaces for now in the interest of progress. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 11:45:51 -0400, Andrei Alexandrescu wrote: On 05/24/2010 10:01 AM, Steven Schveighoffer wrote: On Fri, 21 May 2010 13:42:14 -0400, Andrei Alexandrescu wrote: On 05/19/2010 08:42 PM, Steven Schveighoffer wrote: Andrei Alexandrescu Wrote: I.e. there aren't many kinds of HashMaps that derive from each other. But the interfaces are not detrimental to your ideas. The only thing interfaces require is that the entities implementing them are classes and not structs. As long as you agree that classes are the right call, then interfaces can co-exist with your other suggestions without interference. This brings back a discussion I had with Walter a while ago, with echoes in the newsgroup. Basically the conclusion was as follows: if a container never escapes the addresses of its elements, it can manage its own storage. That forces, however, the container to be a struct because copying references to a class container would break that encapsulation. I called those "perfectly encapsulated containers" and I think they are good candidates for manual memory management because they tend to deal in relatively large chunks. I noticed that your collections return things by value, so they are good candidates for perfect encapsulation. Then you need reference counting, I don't think that is a good design. Why? I think refcounting for containers is perfect. Refcounting small objects is bad because the counter overhead is large. Also, small objects tend to be created and passed around a lot, so the time overhead is significant. In contrast, refcounting and containers are a perfect marriage. The container is a relatively large conglomerate of objects so refcounting that is bound to yield good benefits in terms of memory reclamation. I meant refcounting elements. If you are simply refcounting the container, what do you do when someone wants to remove a node? Not allow it if more than one reference to the container exists? Java has some warts as you have rightfully pointed out in the past (i.e. O(n) lookup), but I have attempted to remove all those warts. Dcollections I would hope does not suffer from the problems that Java's collections suffer from. That's great. But let me quote what you said: "I myself don't really use the interface aspect of the classes, it is mostly a carryover from the Java/Tango inspirations." I took that to mean you don't have a strong justification for structuring dcollections as a hierarchy, and furthermore that makes me hope it's possible you'd be willing to yank that dinosaur brain. I did not have a strong justification for using interfaces originally, and my first interface design shows that. But I think with careful design, the interfaces in the D2 version are pretty useful in some situations, all of which could be essential to a project's design. In other words, I don't see the fact that Java or Tango's original collection package had interfaces as a "wart". The problem I see right now is, a lot of that utility is moot since D is not very good at dynamic linking. Since you necessarily have to statically link Phobos and dcollections, it makes no sense to keep binary compatibility, or hide implementation. Looks like we're heading straight to stalemate once again. In the interest of time, it would be better to get to stalemate (or, hopefully, agreement) so we know whether dcollections will be integrated within Phobos or whether I should spend this and next weeks' evenings coding. To that end, please let me know whether it's worth that I spend time on putting together a list of proposed changes. I think if we can keep dcollections as classes, then the opportunity to have interfaces still exists for the future. If that's the case, we can ditch the interfaces for now, and revisit it when D's dynamic lib support gets better. So let's move on to other issues. I haven't changed my mind on interface utility, but there seems to be not much support for the idea. -Steve
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 11:14 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu wrote: On 05/24/2010 10:23 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, Do you agree that's an awkwardness? Yes, at that point it's an optimization. The only place where I assume length could be invalid is when working with the Iterator!V type, which might not be a dcollections object. That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have interfaces, the door is ajar forever. Client code can't write this: auto total = cont1.length + cont2.length; because that is incorrect code (that compiles, runs, and produces some odd result). So don't take it lightly. NO_LENGTH_SUPPORT means no length support. Then pretending it's supported will only harm everyone. I get what you are saying. What I meant was it's moot if you are not using interfaces. If you don't know what the underlying type is, then you have to do a runtime check. I agree it's awkward, and I could have not included length as a member of Iterator, in which case it would be on all the container types and NO_LENGTH_SUPPORT would not exist. All containers are meant to have a valid length, it is only with non-container Iterators that length can be NO_LENGTH_SUPPORT. However, supporting non-length containers via generic programming vs. interfaces would be much smoother. In that case, you can do coll.begin.empty to determine if the collection has any elements. Then why not make length optional and define empty? From the above it looks like an obviously better design. But all dcollections are bi-directional anyways, there is no singly linked list. That was a decision I made early on, because it allows more assumptions about dcollections' containers. I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple. Do you see dcollections' inability to express slists as a problem? I don't see them as unable to express slists. They would break the design guidelines that I set for collections being iterable in both directions, but an slist fits fine within the List interface. What's the required complexity of concat? O(n) with n being the number of elements added Is add expected to put things in a specific place? How does slist implement back()? slist does not fit the List interface. A pointer to the end element would be required for both appending and back(). This further erodes my confidence. Size needs to be maintained _and_ a pointer to the last element must be maintained. For many uses, all of this stuff is unnecessary overhead. I think we need to build the shared vision that in Phobos we're competing against hand-written, highly-optimized code. This is the fight STL took and largely won. We can't afford to lower our standards for one tiny bit. I was once talking over a beer about the advantages of container abstractions. Walter slapped my face by defining the fastest SLL in the world on a napkin. It looked like this: struct List { int v; List * next; } He took so little time to define that, and the primitives he needed were so simple and fast, that it was an absolute overkill to replace that with a generic whiz-bang container. If I'd mentioned there'd be any extra data involved, that would mean the end of negotiation. "Why do you need that?" "For length()!" "But I don't need length()!" "Well you have to." "That's Communism in design!" And I agree. Andrei
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 11:51:39 -0400, Andrei Alexandrescu wrote: On 05/24/2010 10:39 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 11:21:20 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: I am not familiar with tries, Missed that upon the first read. I suggest you look at tries and the following other structures as good examples that it's futile to fit collections into hierarchies. http://en.wikipedia.org/wiki/Trie http://linux.thai.net/~thep/datrie/datrie.html http://en.wikipedia.org/wiki/Suffix_tree http://en.wikipedia.org/wiki/Kd-tree We'd want to implement in time those and many more in Phobos without worrying that some of their primitives won't fit the existing interfaces, and also without the unjustified effort of formalizing interfaces for each of them in thinking that another very, very similar container will come along. From a cursory look, I don't see why tries would not be possible in dcollections. I'd probably start with something like this: class TrieMap(K, V) : Map!(K[], V) The array brackets enforce the ability to use prefixes on the keys. The key point of tries is that they have distributed storage. Thus, shoehorning them into the interface function Iterator!(K[]) keys(); forces allocation and copying. I wasn't thinking of that, I was thinking the key point of tries being more efficient at lookup for strings of things. One solution is that Trie could be a separate interface (i.e. a sibling of Map). One thing to point out is that D's arrays are adept at appending and prefixing. A K[] key would not necessarily be reallocating for each node traversal, but it certainly would be copying data. How would you define a way to get all the keys of a Trie? Or would you simply leave that function off? Is it unreasonable to expect to be able to iterate the keys in a trie? (I don't really know, I've never worked with them) -Steve
Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)
On 05/24/2010 10:58 AM, Steven Schveighoffer wrote: On Sun, 23 May 2010 17:36:52 -0400, Andrei Alexandrescu wrote: I've thought for a very long time about the class vs. struct choice in a container, and I came to a startling conclusion: it (almost) doesn't matter. Could be either, and the tradeoffs involved are nonessential. Here they are: 1. Using a class makes implementing members easier because there's no need to do work through an additional member. With a struct, you need a pimpl approach. For example: struct Array { struct Impl { ... } Impl * impl; ... @property length() const { return impl->length; } ... } 2. struct gives you more power in managing collection's own memory, as long as the collection doesn't escape item addresses or other addresses of internal handles. So all other things being equal, struct has a net advantage. Yes, but most containers are node-based, so as long as you use structs for nodes, you are generally in control of the bulk of allocation (dcollections does this). I'm saying, the net advantage of struct is that it can safely and deterministically release memory in its destructor. 3. The creation syntaxes are different. For Phobos, I suggest adding a simple function make() to std.algorithm. make!T(a, b, c) returns a newly constructed object T by invoking the constructor with arguments a, b, and c. That way we can make client code virtually agnostic of the class/struct choice for a container. Classes have builtin object monitors. But you could handle that via a struct as well. The language support wouldn't be there though. Oh, and the monitor isn't needed so that's an unused word there. I'd forgotten about it. That's it. Otherwise, one could use either to build a container. Let me note that I have reached the conclusion that containers should be at best reference types, with a meta-constructor Value!(C) that takes a container C and makes it into a value type. The thing that classes really give you is the language support. Structs need to be hand-crafted to support the same syntax. Classes are enforced as always being reference types and always being on the heap. I'd agreed classes are more convenient because they make the implementation straightforward. The Value!(C) is questionable because creating the head of a container on the stack leads to easily escaped stack references. I don't understand this. Could you give an example? But yeah, a struct as a 'smart pointer' could work, as long as you don't 'auto-create' like AA's do. It seems Steve took the weekend off. My son's 2nd birthday party was Saturday, and I had visitors, sorry :) I did some responses, but at some point if you want to remain married, you have to pay attention to your family. Congrats. And yes, wise words :o). Andrei
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 11:45:44 -0400, Andrei Alexandrescu wrote: On 05/24/2010 10:23 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, Do you agree that's an awkwardness? Yes, at that point it's an optimization. The only place where I assume length could be invalid is when working with the Iterator!V type, which might not be a dcollections object. That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have interfaces, the door is ajar forever. Client code can't write this: auto total = cont1.length + cont2.length; because that is incorrect code (that compiles, runs, and produces some odd result). So don't take it lightly. NO_LENGTH_SUPPORT means no length support. Then pretending it's supported will only harm everyone. I get what you are saying. What I meant was it's moot if you are not using interfaces. If you don't know what the underlying type is, then you have to do a runtime check. I agree it's awkward, and I could have not included length as a member of Iterator, in which case it would be on all the container types and NO_LENGTH_SUPPORT would not exist. All containers are meant to have a valid length, it is only with non-container Iterators that length can be NO_LENGTH_SUPPORT. However, supporting non-length containers via generic programming vs. interfaces would be much smoother. In that case, you can do coll.begin.empty to determine if the collection has any elements. Then why not make length optional and define empty? From the above it looks like an obviously better design. But all dcollections are bi-directional anyways, there is no singly linked list. That was a decision I made early on, because it allows more assumptions about dcollections' containers. I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple. Do you see dcollections' inability to express slists as a problem? I don't see them as unable to express slists. They would break the design guidelines that I set for collections being iterable in both directions, but an slist fits fine within the List interface. What's the required complexity of concat? O(n) with n being the number of elements added Is add expected to put things in a specific place? How does slist implement back()? slist does not fit the List interface. A pointer to the end element would be required for both appending and back(). -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 08:06:29 -0400, Steven Schveighoffer wrote: On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques wrote: On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer wrote: [snip] I understand these points, but I'm already using interfaces to copy data between containers. I don't have to, I could have used generic code, but this way, only one function is instantiated to copy data from all the other containers. The problem with using generic code is that the compiler will needlessly duplicate functions that are identical. This sounds like a failure of design. Why aren't you using ranges to do this? Why are ranges necessarily better? I'm using the container's opApply, which I'd probably still use even if there were no interfaces. opApply allows much more possibilities in traversal than ranges which cannot use stack recursion without heap activity. Ranges are not necessarily better and may have some minor amount of overhead over a well optimized opApply. Then again, the opposite may be true. The point is that if the difference between opApply and a range is more than trivial, you've probably had a failure of design occur. Honestly, I'm having trouble thinking of a container which allows stack recursion for transversal and doesn't have an efficient range/loop variant. For that matter, a range using heap activity is also a clear indication of a design failure. Using interfaces is not as viral as you think. My interfaces can be used in generic code, as long as the generic code uses functions in the interfaces. If a library returns an interface, the author is saying "I don't want you using any functions outside this interface," so why is that a bad thing? Well, needlessly duplicated functions for one. :) More importantly, the example I gave was about third party libraries which I have no control over. So this solution explicitly doesn't work. And even if everyone used templates everywhere in order to be compatible with both interfaces and classes, isn't that a viral effect of having both? If a 3rd party library uses interfaces, it's probably for good reason. They most likely want to remain binary compatible with other libs, and/or want to abstract the implementation of some custom container type. If you don't like their requirements, don't use the library. -Steve No, if a 3rd party library _needs_ to use interfaces it's probably for a good reason. The problem is, if they exist, people are going to use them even if they don't need them. Which therein lies the problem.
Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)
On Sun, 23 May 2010 17:36:52 -0400, Andrei Alexandrescu wrote: I've thought for a very long time about the class vs. struct choice in a container, and I came to a startling conclusion: it (almost) doesn't matter. Could be either, and the tradeoffs involved are nonessential. Here they are: 1. Using a class makes implementing members easier because there's no need to do work through an additional member. With a struct, you need a pimpl approach. For example: struct Array { struct Impl { ... } Impl * impl; ... @property length() const { return impl->length; } ... } 2. struct gives you more power in managing collection's own memory, as long as the collection doesn't escape item addresses or other addresses of internal handles. So all other things being equal, struct has a net advantage. Yes, but most containers are node-based, so as long as you use structs for nodes, you are generally in control of the bulk of allocation (dcollections does this). 3. The creation syntaxes are different. For Phobos, I suggest adding a simple function make() to std.algorithm. make!T(a, b, c) returns a newly constructed object T by invoking the constructor with arguments a, b, and c. That way we can make client code virtually agnostic of the class/struct choice for a container. Classes have builtin object monitors. But you could handle that via a struct as well. The language support wouldn't be there though. That's it. Otherwise, one could use either to build a container. Let me note that I have reached the conclusion that containers should be at best reference types, with a meta-constructor Value!(C) that takes a container C and makes it into a value type. The thing that classes really give you is the language support. Structs need to be hand-crafted to support the same syntax. Classes are enforced as always being reference types and always being on the heap. The Value!(C) is questionable because creating the head of a container on the stack leads to easily escaped stack references. But yeah, a struct as a 'smart pointer' could work, as long as you don't 'auto-create' like AA's do. It seems Steve took the weekend off. My son's 2nd birthday party was Saturday, and I had visitors, sorry :) I did some responses, but at some point if you want to remain married, you have to pay attention to your family. -Steve
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 10:39 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 11:21:20 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: I am not familiar with tries, Missed that upon the first read. I suggest you look at tries and the following other structures as good examples that it's futile to fit collections into hierarchies. http://en.wikipedia.org/wiki/Trie http://linux.thai.net/~thep/datrie/datrie.html http://en.wikipedia.org/wiki/Suffix_tree http://en.wikipedia.org/wiki/Kd-tree We'd want to implement in time those and many more in Phobos without worrying that some of their primitives won't fit the existing interfaces, and also without the unjustified effort of formalizing interfaces for each of them in thinking that another very, very similar container will come along. From a cursory look, I don't see why tries would not be possible in dcollections. I'd probably start with something like this: class TrieMap(K, V) : Map!(K[], V) The array brackets enforce the ability to use prefixes on the keys. The key point of tries is that they have distributed storage. Thus, shoehorning them into the interface function Iterator!(K[]) keys(); forces allocation and copying. Andrei
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 10:01 AM, Steven Schveighoffer wrote: On Fri, 21 May 2010 13:42:14 -0400, Andrei Alexandrescu wrote: On 05/19/2010 08:42 PM, Steven Schveighoffer wrote: Andrei Alexandrescu Wrote: I.e. there aren't many kinds of HashMaps that derive from each other. But the interfaces are not detrimental to your ideas. The only thing interfaces require is that the entities implementing them are classes and not structs. As long as you agree that classes are the right call, then interfaces can co-exist with your other suggestions without interference. This brings back a discussion I had with Walter a while ago, with echoes in the newsgroup. Basically the conclusion was as follows: if a container never escapes the addresses of its elements, it can manage its own storage. That forces, however, the container to be a struct because copying references to a class container would break that encapsulation. I called those "perfectly encapsulated containers" and I think they are good candidates for manual memory management because they tend to deal in relatively large chunks. I noticed that your collections return things by value, so they are good candidates for perfect encapsulation. Then you need reference counting, I don't think that is a good design. Why? I think refcounting for containers is perfect. Refcounting small objects is bad because the counter overhead is large. Also, small objects tend to be created and passed around a lot, so the time overhead is significant. In contrast, refcounting and containers are a perfect marriage. The container is a relatively large conglomerate of objects so refcounting that is bound to yield good benefits in terms of memory reclamation. Yes, if you want to define "this function needs something that is both addable and purgeable, I don't have an interface for that. But a concept can certainly define that generically (which is what you want anyways), or you could just say "I need a List" and get those functions also. It also does not force entities other than dcollections objects to be classes, they could be structs and implement the correct concepts. I myself don't really use the interface aspect of the classes, it is mostly a carryover from the Java/Tango inspirations. I don't know Tango, but Java's containers are a terrible example to follow. Java's container library is a ill-advised design on top of an underpowered language, patched later with some half-understood seeming of genericity. I think Java containers are a huge disservice to the programming community because they foster bad design. Java has some warts as you have rightfully pointed out in the past (i.e. O(n) lookup), but I have attempted to remove all those warts. Dcollections I would hope does not suffer from the problems that Java's collections suffer from. That's great. But let me quote what you said: "I myself don't really use the interface aspect of the classes, it is mostly a carryover from the Java/Tango inspirations." I took that to mean you don't have a strong justification for structuring dcollections as a hierarchy, and furthermore that makes me hope it's possible you'd be willing to yank that dinosaur brain. But I can see one good reason to keep them -- binary interoperability. For example, it might be the case some day when D has good support with dynamic libraries that a library exposes some piece of itself as a Map or List interface. I need to disagree with that. I've done and I do a ton of binary interoperability stuff. You never expose a generic container interface! Interoperable objects always embody high-level logic that is specific to the application. They might use containers inside, but they invariably expose high-level, application-specific functionality. It's done all the time in Java and .NET. For example, A GUI listbox widget exposes its elements as an array of elements, which implement the List interface. You don't ever see the implementation or need it. Granted Java and .NET have less problems than C++ and D with binary compatibility, since the function tables are dynamic, but the potential is there for D to make binary compatibility possible with interfaces. I see. In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 10:23 AM, Steven Schveighoffer wrote: On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, Do you agree that's an awkwardness? Yes, at that point it's an optimization. The only place where I assume length could be invalid is when working with the Iterator!V type, which might not be a dcollections object. That doesn't matter. Since you define NO_LENGTH_SUPPORT and you have interfaces, the door is ajar forever. Client code can't write this: auto total = cont1.length + cont2.length; because that is incorrect code (that compiles, runs, and produces some odd result). So don't take it lightly. NO_LENGTH_SUPPORT means no length support. Then pretending it's supported will only harm everyone. but all dcollections define length. I was hoping we're on the verge of agreeing to yank all interfaces and let the collection roam free. If that's a possibility, then your design isn't constrained to define length anymore unless in can. I agree, removing all the interfaces would get rid of the need for NO_LENGTH_SUPPORT. But all dcollections define a valid length, so that point is kinda moot :) Not moot because you have interfaces. However, supporting non-length containers via generic programming vs. interfaces would be much smoother. In that case, you can do coll.begin.empty to determine if the collection has any elements. Then why not make length optional and define empty? From the above it looks like an obviously better design. But all dcollections are bi-directional anyways, there is no singly linked list. That was a decision I made early on, because it allows more assumptions about dcollections' containers. I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple. Do you see dcollections' inability to express slists as a problem? I don't see them as unable to express slists. They would break the design guidelines that I set for collections being iterable in both directions, but an slist fits fine within the List interface. What's the required complexity of concat? Is add expected to put things in a specific place? How does slist implement back()? slist does not fit the List interface. And singly linked vs. doubly linked does not make any difference whether O(1) length is possible or not. As you say, it's O(1) splicing or O(1) length, regardless of single or double links. I disagree with your assessment that length is a less used operation than splicing. I think I have never used splicing with std::list. C++'s default is O(1) length, and I followed that design. I didn't assess that. I felt like you did. Your statement was: "It works against splicing (an important list primitive) and most of the time you don't need it so why waste time updating it." While you didn't specifically say it was used more, you mentioned the importance of splicing, and the non-importance of length. I guess I should have said, I disagree that splicing is more important than caching length. My main point was that if one defines an slist, in many cases one defines it to be very small, simple, and cheap. Maintaining size will come on the radar and would discourage the use of the abstraction in favor of a hand-rolled implementation. This is failure to abstract. All dcollections own their elements, it is not like an array, where there can be multiple references to subsets of the data. An slist would simply be an slist as you describe with a slightly bigger head. In other words, the head node has the length cache and allocator, and object monitor, and whatever else you need for the whole list. The nodes themselves would be a simple element and pointer. But the elements are never exposed except via ranges and cursors. The ranges and cursors cannot affect the topology of the list, you need the head "owner" node to do that. I understand. The problem is when you many such lists or when you manipulate a few intensively. I look at it this way, dcollections are "good enough" for most uses, if you need highly tailored data structures with specific uses in mind, you can make those on your own. For example, dcollections' LinkList can easily take the place of a simple slist. It may not be as fast with your specific requirements in mind, but it works. The benefit is, it works like all the other collection types and it's easy to learn all of them together because they all have certain properties. That's what STL said about slists. Next thing you know... forward_list is being standardized. Andrei
Re: dcollections 1.0 and 2.0a beta released
On Fri, 21 May 2010 22:56:54 -0400, Vladimir Panteleev wrote: On Thu, 20 May 2010 04:42:35 +0300, Steven Schveighoffer wrote: interfaces Does that imply that the most important methods are virtual? If so, say good-bye to inlining, and hello to an additional level of dereferencing. Without meaning any disrespect to all the work you did, allow me to say that I won't use a library that could be faster (without making usage too clumsy), but isn't. (I prefer my D programs to be as fast as reasonably possible - if I didn't care about speed, I'd use another language.) For the same reasons, I'd be disappointed if the library was admitted as-is into Phobos, since it doesn't live up to my personal ideal of what D should be. With the future update that all the classes are final, then they are not virtual as long as you don't use the interfaces. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 11:21:20 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: I am not familiar with tries, Missed that upon the first read. I suggest you look at tries and the following other structures as good examples that it's futile to fit collections into hierarchies. http://en.wikipedia.org/wiki/Trie http://linux.thai.net/~thep/datrie/datrie.html http://en.wikipedia.org/wiki/Suffix_tree http://en.wikipedia.org/wiki/Kd-tree We'd want to implement in time those and many more in Phobos without worrying that some of their primitives won't fit the existing interfaces, and also without the unjustified effort of formalizing interfaces for each of them in thinking that another very, very similar container will come along. From a cursory look, I don't see why tries would not be possible in dcollections. I'd probably start with something like this: class TrieMap(K, V) : Map!(K[], V) The array brackets enforce the ability to use prefixes on the keys. -Stvee
Re: dcollections 1.0 and 2.0a beta released
On Mon, 24 May 2010 10:10:51 -0400, Andrei Alexandrescu wrote: On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, Do you agree that's an awkwardness? Yes, at that point it's an optimization. The only place where I assume length could be invalid is when working with the Iterator!V type, which might not be a dcollections object. but all dcollections define length. I was hoping we're on the verge of agreeing to yank all interfaces and let the collection roam free. If that's a possibility, then your design isn't constrained to define length anymore unless in can. I agree, removing all the interfaces would get rid of the need for NO_LENGTH_SUPPORT. But all dcollections define a valid length, so that point is kinda moot :) However, supporting non-length containers via generic programming vs. interfaces would be much smoother. In that case, you can do coll.begin.empty to determine if the collection has any elements. Then why not make length optional and define empty? From the above it looks like an obviously better design. But all dcollections are bi-directional anyways, there is no singly linked list. That was a decision I made early on, because it allows more assumptions about dcollections' containers. I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple. Do you see dcollections' inability to express slists as a problem? I don't see them as unable to express slists. They would break the design guidelines that I set for collections being iterable in both directions, but an slist fits fine within the List interface. And singly linked vs. doubly linked does not make any difference whether O(1) length is possible or not. As you say, it's O(1) splicing or O(1) length, regardless of single or double links. I disagree with your assessment that length is a less used operation than splicing. I think I have never used splicing with std::list. C++'s default is O(1) length, and I followed that design. I didn't assess that. I felt like you did. Your statement was: "It works against splicing (an important list primitive) and most of the time you don't need it so why waste time updating it." While you didn't specifically say it was used more, you mentioned the importance of splicing, and the non-importance of length. I guess I should have said, I disagree that splicing is more important than caching length. My main point was that if one defines an slist, in many cases one defines it to be very small, simple, and cheap. Maintaining size will come on the radar and would discourage the use of the abstraction in favor of a hand-rolled implementation. This is failure to abstract. All dcollections own their elements, it is not like an array, where there can be multiple references to subsets of the data. An slist would simply be an slist as you describe with a slightly bigger head. In other words, the head node has the length cache and allocator, and object monitor, and whatever else you need for the whole list. The nodes themselves would be a simple element and pointer. But the elements are never exposed except via ranges and cursors. The ranges and cursors cannot affect the topology of the list, you need the head "owner" node to do that. I look at it this way, dcollections are "good enough" for most uses, if you need highly tailored data structures with specific uses in mind, you can make those on your own. For example, dcollections' LinkList can easily take the place of a simple slist. It may not be as fast with your specific requirements in mind, but it works. The benefit is, it works like all the other collection types and it's easy to learn all of them together because they all have certain properties. I don't know of a word for "hierarchy with interfaces as root and inner nodes and classes as leaf nodes" so I just call that class hierarchy. I view them as classes with interfaces tacked on because the implementations are similar :) -Steve
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: I am not familiar with tries, Missed that upon the first read. I suggest you look at tries and the following other structures as good examples that it's futile to fit collections into hierarchies. http://en.wikipedia.org/wiki/Trie http://linux.thai.net/~thep/datrie/datrie.html http://en.wikipedia.org/wiki/Suffix_tree http://en.wikipedia.org/wiki/Kd-tree We'd want to implement in time those and many more in Phobos without worrying that some of their primitives won't fit the existing interfaces, and also without the unjustified effort of formalizing interfaces for each of them in thinking that another very, very similar container will come along. Andrei
Re: dcollections 1.0 and 2.0a beta released
On Fri, 21 May 2010 13:42:14 -0400, Andrei Alexandrescu wrote: On 05/19/2010 08:42 PM, Steven Schveighoffer wrote: Andrei Alexandrescu Wrote: I.e. there aren't many kinds of HashMaps that derive from each other. But the interfaces are not detrimental to your ideas. The only thing interfaces require is that the entities implementing them are classes and not structs. As long as you agree that classes are the right call, then interfaces can co-exist with your other suggestions without interference. This brings back a discussion I had with Walter a while ago, with echoes in the newsgroup. Basically the conclusion was as follows: if a container never escapes the addresses of its elements, it can manage its own storage. That forces, however, the container to be a struct because copying references to a class container would break that encapsulation. I called those "perfectly encapsulated containers" and I think they are good candidates for manual memory management because they tend to deal in relatively large chunks. I noticed that your collections return things by value, so they are good candidates for perfect encapsulation. Then you need reference counting, I don't think that is a good design. Yes, if you want to define "this function needs something that is both addable and purgeable, I don't have an interface for that. But a concept can certainly define that generically (which is what you want anyways), or you could just say "I need a List" and get those functions also. It also does not force entities other than dcollections objects to be classes, they could be structs and implement the correct concepts. I myself don't really use the interface aspect of the classes, it is mostly a carryover from the Java/Tango inspirations. I don't know Tango, but Java's containers are a terrible example to follow. Java's container library is a ill-advised design on top of an underpowered language, patched later with some half-understood seeming of genericity. I think Java containers are a huge disservice to the programming community because they foster bad design. Java has some warts as you have rightfully pointed out in the past (i.e. O(n) lookup), but I have attempted to remove all those warts. Dcollections I would hope does not suffer from the problems that Java's collections suffer from. But I can see one good reason to keep them -- binary interoperability. For example, it might be the case some day when D has good support with dynamic libraries that a library exposes some piece of itself as a Map or List interface. I need to disagree with that. I've done and I do a ton of binary interoperability stuff. You never expose a generic container interface! Interoperable objects always embody high-level logic that is specific to the application. They might use containers inside, but they invariably expose high-level, application-specific functionality. It's done all the time in Java and .NET. For example, A GUI listbox widget exposes its elements as an array of elements, which implement the List interface. You don't ever see the implementation or need it. Granted Java and .NET have less problems than C++ and D with binary compatibility, since the function tables are dynamic, but the potential is there for D to make binary compatibility possible with interfaces. In the past I have built a C++ library that abstracted features of the OS. My goal was to make it possible to dynamically load a module that abstracted things like setting the IP address of a network interface. My modules used std::string instead of char * to lookup services to get objects that implement the interface. Big mistake. On a later version of the standard C++ runtime, the private implementation of std::string changed, so the dynamically loaded libraries crashed horribly. No change in string's interface, just the private stuff changed, but because it's a template, the code that uses it necessarily has to be aware of it. We ended up ditching the standard C++ library's version of string, and used STLPort so we could control the library. I envision this same sort of problem would be likely with D collection objects that were not used via interfaces. So my answer is -- go ahead and define these concepts and required names, and you can ignore the interfaces if they don't interest you. They do not subtract from the possibilities, and others may find good use for them. Does that make sense? I understand I could ignore the interfaces and call it a day, but it seems that at this point we are both convinced they are not quite good at anything: you only put them in because you suffered the Stockholm syndrome with Java, and I hate them with a passion. The reason I put them in is because they existed before, but thinking about what they would be useful for, D doesn't really support dynamic libraries all that well (recent advances have
Re: dcollections 1.0 and 2.0a beta released
On 05/24/2010 06:54 AM, Steven Schveighoffer wrote: length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, Do you agree that's an awkwardness? but all dcollections define length. I was hoping we're on the verge of agreeing to yank all interfaces and let the collection roam free. If that's a possibility, then your design isn't constrained to define length anymore unless in can. In that case, you can do coll.begin.empty to determine if the collection has any elements. Then why not make length optional and define empty? From the above it looks like an obviously better design. But all dcollections are bi-directional anyways, there is no singly linked list. That was a decision I made early on, because it allows more assumptions about dcollections' containers. I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple. Do you see dcollections' inability to express slists as a problem? And singly linked vs. doubly linked does not make any difference whether O(1) length is possible or not. As you say, it's O(1) splicing or O(1) length, regardless of single or double links. I disagree with your assessment that length is a less used operation than splicing. I think I have never used splicing with std::list. C++'s default is O(1) length, and I followed that design. I didn't assess that. My main point was that if one defines an slist, in many cases one defines it to be very small, simple, and cheap. Maintaining size will come on the radar and would discourage the use of the abstraction in favor of a hand-rolled implementation. This is failure to abstract. OTish: What are your thoughts on a bimap implementation and a child/sibling or general tree implementation as part of dcollections? I haven't the slightest what a bimap is :) I am not an expert in collections or data structures, I just reimplement things I have understood. My implementations are basically copied from my algorithm book, and refined as much as I can do. That being said, if you have any implementation of a tree or hash, it should be easy to insert into dcollections. If you have ideas for other collection types (i.e. other than Map, Set, Multiset or List), then I can look into that if you point me at an implementation or have one of your own. I purposefully left out multi-map because I've never had a huge use for it, and it seemed like a awkward thing to create an interface for... Tries of various kinds come up again and again. I don't think dcollections' current abstractions capture them, which further supports my point that containers have too much personality to be caught in the straitjacket of class hierarchies. I am not familiar with tries, and dcollections has no class hierarchy. I don't know of a word for "hierarchy with interfaces as root and inner nodes and classes as leaf nodes" so I just call that class hierarchy. Andrei
Re: dcollections 1.0 and 2.0a beta released
On Sat, 22 May 2010 07:56:31 -0400, Michel Fortin wrote: On 2010-05-21 22:55:16 -0400, Walter Bright said: Walter Bright wrote: If we can get anywhere close to that level of success with ranges and containers, we should all be well pleased. Mike Taylor has a phrase for that I think is well-coined: "impedance matching", defined as the work necessary to get one library module to work with another library module. This makes me think about something. In principle, I like the idea of containers being reference type. It works well when passing a container to functions. But at the same time, I despite it. By-reference containers forces you to have extra indirections even when you don't need them, and you have to worry about null. Sometime a value-type would be better, when creating more complex data structures for instance: class Channel { private { Array!Message inbound; Array!Message outbound; } ... } What's the point of having extra indirection here? Scope class members should solve this. It's been thrown around forever -- essentially, you should be able to define that a class member's storage is contained within the owner object. a value-type node-based containers just don't work well -- it's too easy to escape references, and too easy to accidentally duplicate the entire node set when passing to functions. It works fine for arrays because the array has a very simple structure -- data and length. And the length and data are somewhat orthogonal. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Fri, 21 May 2010 14:00:42 -0400, Andrei Alexandrescu wrote: On 05/20/2010 05:34 AM, Steven Schveighoffer wrote: I understand these points, but I'm already using interfaces to copy data between containers. I don't have to, I could have used generic code, but this way, only one function is instantiated to copy data from all the other containers. The problem with using generic code is that the compiler will needlessly duplicate functions that are identical. There is a copy() function that copies any range to any other range. It might need a revisit, but I think the way to go about copying is generic. This implies the space for the elements already exists in the target range. Copying data from a list to a set for instance can't just allocate space in the set and then use some generic copy function to copy the data in. If generic copying is to be used, it would be a copy function defined by the collection, not a standard one. Using interfaces is not as viral as you think. My interfaces can be used in generic code, as long as the generic code uses functions in the interfaces. If a library returns an interface, the author is saying "I don't want you using any functions outside this interface," so why is that a bad thing? Forcing people to *not* use interfaces has its drawbacks too. Dcollections gives the most flexible design I could muster, while still being useful. I'm not saying I'm against removing the interfaces until some later date, but I don't see any convincing arguments yet, especially since I've already seen benefits from having them. What are the benefits that you have noticed? I think you'd need to back up the copying argument with some data if you want to frame it as a benefit. When I mean see benefits, the benefits are not data-related but design related -- more designs are possible with interfaces and generic programming combined than with generic programming alone. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Thu, 20 May 2010 12:46:59 -0400, Robert Jacques wrote: On Thu, 20 May 2010 06:34:42 -0400, Steven Schveighoffer wrote: Robert Jacques Wrote: On Wed, 19 May 2010 21:42:35 -0400, Steven Schveighoffer > > Does that make sense? > > -Steve Yes and No. I understand where your coming from, but I think it's a bad idea. First, I think it needlessly expands the radius of comprehension needed to understand and use the library. (See Tangled up in tools http://www.pragprog.com/magazines/2010-04/tangled-up-in-tools) Second, I think designing a library to be flexible enough to meet some future, anticipated need (e.g. dlls) is a good idea, but actually implementing vaporous future needs is fraught with peril; it's too easy to guess wrong. Third, interface base design is viral; If library X uses interfaces then I have to use interfaces to interface with it. And if another library Y uses classes, then I'm going have to write a (needless) wrapper around one of them. I understand these points, but I'm already using interfaces to copy data between containers. I don't have to, I could have used generic code, but this way, only one function is instantiated to copy data from all the other containers. The problem with using generic code is that the compiler will needlessly duplicate functions that are identical. This sounds like a failure of design. Why aren't you using ranges to do this? Why are ranges necessarily better? I'm using the container's opApply, which I'd probably still use even if there were no interfaces. opApply allows much more possibilities in traversal than ranges which cannot use stack recursion without heap activity. Using interfaces is not as viral as you think. My interfaces can be used in generic code, as long as the generic code uses functions in the interfaces. If a library returns an interface, the author is saying "I don't want you using any functions outside this interface," so why is that a bad thing? Well, needlessly duplicated functions for one. :) More importantly, the example I gave was about third party libraries which I have no control over. So this solution explicitly doesn't work. And even if everyone used templates everywhere in order to be compatible with both interfaces and classes, isn't that a viral effect of having both? If a 3rd party library uses interfaces, it's probably for good reason. They most likely want to remain binary compatible with other libs, and/or want to abstract the implementation of some custom container type. If you don't like their requirements, don't use the library. -Steve
Re: dcollections 1.0 and 2.0a beta released
On Sat, 22 May 2010 16:58:12 -0400, Andrei Alexandrescu wrote: On 05/19/2010 03:07 PM, Steven Schveighoffer wrote: Ellery Newcomer Wrote: Are the collections supposed to not have isEmpty members? No. Use length == 0. O(1) length is always supported for all collections. One thing before I forget: I think any good collection abstraction must be concretized back to the classic collection instances. Singly-linked lists definitely can't be left off the list! It would be an epic failure. Imagine the papers! New York Times: "D has containers, but no singly-linked lists". The New Yorker: "D's abstractions are too abstract". The National Enquirer: "The Bat Boy stole D's singly-linked lists". Pyongyang Times: "Another failure of the so-called Western Democracy -- yet Juche doesn't need singly-linked lists anyway." Keeping the length cached in a singly-linked list is a costly mistake. It works against splicing (an important list primitive) and most of the time you don't need it so why waste time updating it. Adding any cruft beyond { T payload; List* next; } is very strong incentive for the coder to write their own. Why do you think they're using an SLL in the first place? Because it's simple and has efficient primitives! length is allowed to return NO_LENGTH_SUPPORT if O(1) length isn't possible, but all dcollections define length. In that case, you can do coll.begin.empty to determine if the collection has any elements. But all dcollections are bi-directional anyways, there is no singly linked list. That was a decision I made early on, because it allows more assumptions about dcollections' containers. I think length-less singly linked lists would be a good addition to phobos that are not part of the collection hierarchy, they are almost on par with builtin arrays as being so simple. And singly linked vs. doubly linked does not make any difference whether O(1) length is possible or not. As you say, it's O(1) splicing or O(1) length, regardless of single or double links. I disagree with your assessment that length is a less used operation than splicing. I think I have never used splicing with std::list. C++'s default is O(1) length, and I followed that design. OTish: What are your thoughts on a bimap implementation and a child/sibling or general tree implementation as part of dcollections? I haven't the slightest what a bimap is :) I am not an expert in collections or data structures, I just reimplement things I have understood. My implementations are basically copied from my algorithm book, and refined as much as I can do. That being said, if you have any implementation of a tree or hash, it should be easy to insert into dcollections. If you have ideas for other collection types (i.e. other than Map, Set, Multiset or List), then I can look into that if you point me at an implementation or have one of your own. I purposefully left out multi-map because I've never had a huge use for it, and it seemed like a awkward thing to create an interface for... Tries of various kinds come up again and again. I don't think dcollections' current abstractions capture them, which further supports my point that containers have too much personality to be caught in the straitjacket of class hierarchies. I am not familiar with tries, and dcollections has no class hierarchy. -Steve
Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)
On 2010-05-23 23.36, Andrei Alexandrescu wrote: I've thought for a very long time about the class vs. struct choice in a container, and I came to a startling conclusion: it (almost) doesn't matter. Could be either, and the tradeoffs involved are nonessential. Here they are: 1. Using a class makes implementing members easier because there's no need to do work through an additional member. With a struct, you need a pimpl approach. For example: struct Array { struct Impl { ... } Impl * impl; ... @property length() const { return impl->length; } ... } Isn't that just what "alias this" and "opDispatch" are for ? 2. struct gives you more power in managing collection's own memory, as long as the collection doesn't escape item addresses or other addresses of internal handles. So all other things being equal, struct has a net advantage. 3. The creation syntaxes are different. For Phobos, I suggest adding a simple function make() to std.algorithm. make!T(a, b, c) returns a newly constructed object T by invoking the constructor with arguments a, b, and c. That way we can make client code virtually agnostic of the class/struct choice for a container. That's it. Otherwise, one could use either to build a container. Let me note that I have reached the conclusion that containers should be at best reference types, with a meta-constructor Value!(C) that takes a container C and makes it into a value type. This result is the end of a long journey. I am quite sure I have it right. If anything, I am amazed at how much dogma I had to unlearn before reaching it. And I'm glad I did - this is very D-like: I wasn't looking for a class-based solution, and I wasn't looking for a struct-based solution. I was only looking for the truth - and surprise, this slightly asymmetric duality came forth. It seems Steve took the weekend off. At this point I'm poised to prepare a list of items that would condition putting dcollections in phobos. But I'd rather have Steve (and everyone) acquire their own motivation from my arguments, instead of operating changes as a concession. Andrei -- /Jacob Carlborg
Re: dcollections 1.0 and 2.0a beta released
Fantastic work Steve, Pretty good design IMHO, not sure why .. but somehow the collection battle reminds me to Steve Vai vs Ry Cooder Bjoern
Re: struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)
On 2010-05-23 17:36:52 -0400, Andrei Alexandrescu said: I've thought for a very long time about the class vs. struct choice in a container, and I came to a startling conclusion: it (almost) doesn't matter. Could be either, and the tradeoffs involved are nonessential. I'm starting to wonder whether this discussion really belongs in D.announce. Perhaps you should repost this on the general newsgroup and we shall discuss it there? This newsgroup was obviously the right place for the initial dcollection announcement, be we've sidetracked quite a lot since then. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
struct vs. class containers (was: Re: dcollections 1.0 and 2.0a beta released)
I've thought for a very long time about the class vs. struct choice in a container, and I came to a startling conclusion: it (almost) doesn't matter. Could be either, and the tradeoffs involved are nonessential. Here they are: 1. Using a class makes implementing members easier because there's no need to do work through an additional member. With a struct, you need a pimpl approach. For example: struct Array { struct Impl { ... } Impl * impl; ... @property length() const { return impl->length; } ... } 2. struct gives you more power in managing collection's own memory, as long as the collection doesn't escape item addresses or other addresses of internal handles. So all other things being equal, struct has a net advantage. 3. The creation syntaxes are different. For Phobos, I suggest adding a simple function make() to std.algorithm. make!T(a, b, c) returns a newly constructed object T by invoking the constructor with arguments a, b, and c. That way we can make client code virtually agnostic of the class/struct choice for a container. That's it. Otherwise, one could use either to build a container. Let me note that I have reached the conclusion that containers should be at best reference types, with a meta-constructor Value!(C) that takes a container C and makes it into a value type. This result is the end of a long journey. I am quite sure I have it right. If anything, I am amazed at how much dogma I had to unlearn before reaching it. And I'm glad I did - this is very D-like: I wasn't looking for a class-based solution, and I wasn't looking for a struct-based solution. I was only looking for the truth - and surprise, this slightly asymmetric duality came forth. It seems Steve took the weekend off. At this point I'm poised to prepare a list of items that would condition putting dcollections in phobos. But I'd rather have Steve (and everyone) acquire their own motivation from my arguments, instead of operating changes as a concession. Andrei
Re: dcollections 1.0 and 2.0a beta released
On 2010-05-22 16:01:37 -0400, Walter Bright said: Michel Fortin wrote: What's the point of having extra indirection here? Good question. I think the answer is: 1. When do you ever want to copy a collection? I almost never do, because copying one is an inherently expensive operation. Whenever you need to preserve the previous state of something before applying some transformation on it. But I agree that the copy should be explicit because it is O(n), hence my suggestion of disabling implicit copying for containers. Since we're at it, a reference types container sometime makes it too easy to just create a new reference to the same container when what you really want is to make a copy. I happen to have a bug of this sort to fix in my Objective-C program right now where a reference to a container leaked where it should have been a copy, causing unwanted mutations to the . 2. When you copy a collection, do you copy the container or the elements in the container or both? One would have to carefully read the documentation to see which it is. That increases substantially the "cognitive load" of using them. I don't see the extra cognitive load. Assuming you disable implicit copying of the container, you'll have to use ".dup", which will work exactly as an array. The way items are copied are exactly the same as if the container was a reference type (you call a "dup" or equivalent function and things get copied). 3. Going to lengths to make them value types, but then disabling copying them because you want people to use them as reference types, seems like a failure of design somewhere. I agree in a way. But at the same time, forcing everyone to use a reference type when sometime a value-type would be more adequate also looks like a failure of design to me. To me, the best tradeoff seems to use a value-type because it's quite trivial to create a reference-type from a value type when you need it; the reverse is awkward. 4. That silly extra level of indirection actually isn't there. Consider that even value locals are accessed via indirection: offset[ESP]. For a reference collection we have: offset[EBX]. No difference (yes, EBX has to be loaded, but if it is done more than once it gets cached in a register by the compiler). Have you ever worked with containers of containers? Surely yes since D associative arrays are one of them. So assume we want to implement our associative arrays like this: class HashTable(Key, Value) { Array!(Tuple!(Hash!Key, TreeSet!(Tuple!(Key, Value buckets; } Do you find it reasonable that the TreeSet be a reference type? Reference-type containers would mean one indirection and one extra allocated block for each bucket. Then add that 'Value' could itself be a struct or class containing its own container, and you're stuck with a third unnecessary level of indirection and extra calls to the GC allocate containers and/or check for null. Sound quite wasteful to me. In addition, those extra allocations add more logic to our hash table and thus more chances for bugs. Here I'm using a hash table as an example, the same problem applies to many other data structures, whether they are generic or specific to a particular problem. Container should be efficient and easy to use when composed one inside another. That's the greatest strengths of C++ value-type containers in my opinion. 5. Just making them all reference types zeans the documentation and use become a lot simpler. Simpler to explain, maybe. Simpler to use, I have my doubts. You're just moving the burden to somewhere else. A reference-type container requires a "new Container()" somewhere, and some protection logic against null. In exchange, you don't need to write 'ref' in functions taking containers, and can easily copy references to the container everywhere (sometime too easily). But the reference-type benefits aren't entirely lost with a value-type, because it's trivial to change a value-type as a reference-type. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: dcollections 1.0 and 2.0a beta released
On 05/22/2010 09:07 PM, Sean Kelly wrote: Andrei Alexandrescu Wrote: One thing before I forget: I think any good collection abstraction must be concretized back to the classic collection instances. Singly-linked lists definitely can't be left off the list! It would be an epic failure. Imagine the papers! New York Times: "D has containers, but no singly-linked lists". We could always say we were following C++'s lead :-) C++(0|1)x has forward_list. Needless to say, supporting size() at constant complexity was a matter of huge debate. The proposal that I like is this: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2543.htm Search the page for "size()". I don't know how the proposal was amended. Andrei
Re: dcollections 1.0 and 2.0a beta released
Andrei Alexandrescu Wrote: > > One thing before I forget: I think any good collection abstraction must > be concretized back to the classic collection instances. Singly-linked > lists definitely can't be left off the list! It would be an epic > failure. Imagine the papers! New York Times: "D has containers, but no > singly-linked lists". We could always say we were following C++'s lead :-)
Re: dcollections 1.0 and 2.0a beta released
Andrei Alexandrescu: > Keeping the length cached in a singly-linked list is a costly mistake. > It works against splicing (an important list primitive) and most of the > time you don't need it so why waste time updating it. ... LinkedList(T, bool WithLength=True) { static if(WithLength) { // defines @attribute length here etc } } Inside the methods like the add there is another static if(WithLength) that enables/disables the code that updates the length. This allows to keep the length by default, but makes it easy to remove it when desired. Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
On 05/19/2010 03:07 PM, Steven Schveighoffer wrote: Ellery Newcomer Wrote: Are the collections supposed to not have isEmpty members? No. Use length == 0. O(1) length is always supported for all collections. One thing before I forget: I think any good collection abstraction must be concretized back to the classic collection instances. Singly-linked lists definitely can't be left off the list! It would be an epic failure. Imagine the papers! New York Times: "D has containers, but no singly-linked lists". The New Yorker: "D's abstractions are too abstract". The National Enquirer: "The Bat Boy stole D's singly-linked lists". Pyongyang Times: "Another failure of the so-called Western Democracy -- yet Juche doesn't need singly-linked lists anyway." Keeping the length cached in a singly-linked list is a costly mistake. It works against splicing (an important list primitive) and most of the time you don't need it so why waste time updating it. Adding any cruft beyond { T payload; List* next; } is very strong incentive for the coder to write their own. Why do you think they're using an SLL in the first place? Because it's simple and has efficient primitives! OTish: What are your thoughts on a bimap implementation and a child/sibling or general tree implementation as part of dcollections? I haven't the slightest what a bimap is :) I am not an expert in collections or data structures, I just reimplement things I have understood. My implementations are basically copied from my algorithm book, and refined as much as I can do. That being said, if you have any implementation of a tree or hash, it should be easy to insert into dcollections. If you have ideas for other collection types (i.e. other than Map, Set, Multiset or List), then I can look into that if you point me at an implementation or have one of your own. I purposefully left out multi-map because I've never had a huge use for it, and it seemed like a awkward thing to create an interface for... Tries of various kinds come up again and again. I don't think dcollections' current abstractions capture them, which further supports my point that containers have too much personality to be caught in the straitjacket of class hierarchies. Andrei
Re: dcollections 1.0 and 2.0a beta released
Michel Fortin wrote: On 2010-05-21 22:55:16 -0400, Walter Bright said: Walter Bright wrote: If we can get anywhere close to that level of success with ranges and containers, we should all be well pleased. Mike Taylor has a phrase for that I think is well-coined: "impedance matching", defined as the work necessary to get one library module to work with another library module. This makes me think about something. In principle, I like the idea of containers being reference type. It works well when passing a container to functions. But at the same time, I despite it. By-reference containers forces you to have extra indirections even when you don't need them, and you have to worry about null. Sometime a value-type would be better, when creating more complex data structures for instance: class Channel { private { Array!Message inbound; Array!Message outbound; } ... } What's the point of having extra indirection here? Good question. I think the answer is: 1. When do you ever want to copy a collection? I almost never do, because copying one is an inherently expensive operation. 2. When you copy a collection, do you copy the container or the elements in the container or both? One would have to carefully read the documentation to see which it is. That increases substantially the "cognitive load" of using them. 3. Going to lengths to make them value types, but then disabling copying them because you want people to use them as reference types, seems like a failure of design somewhere. 4. That silly extra level of indirection actually isn't there. Consider that even value locals are accessed via indirection: offset[ESP]. For a reference collection we have: offset[EBX]. No difference (yes, EBX has to be loaded, but if it is done more than once it gets cached in a register by the compiler). 5. Just making them all reference types means the documentation and use become a lot simpler.
Re: dcollections 1.0 and 2.0a beta released
On 05/22/2010 06:28 AM, bearophile wrote: Pelle: Yes, it works, but it doesn't gain anything from it, which is what I said. :) Then what you have said was useless. No it isn't. The point Pelle was making is that there are three use cases for parameter passed arrays: 1. read-only (corresponds to const) 2. read/write + resizing/reallocation (corresponds to ref) 3. read/write, no resizing (corresponds to no modifier) With (1) and (2), if I see those modifiers in the signature, I can deduce the use of the array. With (3), I can't really tell from the signature if the programmer meant that use case or forgot something, but the case is valuable enough that it should be enforceable. With associative arrays, case (3) is rare (is it even possible?) to the point that it can drop out of consideration.
Re: dcollections 1.0 and 2.0a beta released
On 05/22/2010 01:28 PM, bearophile wrote: Pelle: Yes, it works, but it doesn't gain anything from it, which is what I said. :) Then what you have said was useless. Bye, bearophile How so? Passing by reference is slower, and sometimes completely meaningless. Therefore, having a rule that requires passing by ref is not a good. Also: int[] get_xs() { return new int[](100); } void main() { get_xs.inc_all; get_xs.inc_all_by_ref; // Error: get_xs() is not an lvalue }
Re: dcollections 1.0 and 2.0a beta released
BCS wrote: Cool. Now how do I write code so that it will always iterate the collection with the bigger O() lookup time (O(n) before O(log2(n)) before O(log16(n)) before O(1))? :D Add a function. auto foo( R1, R2 )( R1 r1, R2 r2 ) if ( R1.complexity( 10_000 ) > R2.complexity( 10_000 ) ) { ... } auto foo( R1, R2 )( R2 r1, R1 r2 ) if ( R1.complexity( 10_000 ) < R2.complexity( 10_000 ) ) { ... } -- Simen
Re: dcollections 1.0 and 2.0a beta released
On 2010-05-22 08:16:27 -0400, bearophile said: Michel Fortin: Couldn't we just make a struct that cannot be implicitly copied? The @disable attribute was invented for this purpose too: struct Foo { int x; @disable this(this) {} } void main() { Foo f1; Foo f2 = f1; } That prints: test.d(7): Error: struct test9.Foo is not copyable because it is annotated with @disable Indeed, thanks. I figured it out by myself while you were writing this. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: dcollections 1.0 and 2.0a beta released
Michel Fortin: > and you have to worry about null. Nonnull references of course solve this problem. > Couldn't we just make a struct that cannot be implicitly copied? The @disable attribute was invented for this purpose too: struct Foo { int x; @disable this(this) {} } void main() { Foo f1; Foo f2 = f1; } That prints: test.d(7): Error: struct test9.Foo is not copyable because it is annotated with @disable Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
On 2010-05-22 07:56:31 -0400, Michel Fortin said: @explicitdup struct Array { } void testVal(Array array); void testRef(ref Array array); unittest { Array array; testVal(array); // error, cannot copy array implicitly testVal(array.dup); // ok, array is copied testRef(array); // ok, array is passed by reference } If there's already a way to achieve this, I couldn't find it. Apparently it's already achievable this way: struct Array { @disable this(this); Array dup() { return Array(...); } ... } It also blocks simple assignments: Array array2 = array; // error, array is not copyable Array array3 = array.dup; // ok With this, I don't think we need containers to be reference types anymore. The naive error of copying containers everywhere without you knowing about it (as it occurs in C++) is simply no longer possible. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: dcollections 1.0 and 2.0a beta released
On 2010-05-21 22:55:16 -0400, Walter Bright said: Walter Bright wrote: If we can get anywhere close to that level of success with ranges and containers, we should all be well pleased. Mike Taylor has a phrase for that I think is well-coined: "impedance matching", defined as the work necessary to get one library module to work with another library module. This makes me think about something. In principle, I like the idea of containers being reference type. It works well when passing a container to functions. But at the same time, I despite it. By-reference containers forces you to have extra indirections even when you don't need them, and you have to worry about null. Sometime a value-type would be better, when creating more complex data structures for instance: class Channel { private { Array!Message inbound; Array!Message outbound; } ... } What's the point of having extra indirection here? I've been thinking about having both by-value and by-reference containers. The first-class name would be given to the by-reference container to give it more visibility, but that by-reference container would be a simple wrapper for a by-value container "part" implemented in a struct: struct ArrayPart(T) { ... } // by value array container. class Array(T) { ArrayPart!T part; alias part this; } So now, if you want to reuse a container "part" to build some kind of more complex data structure, you can do it like this: class Channel { private { ArrayPart!Message inbound; ArrayPart!Message outbound; } ... } No silly extra indirection. That said, all this gives me a feeling of an overcomplicated design. After all, the problem is that you want to pass the container by reference in function arguments, but it's __too easy to forget the ref__. Perhaps that's the problem that should be fixed. Couldn't we just make a struct that cannot be implicitly copied? Perhaps something like this: @explicitdup struct Array { } void testVal(Array array); void testRef(ref Array array); unittest { Array array; testVal(array); // error, cannot copy array implicitly testVal(array.dup); // ok, array is copied testRef(array); // ok, array is passed by reference } If there's already a way to achieve this, I couldn't find it. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: dcollections 1.0 and 2.0a beta released
Pelle: > Yes, it works, but it doesn't gain anything from it, which is what I > said. :) Then what you have said was useless. Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
On 05/22/2010 01:00 PM, bearophile wrote: Pelle: Wouldn't gain anything from ref, and wouldn't work with const. You are wrong, it works correctly with ref: Yes, it works, but it doesn't gain anything from it, which is what I said. :)
Re: dcollections 1.0 and 2.0a beta released
Pelle: > It's not a very good rule anyway: > > void inc_all(int[] xs) { > foreach (ref x; xs) { > x += 1; > } > } > > Wouldn't gain anything from ref, and wouldn't work with const. You are wrong, it works correctly with ref: import std.stdio: writeln; void inc_all(ref int[] array) { foreach (ref x; array) x++; } void main() { auto a = new int[10]; a.inc_all(); writeln(a); } It prints 1 1 1 1 1 1 1 1 1 1 Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
On 05/22/2010 12:20 PM, bearophile wrote: BCS: Maybe the style rule should be: dynamic arrays and AA's should be passed as const or ref. Something like that must be enforced by the compiler, or the design of arrays/AAs must be changed. Bye, bearophile It's not a very good rule anyway: void inc_all(int[] xs) { foreach (ref x; xs) { x += 1; } } Wouldn't gain anything from ref, and wouldn't work with const.
Re: dcollections 1.0 and 2.0a beta released
BCS: > Maybe the style rule should be: dynamic arrays and AA's should be passed > as const or ref. Something like that must be enforced by the compiler, or the design of arrays/AAs must be changed. Bye, bearophile
Re: dcollections 1.0 and 2.0a beta released
Walter Bright wrote: If we can get anywhere close to that level of success with ranges and containers, we should all be well pleased. Mike Taylor has a phrase for that I think is well-coined: "impedance matching", defined as the work necessary to get one library module to work with another library module. One example of bad impedance matching is C++ iostreams' attempt to make a memory buffer look like a file. Phobos propagated that mistake in its own streams. A better way is to use ranges, where the file is accessed via a range and a memory buffer is accessed via a range. Then anything that operates on data is written to the range, not a fake file interface.
Re: dcollections 1.0 and 2.0a beta released
Andrei Alexandrescu Wrote: > > I don't know Tango, but Java's containers are a terrible example to > follow. Java's container library is a ill-advised design on top of an > underpowered language, patched later with some half-understood seeming > of genericity. I think Java containers are a huge disservice to the > programming community because they foster bad design. Tango's container library is or was a port of Doug Lea's containers for Java. While Doug Lea is an absolute master with concurrency, I never liked his container library very much. As you've said, the common abstractions it draws are weird and not terribly useful. > I need to disagree with that. I've done and I do a ton of binary > interoperability stuff. You never expose a generic container interface! > Interoperable objects always embody high-level logic that is specific to > the application. They might use containers inside, but they invariably > expose high-level, application-specific functionality. This. Iterators (or ranges) are passed all over the place, but when operating directly on a container I always want to know what kind of container I'm dealing with. Cost of operations is an issue, I may need to manually sort at some point or be aware that iterators will be invalidated, etc. Steve has already said he doesn't use the interfaces, and I'm a huge fan of not doing speculative design. It's invariably wrong and then you get stuck supporting it. I'd vote to drop the interfaces. They can always be added back later anyway.
Re: dcollections 1.0 and 2.0a beta released
Hello Andrei, On 05/20/2010 08:22 AM, Steven Schveighoffer wrote: Michel Fortin Wrote: On 2010-05-20 06:34:42 -0400, Steven Schveighoffer said: I understand these points, but I'm already using interfaces to copy data between containers. I don't have to, I could have used generic code, but this way, only one function is instantiated to copy data from all the other containers. The problem with using generic code is that the compiler will needlessly duplicate functions that are identical. One question. Have you calculated the speed difference between using an interface and using generic code? Surely going through all those virtual calls slows things down a lot. I do like interfaces in principle, but I fear it'll make things much slower when people implement things in term of interfaces. That's why I'm not sure it's a good idea to offer container interfaces in the standard library. It's not that much slower. You get a much higher speedup when things can be inlined than virtual vs. non-virtual. However, I should probably make all the functions in the concrete implementations final. I made several of them final, but I should do it across the board. Yup. Even Java does that. I forgot whether it was a recanting of a former stance (as was the case with synchronized) or things were like that from the get-go. One thing I just thought of -- in dcollections, similar types can be compared to one another. For example, you can check to see if a HashSet is equal to a TreeSet. But that would not be possible without interfaces. Of course it would be possible. You write a generic function that takes two generic container and constrain the inputs such that at least one has element lookup capability. (Complexity-oriented design for the win!) Andrei Cool. Now how do I write code so that it will always iterate the collection with the bigger O() lookup time (O(n) before O(log2(n)) before O(log16(n)) before O(1))? :D -- ... <
Re: dcollections 1.0 and 2.0a beta released
Hello Vladimir, On Thu, 20 May 2010 04:42:35 +0300, Steven Schveighoffer wrote: interfaces Does that imply that the most important methods are virtual? If so, say good-bye to inlining, and hello to an additional level of dereferencing. From a technical standpoint there is no reason that a method needs to be called virtually from a class reference just because the same method gets called virtually from an interface reference. -- ... <
Re: dcollections 1.0 and 2.0a beta released
Hello BLS, ok Some collection types belong to the same family. So they should be loosely coupled. How to do that without having Interfaces ? How about this: Make a flat family of collections. Name the methods so that across the lib, the same name does the same kind of thing and different names do different kinds of things. Then define a flat family of whatever interfaces are useful and have each class derive from all the interfaces they happen to implement (but don't go out of your way to make anything fit). -- ... <
Re: dcollections 1.0 and 2.0a beta released
Hello Bill, On Fri, May 21, 2010 at 9:43 AM, Steven Schveighoffer wrote: void foo(int[int] x) { x[5] = 5; } void main() { int[int] x; foo(x); assert(x[5] == 5); // fails } And with arrays at least it's even more insidious, because sometimes it will seem to work, and sometimes it won't. void foo(int[] x) { x ~= 10; } Caller's .length will never get updated by that, but it won't crash so it may take a while to find the bug. Maybe the style rule should be: dynamic arrays and AA's should be passed as const or ref. -- ... <
Re: dcollections 1.0 and 2.0a beta released
Hello superdan, dun tell me it dun work. i dun explain shit again. it works coz a struct cant be null. but a struct can be a ref if it only haz one pointer inside. methinks the builtin hash iz dat way. void foo(container!shit poo) { if(!poo) poo = new container!shit; // fuck dat shit poo.addElement(Shit(diarrhea)); } dat sucks bull ballz. That code is broken, if poo is of a class type, that just new's a container, adds an element, and then drops the reference to get GC'ed. To many it do anything generally useful, it would need to have another level of indirection. -- ... <
Re: dcollections 1.0 and 2.0a beta released
On Thu, 20 May 2010 04:42:35 +0300, Steven Schveighoffer wrote: interfaces Does that imply that the most important methods are virtual? If so, say good-bye to inlining, and hello to an additional level of dereferencing. Without meaning any disrespect to all the work you did, allow me to say that I won't use a library that could be faster (without making usage too clumsy), but isn't. (I prefer my D programs to be as fast as reasonably possible - if I didn't care about speed, I'd use another language.) For the same reasons, I'd be disappointed if the library was admitted as-is into Phobos, since it doesn't live up to my personal ideal of what D should be. -- Best regards, Vladimirmailto:vladi...@thecybershadow.net
Re: dcollections 1.0 and 2.0a beta released
Walter Bright wrote: If we can get anywhere close to that level of success with ranges and containers, we should all be well pleased. Mike Taylor has a phrase for that I think is well-coined: "impedance matching", defined as the work necessary to get one library module to work with another library module. One example of bad impedance matching is C++ iostreams' attempt to make a memory buffer look like a file. Phobos propagated that mistake in its own streams.
Re: dcollections 1.0 and 2.0a beta released
Andrei Alexandrescu wrote: On 05/19/2010 09:59 PM, Robert Jacques wrote: Yes and No. I understand where your coming from, but I think it's a bad idea. First, I think it needlessly expands the radius of comprehension needed to understand and use the library. (See Tangled up in tools http://www.pragprog.com/magazines/2010-04/tangled-up-in-tools) For the record, I strongly agree with this. I do too, but that's the easy part. Living up to those ideals is extremely hard, usually because most designers think they have designed simple interfaces when everyone else thinks they didn't. In this whole container discussion, I'd like to point out something Andrei pointed out to me. The one extremely successful example of pluggable components is the unix filter model. Essentially, one program pipes its output to the next, which pipes its output to the next, etc. The unix console is designed around that paradigm. If we can get anywhere close to that level of success with ranges and containers, we should all be well pleased.
Re: dcollections 1.0 and 2.0a beta released
On 21/05/2010 19:15, Andrei Alexandrescu wrote: I'll now slowly answer the great replies I got in this thread, in chronological order. Since I read them all before replying, I might sometimes refer to future posts. Hopefully that won't be too confusing. Andrei Following the dcollection thread, it seems that your argument is that Interfaces in a non hierarchical collection library are useless. Or : Hierarchical collection libs are nonsense and ergo interfaces are obsolete. ok Some collection types belong to the same family. So they should be loosely coupled. How to do that without having Interfaces ? Next f.i. forwardranges belongs to a certain family of collections. and Ranges are (or should be) Interfaces. You see me very confused. Bjoern given struct node class col : IFWRange node Node; // No ?
Re: dcollections 1.0 and 2.0a beta released
Steven Schveighoffer wrote: I myself don't really use the interface aspect of the classes, it is mostly a carryover from the Java/Tango inspirations. But I can see one good reason to keep them -- binary interoperability. For example, it might be the case some day when D has good support with dynamic libraries that a library exposes some piece of itself as a Map or List interface. Hi, Steven. You made a good point on interoperability. Strict, precise, readable, interfaces, that's what I would like in the really good standard library for D. But, is D mature enough to this kind of possibilities, considering, at least, known bugs involving interfaces? I don't even feel myself free to use interfaces in my code because of undefined behavior it may cause. Shared libraries... Is this going to happen on Linux? When? -- Alex Makhotin, the founder of BITPROX, http://bitprox.com
Re: dcollections 1.0 and 2.0a beta released
Andrei Alexandrescu wrote: See the -mergefunc compiler switch of LDC, to merge identical functions (like ones produced by template instantiations). This feature is not very powerful yet, but it's present and I have seen it works. Indeed. I'm no expert in linkers, but in my opinion this is one of the most basic optimizations a linker should perform. And C++ has pushed linkers to do that for years now so I'd expect most linkers to do that already... The problem with templates are more the multiple slightly different instanciations. In general this is good for performance, but it's only needed for the code paths that need to be fast. I think generic containers should be fast. There's been talk that Walter's slow porting of the linker to C and then D will put him in the position to introduce such an optimization. Unfortunately, it's an agonizingly slow process. It also does nothing for Linux or OSX.
Re: dcollections 1.0 and 2.0a beta released
On 05/21/2010 01:34 PM, Walter Bright wrote: Andrei Alexandrescu wrote: I wrote a solution to the problem in native D. It goes like this: alias Container!(int, addable | purgeable) Messerschmidt; void messWith(Messerschmidt i) { ... use i's capabilities to add and purge ... } I agree with Michael Fortin that the | is questionable. I'd like to suggest instead that it should instead be a variadic list of names, like: alias Container!(int, addable, purgeable) Msserschmidt; Perhaps the names should follow a naming convention, alias Container!(int, ContainerAddable, ContainerPurgeable) Msserschmidt; The problem with using scoped names, like Container.Addable, is scoped names cannot be added to. Well the problem stays: compound interfaces grow combinatorially with the number of components, because an interface X!(A, B, C) inherits at the same time X!(A, B), X!(A, C), and X!(B, C). Check the epic inheritance diagram here: http://www.artima.com/cppsource/codefeatures2.html and the equally epic table here: http://www.artima.com/cppsource/codefeatures3.html Andrei
Re: dcollections 1.0 and 2.0a beta released
Andrei Alexandrescu wrote: I wrote a solution to the problem in native D. It goes like this: alias Container!(int, addable | purgeable) Messerschmidt; void messWith(Messerschmidt i) { ... use i's capabilities to add and purge ... } I agree with Michael Fortin that the | is questionable. I'd like to suggest instead that it should instead be a variadic list of names, like: alias Container!(int, addable, purgeable) Msserschmidt; Perhaps the names should follow a naming convention, alias Container!(int, ContainerAddable, ContainerPurgeable) Msserschmidt; The problem with using scoped names, like Container.Addable, is scoped names cannot be added to.
Re: dcollections 1.0 and 2.0a beta released
On Fri, May 21, 2010 at 10:50 AM, superdan wrote: >> void foo(int[int] x) >> { >> x[5] = 5; >> } >> void main() >> { >> int[int] x; >> foo(x); >> assert(x[5] == 5); // fails >> } >> -Steve > > wrote a long post but it got lost. shit. bottom line dats a bug in dmd or > phobos. Unfortunately it works exactly as designed. --bb
Re: dcollections 1.0 and 2.0a beta released
On 05/20/2010 02:47 PM, bearophile wrote: Michel Fortin: Devirtualization is only possible in certain cases: when the function knows exactly which type it'll get.< You are wrong, in most cases there are ways to de-virtualize, even when the runtime type isn't exactly known, but sometimes to do it you have to work too much. This is probably why C# dotnet doesn't perform this optimization. If we get deeper into this branch, we'll forget where we started from (are interfaces sensible for this design?) and we'll reframe the interfaces vs. no interfaces decision into a speed loss vs. no speed loss decision. There are other arguments to look at besides performance. Andrei