Re: Names with trailing question mark
On 2011-10-04 03:13, Nick Sabalausky wrote: "bearophile" wrote in message news:j6dgvp$1rot$1...@digitalmars.com... Predicates are quite common. In D I presume the standard way to write them is with a name like "isFoo". In other languages they are written in other ways: if (foo.isEven) {} if (foo.isEven()) {} filter!isEven(data) if (foo.evenQ) {} if (foo.evenQ()) {} filter!evenQ(data) if (foo.even?) {} if (foo.even?()) {} filter!even?(data) Other usages: contains? areInside ==> inside? It's nice, and it's one of the things in Ruby that I like, but I'd be surprised if it didn't cause parsing (or even lexing) ambiguities with ?:. If that could be reasonably solved, I'd be in favor of it, but I'm not sure it can. If would choose to require a space before the question mark in a ternary operator. -- /Jacob Carlborg
Re: Names with trailing question mark
On 2011-10-04 01:37, bearophile wrote: Predicates are quite common. In D I presume the standard way to write them is with a name like "isFoo". In other languages they are written in other ways: if (foo.isEven) {} if (foo.isEven()) {} filter!isEven(data) if (foo.evenQ) {} if (foo.evenQ()) {} filter!evenQ(data) if (foo.even?) {} if (foo.even?()) {} filter!even?(data) Other usages: contains? areInside ==> inside? I don't remember serious recent discussions here about that last form. Allowing a single trailing question mark in D names has some advantages: 1) It gives a standard way to denote a predicate, so it becomes very easy to tell apart a predicate from other non predicate things. 2) It allows for short names. 3) A single question mark is easy and quick to write on most keyboards. 4) It is used in other languages, as Ruby, and people seem to appreciate it. 5) Its meaning is probably easy enough to understand and remember. Some disadvantages: 1) "?" makes code less easy to read aloud. 2) It is less good to use the trailing "?" in functions that aren't properties, the syntax becomes less nice looking: if (foo.even()?) Why would you put the question mark after the parentheses. At least in Ruby the question mark is part of the method name. This looks better: if (foo.even?()) 3) The usage of the trailing "?" in names forbids later different usages for it, like denoting nullable types. 4) In expressions that use the ternary operator and a property predicate (without leading ()) it becomes a bit less easy to read (example from http://stackoverflow.com): string customerName = user.loggedIn? ? user.name : "Who are you?"; I think this too gets accepted: string customerName = user.loggedIn??user.name:"Who are you?"; This is not accepted in Ruby. You need a space after the second question mark. If this would be implemented I would vote for requiring a space after the first question mark as well. I vaguely like the idea of using a trailing question mark in predicate names. Bye, bearophile What about allowing method names like Scala does, containing arbitrary symbols. void ::: (int a) {} I assume that will complicate the language quite a lot. -- /Jacob Carlborg
Re: User attributes (was Proposal on improvement to deprecated)
On 2011-10-03 21:54, bearophile wrote: Jacob Carlborg: I would like to have something similar as this possible as well: class Foo { @get_set int x; } Would insert a getter and setter for "x" like this: class Foo { int x_; int x () { return x_; } int x (int x) { return x_ = x; } } I think a template mixin is already able to do this. Bye, bearophile Template mixins can do (most of) the other things suggested in this thread as well, except for your suggestion. But it's not pretty. -- /Jacob Carlborg
Re: std.getopt suggestion
"Nick Sabalausky" wrote in message news:j6e836$2gj$1...@digitalmars.com... > "Walter Bright" wrote in message > news:j6e1k9$2p9u$1...@digitalmars.com... >> >> Steve Jobs is famously successful for paring down feature sets to the >> bare minimum that works for 90% of the users, and then doing those >> features very well. >> > > Steve Jobs is famous for handling the bare minimum that works for 90% of > *average Joe* users and saying "Fuck off" to everyone and everything else. > That's why all his products are shit. > He's also famous for being a pretentious fucking tool.
Re: std.getopt suggestion
"Walter Bright" wrote in message news:j6e1k9$2p9u$1...@digitalmars.com... > > Steve Jobs is famously successful for paring down feature sets to the bare > minimum that works for 90% of the users, and then doing those features > very well. > Steve Jobs is famous for handling the bare minimum that works for 90% of *average Joe* users and saying "Fuck off" to everyone and everything else. That's why all his products are shit.
The problem with std.conv.emplace
I'm currently trying to fix the problem I have with std.conv.emplace to fully replace the deprecated new/delete operator overloading with a template. However it discovered that this is currently not possible, at least not to my knowdelge. The problem is as follows: class Foo { } class Test { private: Foo m_foo; public: this(Foo foo){ m_foo = foo; } void SetFoo(Foo foo){ m_foo = foo; } } void main(){ void[__traits(classInstanceSize,Test)] mem = typeid(Test).init[]; emplace!Test(mem,null); //do not know how to create a object of type test with args (void*) } If I however do the following it works (ignore the missing init part here): void main(){ void[__traits(classInstanceSize,Test)] mem = typeid(Test).init[]; Test t = cast(Test)mem.ptr; t.__ctor(null); } The same problem occurs when you do: void CallSetFoo(T)(Test c,T arg){ c.SetFoo(arg); } CallSetFoo(new Test(),null); It seems that null actually has it's own type but that type is not exposed into the language and is lost as soon as null is passed to a template. To a template null just looks like a void* and therefore can not be used correctly any more. Now one could write some code that would query all possible constructors find a matching one and manually cast void* to the correct reference type however this would require a runtime check for null to avoid casting any void* to a reference, and I would highly prefer a solution that would not require runtime checks to be safe. My suggestion would be to actually give null it's own type and expose that into the language so it can be correctly used with templates. Any other ideas how to fix this? Maybe there is a way with the current type system I don't know of. -- Kind Regards Benjamin Thaut
Re: Thread local and memory allocation
On Mon, 03 Oct 2011 15:48:57 -0400, deadalnix wrote: [snip] What I suggest in add a flag SHARED in BlkAttr and store it as an attribute of the block. Later modification could be made according to this flag. This attribute shouldn't be modifiable later on. What do you think ? Is it something it worth working on ? If it is, how can I help ? I've been a proponent of thread-local garbage collection, so naturally I think it's a good idea :) There are some GCs specifically tailored for immutable data, so I'd probably wish to add separate SHARED and IMMUTABLE flags. On the con side, the major issue with thread-local GCs is that currently we don't have good ways of building shared and immutable data. This leads to people building data with mutable structures and casting at the end. Now the issue with shared, is mostly a quality of implementation issue. However, building immutable data structures efficiently requires a unique (aka. mobile) storage type, which we'll probably get at the same time D gets an ownership type system. That is to say, no time in the foreseeable future. That said, there are are mitigating factors. First, by far the most common example of the build & cast pattern involves string/array building; a task appender addresses in spades. Second, std.allocators could be used to determine which heap to allocate from. Third, we could op to be able to switch the GC from thread-local to shared mode and visa versa; the idea being that inside an object building routine, all allocations would be casted to immutable/shared and thus! the local heap should be bypassed. As for how can you help, I'd suggest building a thread local gc, following the design of the recent discussion on std.allocators, if you're up to it.
Re: Request for pre-review: std.serialization/orange
On Mon, 03 Oct 2011 14:10:52 -0400, Jacob Carlborg wrote: On 2011-10-03 15:57, Robert Jacques wrote: So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API? No, I'm not saying that. Example: struct Array { void* ptr; size_t length; size_t elementSize; bool isSliceOf (Array b) { return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length * b.elementSize; } } void main () { auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. 4]; auto aArr = Array(a.ptr, a.length, 4); auto bArr = Array(b.ptr, b.length, 4); assert(bArr.isSliceOf(aArr)); assert(!aArr.isSliceOf(bArr)); } Both the asserts in the above code passes as expected. See, no serialization or archiving in sight. Is there something I'm missing? That putting isSliceOf in the public API, implies its usage by the archiver. Actually it does not need to be part of the public API when I think about it. I can move it into Serializer. Array would still need to be public since both Serailzer and Archive need access to it and the package attribute doesn't work very well. Regarding design, I agree, although I'd go one further and define Array as a public type inside the Serializer class. However, this concept of an 'Array' is fundamentally flawed. Consider: auto c = a[1..3]; auto cArr = Array(c.ptr,c.length,4); assert(!cArr.isSliceOf(bArr)); assert(!bArr.isSliceOf(cArr)); // and b ~= 5; bArr = Array(b.ptr, b.length, 4); assert(!bArr.isSliceOf(aArr)); In short, a serializer must be capable of handling overlapping arrays, not just strict slices. The array representation therefore needs to parameterize both the underlying array common to all slices and the actual slice that was serialized for that key. The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance. I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array? Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm. Now I think I start to understand. I'm assuming you currently separate object and array alias resolution and don't handle pointers at all. Yes. Pointers are handled as well. It's handled similar to arrays/slices. First it always serializes what the pointer points to. Then in a post process step, after all serialization is done, it replaces all serialized pointers, that points to a value that has been serialized, with a reference. If a given pointer doesn't point to a value that has be serialized it's left as is. Can a pointer point to the interior of an object? To an element of an array? I can see if I this memory chunk approach can be used instead. How will this be used with the balanced tree, could you give a simple example? Well, balanced trees need a comparison function so: struct Node { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk //... Other meta-data, i.e. ID, int opCmp(const ref Node b) { if( tail < b.head) return -1; if(b.tail < head) return 1; return 0; } } On equality / assignment, one just has to combine the heads and tail with min/max, respectively, and update hasAliases, etc. The difficulty is when a new node 'bridges the gap' between two existing nodes. This has to handled explicitly as part of the tree re-balancing, but you may want to consider making the merging of nodes part of the comparison operator for simplicity/efficiency: head = min(head,b.head); tail = max(tail,b.tail); hasAliases = true; After pruning, updating meta-data, etc, the aliased memory chunk for any given pointer can be found using a separate comparison operator: int opCmp(const ref void* b) { if(tail < b ) return -1; if(b< head) return 1; return 0; } // Which you'd probably use like: if( auto node = arr.ptr in setOfAliases ) { auto offset = arr.ptr - node.head; //... } else {//.
Re: std.getopt suggestion
On Monday, October 03, 2011 21:20:48 Walter Bright wrote: > On 9/29/2011 11:54 AM, Jonathan M Davis wrote: > > And yes, that's an argument by ad populum > > (or whatever the exact name is), but what's considered "clutter" is > > subjective. Yes, the improvement would be relatively minor, but so's the > > cost of the change, and while it doesn't necessarily show that you're > > wrong when no one seems to agree with you, it does at least say > > something when no one agrees with you. > > I've been only a casual user of std.getopt, barely scratching the surface of > what it can do. But I do have a few general thoughts on this. > > One of the very hardest things in design is knowing when to say "no" to a > new feature. The feature is desired by some subset of the users, it can be > ignored by those who have no use for it, so it seems like an unequivocal > win, right? > > But: > > 1. It adds to the "cognitive load" of the product. The cognitive load is how > thick the manual is. The bigger it is, the more intimidating it is, and the > fewer will dare to even open it. There is immense attraction in simple to > understand products. std.getopt is supposed to make life easier for > programmers - pages and pages and pages of documentation, options, complex > examples, etc., just lead one to say "screw it, I'll roll my own" and it > has failed. > > Steve Jobs is famously successful for paring down feature sets to the bare > minimum that works for 90% of the users, and then doing those features very > well. > > 2. Once a feature is there, it stays forever. It's very hard to judge how > many people rely on a feature that turns out in hindsight to be baggage. > Removing it arbitrarily will break existing code and tick off people. C++ > has a number of hare-brained features (like trigraphs) that everyone hates > but prove impossible to remove, despite it mucking up progress with > language. > > 3. Increasing the complexity means more maintenance, cognitive load for the > maintenance programmer, and bugs, bugs, bugs. > > 4. If a user really needs a special case not supported by std.getopt, it is > straightforward to roll his own. > > 5. Supporting (well) only a reduced feature set means that apps will tend to > have command line behavior that is more consistent and predictable, which > is a good thing. > > > It's why I have rather bull-headedly resisted adding feature after feature > to D's unittest facility. The unittest feature has been a home run for D, > and I suspect a lot of its success has been its no-brainer simplicity and > focus on doing one thing well. All good thoughts, but in this case, it's an argument over rearranging how some of the options are set (options which are probably more or less _never_ set and arguably should never have been configurable in the first place for essentially the reasons that you're giving). So, we wouldn't be looking at complicating anything or increasing the cognitive load in using it. All it would do would be to make some mutable global variables into a struct which would be passed in were the programmer to actually want to set them (which probably never happens). - Jonathan M Davis
Re: std.getopt suggestion
On 9/29/2011 11:54 AM, Jonathan M Davis wrote: And yes, that's an argument by ad populum (or whatever the exact name is), but what's considered "clutter" is subjective. Yes, the improvement would be relatively minor, but so's the cost of the change, and while it doesn't necessarily show that you're wrong when no one seems to agree with you, it does at least say something when no one agrees with you. I've been only a casual user of std.getopt, barely scratching the surface of what it can do. But I do have a few general thoughts on this. One of the very hardest things in design is knowing when to say "no" to a new feature. The feature is desired by some subset of the users, it can be ignored by those who have no use for it, so it seems like an unequivocal win, right? But: 1. It adds to the "cognitive load" of the product. The cognitive load is how thick the manual is. The bigger it is, the more intimidating it is, and the fewer will dare to even open it. There is immense attraction in simple to understand products. std.getopt is supposed to make life easier for programmers - pages and pages and pages of documentation, options, complex examples, etc., just lead one to say "screw it, I'll roll my own" and it has failed. Steve Jobs is famously successful for paring down feature sets to the bare minimum that works for 90% of the users, and then doing those features very well. 2. Once a feature is there, it stays forever. It's very hard to judge how many people rely on a feature that turns out in hindsight to be baggage. Removing it arbitrarily will break existing code and tick off people. C++ has a number of hare-brained features (like trigraphs) that everyone hates but prove impossible to remove, despite it mucking up progress with language. 3. Increasing the complexity means more maintenance, cognitive load for the maintenance programmer, and bugs, bugs, bugs. 4. If a user really needs a special case not supported by std.getopt, it is straightforward to roll his own. 5. Supporting (well) only a reduced feature set means that apps will tend to have command line behavior that is more consistent and predictable, which is a good thing. It's why I have rather bull-headedly resisted adding feature after feature to D's unittest facility. The unittest feature has been a home run for D, and I suspect a lot of its success has been its no-brainer simplicity and focus on doing one thing well.
Re: Thread local and memory allocation
Sean Kelly Wrote: > On Oct 3, 2011, at 3:27 PM, Jason House wrote: > > > Sean Kelly Wrote: > >> There's another important issue that hasn't yet been addressed, which is > >> that when the GC collects memory, the thread that finalizes non-shared > >> data should be the one that created it. So that SHARED flag should really > >> be a thread-id of some sort. Alternately, each thread could allocate from > >> its own pool, with shared allocations coming from a common pool. This > >> would allow the lock granularity to be reduced and in some cases > >> eliminated. > > > > > > Why not run the collection for a single thread in the thread being > > collected? It's a simple way to force where the finalizer runs. It's a big > > step up from stop-the world collections, but still requires pauses. > The world can't be stopped when finalizers run or the app can deadlock. So the only correct behavior is to have the creator of a TLS block be the one to finalize it. If only one thread is stopped, how will a deadlock occur? If it's a deadlock due to new allocations, doesn't the current GC already handle that?
Re: Names with trailing question mark
"bearophile" wrote in message news:j6dgvp$1rot$1...@digitalmars.com... > Predicates are quite common. In D I presume the standard way to write them > is with a name like "isFoo". In other languages they are written in other > ways: > > > if (foo.isEven) {} > if (foo.isEven()) {} > filter!isEven(data) > > if (foo.evenQ) {} > if (foo.evenQ()) {} > filter!evenQ(data) > > if (foo.even?) {} > if (foo.even?()) {} > filter!even?(data) > > Other usages: > > contains? > areInside ==> inside? > > It's nice, and it's one of the things in Ruby that I like, but I'd be surprised if it didn't cause parsing (or even lexing) ambiguities with ?:. If that could be reasonably solved, I'd be in favor of it, but I'm not sure it can.
Re: Thread local and memory allocation
On 10/3/2011 4:20 PM, Sean Kelly wrote: Immutable data would have to be allocated on the shared heap as well, which means the contention for the shared heap may actually be fairly significant. But the alternatives are all too complex (migrating immutable data from local pools to a common pool when a thread terminates, etc). There's also the problem of transferring knowledge of whether something is immutable into the allocation routine. As things stand, I don't believe that type info is available. Right. The current language allows no way to determine in advance if an allocation will be eventually made immutable (or shared) or not. However, if the gc used thread local pools to do the allocation from (not the collection), the gc would go faster because it wouldn't need locking to allocate from those pools. This change can happen without any language or compiler changes.
Names with trailing question mark
Predicates are quite common. In D I presume the standard way to write them is with a name like "isFoo". In other languages they are written in other ways: if (foo.isEven) {} if (foo.isEven()) {} filter!isEven(data) if (foo.evenQ) {} if (foo.evenQ()) {} filter!evenQ(data) if (foo.even?) {} if (foo.even?()) {} filter!even?(data) Other usages: contains? areInside ==> inside? I don't remember serious recent discussions here about that last form. Allowing a single trailing question mark in D names has some advantages: 1) It gives a standard way to denote a predicate, so it becomes very easy to tell apart a predicate from other non predicate things. 2) It allows for short names. 3) A single question mark is easy and quick to write on most keyboards. 4) It is used in other languages, as Ruby, and people seem to appreciate it. 5) Its meaning is probably easy enough to understand and remember. Some disadvantages: 1) "?" makes code less easy to read aloud. 2) It is less good to use the trailing "?" in functions that aren't properties, the syntax becomes less nice looking: if (foo.even()?) 3) The usage of the trailing "?" in names forbids later different usages for it, like denoting nullable types. 4) In expressions that use the ternary operator and a property predicate (without leading ()) it becomes a bit less easy to read (example from http://stackoverflow.com): string customerName = user.loggedIn? ? user.name : "Who are you?"; I think this too gets accepted: string customerName = user.loggedIn??user.name:"Who are you?"; I vaguely like the idea of using a trailing question mark in predicate names. Bye, bearophile
Re: Thread local and memory allocation
On 10/3/2011 12:48 PM, deadalnix wrote: What do you think ? Is it something it worth working on ? If it is, how can I help ? It is a great idea, and it has been discussed before. The difficulties are when thread local allocated data gets shared with other threads, like for instance immutable data that is implicitly shareable.
Re: Thread local and memory allocation
On Oct 3, 2011, at 3:55 PM, Walter Bright wrote: > On 10/3/2011 12:48 PM, deadalnix wrote: >> What do you think ? Is it something it worth working on ? If it is, how can I >> help ? > > It is a great idea, and it has been discussed before. The difficulties are > when thread local allocated data gets shared with other threads, like for > instance immutable data that is implicitly shareable. Immutable data would have to be allocated on the shared heap as well, which means the contention for the shared heap may actually be fairly significant. But the alternatives are all too complex (migrating immutable data from local pools to a common pool when a thread terminates, etc). There's also the problem of transferring knowledge of whether something is immutable into the allocation routine. As things stand, I don't believe that type info is available.
Re: Thread local and memory allocation
On Oct 3, 2011, at 3:27 PM, Jason House wrote: > Sean Kelly Wrote: >> There's another important issue that hasn't yet been addressed, which is >> that when the GC collects memory, the thread that finalizes non-shared data >> should be the one that created it. So that SHARED flag should really be a >> thread-id of some sort. Alternately, each thread could allocate from its >> own pool, with shared allocations coming from a common pool. This would >> allow the lock granularity to be reduced and in some cases eliminated. > > > Why not run the collection for a single thread in the thread being collected? > It's a simple way to force where the finalizer runs. It's a big step up from > stop-the world collections, but still requires pauses. The world can't be stopped when finalizers run or the app can deadlock. So the only correct behavior is to have the creator of a TLS block be the one to finalize it.
Re: Thread local and memory allocation
Sean Kelly Wrote: > There's another important issue that hasn't yet been addressed, which is that > when the GC collects memory, the thread that finalizes non-shared data should > be the one that created it. So that SHARED flag should really be a thread-id > of some sort. Alternately, each thread could allocate from its own pool, > with shared allocations coming from a common pool. This would allow the lock > granularity to be reduced and in some cases eliminated. Why not run the collection for a single thread in the thread being collected? It's a simple way to force where the finalizer runs. It's a big step up from stop-the world collections, but still requires pauses.
Re: Thread local and memory allocation
Yes, I was thinking is such a thing. Each thread has a local heap and you have a common shared heap too, with shared data in it. Is such a case, the flag is suffiscient because then the GC could handle that and trigger thread local heap allocation instead of shared one. This is consistent with the swap friendliness I was talking about and can reduce the need of synchronization when allocating memory (a lock will only occur if the GC doesn't have any memory left in his pool for the given thread). And solve finalization's thread, yes. Le 03/10/2011 22:54, Sean Kelly a écrit : On Oct 3, 2011, at 12:48 PM, deadalnix wrote: There's another important issue that hasn't yet been addressed, which is that when the GC collects memory, the thread that finalizes non-shared data should be the one that created it. So that SHARED flag should really be a thread-id of some sort. Alternately, each thread could allocate from its own pool, with shared allocations coming from a common pool. This would allow the lock granularity to be reduced and in some cases eliminated. I'd like to move to CDGC as an intermediate step, and that will need some testing and polish. That would allow for precise collections if the compiler support is added. Then the thread-local finalization has to be tackled one way or another. I'd favor per-thread heaps but am open to suggestions and/or help.
Re: Next in the Review Queue?
Jonathan M Davis Wrote: > There isn't a formal list for the review queue anywhere. It's managed > entirely > by posts in the newsgroup. It's being managed fairly informally at this > point, > since we don't always review the item that's been in the queue the longest > (depending on the current circumstances), and unless the person who's > submitting a module for review is active on the newsgroup and ready to be > active in the review of their code, there's no point in reviewing their code > anyway. And as long as that works, we're going to continue to do it that way. > > - Jonathan M Davis In fact, I'm thinking that these threads should come up periodically but not include a list. Instead just ask what is ready for review. The reason are two fold. 1. As you say the author must be active, and while they may usually be active you want them active now. 2. It forces the author to speak to their being active/not on vacation. I wasn't sure if it made sense to reply, but it does. I suppose it could be a little discouraging, "they keep forgetting about me! I've been ready the last 3..." kind of response. But I don't think anyone here would take it that way, and considering the Boost review criteria, once the code is in it is it is "your" responsibility including passing the baton. So sticking around for review isn't any different from having it in.
Re: Thread local and memory allocation
Sean Kelly: > I'd favor per-thread heaps but am open to suggestions and/or help. (I am ignorant still about such issues, so I usually keep myself quiet about them. Please forgive me if I am saying stupid things.) The memory today is organized like a tree, the larger memories are slower, and the far the memory is, the more costly it is to move data across, and to keep coherence across two pieces of data that are supposed to be the "same" data. If I have a CPU with 4 cores, each core has hyper-threading, and each pair of cores has its L2 cache, then I think it is better for your code to use 2 heaps (one heap for each L2 cache). If future CPUs will have even more cores, with totally independent local memory, then this memory has to correspond to a different heap. Bye, bearophile
Re: Next in the Review Queue?
On Monday, October 03, 2011 13:30 simendsjo wrote: > Isn't there more in the queue? std.process and std.socket rewrite? IIUC, the new std.process requires a fix to dmc's C runtime on Windows, so until that's sorted out, the new std.process is not ready for review. As for std.socket, https://github.com/D-Programming-Language/phobos/pull/260 does do a fair bit of work on it, but as I understand it, it doesn't change enough of the API to merit a formal review (I haven't looked over that particular pull request in detail though - just prior versions of it - so I'm not all that clear on its current state). And I'm not aware of any major redesigns of std.socket being in the works - though there has been some discussion of possibly redesigning its current API. Regardless, no major revisions to std.socket have been submitted as ready for the review queue. > Couldn't find a list in the wiki though. There isn't a formal list for the review queue anywhere. It's managed entirely by posts in the newsgroup. It's being managed fairly informally at this point, since we don't always review the item that's been in the queue the longest (depending on the current circumstances), and unless the person who's submitting a module for review is active on the newsgroup and ready to be active in the review of their code, there's no point in reviewing their code anyway. And as long as that works, we're going to continue to do it that way. - Jonathan M Davis
Re: Thread local and memory allocation
On Oct 3, 2011, at 12:48 PM, deadalnix wrote: > D's uses thread local storage for most of its data. And it's a good thing. > > However, the allocation mecanism isn't aware of it. In addition, it has no > way to handle it in the future as things are specified. > > As long as you don't have any pointer in shared memory to thread local data > (thank to the type system) so this is something GC could use at his own > advantage. > > As long as good pratice should minimize as much as possible the usage of > shared data, this design choice make things worse for good design, which is, > IMO, not D's phylosophy. > > The advantages of handling this at memory management levels are the > followings : > - Swap friendlyness. Data of a given thread can be located in blocks, so an > idle thread can be swapped easily without huge penality on performance. > Anyone who have used chrome and firefox with a lots of tabs on a machine with > limited memory know what I'm talking about : firefox uses less memory than > whrome, but performance are terrible anyway, because chrome memory layout is > more cache friendly (tabs memory isn't mixed with each others). > - Effisciency in heavily multithreaded application like servers : the more > thread run in the program, the more a stop the world GC is costly. As long as > good design imply separate data from thread as much as possible, a thread > local collection can be triggered at time without stopping other threads. > > Even is thoses improvements are not implemented yet and anytime soon, it > kinda sad that the current interface doesn't allow for this. > > What I suggest in add a flag SHARED in BlkAttr and store it as an attribute > of the block. Later modification could be made according to this flag. This > attribute shouldn't be modifiable later on. > > What do you think ? Is it something it worth working on ? If it is, how can I > help ? There's another important issue that hasn't yet been addressed, which is that when the GC collects memory, the thread that finalizes non-shared data should be the one that created it. So that SHARED flag should really be a thread-id of some sort. Alternately, each thread could allocate from its own pool, with shared allocations coming from a common pool. This would allow the lock granularity to be reduced and in some cases eliminated. I'd like to move to CDGC as an intermediate step, and that will need some testing and polish. That would allow for precise collections if the compiler support is added. Then the thread-local finalization has to be tackled one way or another. I'd favor per-thread heaps but am open to suggestions and/or help.
Re: Next in the Review Queue?
On 03.10.2011 22:14, Dmitry Olshansky wrote: On 03.10.2011 21:01, Jesse Phillips wrote: Jonathan M Davis Wrote: The review for the region allocator has completed, so we need to choos something else to review now. I believe that the current items in the review queue which are ready for review are - std.log - a CSV parsing module by Jesse Phillips - std.benchmark - GSoC changes to std.regex There are a couple of other items which are in a "pre-review" state (e.g. orange) as well as a few others that are supposed to be close to being ready for review (e.g. changes to std.variant), but I'm not aware of anything else which is actually ready for review at the moment. So, if I missed something, please bring it up. The item which has been in the queue the longest is std.log, but given that std.benchmark could affect further reviews, we may want to review that first. The GSoC changes to to std.regex are also probably of fairly high priority. Any thoughts on which item should be reviewed first? - Jonathan M Davis While I would love to have mine reviewed, I agree with your selection of priority. Same thoughts here. As for std.regex being high priority, well, others have being waiting for quite some time so let's respect order unless someone wants to skip a round. Isn't there more in the queue? std.process and std.socket rewrite? Couldn't find a list in the wiki though.
Re: Next in the Review Queue?
On 03.10.2011 21:01, Jesse Phillips wrote: Jonathan M Davis Wrote: The review for the region allocator has completed, so we need to choos something else to review now. I believe that the current items in the review queue which are ready for review are - std.log - a CSV parsing module by Jesse Phillips - std.benchmark - GSoC changes to std.regex There are a couple of other items which are in a "pre-review" state (e.g. orange) as well as a few others that are supposed to be close to being ready for review (e.g. changes to std.variant), but I'm not aware of anything else which is actually ready for review at the moment. So, if I missed something, please bring it up. The item which has been in the queue the longest is std.log, but given that std.benchmark could affect further reviews, we may want to review that first. The GSoC changes to to std.regex are also probably of fairly high priority. Any thoughts on which item should be reviewed first? - Jonathan M Davis While I would love to have mine reviewed, I agree with your selection of priority. Same thoughts here. As for std.regex being high priority, well, others have being waiting for quite some time so let's respect order unless someone wants to skip a round. -- Dmitry Olshansky
Re: User attributes (was Proposal on improvement to deprecated)
Jacob Carlborg: > I would like to have something similar as this possible as well: > > class Foo > { > @get_set int x; > } > > Would insert a getter and setter for "x" like this: > > class Foo > { > int x_; > > int x () { return x_; } > int x (int x) { return x_ = x; } > } I think a template mixin is already able to do this. Bye, bearophile
Thread local and memory allocation
D's uses thread local storage for most of its data. And it's a good thing. However, the allocation mecanism isn't aware of it. In addition, it has no way to handle it in the future as things are specified. As long as you don't have any pointer in shared memory to thread local data (thank to the type system) so this is something GC could use at his own advantage. As long as good pratice should minimize as much as possible the usage of shared data, this design choice make things worse for good design, which is, IMO, not D's phylosophy. The advantages of handling this at memory management levels are the followings : - Swap friendlyness. Data of a given thread can be located in blocks, so an idle thread can be swapped easily without huge penality on performance. Anyone who have used chrome and firefox with a lots of tabs on a machine with limited memory know what I'm talking about : firefox uses less memory than whrome, but performance are terrible anyway, because chrome memory layout is more cache friendly (tabs memory isn't mixed with each others). - Effisciency in heavily multithreaded application like servers : the more thread run in the program, the more a stop the world GC is costly. As long as good design imply separate data from thread as much as possible, a thread local collection can be triggered at time without stopping other threads. Even is thoses improvements are not implemented yet and anytime soon, it kinda sad that the current interface doesn't allow for this. What I suggest in add a flag SHARED in BlkAttr and store it as an attribute of the block. Later modification could be made according to this flag. This attribute shouldn't be modifiable later on. What do you think ? Is it something it worth working on ? If it is, how can I help ?
Re: User attributes (was Proposal on improvement to deprecated)
On 2011-10-03 20:22, Piotr Szturmaj wrote: Aleksandar Ružičić wrote: What about using an attribute for that? Looks like a good use case for user defined attributes (I need to write DIP on that). There are of course many other possible use cases like Object/Relational Mapping, RPC/Remoting, debugger visualization overrides, hints for the GC, etc. User defined attributes provide a powerful way to extend language without constantly upgrading the compiler (and avoiding "too many features" in it). I would love to hear any opinions about general idea of user defined attributes in D, not necessarily this particular syntax as it is only my loud thinking, though it should show what the whole thing is about. I would love this see this as well. I think it needs to be accessible at runtime too. It should be possible to get the attributes of a class through a base class reference and it should return the attributes of the runtime type. I would like to have something similar as this possible as well: class Foo { @get_set int x; } Would insert a getter and setter for "x" like this: class Foo { int x_; int x () { return x_; } int x (int x) { return x_ = x; } } An attribute should be able to insert something else instead of the symbol it applies to, or nothing at all. -- /Jacob Carlborg
Re: User attributes (was Proposal on improvement to deprecated)
Piotr Szturmaj: > for user defined attributes (I need to write DIP on that). Please write that DIP :-) Even if it will be refused, it will be a starting point to build on successive better ideas for user defined attributes. > There are of course many other possible use cases like Object/Relational > Mapping, RPC/Remoting, debugger visualization overrides, hints for the > GC, etc. I see user defined attributes also as ways to extend the type system with in library/user code, a bit like dehydra/treehydra (but with no need to use another language to specify them). But I think more static introspection will be needed for that. I mean things like a __trait that given a function name, returns an array of the names of all its local variables, a __trait that tells if a variable used in a function is locally defined, globally defined, if it is static, etc. Some of such info is already present in the JSON documentation about modules. So a way to perform searches on such JSON at compile time will be useful. Eventually maybe even linear types or uniqueness types become implementable with annotations. Bye, bearophile
Re: std.benchmark is in reviewable state
On 2011-10-03 20:04, Andrei Alexandrescu wrote: On 10/3/11 12:49 PM, Jacob Carlborg wrote: On 2011-10-03 17:44, Andrei Alexandrescu wrote: On 10/3/11 8:59 AM, Jens Mueller wrote: 4. Test results should be exposed to the caller. Usually, it is enough to write the results to the console. But you may want to post process the results. Being it either graphically or to output XML for whatever reason. I think it would be beneficial to add this flexibility and separate the benchmarking logic from its generating the output. It might be useful to add the used CPU using std.cpuid to the output. We should decouple collection from formatting. However, this complicates matters because we'd need to expose the benchmark results structure. I'll think of it. Not necessarily. The user could provide a delegate as a callback, something like this: struct Result { string func; double relative; double calls; double tCall; double callsPerSecond; } alias void delegate (Result result) Formatter; That's what I meant when I said "we'd need to expose the benchmark results structure" :o). Hehe, Ok. I don't think it gets THAT much more complicated, just call the delegate with the results instead of writef(ln). There's probably no way out of this anyway, particularly if we provide richer information via times(), scalability, etc. Andrei -- /Jacob Carlborg
User attributes (was Proposal on improvement to deprecated)
Aleksandar Ružičić wrote: What about using an attribute for that? Looks like a good use case for user defined attributes (I need to write DIP on that). struct Deprecated { string message; // instantiated on every symbol marked with this attribute template applyAttribute(Symbol) { static assert(0, Symbol.stringof ~ "is deprecated: " ~ message); } } @std.attributes.Deprecated("test"); class A { } This is just enough for deprecation. For other cases __traits should be extended with hasAttribute and getAttributes: static assert(__traits(hasAttribute, A, Deprecated)); static assert(is(__traits(getAttributes, A) == TypeTuple!(Deprecated))); // struct Deprecated available at compile time static assert(__traits(getAttribute, A, 0).message != ""); Serialization example: @std.xml.XmlElement("address") struct Address { string street; string city; string country; } @std.xml.XmlElement("person") struct Person { @XmlAttribute("age") int ageInYears; string fullName; @XmlElement("addr") // override element name Address address; } // with Uniform Function Call Syntax: void save(T)(T serializable, string fileName) if (__traits(hasAttribute, T, std.xml.XmlElement)) { auto elemAttr = __traits(getAttribute, T, std.xml.XmlElement); // serialization code } auto person = Person(30, "John Doe"); person.address = Address("Some street address", "London", "UK"); person.save("person.xml"); // should produce this content: John Doe Some street address London UK Other example with UFCS (flag handling) struct Flags { template applyAttribute(Symbol) { // 1. test if Symbol is an enum // 2. test if there is only one @Flags attribute for this enum // 3. test if flags don't overlap } } @Flags enum A { Flag1 = 1, Flag2 = 2, Flag3 = 4 } bool hasFlag(T, V)(T t, V v) if (__traits(hasAttribute, T, Flags)) { return t & v == v; } void f(A a) { if (a.hasFlag(A.Flag1)) { // } } There are of course many other possible use cases like Object/Relational Mapping, RPC/Remoting, debugger visualization overrides, hints for the GC, etc. User defined attributes provide a powerful way to extend language without constantly upgrading the compiler (and avoiding "too many features" in it). I would love to hear any opinions about general idea of user defined attributes in D, not necessarily this particular syntax as it is only my loud thinking, though it should show what the whole thing is about.
Re: Request for pre-review: std.serialization/orange
On 2011-10-03 15:57, Robert Jacques wrote: So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API? No, I'm not saying that. Example: struct Array { void* ptr; size_t length; size_t elementSize; bool isSliceOf (Array b) { return ptr >= b.ptr && ptr + length * elementSize <= b.ptr + b.length * b.elementSize; } } void main () { auto a = [0, 1, 2, 3, 4]; auto b = a[2 .. 4]; auto aArr = Array(a.ptr, a.length, 4); auto bArr = Array(b.ptr, b.length, 4); assert(bArr.isSliceOf(aArr)); assert(!aArr.isSliceOf(bArr)); } Both the asserts in the above code passes as expected. See, no serialization or archiving in sight. Is there something I'm missing? Actually it does not need to be part of the public API when I think about it. I can move it into Serializer. Array would still need to be public since both Serailzer and Archive need access to it and the package attribute doesn't work very well. The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance. I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array? Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm. Now I think I start to understand. I'm assuming you currently separate object and array alias resolution and don't handle pointers at all. Yes. Pointers are handled as well. It's handled similar to arrays/slices. First it always serializes what the pointer points to. Then in a post process step, after all serialization is done, it replaces all serialized pointers, that points to a value that has been serialized, with a reference. If a given pointer doesn't point to a value that has be serialized it's left as is. I can see if I this memory chunk approach can be used instead. How will this be used with the balanced tree, could you give a simple example? -- /Jacob Carlborg
Re: std.benchmark is in reviewable state
On 10/3/11 12:49 PM, Jacob Carlborg wrote: On 2011-10-03 17:44, Andrei Alexandrescu wrote: On 10/3/11 8:59 AM, Jens Mueller wrote: 4. Test results should be exposed to the caller. Usually, it is enough to write the results to the console. But you may want to post process the results. Being it either graphically or to output XML for whatever reason. I think it would be beneficial to add this flexibility and separate the benchmarking logic from its generating the output. It might be useful to add the used CPU using std.cpuid to the output. We should decouple collection from formatting. However, this complicates matters because we'd need to expose the benchmark results structure. I'll think of it. Not necessarily. The user could provide a delegate as a callback, something like this: struct Result { string func; double relative; double calls; double tCall; double callsPerSecond; } alias void delegate (Result result) Formatter; That's what I meant when I said "we'd need to expose the benchmark results structure" :o). There's probably no way out of this anyway, particularly if we provide richer information via times(), scalability, etc. Andrei
Re: std.benchmark is in reviewable state
On 2011-10-03 17:44, Andrei Alexandrescu wrote: On 10/3/11 8:59 AM, Jens Mueller wrote: 4. Test results should be exposed to the caller. Usually, it is enough to write the results to the console. But you may want to post process the results. Being it either graphically or to output XML for whatever reason. I think it would be beneficial to add this flexibility and separate the benchmarking logic from its generating the output. It might be useful to add the used CPU using std.cpuid to the output. We should decouple collection from formatting. However, this complicates matters because we'd need to expose the benchmark results structure. I'll think of it. Not necessarily. The user could provide a delegate as a callback, something like this: struct Result { string func; double relative; double calls; double tCall; double callsPerSecond; } alias void delegate (Result result) Formatter; Doesn't seem complicated to me. Then the module could provide a default formatter that does what the module currently does. -- /Jacob Carlborg
Re: Next in the Review Queue?
Jonathan M Davis Wrote: > The review for the region allocator has completed, so we need to choos > something else to review now. I believe that the current items in the review > queue which are ready for review are > > - std.log > - a CSV parsing module by Jesse Phillips > - std.benchmark > - GSoC changes to std.regex > > There are a couple of other items which are in a "pre-review" state (e.g. > orange) as well as a few others that are supposed to be close to being ready > for review (e.g. changes to std.variant), but I'm not aware of anything else > which is actually ready for review at the moment. So, if I missed something, > please bring it up. > > The item which has been in the queue the longest is std.log, but given that > std.benchmark could affect further reviews, we may want to review that first. > The GSoC changes to to std.regex are also probably of fairly high priority. > > Any thoughts on which item should be reviewed first? > > - Jonathan M Davis While I would love to have mine reviewed, I agree with your selection of priority.
Re: Request for pre-review: std.serialization/orange
On 2011-10-03 15:39, Robert Jacques wrote: On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg wrote: On 2011-10-03 05:50, Robert Jacques wrote: On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg wrote: On 2011-10-01 05:00, Robert Jacques wrote: I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions. Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static. How about overrideSerializer or overloadSerializer? registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method? Yes. Sorry for being unclear. The concept being that at the instance level, you are overriding default behavior. Yes, thanks, that makes sense. -- /Jacob Carlborg
Re: std.benchmark is in reviewable state
With this new module we might as well implement __MODULE__ or something similar to avoid hardcoding the module name or using the .stringof[7..$] trick.
Re: Proposal on improvement to deprecated
What about using an attribute for that? deprecated("foo() is deprecated use bar()") void foo() {...} @schedule deprecated("foo() is going to be deprecated as of -MM-DD, use bar() instead") void foo() {...} First form will behave like just it does now, with optional message argument. Second form would produce informational message instead of compile error (with line number where that symbol is accessed from). Both forms would print nothing when using -d switch. In case "unavailable" is required, another attribute could be used for that: @removed deprecated("foo() was removed as of -MM-DD, use bar() instead.") void foo(); When @removed deprecated symbol is used compiler will abort with detailed error message (line number where symbol is accessed from) even if -d switch is used. This would be more convenient than just bailing out with message "undefined symbol foo()" because we can use message parameter of deprecated to direct user where to look for replacement. Of course @schedule and @removed names are just an example.. "Jonathan M Davis" wrote in message news:mailman.29.1317612078.22016.digitalmar...@puremagic.com... On Sunday, October 02, 2011 07:06:36 Michel Fortin wrote: On 2011-10-02 05:24:36 +, Jonathan M Davis said: > deprecated("message", soft) void func1() {} > deprecated("message", hard) void func2() {} This soft/hard thing is a little obscure, no one will understand what it means by looking a it. Why not have separated 'deprecated' and 'unavailable' states? We could just do deprecated("message"); deprecated("message", full); "Full" deprecation should be clear enough, I would think. And for simplicity, you just don't have any argument for partial deprecation. - Jonathan M Davis
Re: std.benchmark is in reviewable state
Andrei Alexandrescu wrote: > On 10/3/11 8:59 AM, Jens Mueller wrote: > [snip] > > Thanks for this rich feedback. I'll act on most or all points. A few > notes follow. > > >1. I'm not that convinced that prepending benchmark_ to a function is > >that nice even though it's simple. But following this idea I could > >remove all my unittest blocks and add functions starting with test_ and > >write a module std.test in same fashion. If this approach is considered > >the way to go we can remove unittest altogether and solve testing > >entirely in a library. After all it looks less elegant to me but I admit > >that I see no other way and it is the easiest solution after all. I just > >want to stress that when seeing the code I questioned myself whether > >having unittest in the language is nice to have. > > std.benchmark uses relatively new reflection features that weren't > available until recently. Adding unittest now would be difficult to > justify, but at the time unittest was a good feature. I see. This is also my feeling. > >6. Wrong numbers in documentation > >The examples in the documentation have wrong numbers. In the end one > >should pay special attention that the documentation is consistent. I > >think one cannot automate this. > > Not sure how this can be addressed at all (or maybe I don't > understand). The examples show real numbers collected from real runs > on my laptop. I don't think one may expect to reproduce them with > reasonable accuracy because they depend on a lot of imponderables. I just mean that the relative speed ups printed in the table does not match the ones given in the surrounding text. > >There are also rounding errors. That means when I try to compute these > >numbers I get different in the first digit after the decimal point. I > >wonder why this is. Even though the measurement may be noisy the > >computed numbers should be more precise I think. > > Where are calculations mistaken? Indeed I see e.g. that append > built-in has 83ns/call and 11.96M calls/s. Reverting the first with > a "calculator" program yields 12.04 Mcalls/s, and reverting the > second yields 83.61ns/call. I'll see what can be done. Yes. This is what I meant here. > >7. Changing units in the output. > >One benchmark report may use different units in its output. The one in > >the example uses microseconds and nanoseconds. This makes comparison > >more difficult and should be avoided (at least for groups). It is > >error-prone when analyzing the numbers. > > Problem is space and wildly varying numbers. I initially had > exponential notation for everything, but it was super ugly and > actually more difficult to navigate. That's why I added the compact > notation. I think we'll need to put up with it. If formatting and collecting is separated it is fine I'll guess. It's okay for the default output. > >10. The file reading/writing benchmark should be improved. Since the > >file should be deleted. The file writing benchmark should be more like: > >void benchmark_fileWrite() > >{ > > benchmarkPause(); > > auto f = File.tmpfile(); > > benchmarkResume(); > > > > f.writeln("hello, world!"); > >} > > > >But this also measures the deletion of the file. > > If you don't want to measure std.file.write but instead only the > time spent in writeln, his should work and measure only the time > spent in writing to the file: > > void benchmark_fileWrite() > { > benchmarkPause(); > auto f = File.tmpfile(); > benchmarkResume(); > f.writeln("hello, world!"); > benchmarkPause(); > } > > I haven't yet tested the case when a functions leaves the benchmark paused. This looks good. > >I have no idea how to fix the benchmark for reading a file. But I > >believe it involves using mkstemp on POSIX and similar on Windows. > >mkstemp should be added to std.file in the long run. > > Why is there a need to fix it? The benchmark measures the > performance of std.file.read soup to nuts. Yes. But the benchmark should clean up after itself if possible. So it should remove the file. > In related news, I got word from a colleagues that I should also use > times() (http://linux.die.net/man/2/times) to distinguish time spent > in user space vs. system calls. This is because some functions must > issue certain system calls in any case and the user is interested in > how much they can improve the situation by optimizing their end of > the deal. This would apply e.g. to file I/O where the overhead of > formatting etc. would become easier to distinguish. Good hint from your colleague. The more data one can get when measuring code the better. You could even be interested in cache misses when benchmarking. But CPU time is usually what boils down to. I fear you may loose simplicity. I'm looking forward to your changes. Jens
Re: std.benchmark is in reviewable state
On 10/3/11 8:59 AM, Jens Mueller wrote: [snip] Thanks for this rich feedback. I'll act on most or all points. A few notes follow. 1. I'm not that convinced that prepending benchmark_ to a function is that nice even though it's simple. But following this idea I could remove all my unittest blocks and add functions starting with test_ and write a module std.test in same fashion. If this approach is considered the way to go we can remove unittest altogether and solve testing entirely in a library. After all it looks less elegant to me but I admit that I see no other way and it is the easiest solution after all. I just want to stress that when seeing the code I questioned myself whether having unittest in the language is nice to have. std.benchmark uses relatively new reflection features that weren't available until recently. Adding unittest now would be difficult to justify, but at the time unittest was a good feature. 2. When benchmarking code you often investigate the algorithm's properties regarding its scalability. Yah, that definitely needs to be looked at. I think I need to add some support beyond mere documentation. 3. (r[i] / 10_000_000.).to!("nsecs", int) should be written as (r[i] / 10_000_000).nsecs Same for msecs and similar. Noted. 4. Test results should be exposed to the caller. Usually, it is enough to write the results to the console. But you may want to post process the results. Being it either graphically or to output XML for whatever reason. I think it would be beneficial to add this flexibility and separate the benchmarking logic from its generating the output. It might be useful to add the used CPU using std.cpuid to the output. We should decouple collection from formatting. However, this complicates matters because we'd need to expose the benchmark results structure. I'll think of it. 5. In the first example I completely missed that it is not "relative calls". I thought this was one term even though I didn't know what it should mean. Later I figured that there is an empty column called relative. There are two spaces there, I should add three. 6. Wrong numbers in documentation The examples in the documentation have wrong numbers. In the end one should pay special attention that the documentation is consistent. I think one cannot automate this. Not sure how this can be addressed at all (or maybe I don't understand). The examples show real numbers collected from real runs on my laptop. I don't think one may expect to reproduce them with reasonable accuracy because they depend on a lot of imponderables. There are also rounding errors. That means when I try to compute these numbers I get different in the first digit after the decimal point. I wonder why this is. Even though the measurement may be noisy the computed numbers should be more precise I think. Where are calculations mistaken? Indeed I see e.g. that append built-in has 83ns/call and 11.96M calls/s. Reverting the first with a "calculator" program yields 12.04 Mcalls/s, and reverting the second yields 83.61ns/call. I'll see what can be done. 7. Changing units in the output. One benchmark report may use different units in its output. The one in the example uses microseconds and nanoseconds. This makes comparison more difficult and should be avoided (at least for groups). It is error-prone when analyzing the numbers. Problem is space and wildly varying numbers. I initially had exponential notation for everything, but it was super ugly and actually more difficult to navigate. That's why I added the compact notation. I think we'll need to put up with it. 8. I prefer writing std.benchmark.pause() over benchmarkPause() and std.benchmark.resume() over benchmarkResume(). OK. 10. The file reading/writing benchmark should be improved. Since the file should be deleted. The file writing benchmark should be more like: void benchmark_fileWrite() { benchmarkPause(); auto f = File.tmpfile(); benchmarkResume(); f.writeln("hello, world!"); } But this also measures the deletion of the file. If you don't want to measure std.file.write but instead only the time spent in writeln, his should work and measure only the time spent in writing to the file: void benchmark_fileWrite() { benchmarkPause(); auto f = File.tmpfile(); benchmarkResume(); f.writeln("hello, world!"); benchmarkPause(); } I haven't yet tested the case when a functions leaves the benchmark paused. I have no idea how to fix the benchmark for reading a file. But I believe it involves using mkstemp on POSIX and similar on Windows. mkstemp should be added to std.file in the long run. Why is there a need to fix it? The benchmark measures the performance of std.file.read soup to nuts. In related news, I got word from a colleagues that I should also use times() (http://linux.die.net/man/2/times) to distinguish time spent in user space vs. system calls. This is because some f
Re: Next in the Review Queue?
On 10/3/11 1:27 AM, Jonathan M Davis wrote: The review for the region allocator has completed, so we need to choos something else to review now. I believe that the current items in the review queue which are ready for review are - std.log - a CSV parsing module by Jesse Phillips - std.benchmark - GSoC changes to std.regex There are a couple of other items which are in a "pre-review" state (e.g. orange) as well as a few others that are supposed to be close to being ready for review (e.g. changes to std.variant), but I'm not aware of anything else which is actually ready for review at the moment. So, if I missed something, please bring it up. The item which has been in the queue the longest is std.log, but given that std.benchmark could affect further reviews, we may want to review that first. The GSoC changes to to std.regex are also probably of fairly high priority. Following feedback I received for std.benchmark I'd like to take a little more time to improve it. Should be ready for formal review next Monday. Thanks, Andrei
Re: Proposal on improvement to deprecated
On Monday, October 03, 2011 15:01:13 Regan Heath wrote: > We've already got informational warnings, warnings and errors.. why not > just expand pragma to allow programmers to emit them on a case by case > basis, i.e. > > pragma(info, "foo is scheduled for deprecation") > pragma(warning, "foo will be deprecated on dd/mm/yy, replace with bar") > pragma(error, "foo has been deprecated, replace with bar") A key problem with using pragmas for deprecation messages is the fact that pragma is run once when that code is compiled. So, if you want it to affect only people who use the deprecated function, the function has to be templated. So, anything which isn't templated, can't have a message. Also, it'll print once per template instantion, not once per use, and it won't give any line numbers (if it did, it would give the line number of the pragma, not where the function is called). The pragma messages are somewhat useful, but they're too general to work very well as deprecation messages. - Jonathan M Davis
Re: Recording object states
On Sat, 01 Oct 2011 17:47:10 +0200, Andrej Mitrovic wrote: I just thought it's really cool how D makes this possible and very easy to do. Check it out: http://codepad.org/yA8ju9u0 I was thinking of using something like this in code samples for a library (e.g. CairoD), where a Recorder object like this would capture the states of some shape and then generate a series of images on how that shape changes over several function calls. And this could easily be injected into HTML documentation. Thanks to opDispatch I would only have to rewrite a small portion of my sample code, replacing "Shape" ctor calls with Recorder(Shape()) calls. You might want to consider a template constraint on opDispatch: auto opDispatch(string method, Args...)(Args args) if (is(typeof(mixin("t."~method -- Simen
Re: Proposal on improvement to deprecated
We've already got informational warnings, warnings and errors.. why not just expand pragma to allow programmers to emit them on a case by case basis, i.e. pragma(info, "foo is scheduled for deprecation") pragma(warning, "foo will be deprecated on dd/mm/yy, replace with bar") pragma(error, "foo has been deprecated, replace with bar") R
Re: std.benchmark is in reviewable state
Andrei Alexandrescu wrote: > On 9/25/11 6:08 PM, Andrei Alexandrescu wrote: > >I've had a good time with std.benchmark today and made it ready for > >submission to Phobos. > [snip] > > You destroyed, and I listened. I updated my benchmark branch to > accept a configurable time budget for measurements, and a number of > trials that take the minimum time. > > Still need to do the thread affinity stuff. > > Code: > > https://github.com/andralex/phobos/blob/benchmark/std/benchmark.d > https://github.com/andralex/phobos/blob/benchmark/std/format.d > > Dox: > > http://erdani.com/d/web/phobos-prerelease/std_benchmark.html > http://erdani.com/d/web/phobos-prerelease/std_format.html > > > Destroy once again. > > Andrei Overall I like this module very much. 1. I'm not that convinced that prepending benchmark_ to a function is that nice even though it's simple. But following this idea I could remove all my unittest blocks and add functions starting with test_ and write a module std.test in same fashion. If this approach is considered the way to go we can remove unittest altogether and solve testing entirely in a library. After all it looks less elegant to me but I admit that I see no other way and it is the easiest solution after all. I just want to stress that when seeing the code I questioned myself whether having unittest in the language is nice to have. 2. When benchmarking code you often investigate the algorithm's properties regarding its scalability. That's why I think it's useful to give an example in the documentation. Something like void benchmark_insert_array(size_t length)(uint n) { benchmarkPause(); Array!bool array; array.length = length; benchmarkResume(); foreach(i; 0 .. n) array.insertAfter(array[0 .. uniform(0, length)], true); } void benchmark_insert_slist(size_t length)(uint n) { benchmarkPause(); SList!bool list; foreach (i; 0 .. length) list.insertFront(false); benchmarkResume(); foreach(i; 0 .. n) { auto r = take(list[], uniform(0, length)); list.insertAfter(r, true); } } alias benchmark_insert_slist!(10) benchmark_insert_slist_10; alias benchmark_insert_array!(10) benchmark_insert_array_10; alias benchmark_insert_slist!(100) benchmark_insert_slist_100; alias benchmark_insert_array!(100) benchmark_insert_array_100; This example needs more polishing. Input size is the most important dimension that CPU time depends on. The example should clarify how to benchmark depending on the input size. 3. (r[i] / 10_000_000.).to!("nsecs", int) should be written as (r[i] / 10_000_000).nsecs Same for msecs and similar. 4. Test results should be exposed to the caller. Usually, it is enough to write the results to the console. But you may want to post process the results. Being it either graphically or to output XML for whatever reason. I think it would be beneficial to add this flexibility and separate the benchmarking logic from its generating the output. It might be useful to add the used CPU using std.cpuid to the output. 5. In the first example I completely missed that it is not "relative calls". I thought this was one term even though I didn't know what it should mean. Later I figured that there is an empty column called relative. 6. Wrong numbers in documentation The examples in the documentation have wrong numbers. In the end one should pay special attention that the documentation is consistent. I think one cannot automate this. There are also rounding errors. That means when I try to compute these numbers I get different in the first digit after the decimal point. I wonder why this is. Even though the measurement may be noisy the computed numbers should be more precise I think. 7. Changing units in the output. One benchmark report may use different units in its output. The one in the example uses microseconds and nanoseconds. This makes comparison more difficult and should be avoided (at least for groups). It is error-prone when analyzing the numbers. 8. I prefer writing std.benchmark.pause() over benchmarkPause() and std.benchmark.resume() over benchmarkResume(). or import std.benchmark : benchmark; try benchmark.pause() Supporting benchmark.start() would improve the example. Since you just need to specify one line that indicates where the benchmarking should start. But this seems difficult to get to work with the current design and I don't consider it important. It's a very minor thing. 10. The file reading/writing benchmark should be improved. Since the file should be deleted. The file writing benchmark should be more like: void benchmark_fileWrite() { benchmarkPause(); auto f = File.tmpfile(); benchmarkResume(); f.writeln("hello, world!"); } But this also measures the deletion of the file. I have no idea how to fix the benchmark for reading a file. But I believe it involves using mkstemp on POSIX and similar
Re: Request for pre-review: std.serialization/orange
On Mon, 03 Oct 2011 02:38:22 -0400, Jacob Carlborg wrote: On 2011-10-02 00:52, Robert Jacques wrote: On Sat, 01 Oct 2011 07:18:59 -0400, Jacob Carlborg wrote: [snip] Also by the time archiving is called, isSliceOf should always return false. Why is that? If isSliceOf can return true, then that means that the archive is responsible for alias detection, management, etc. No, who says that. You can take this struct and use it outside of this library, it knows nothing about archiving or serialization. If the isSliceOf method should return false when archiving has been called I would need to add logic to detect when serialization and archiving has begun and ended. So, in essence, you are saying that by the time archiving occurs, isSliceOf will always return false? Then why is it part of the public API? That means that every single archive format must implement an alias resolution algorithm. No, the serializer performs this task. Okay. [snip] I do. How would I otherwise discover if an array is a slice of another array or not? Okay, first some rational. Consider: assert(!a.isSliceOf(b)); assert(!b.isSliceOf(a)); assert( c.isSliceOf(a)); assert( c.isSliceOf(b)); and class Foo { float x; float[3] point; } void main() { auto foo = new Foo; auto ptr = &foo.x; auto slice = point[0..2]; } In the first case, a, b and c are all slices of a common root array, but the root array may not be serialized. In the second case, first you have a pointer to the inside of an object and second you have a slice of a static array inside an object, all three of which may be serialized together. My impression from your API (so this might not be correct) is that currently, you can't handle the above use cases. Even if you can, an O(N^2) algorithm is rather inefficient. This is how it works: As the first step all arrays are serialized as regular arrays and not slices. After all serialization is done I loop over all arrays and check if they are a slice of some other array. If I found a match I replace the serialized array with a slice instead. These arrays are stored as an associative array with the type Array[Id]. I don't know if there's a better data structure for this. I presented one below. The solution, in my mind, is to think in terms of memory blocks/chucks. Every reference can be thought as pointing to a memory chunk defined by two pointers and a flag: { void* head; // Start of the memory chunk void* tail; // End of the memory chunk bool hasAliases; // True if there are more than one reference to this chunk } For alias detection / resolution, you build a balanced tree of memory chunks, widening the chunk and flagging hasAliases as appropriate. Which should give you O(N log(N)) performance. I'm not sure I understand. That would require that the arrays are stored in a continues block of memory? Won't "head" and "tail" always point to start of the array and the end of the array? Most of the time yes. But not all of the time. It would be fair to say that 'head' and 'tail' would be inside the GC memory region of the array and are <= or >= of the start/end of an array, respectively. More importantly, both objects and pointers would resolve to memory chunks as well and be included in the alias resolution algorithm. I'm assuming you currently separate object and array alias resolution and don't handle pointers at all.
Re: Request for pre-review: std.serialization/orange
On Mon, 03 Oct 2011 03:06:36 -0400, Jacob Carlborg wrote: On 2011-10-03 05:50, Robert Jacques wrote: On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg wrote: On 2011-10-01 05:00, Robert Jacques wrote: I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions. Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static. How about overrideSerializer or overloadSerializer? registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method? Yes. Sorry for being unclear. The concept being that at the instance level, you are overriding default behavior.
Re: Proposal on improvement to deprecated
On 10/3/2011 12:21 AM, Jacob Carlborg wrote: Since we already are using pragma(msg, "scheduled for deprecation") where possible, i.e. in templates, why can't we have a proper language construct for doing it? deprecated("message") {} deprecated {} Behave as it currently does, except that the first form will print a message as well. That's a good idea. deprecated("message", scheduled) {} Will behave just as pragma(msg, "") does when a symbol is used that is wrapped in the "deprecated" block. Too many features.
Re: Proposal on improvement to deprecated
On 2011-10-03 07:59, Walter Bright wrote: On 10/2/2011 10:13 PM, Jonathan M Davis wrote: I really think that making it so that deprecated doesn't actually stop compilation (but rather just prints the message) would improve things The user has two choices: 1. get deprecation messages and fix the code 2. add -d and ignore it I don't see any point in printing messages and ignoring them. I don't believe it improves the user experience. The user is going to have to eventually fix the code, and that WILL BE ANNOYING to him. Guaranteed. There's no way to deprecate things and not annoy people. Since we already are using pragma(msg, "scheduled for deprecation") where possible, i.e. in templates, why can't we have a proper language construct for doing it? deprecated("message") {} deprecated {} Behave as it currently does, except that the first form will print a message as well. deprecated("message", scheduled) {} Will behave just as pragma(msg, "") does when a symbol is used that is wrapped in the "deprecated" block. -- /Jacob Carlborg
Re: Proposal on improvement to deprecated
On 2011-10-03 09:07, Jonathan M Davis wrote: On Monday, October 03, 2011 08:54:24 Jacob Carlborg wrote: I was thinking that the levels match to different types of compiler messages. Above one level you get a note, above the next level a warning and above yet another level you get an error. Then it's up to the users to decide what a given level means. I'm sure that someone would find the additional control that your suggestion gives useful, but it really sounds overly complicated to me. Deprecation needs to work smoothly, but it's also a feature that shouldnt' be needed very often, so it's not something that should be particularly complicated. And Walter doesn't seem to like the idea of complicating it more than it currently is _at all_ beyond adding the ability to give a custom message to deprecated. So, if any changes beyond that are going to be made, it looks like they're _really_ going to need to pull their weight and have a very solid argument for their inclusion. - Jonathan M Davis I guess you're right, I just throw out a suggestion. -- /Jacob Carlborg
Re: Request for pre-review: std.serialization/orange
On 2011-10-03 05:50, Robert Jacques wrote: On Sat, 01 Oct 2011 06:50:52 -0400, Jacob Carlborg wrote: On 2011-10-01 05:00, Robert Jacques wrote: I agree, which is why I suggested lookup should have some granuality. i.e. that there is both a global store of serialization methods and a per instance store of serialization methods. Lookup would first look in the local store before defaulting to the global store. But this should be a separate pair of functions. Aah, now I get it. That's a good idea. The question is what to name the two functions. Yet another use case for overloading methods on static. How about overrideSerializer or overloadSerializer? registerSerializer for the static method and overloadSerializer/overrideSerializer for the instance method? Umm... example code for the deserialize method should contain 'deserialize' somewhere inside it. You are completley right a = serializer!(int)("a"); Should be a = deserialize!(int)("a"); My bad. -- /Jacob Carlborg
Re: Proposal on improvement to deprecated
On Monday, October 03, 2011 08:54:24 Jacob Carlborg wrote: > I was thinking that the levels match to different types of compiler > messages. Above one level you get a note, above the next level a warning > and above yet another level you get an error. Then it's up to the users > to decide what a given level means. I'm sure that someone would find the additional control that your suggestion gives useful, but it really sounds overly complicated to me. Deprecation needs to work smoothly, but it's also a feature that shouldnt' be needed very often, so it's not something that should be particularly complicated. And Walter doesn't seem to like the idea of complicating it more than it currently is _at all_ beyond adding the ability to give a custom message to deprecated. So, if any changes beyond that are going to be made, it looks like they're _really_ going to need to pull their weight and have a very solid argument for their inclusion. - Jonathan M Davis
Re: Proposal on improvement to deprecated
On 2011-10-02 22:16, Jonathan M Davis wrote: On Sunday, October 02, 2011 10:48:53 Jacob Carlborg wrote: On 2011-10-02 07:24, Jonathan M Davis wrote: There has been a fair bit of discussion about improving deprecated in this pull request: https://github.com/D-Programming-Language/dmd/pull/345 I have not read the discussion on github but an idea would be to have several levels of deprecation. Instead of "soft" and "hard" there could be numbers. Say level 1 could be used for scheduled for deprecation and depending what flags will be used when compiling it could print some kind of message. Level 2 would be soft deprecate and level 3 would be hard deprecate. There could also be more levels in between these levels if we want. The idea is that the user can use a compiler flag to get messages about symbols that are scheduled for deprecation. Don't know if this is too complicated. And what would each level do? How would it interact with the compiler? The feature needs to be simple. Deprecation is not something that is going to need to be used frequently, so making it particularly complicated is undesirable. It needs to be effective but easy to use. What we have is better than nothing, but it isn't quite good enough. So, it needs to be improved. But we don't need it to be vastly more powerful than it currently is, just iron it out a bit. Your suggestion sounds like it would be getting a bit complicated. - Jonathan M Davis If levels like these are used then there could be a compiler flag that says: "Print out all levels above X as a warning" or "All levels above Y should be a compile error". -- /Jacob Carlborg
Re: std.getopt suggestion
On Monday, October 03, 2011 08:43:37 Jacob Carlborg wrote: > How about starting to review modules that existed before we started with > the review process. But I guess many will complain that the time can be > better spent elsewhere. I think that for the most part where there's really an issue with a module, we've already discussed revamping it. The new design is then reviewed as part of the formal review process. So, on the whole, I think that the issue is covered. And as much as I would like Phobos' lingering issues to be resolved, I don't think that going over all of the older code in formal reviews is really going to improve things much. For the most part, stuff which isn't being fixed by the wholesale overhaul of an older module can be fixed through pull requests and the like. - Jonathan M Davis