Re: A monitor for every object
Robert Jacques wrote: On Fri, 04 Feb 2011 17:23:35 -0500, Jérôme M. Berger jeber...@free.fr wrote: Steven Schveighoffer wrote: D's monitors are lazily created, so there should be no issue with resource allocation. What happens if two threads attempt to create a monitor for the same object at the same time? Is there a global lock to avoid race conditions in this case? Jerome Only the reference to the mutex is shared, so all you need in an atomic op. This requires an atomic if (a is null) a = b;. I did not know that such a beast existed. Jerome -- mailto:jeber...@free.fr http://jeberger.free.fr Jabber: jeber...@jabber.fr signature.asc Description: OpenPGP digital signature
Re: buffered input
On Friday 04 February 2011 21:46:40 Andrei Alexandrescu wrote: I've had the opportunity today to put some solid hours of thinking into the relationship (better said the relatedness) of what would be called buffered streams and ranges. They have some commonalities and some differences, but it's been difficult to capture them. I think now I have a clear view, caused by a few recent discussions. One was the CSV reader discussed on the Phobos list; another was the discussion on defining the right std.xml. First, let's start with the humblest abstraction of all - an input range, which only defines the troika empty/front/popFront with the known semantics. An input range consumes input destructively and has a one-element horizon. It may as well considered a buffered stream with the buffer length exactly one. Going from there, we may say that certain streaming can be done by using an input range of ubyte (or dchar for text). That would be the UTFpowered equivalent of getchar(). The readf function operates that way - it only needs to look one character ahead. Incidentally, the CSV format also requires lookahead of 1, so it also can operate on a range of dchar. At this point we need to ask ourselves an essential question. Since we have this input range abstraction for a 1-element buffer, what would its n-elements buffer representation look like? How do we go from input range of T (which really is unbuffered input range of T to buffered input range of T? Honestly, the answer was extremely unclear to me for the longest time. I thought that such a range would be an extension of the unbuffered one, e.g. a range that still offers T from front() but also offers some additional functions - e.g. a lookahead in the form of a random-access operator. I still think something can be defined along those lines, but today I came together with a design that is considerably simpler both for the client and the designer of the range. I hereby suggest we define buffered input range of T any range R that satisfies the following conditions: 1. R is an input range of T[] 2. R defines a primitive shiftFront(size_t n). The semantics of the primitive is that, if r.front.length = n, then shiftFront(n) discards the first n elements in r.front. Subsequently r.front will return a slice of the remaining elements. 3. R defines a primitive appendToFront(size_t n). Semantics: adds at most n more elements from the underlying stream and makes them available in addition to whatever was in front. For example if r.front.length was 1024, after the call r.appendToFront(512) will have r.front have length 1536 of which the first 1024 will be the old front and the rest will be newly-read elements (assuming that the stream had enough data). If n = 0, this instructs the stream to add any number of elements at its own discretion. This is it. I like many things about this design, although I still fear some fatal flaw may be found with it. With these primitives a lot of good operating operating on buffered streams can be written efficiently. The range is allowed to reuse data in its buffers (unless that would contradict language invariants, e.g. if T is invariant), so if client code wants to stash away parts of the input, it needs to make a copy. One great thing is that buffered ranges as defined above play very well with both ranges and built-in arrays - two quintessential parts of D. I look at this and say, this all makes sense. For example the design could be generalized to operate on some random-access range other than the built-in array, but then I'm thinking, unless some advantage comes about, why not giving T[] a little special status? Probably everyone thinks of contiguous memory when thinking buffers, so here generalization may be excessive (albeit meaningful). Finally, this design is very easy to experiment with and causes no disruption to ranges. I can readily add the primitives to byLine and byChunk so we can see what streaming we can do that way. What do you all think? Hmm. I think that I'd have to have an actual implementation to mess around with to say much. My general take on buffered input is that I don't want to worry about it. I want it to be buffered so that it's more efficient, but I don't want to have to care about it in how I use it. I would have expected a buffered input range to be exactly the same as an input range except that it doesn't really just pull in one character behind the scenes. It pulls in 1024 or whatever when popFront() would result in the end of the buffer being reached, and you just get the first one with front. The API doesn't reflect the fact that it's buffered at all except perhaps in how you initialize it (by telling how big the buffer is, though generally I don't want to have to care about that either). Now, there may be some sort of use case where you actually need to care about the buffering, so using buffered
Re: A better assert() [was: Re: std.unittests [updated] for review]
On 02/05/2011 08:29 AM, Jonathan M Davis wrote: On Friday 04 February 2011 13:29:38 bearophile wrote: Jonathan M Davis: assert(0) has the advantage of being a normal assertion in non-release mode. What is this useful for? To me this looks like a significant disadvantage. If I want a HALT (to tell the compiler a point can't be reached, etc) I want it in every kind of compilation of the program. It also makes it clear that that code path should _never_ be reached. The replacement for assert(0) is meant to be more clear in its purpose compared to assert(0). It may be named thisCantHappen(), or assertZero(), etc. assert(0) will actually give a stack trace with a file and line number. It will also give you a message if you include one with it. HALT just kills the program. I _much_ prefer that assert(0) be a normal assert in no-release mode. Leaving it in as a HALT has the advantage that the program will just die if it reaches that point in release mode rather than trying to continue with the assert gone, but I very much want a normal assert in non-release mode. It's much more useful. The real question though is whether you can convince Walter (which I doubt, but I don't know). This topic was already discussed, and I think the result of the discussion was that this change of assert(false) is not worth it. But if asserts gets inproved for other purposes, then this is a chance to work on improving assert(0) too. Still, making such a change _would_ contradict TDPL, which is supposed to be a major no-no at this point. I like TDPL, I respect Andrei and you, I agree that TDPL is a kind of reference for D2, but please stop using TDPL as a The Bible in many of your posts. Not even Andrei himself looks so religiously attached as you to the contents of TDPL :-) A little flexibility is acceptable. I believe that Walter and Andrei have made it fairly clear that if we do anything that contradicts TDPL, it needs to have a very good reason to be done. TDPL is _supposed_ to have been the final word. Unfortunately, the implementation is behind, so _it_ is possible that we're going to have to make changes which contradict it. However, if we do, those changes have to be needed or at least really merit the cost of contradicting TDPL. Something as small as changing assert(0) is unlikely to do that. struct Foo { int x; invariant() { assert(x == 1); } } void main() { Foo f; assert(f); } DMD 2.051: test.d(7): Error: expression f of type Foo does not have a boolean value Actually, the more I think about it, the less I see assert(class_instance) to be a problem. Normally it would check that the reference was non-null. Assuming that the feature isn't buggy, it'll still do that, but it'll check the invariant in addition. And since the invariant is always supposed to be true, that shouldn't be a problem. I really don't think that assert needs to be fundamentally changed with regards to assert(0) or assert(class_instance). - Jonathan M Davis All right, I guess I get your point. I still think that checking a class's invariant should be explicit. Assert(whatever) means for me check whatever is not (equivalent to) false, not check whatever is not (equivalent to) false and whatever's invariant condition, if any, is fulfilled. Note that an object's invariant is *not* part of its regular truth value (1st assertion in unittest below indeed passes). Thus, it is clearly incorrect that assert(x) checks its invariant implicitely: assert ( x ); assert ( cast(bool)x == true ) ; should always have the same outcome. Thus, the idiom to check an object's invariant should be different, at least in a visible detail. A side-issue is the following code: class C { int i; invariant () { assert(i == 1); } } unittest { auto c = new C(); assert ( cast(bool)c == true ) ; assert ( c ) ; } throws a plain Assertion error with no additional comment. No mention of invariant checking, thus no hint to debug. I guess the problem would be more acceptable if asserts in invariants had at least slightly different error message forms, if only Invariant assertion error. (Yes, one can customize error messages, sure; but defaults should be as helpful as possible.) Denis -- _ vita es estrany spir.wikidot.com
Re: A monitor for every object
Steven Schveighoffer: D's monitors are lazily created, so there should be no issue with resource allocation. If you don't ever lock an object instance, it's not going to consume any resources. For the non-sorcerers following the thread, would someone explain in a few words what it actually means, conceptually and concretely, for an object to be its own monitor. (searches online have brought me nothing relevant) Denis -- _ vita es estrany spir.wikidot.com
Re: std.xml should just go
Andrej Mitrovic wrote: On 2/4/11, spir denis.s...@gmail.com wrote: About that, I would love a tutorial about eponymous templates starting with their /purpose/ (why does this feature even exist? what does it /mean/? what does it compare/oppose to? why is one supposed to need/enjoy it? how is it supposed to help make code better mirror model?) Same for alias template params. Same for a rather long list of features, probably. But both of these are already explained in the manual: http://www.digitalmars.com/d/2.0/template.html (search for Implicit Template Properties) http://www.digitalmars.com/d/2.0/template.html (search for Template Alias Parameters) Granted, eponymous templates aren't explained in much detail on that page. As for explaining how they work together, I did write that short template tutorial (http://prowiki.org/wiki4d/wiki.cgi?D__Tutorial/D2Templates), but you've already seen that. :) However, I do not think we should write tutorials on single features alone. I've read a bunch of books that explain the language in feature-by-feature basis, but neglect to tie everything together. For example, Learning Python is this 1200 page book about Python 3, explaining the language feature by feature but never really discussing the language as a whole. It's only good as a reference, which ironically defeats the book's title. OTOH Dive into Python 3 gradually introduces you to more features of the language, but always has code examples where you can see multiple features of the language being used. (IIRC there were string processing examples which used regex, multiple modules, and unittests all at once). Having a perspective on how all features tie together is crucial to understanding the purpose of individual features themselves. In my opinion! I agree, most of the 'dive into' books are excellent and complementary to reference materials. TDPL also has great little examples that illustrate the why of things, without ever becoming a mindless tutorial. It's hard to write such things however (witness the abundant amount of horrible technical writing), I truly admire those who can.
Re: A better assert() [was: Re: std.unittests [updated] for review]
On Saturday 05 February 2011 02:43:56 spir wrote: On 02/05/2011 08:29 AM, Jonathan M Davis wrote: On Friday 04 February 2011 13:29:38 bearophile wrote: Jonathan M Davis: assert(0) has the advantage of being a normal assertion in non-release mode. What is this useful for? To me this looks like a significant disadvantage. If I want a HALT (to tell the compiler a point can't be reached, etc) I want it in every kind of compilation of the program. It also makes it clear that that code path should _never_ be reached. The replacement for assert(0) is meant to be more clear in its purpose compared to assert(0). It may be named thisCantHappen(), or assertZero(), etc. assert(0) will actually give a stack trace with a file and line number. It will also give you a message if you include one with it. HALT just kills the program. I _much_ prefer that assert(0) be a normal assert in no-release mode. Leaving it in as a HALT has the advantage that the program will just die if it reaches that point in release mode rather than trying to continue with the assert gone, but I very much want a normal assert in non-release mode. It's much more useful. The real question though is whether you can convince Walter (which I doubt, but I don't know). This topic was already discussed, and I think the result of the discussion was that this change of assert(false) is not worth it. But if asserts gets inproved for other purposes, then this is a chance to work on improving assert(0) too. Still, making such a change _would_ contradict TDPL, which is supposed to be a major no-no at this point. I like TDPL, I respect Andrei and you, I agree that TDPL is a kind of reference for D2, but please stop using TDPL as a The Bible in many of your posts. Not even Andrei himself looks so religiously attached as you to the contents of TDPL :-) A little flexibility is acceptable. I believe that Walter and Andrei have made it fairly clear that if we do anything that contradicts TDPL, it needs to have a very good reason to be done. TDPL is _supposed_ to have been the final word. Unfortunately, the implementation is behind, so _it_ is possible that we're going to have to make changes which contradict it. However, if we do, those changes have to be needed or at least really merit the cost of contradicting TDPL. Something as small as changing assert(0) is unlikely to do that. struct Foo { int x; invariant() { assert(x == 1); } } void main() { Foo f; assert(f); } DMD 2.051: test.d(7): Error: expression f of type Foo does not have a boolean value Actually, the more I think about it, the less I see assert(class_instance) to be a problem. Normally it would check that the reference was non-null. Assuming that the feature isn't buggy, it'll still do that, but it'll check the invariant in addition. And since the invariant is always supposed to be true, that shouldn't be a problem. I really don't think that assert needs to be fundamentally changed with regards to assert(0) or assert(class_instance). - Jonathan M Davis All right, I guess I get your point. I still think that checking a class's invariant should be explicit. Assert(whatever) means for me check whatever is not (equivalent to) false, not check whatever is not (equivalent to) false and whatever's invariant condition, if any, is fulfilled. Note that an object's invariant is *not* part of its regular truth value (1st assertion in unittest below indeed passes). Thus, it is clearly incorrect that assert(x) checks its invariant implicitely: assert ( x ); assert ( cast(bool)x == true ) ; should always have the same outcome. Thus, the idiom to check an object's invariant should be different, at least in a visible detail. A side-issue is the following code: class C { int i; invariant () { assert(i == 1); } } unittest { auto c = new C(); assert ( cast(bool)c == true ) ; assert ( c ) ; } throws a plain Assertion error with no additional comment. No mention of invariant checking, thus no hint to debug. I guess the problem would be more acceptable if asserts in invariants had at least slightly different error message forms, if only Invariant assertion error. (Yes, one can customize error messages, sure; but defaults should be as helpful as possible.) The AssertError should give the file and line number which should then tell you exactly which assert failed. Also, if stack traces work (which they do on Linux, but I don't think that they do on Windows yet), then you get a stack trace. It might be nice if the AssertError also said that it was in an invariant, but I don't think that it much matters where the assert was when an assert fails - be in a function, or in an invariant, or in a in block, or wherever. What matters is knowing which assert failed so that
Re: buffered input
On 02/05/2011 10:36 AM, Nick Sabalausky wrote: On a separate note, I think a good way of testing the design (and end up getting something useful anyway) would be to try to use it to create a range that's automatically-buffered in a more traditional way. Ie, Given any input range 'myRange', buffered(myRange, 2048) (or something like that) would wrap it in a new input range that automatically buffers the next 2048 elements (asynchronously?) whenever its internal buffer is exhausted. Or something like that. It's late and I'm tired and I can't think anymore ;) That's exactly what I'm expecting. Funnily enough, I was about to start a thread on the topic after reading related posts. My point was: I'm not a specialist in efficiency (rather the opposite), I just know there is --theoretically-- relevant performance loss to expect from unbuffered input in various cases. Could we define a generic input-buffering primitive allowing people to benefit from others' competence? Just like Appender. Denis -- _ vita es estrany spir.wikidot.com
Re: buffered input
On 02/05/2011 08:22 AM, Ellery Newcomer wrote: 2. R defines a primitive shiftFront(size_t n). The semantics of the primitive is that, if r.front.length = n, then shiftFront(n) discards the first n elements in r.front. Subsequently r.front will return a slice of the remaining elements. Does shiftFront literally move element n to index 0 and so on? It seems to me that if you do, its going to have horrid performance, and if you don't, then you will eventually run into situations where appendToFront will require a wrap around, which loses you your contiguity, or a reallocation of the buffer. Is this really what it means? I naively understood discards as meaning buf = buf[n..$]; or similar. Denis -- _ vita es estrany spir.wikidot.com
Re: buffered input
Andrei: I've had the opportunity today to put some solid hours of thinking into the relationship (better said the relatedness) of what would be called buffered streams and ranges. This is an important part of the range design. This range is useful for other things too, like: - increasing efficiency of some lazy operations, as already done in Clojure. A buffer is meant to be CPU cache friendly, increasing performance of numeric code too. - Buffered I/O - The chunked lazy parallel map dsimcha is working on - Creating a chunked interface in Phobos for DBMSs See some of my posts about it: http://www.digitalmars.com/d/archives/digitalmars/D/Vectorized_Laziness_100525.html http://www.digitalmars.com/d/archives/digitalmars/D/Re_Vectorized_Laziness_100676.html http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.comgroup=digitalmars.Dartnum=103882 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=125876 Bye, bearophile
Re: buffered input
On 02/05/2011 11:09 AM, Jonathan M Davis wrote: Hmm. I think that I'd have to have an actual implementation to mess around with to say much. My general take on buffered input is that I don't want to worry about it. I want it to be buffered so that it's more efficient, but I don't want to have to care about it in how I use it. I would have expected a buffered input range to be exactly the same as an input range except that it doesn't really just pull in one character behind the scenes. It pulls in 1024 or whatever when popFront() would result in the end of the buffer being reached, and you just get the first one with front. The API doesn't reflect the fact that it's buffered at all except perhaps in how you initialize it (by telling how big the buffer is, though generally I don't want to have to care about that either). [...] Regardless, a more normal range could be built on top of what you're suggesting, and it could do essentially what I was thinking buffered ranges would do. So, perhaps doing what you're suggesting and building what I was thinking of on top of that would be the way to go. That way, if you actually care about messing with the buffer, you can, but if not, you just use it normally and the buffering is dealt with underneath. Exactly. I would love something like: auto bufInputRange (R) (R inputRange, size_t capacity=0) if (...) Meaning one can specify (max) buffering capacity; else there is a standard (re)sizing scheme. Just like dyn array (re)sizing. Side-question to specialists: What should actual buf capacity depend on? Denis -- _ vita es estrany spir.wikidot.com
Re: buffered input
On 02/05/2011 08:45 AM, Michel Fortin wrote: One thing I'm wondering is whether it'd be more efficient if we could provide our own buffer to be filled. In cases where you want to preserve the data, this could let you avoid double-copying: first copy in the temporary buffer and then at the permanent storage location. If you need the data only temporarily however providing your buffer to be filled might be less efficient for a range that can't avoid copying to the temporary buffer for some reason.. Does this also makes sense when one needs to iterate over a whole set of source data via buferred input rangeS? I mean the same buffer can be reused, avoiding repeted allocation (or is this wrong or irrelevant?). Deins -- _ vita es estrany spir.wikidot.com
Re: std.xml should just go
The case is different --I mean the comparison does not hold IIUC. Virtual methods are /intended/ to be overriden, this is precisely part of their semantics. While the whole point of const-the-D-way is to ensure actual constness as marked in a given function's signature, whatever this function itself calls. The contract is such that the reader does not even need to watch further. Again, IIUC (please correct if I'm wrong on this). Denis Well you are thinking with the current usage. --- const int i; const A a; --- Think about these two lines. If this is a C++ code, you can't say much about their constness. But if this is a D code, you can say many things about each line and go even further and say their constness is exactly same! What i am getting at is that if we have this affinity between types and constness always a first class attribute why don't we go even further and drop the signatures all together and make constness accessible to every single D code written. Walter and Steve are talking about the contract feature we give to const signature, i am not saying it is wrong or it has lesser importance than they claim. I am just questioning if this what CS should be.
Link the source
The great Raymond Hettinger suggests to put links to the (github) source code inside the docs, this was done by Tango docs, but not enough by Phobos docs: http://rhettinger.wordpress.com/2011/01/28/open-your-source-more/ https://groups.google.com/group/comp.lang.python/browse_thread/thread/0ead7571edfdc6d7 Bye, bearophile
Re: buffered input
Does shiftFront literally move element n to index 0 and so on? It seems to me that if you do, its going to have horrid performance, and if you don't, then you will eventually run into situations where appendToFront will require a wrap around, which loses you your contiguity, or a reallocation of the buffer. I think it is basically popFrontN(), and appendToFront() a just append.
Re: buffered input
Andrei Alexandrescu napisał: I hereby suggest we define buffered input range of T any range R that satisfies the following conditions: 1. R is an input range of T[] 2. R defines a primitive shiftFront(size_t n). The semantics of the primitive is that, if r.front.length = n, then shiftFront(n) discards the first n elements in r.front. Subsequently r.front will return a slice of the remaining elements. 3. R defines a primitive appendToFront(size_t n). Semantics: adds at most n more elements from the underlying stream and makes them available in addition to whatever was in front. For example if r.front.length was 1024, after the call r.appendToFront(512) will have r.front have length 1536 of which the first 1024 will be the old front and the rest will be newly-read elements (assuming that the stream had enough data). If n = 0, this instructs the stream to add any number of elements at its own discretion. I don't see a clear need for the two to be separate. Could they fold into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() discards all and loads any number it pleases. This is it. I like many things about this design, although I still fear some fatal flaw may be found with it. With these primitives a lot of good operating operating on buffered streams can be written efficiently. The range is allowed to reuse data in its buffers (unless that would contradict language invariants, e.g. if T is invariant), so if client code wants to stash away parts of the input, it needs to make a copy. Some users would benefit if they could just pass in a buffer and say fill'er up. One great thing is that buffered ranges as defined above play very well with both ranges and built-in arrays - two quintessential parts of D. I look at this and say, this all makes sense. For example the design could be generalized to operate on some random-access range other than the built-in array, but then I'm thinking, unless some advantage comes about, why not giving T[] a little special status? Probably everyone thinks of contiguous memory when thinking buffers, so here generalization may be excessive (albeit meaningful). Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. -- Tomek
Re: buffered input
Tomek Sowiński napisał: Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. I meant: when n + front.length buf.length. -- Tomek
Re: buffered input
Tomek Sowiñski Wrote: Andrei Alexandrescu napisa³: I hereby suggest we define buffered input range of T any range R that satisfies the following conditions: 1. R is an input range of T[] 2. R defines a primitive shiftFront(size_t n). The semantics of the primitive is that, if r.front.length = n, then shiftFront(n) discards the first n elements in r.front. Subsequently r.front will return a slice of the remaining elements. 3. R defines a primitive appendToFront(size_t n). Semantics: adds at most n more elements from the underlying stream and makes them available in addition to whatever was in front. For example if r.front.length was 1024, after the call r.appendToFront(512) will have r.front have length 1536 of which the first 1024 will be the old front and the rest will be newly-read elements (assuming that the stream had enough data). If n = 0, this instructs the stream to add any number of elements at its own discretion. I don't see a clear need for the two to be separate. Could they fold into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() discards all and loads any number it pleases. This is it. I like many things about this design, although I still fear some fatal flaw may be found with it. With these primitives a lot of good operating operating on buffered streams can be written efficiently. The range is allowed to reuse data in its buffers (unless that would contradict language invariants, e.g. if T is invariant), so if client code wants to stash away parts of the input, it needs to make a copy. Some users would benefit if they could just pass in a buffer and say fill'er up. One great thing is that buffered ranges as defined above play very well with both ranges and built-in arrays - two quintessential parts of D. I look at this and say, this all makes sense. For example the design could be generalized to operate on some random-access range other than the built-in array, but then I'm thinking, unless some advantage comes about, why not giving T[] a little special status? Probably everyone thinks of contiguous memory when thinking buffers, so here generalization may be excessive (albeit meaningful). Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. I find this discussion interesting. There's one idea for an application I'd like to try at some point. Basically a facebook chat thingie, but with richer gaming features. The expected audience will be 10 - 100K simultaneous clients connecting to a single server. Not sure if DOM or SAX will be better. After seeing the Tango's XML benchmarks I was convinced that the implementation platform will be D1/Tango, but now it looks like Phobos is also getting there, propably even outperforming Tango by a clear margin. Since even looking at Tango's documentation has intellectual property problems and likely causes taint, I could make an independent benchmark comparing the two and their interfaces later. But I propaply need to avoid going into too much details, otherwise the Phobos developers wouldn't be able to read it without changing their license. From what I've read so far, the proposed design looks very much like what Tango has now in their I/O framework. But probably Phobos's TLS default and immutable strings improve multithreaded performance even more.
Re: buffered input
On 2011-02-05 07:01:24 -0500, spir denis.s...@gmail.com said: On 02/05/2011 08:45 AM, Michel Fortin wrote: One thing I'm wondering is whether it'd be more efficient if we could provide our own buffer to be filled. In cases where you want to preserve the data, this could let you avoid double-copying: first copy in the temporary buffer and then at the permanent storage location. If you need the data only temporarily however providing your buffer to be filled might be less efficient for a range that can't avoid copying to the temporary buffer for some reason.. Does this also makes sense when one needs to iterate over a whole set of source data via buferred input rangeS? I mean the same buffer can be reused, avoiding repeted allocation (or is this wrong or irrelevant?). As I said in my post, whether a temporary buffer or a user-supplied buffer is better depends on whether you plan to store the data beyond the temporary buffer's lifetime or not. If you just iterate to calculate the SHA1 hash, the temporary buffer is fine (and possibly better depending on the range's implementation). If you iterate to calculate the SHA1 hash *and* also want to store the file in memory, then it's better if you can provide your own buffer which can point directly to the permanent storage location and bypass copying to the temporary buffer (if the range's implementation allows it). -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: A better assert() [was: Re: std.unittests [updated] for review]
Adam D. Ruppe: LOL! I like a bit of humour to keep myself serious/sane despite the strange discussions we have around here now and then :-) But, assert(0) does exactly what it says - assert this situation is invariably invalid. Something like class_instance.invariant() is better because: - It's explicit and readable. So is assert(obj); An assert(something) doesn't say in any explicit way that it will call the invariant, it's an information present only inside your head, that knows D mysteries. A syntax like class_name.invariant() or something similar is instead explicit in its purpose to call the invariant. This isn't really special. You're asserting the object is valid, which includes the invariant. If you want to only assert it is not null, you should write assert(obj !is null); The semantics a not D-expert expects from assert(something) is to test that something is true. For a class reference this means the pointer is not null. If you want assert() to do more for classes/structs/enums, then you are adding a special case to assert(). Also, if you change to obj.invariant(), it will probably never be used. assert() is your one-stop shop for sanity tests. :-) IMO that's the bug. It'd make a lot more sense to fix it so assert(struct) checks the invariant than to break assert(class) so it doesn't. Removing a special case from assert() doesn't mean breaking it. One more interesting example: struct Foo { int x; invariant() { assert(x == 0); } T opCast(T:bool)() { return false; } } void main() { Foo f; assert(f); // line 8 } It generates: core.exception.AssertError@test(8): Assertion failure Here the assert(f) is calling opCast, this according to DbC laws makes it call invariant first, here it passes, but then opCast returns a false, and the final assert fails. Bye, bearophile
Re: buffered input
On 2/5/11 2:22 AM, Ellery Newcomer wrote: Does shiftFront literally move element n to index 0 and so on? It seems to me that if you do, its going to have horrid performance, and if you don't, then you will eventually run into situations where appendToFront will require a wrap around, which loses you your contiguity, or a reallocation of the buffer. No, it's a mere internal operation bufpos += n or so. Andrei
Re: buffered input
On 2/5/11 2:45 AM, Michel Fortin wrote: One thing I'm wondering is whether it'd be more efficient if we could provide our own buffer to be filled. In cases where you want to preserve the data, this could let you avoid double-copying: first copy in the temporary buffer and then at the permanent storage location. If you need the data only temporarily however providing your buffer to be filled might be less efficient for a range that can't avoid copying to the temporary buffer for some reason.. Generally when one says I want the stream to copy data straight into my buffers this is the same as I want the stream to be unbuffered. So if you want to provide your own buffers to be filled, we need to discuss refining the design of unbuffered input - for example by adding an optional routine for bulk transfer to input ranges. Andrei
Re: buffered input
On 2011-02-05 10:02:47 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org said: Generally when one says I want the stream to copy data straight into my buffers this is the same as I want the stream to be unbuffered. So if you want to provide your own buffers to be filled, we need to discuss refining the design of unbuffered input - for example by adding an optional routine for bulk transfer to input ranges. You're right, this is a different thing. My major gripe with ranges at this time is that it's almost impossible to design an algorithm that can take slices *or* make copies depending on whether the range supports slicing or not, and whether the slices are stable (not going to be mutated when popping elements from the range). At least not without writing two implementations of it. I reread your initial post to get a clearer idea of what it meant. It seems to me that your buffered range design could be made to fix that hole. If the data you want to parse is all in memory, the buffered range could simply use the original array as its buffer; shiftFront would simply just the whole array to remove the first n elements while appendToFront would do nothing (as the buffer already contains all of the content). And if the data is immutable, then it's safe to just take a slice of it to preserve it instead of doing a copy. So you can't really be more efficient than that, it's just great. As for getting the data in bulk directly so you can avoid needless copies... I think the same optimization is possible with a buffered range. All you need is a buffered range that doesn't reuse the buffer, presumably one of immutable(T)[]. With it, you can slice at will without fear of the data being overwritten at a later time. So my rereading of your proposal convinced me. Go ahead, I can't wait to use it. :-) -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: A better assert() [was: Re: std.unittests [updated] for review]
bearophile: The semantics a not D-expert expects from assert(something) is to test that something is true. For a class reference this means the pointer is not null. This is the core of our disagreement: I think an object is not true if it's invariant fails. It means the object is completely invalid, no different than if it was a null pointer. It's unusable. The sooner this check is done, the better, so we can figure out where it went wrong. One more interesting example: That makes sense, it's still analogous to if() like you'd expect. Actually, I wonder if doing if(obj) should check it's invariant too with classes, both for consistency with assert and getting that check done even more often.
Re: Link the source
I added automatic source linking to my improveddoc program. http://arsdnet.net/d-web-site/std_stdio.html (see the link on the right) If this reaches the point where it is good enough for the official site, we'll have it there too. It needs to know the commit ID of the release to link to the file. This way, you are looking at the same source you are reading the documentation for, to avoid version confusion.
Re: buffered input
Andrei Alexandrescu napisał: Transparent buffering sounds sensible but in fact it robs you of important capabilities. It essentially forces you to use grammars with lookahead 1 for all input operations. Being able to peek forward into the stream without committing to read from it allows you to e.g. do operations like does this stream start with a specific word etc. As soon Broken sentence?
Re: Calling method by name.
On 2011-02-04 05:07, Jonathan M Davis wrote: On Thursday 03 February 2011 19:29:15 Robert Jacques wrote: On Thu, 03 Feb 2011 08:49:54 -0500, Jacob Carlborgd...@me.com wrote: On 2011-02-03 05:52, Robert Jacques wrote: On Wed, 02 Feb 2011 12:55:37 -0500, %uf...@jhgjhb.com wrote: I know is possible to create an object from its name. It's possible to call a method from that object if the name is only known at runtime? Would something like the following be possible? string classname, methodname; // Ask the user for class and method. auto obj = Object.factory(classname); invoke(methodname, obj, param1, param2); Thanks I've been working on an update to std.variant, which includes a compile-time reflection to runtime-reflection system. (See https://jshare.johnshopkins.edu/rjacque2/public_html/) From the docs: Manually registers a class with Variant's runtime-reflection system. Note that Variant automatically registers any types it is exposed. Note how in the example below, only Student is manually registered; Grade is automatically registered by Variant via compile-time reflection of Student. module example; class Grade { real mark; } class Student { Grade grade; } void main(string[] args) { Variant.__register!Student; Variant grade = Object.factory(example.Grade); grade.mark(96.6); assert(grade.mark == 96.6); } And dynamic method/field calls are handled via the __reflect(string name, Variant[] args...) method like so: grade.__reflect(mark,Variant(96.6)); assert(grade.__reflect(mark) == 96.6); Why would you need to pass in Variants in __reflect? Why not just make it a variadic method and automatically convert to Variant? Well, opDispatch does exactly that. __reflect, on the other hand, was designed as a quasi-backend function primarily for a) internal use (hence the double underscore), b) scripting language interfacing/implementing and c) user-extension. So efficiency was of key importance. And the reflection system is extensible, as Variant knows to call __reflect on user defined types. This makes things like prototype style objects possible. (There's even a beta implementation of a prototype object in the library) But this requires that the use __reflect methods not be templated. I'm not well versed in dynamic reflection and its use cases, so when I considered the combination of a runtime method name and compile-time argument type information, I classed it as 'rare in practice'. But if that's not the case, I'd like to know and would greatly appreciate a use case/unit test. Most of the good examples of runtime reflection that I'm aware of require user- defined attributes. But there are libraries in Java (and presumably C#) that do stuff like allow you to mark your classes with certain attributes indicating what type of XML elements that they should be, and then another library which knows _nothing_ about your classes is able to serialize them to and from XML. Another example would be Hibernate, which does the same sort of stuff only it deals with talking to databases. Full-on runtime reflection combined with user-defined attributes can do some powerful stuff. However, I do think that runtime reflection without user-defined attributes doesn't tend to be anywhere near as useful. To really get that sort of stuff working, we'd need D to properly support both user- defined attributes and runtime reflection. Both are future possibilities but obviously aren't happening any time soon. - Jonathan M Davis Ruby seems to get along without any kind of attributes/annotations. But on the other hand you can call a method in a class declaration and this will behave much the same as a attribute. ActiveRecord in Rails is a good example of runtime reflection. Also the Ruby XML library builder is a good example of runtime reflection. Maybe not acutally runtime reflection but is uses the method_missing method, equivalent to the opDispatch method in D, heavily. http://builder.rubyforge.org/ -- /Jacob Carlborg
Re: buffered input
Andrei Alexandrescu napisał: I don't see a clear need for the two to be separate. Could they fold into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() discards all and loads any number it pleases. I think combining the two into one hurts usability as often you want to do one without the other. OK, but if you go this way, what would popFront() do? Some users would benefit if they could just pass in a buffer and say fill'er up. Correct. That observation applies to unbuffered input as well. Right. Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. I think circularity is an implementation detail that is poor as a client-side abstraction. I fear efficiency will get abstracted out. Say this is my internal buffer (pipes indicate front() slice): [ooo|oo|oo] Now I do appendToFront(3) -- how do you expose the expected front() without moving data? -- Tomek
Re: std.unittests/exception Update and Vote
Jonathan M Davis wrote: Okay, the latest code and documentation is here: http://is.gd/HZQwNz I've also made the changes in my github fork of Phobos here: https://github.com/jmdavis/phobos . So, if this passes the vote, it's just a pull request away from being in Phobos. assertPred, assertThrown, assertNotThrown, and collectExceptionMsg have been merged into std.exception and their documentation updated again. There are fewer examples now, and hopefully it's more to everyone's liking, though the changes aren't drastic. Hopefully, I didn't miss anything that needed changing. Also, since collectExceptionMsg was catching Throwable and collectException was catching Exception, I made them both templated on the type to catch with Exception as the default. So, you can use both to catch any Throwable, but they default to Exception, so no code should break as a result. I kept Exception in their names with the idea that you really shouldn't be catching Throwable or Errors except in exception circumstances, so collectException is more correct for general use and its name doesn't encourage people to catch the wrong thing (it also avoids having to create an alias for backwards compatability). We're coming up on the time when the proposal has to be voted in or out (Feb 7th). It looks like Don and/or Walter _may_ make it so that assert is improved such that it does some of what assertPred does, printing much better error messages, and if that's the case, assertPred will need to be reworked or tossed entirely (but whether that happens depends on what they decide between now and Feb 7th). So, unless Andrei has a problem with it, I'd ask that you vote for assertPred separately from assertThrown, assertNotThrown, and collectExceptionMsg. So, if it's decided that assert is going to be improved and assertPred won't work as is, at least we can get assertThrown, assertNotThrown, and collectExceptionMsg into Phobos (assuming that they pass the vote). If assert is improved and assertPred doesn't make it in, and some portion of assertPred's other capabilities should still be in Phobos, they can be reworked in a future proposal (though if assert doesn't get improved, then they're in assertPred as it is). So, please make any last comments or suggestions on this proposal, and vote on whether you think that assertPred should be in Phobos (assuming that assert isn't going to be improved such that assertPred isn't necessary) and whether you think that assertThrown, assertNotThrown, and collectExceptionMsg should get in regardless of whether assertPred does. - Jonathan M Davis Yes for assertThrown and assertNotThrown. Ambivalent about collectExceptionMsg. I strongly believe that assertPred should not be included. But I think it can be done properly. Please make a bugzilla request for the improvements to assert().
Re: buffered input
Jean Crystof napisał: I find this discussion interesting. There's one idea for an application I'd like to try at some point. Basically a facebook chat thingie, but with richer gaming features. The expected audience will be 10 - 100K simultaneous clients connecting to a single server. Not sure if DOM or SAX will be better. After seeing the Tango's XML benchmarks I was convinced that the implementation platform will be D1/Tango, but now it looks like Phobos is also getting there, propably even outperforming Tango by a clear margin. Thanks for having faith ;-) Since even looking at Tango's documentation has intellectual property problems and likely causes taint, I could make an independent benchmark comparing the two and their interfaces later. But I propaply need to avoid going into too much details, otherwise the Phobos developers wouldn't be able to read it without changing their license. That would be helpful. From what I've read so far, the proposed design looks very much like what Tango has now in their I/O framework. But probably Phobos's TLS default and immutable strings improve multithreaded performance even more. Well, immutability doesn't help much because a buffer must be written to. Speaking of multithreading, I was thinking of an implementation where an internal thread is doing I/O. It loads data in front of the current front() slice, as much as the internal buffer can hold. The motivation is to overlap content processing and I/O operations so that less time is spent in total. Although there is some interaction overhead: locking, syncing caches so that cores see the same buffer. -- Tomek
Re: Having fun making tutorials
I've added a section on calling conventions and compiling/using interface files with static libraries. I should also add a section on using htod and discuss the issue with omf vs coff. There should be another page for actually using C libraries, although I think there are some already (there's one in the manual for sure). After that I'll hopefully write a few things about DLLs, which are often problematic for newbies.
Re: std.xml should just go
On 2011-02-04 08:34, Jonathan M Davis wrote: Slices: just one more reason why D's arrays kick the pants of other languages' arrays... - Jonathan M Davis Ruby has array slices as well. A slice of an array refers to the original data just like in D. But on the other hand a new instance is created when making a slice (I assume, since everything is an object in Ruby). -- /Jacob Carlborg
Re: std.unittests/exception Update and Vote
On 2/4/11 4:05 PM, Jonathan M Davis wrote: So, please make any last comments or suggestions on this proposal, and vote on whether you think that assertPred should be in Phobos (assuming that assert isn't going to be improved such that assertPred isn't necessary) and whether you think that assertThrown, assertNotThrown, and collectExceptionMsg should get in regardless of whether assertPred does. I think the topic has already been discussed quite exhaustive, so I'll just list my votes: Strong yes for assertThrown (I have something like that in all of my D1/D2 projects), rather ambivalent regarding assertNotThrown and collectExceptionMsg, and a definite no to assertPred – we can do better than that! David
Re: buffered input
Dang, you beat me to my post on what I have run into trying to provide a slice-able, assignable, buffered Forward Range. I was doing some work on a CSV parser. It is rather simple to build a proper parser from an input range. But providing the ability to use custom separators which could be of any length did not work well with a forward range. It was no longer a look-ahead of one. So I started examining how Splitter[1] work with slice-able ranges. Ok enough of the background. So basically I tried to make a range that would provide everything I needed for the new CSV design[2] and the result[3] didn't work. It actually works better with my CSV parser than it does splitter. The main issue I was having is, if you save the range and move forward, how do you keep the buffer of all instances in sync. Can we turn an input range into a forward range? If not, how would you get splitter working on an input range? (I probably need to file a bug, but my InputFileRange[3, bottom] didn't work with splitter either) The next issue is with slicing. If we can't get an input range to become a forward range then we can't have slicing either. A slice of [0..$] should give me a copy of the range. But even if you could do this, how would you know that the slice should be made of the entire range, or of just what is available in the buffer? So I guess the question is, with the proposal. Can a hasSlicing!R be created from an InputRange!R such that auto range = Hello, World; auto len = countUntil(range, ,); assert(range[0..len] == Hello); where range is replaced by a buffered Input Range. And as an added bonus: range = range[len..$]; assert(range == ,World); You can of course use the Range for equality, instead of strings like Hello. 1. https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L1317 2. https://github.com/he-the-great/JPDLibs/blob/csvoptimize/csv/csv.d 3. https://gist.github.com/812681
Re: buffered input
On Saturday 05 February 2011 07:16:45 Andrei Alexandrescu wrote: On 2/5/11 5:09 AM, Jonathan M Davis wrote: Hmm. I think that I'd have to have an actual implementation to mess around with to say much. My general take on buffered input is that I don't want to worry about it. I want it to be buffered so that it's more efficient, but I don't want to have to care about it in how I use it. I would have expected a buffered input range to be exactly the same as an input range except that it doesn't really just pull in one character behind the scenes. It pulls in 1024 or whatever when popFront() would result in the end of the buffer being reached, and you just get the first one with front. The API doesn't reflect the fact that it's buffered at all except perhaps in how you initialize it (by telling how big the buffer is, though generally I don't want to have to care about that either). Transparent buffering sounds sensible but in fact it robs you of important capabilities. It essentially forces you to use grammars with lookahead 1 for all input operations. Being able to peek forward into the stream without committing to read from it allows you to e.g. do operations like does this stream start with a specific word etc. As soon The thing is though that if I want to be iterating over a string which is buffered (from a file or stream or whatever), I want front to be immutable(char) or char, not immutable(char)[] or char[]. I can see how having an interface which allows startsWith to efficiently check whether the buffered string starts with a particular string makes good sense, but generally, as far as I'm concerned, that's startsWith's problem. How would I even begin to use a buffered range of string[] as a string? Normally, when I've used buffered anything, it's been purely for efficiency reasons. All I've cared about is having a stream or file or whatever. The fact that reading it from the file (or wherever it came from) in a buffered manner is more efficient means that I want it buffered, but that hasn't had any effect on how I've used it. If I want x characters from the file, I ask for x characters. It's the buffered object's problem how many reads that does or doesn't do. You must be thinking of a use case which I don't normal think of or am not aware of. In my experience, buffering has always been an implementation detail that you use because it's more efficient, but you don't worry about it beyond creating a buffered stream rather than an unbuffered one. - Jonathan M Davis
Re: std.unittests/exception Update and Vote
Jonathan M Davis jmdavisp...@gmx.com wrote in message news:mailman.1276.1296831944.4748.digitalmar...@puremagic.com... So, please make any last comments or suggestions on this proposal, and vote on whether you think that assertPred should be in Phobos (assuming that assert isn't going to be improved such that assertPred isn't necessary) and whether you think that assertThrown, assertNotThrown, and collectExceptionMsg should get in regardless of whether assertPred does. I'm in favor of all of them. Sure, maybe we could do better than assertPred, but when's that actually going to happen? In the meantime, assertPred is a hell of an improvement over the current state and it exists *now*. I don't want to needlessly hijack the present just for the sake of some still-hypothetical future.
Re: buffered input
Nice! And evenin'! Layman's view: - - - - - - - - - - - (I'm serious, please don't take my post too seriously. I'm not a heavy user of D and I don't want to pollute. I know in NGs exposure means influence and I babble a lot. Still, my thoughts, or rather reactions, could be of interest, I assume, or I wouldn't be writing this : ) I'm not sure how these buffered input ranges are supposed to be used (some mockup sample code would be very cool!), but it seems to me, and please correct me if I'm wrong, that it's very desirable for these ranges to be interchangeable? As in, you've built some library that passes around ranges, but then some part of it is slow and needs buffered ones. Isn't the point that you can then swap out your ranges for buffered ones here and there, but continue to have a functioning library? Reusability, generics, bend the spoon neo and all that? If not, then ok. But if yes, then I think these buffered ranges look very troublesome! Naughty even! * * * Then there's the sneaky break of consistency of the D semantics. Even if these ranges are not intended to be interchangeable, still, changing the (human langauge) semantics that the input ranges already define is not good! This makes D a difficult language to get an intuitive feel for, I think. By the definition of input ranges, the word front symbolizes the first _element_ in a forwards facing queue of elements. | 1:st |-- front() | 2:nd | v-- hidden --v | 3:rd | | . | | n:th | -- back ..as front() returns a T. Then follows that popFront() means discard the first _element_, so that the element that was second now becomes first. And if we can agree to that, then shiftFront() is possibly very confusing, and so is appendToFront(). They sound like they operate on the first element! So it seems these buffered ranges have redefined the semantics for the word front, as meaning the view window into the front part of the queue. Sneaky! I mean, imagine being new with D and skimming through the API docs for ranges, and picking up these function names at a glance. You'd be setting yourself up for one of those aaahaa-now-I-get-why-I-didn't-get-it-moments for sure. Hmmm. Still, front() could very well refer to the front part—to a list of elements (or the view window), and first() could refer to the first element. Actually, that would make the most sense! Then an input range would be first()/popFirst()/empty(), and a buffered one would have all those, but also amend something like front(n)/widenFront(n)/popFront(n), but yeah, erhm. I call for stricter and more consistent semantics! Decide what front means when talking about ranges, and stick to it! (And I'm talking about human language semantics, not what a function (or primitive?) does.) Erh, I tried to sound resolute there. Not my thing really. * * * Besides that, shiftFront got me thinking about sliding windows, and that would actually be cool! As in | 1st | '\ -- first() | 2nd | |-- front() // view window | 3rd | ./ | 4th | v-- hidden --v | 5th | | .. | | n:th | and then calling shiftFront(2) would shift the view window 2 elements forward (thus fetching 2 and discarding 2). Seems like a useful feature when parsing some encoding with variable point width and known distance to the event horizon, no? As in code.viewDistance = 8; do{ auto p = code.front() if(isLongPoint(p)){ processLong(p) code.shiftFront(8); }else if(isPoint(p)){ process(p) code.shiftFront(4); }else break; }while(p); or something like that. But the semantic that shiftFront would mean the same as popFront(), but on a list of elements? Confusing! Surely, at least popFront(n)... Hm, yeah Ok I'm all out of coffee!!! Thanks for your time! BR /HF
Re: buffered input
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message news:iijq99$1a5o$1...@digitalmars.com... On 2/5/11 6:46 AM, spir wrote: On 02/05/2011 10:36 AM, Nick Sabalausky wrote: On a separate note, I think a good way of testing the design (and end up getting something useful anyway) would be to try to use it to create a range that's automatically-buffered in a more traditional way. Ie, Given any input range 'myRange', buffered(myRange, 2048) (or something like that) would wrap it in a new input range that automatically buffers the next 2048 elements (asynchronously?) whenever its internal buffer is exhausted. Or something like that. It's late and I'm tired and I can't think anymore ;) That's exactly what I'm expecting. Funnily enough, I was about to start a thread on the topic after reading related posts. My point was: I'm not a specialist in efficiency (rather the opposite), I just know there is --theoretically-- relevant performance loss to expect from unbuffered input in various cases. Could we define a generic input-buffering primitive allowing people to benefit from others' competence? Just like Appender. Denis The buffered range interface as I defined it supports infinite lookahead. The interface mentioned by Nick has lookahead between 1 and 2048. So I don't think my interface is appropriate for that. Infinite lookahead is a wonderful thing. Consider reading lines from a file. Essentially what you need to do is to keep on reading blocks of data until you see \n (possibly followed by some more stuff). Then you offer the client the line up to the \n. When the client wants a new line, you combine the leftover data you already have with new stuff you read. On occasion you need to move over leftovers, but if your internal buffers are large enough that is negligible (I actually tested this recently). Another example: consider dealing with line continuations in reading CSV files. Under certain conditions, you need to read one more line and stitch it with the existing one. This is easy with infinite lookahead, but quite elaborate with lookahead 1. I think I can see how it might be worthwhile to discourage the traditional buffer interface I described in favor of the above. It wouldn't be as trivial to use as what people are used to, but I can see that it could avoid a lot of unnessisary copying, especially with other people's suggestion of allowing the user to provide their own buffer to be filled (and it seems easy enough to learn). But what about when you want a circular buffer? Ie, When you know a certain maximum lookahead is fine and you want to minimize memory usage and buffer-appends. Circular buffers don't do infinite lookahead so the interface maybe doesn't work as well. Plus you probably wouldn't want to provide an interface for slicing into the buffer, since the slice could straddle the wrap-around point which would require a new allocation (ie return buffer[indexOfFront+sliceStart..$] ~ buffer[0..sliceLength-($-(frontIndex+sliceStart))]). I guess maybe that would just call for another type of range.
Re: buffered input
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message news:iijpp7$197f$1...@digitalmars.com... On 2/5/11 5:09 AM, Jonathan M Davis wrote: Hmm. I think that I'd have to have an actual implementation to mess around with to say much. My general take on buffered input is that I don't want to worry about it. I want it to be buffered so that it's more efficient, but I don't want to have to care about it in how I use it. I would have expected a buffered input range to be exactly the same as an input range except that it doesn't really just pull in one character behind the scenes. It pulls in 1024 or whatever when popFront() would result in the end of the buffer being reached, and you just get the first one with front. The API doesn't reflect the fact that it's buffered at all except perhaps in how you initialize it (by telling how big the buffer is, though generally I don't want to have to care about that either). Transparent buffering sounds sensible but in fact it robs you of important capabilities. It essentially forces you to use grammars with lookahead 1 for all input operations. Being able to peek forward into the stream without committing to read from it allows you to e.g. do operations like does this stream start with a specific word etc. As soon That shouldn't be a problem for the cases where a lookahead of 1 is all that's needed. So both types can exist (with the traditional/automatic type most likely built on top of Andrei's type). Thus, I think the only question is Are the appropriate use-cases for the traditional/automatic type minor enough and infrequent enough to actively discourage it by not providing it? That I can't answer.
Re: Calling method by name.
On 2011-02-04 17:17, Adam Ruppe wrote: Jacob Carlborg wrote: The class Post maps to the database table posts, no configuration is necessary. Then you can use the column names in the table as fields to set and get data, like this: post = Post.new post.title = some title post.body = the body post.save # will update the database Note that you can do this kind of thing with D's compile time reflection and CTFE as well. That particular example works in my own DataObject class (except I called it commitChanges instead of save). Yeah, I tried to do the same. But in this case the static type system was kind of in the way of what I wanted to do. This is so much easier with a dynamic type system. -- /Jacob Carlborg
Re: buffered input
Jean Crystof a@a.a wrote in message news:iijl2t$10np$1...@digitalmars.com... Tomek Sowiñski Wrote: Andrei Alexandrescu napisa³: I hereby suggest we define buffered input range of T any range R that satisfies the following conditions: 1. R is an input range of T[] 2. R defines a primitive shiftFront(size_t n). The semantics of the primitive is that, if r.front.length = n, then shiftFront(n) discards the first n elements in r.front. Subsequently r.front will return a slice of the remaining elements. 3. R defines a primitive appendToFront(size_t n). Semantics: adds at most n more elements from the underlying stream and makes them available in addition to whatever was in front. For example if r.front.length was 1024, after the call r.appendToFront(512) will have r.front have length 1536 of which the first 1024 will be the old front and the rest will be newly-read elements (assuming that the stream had enough data). If n = 0, this instructs the stream to add any number of elements at its own discretion. I don't see a clear need for the two to be separate. Could they fold into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() discards all and loads any number it pleases. This is it. I like many things about this design, although I still fear some fatal flaw may be found with it. With these primitives a lot of good operating operating on buffered streams can be written efficiently. The range is allowed to reuse data in its buffers (unless that would contradict language invariants, e.g. if T is invariant), so if client code wants to stash away parts of the input, it needs to make a copy. Some users would benefit if they could just pass in a buffer and say fill'er up. One great thing is that buffered ranges as defined above play very well with both ranges and built-in arrays - two quintessential parts of D. I look at this and say, this all makes sense. For example the design could be generalized to operate on some random-access range other than the built-in array, but then I'm thinking, unless some advantage comes about, why not giving T[] a little special status? Probably everyone thinks of contiguous memory when thinking buffers, so here generalization may be excessive (albeit meaningful). Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. I find this discussion interesting. There's one idea for an application I'd like to try at some point. Basically a facebook chat thingie, but with richer gaming features. The expected audience will be 10 - 100K simultaneous clients connecting to a single server. Not sure if DOM or SAX will be better. After seeing the Tango's XML benchmarks I was convinced that the implementation platform will be D1/Tango, but now it looks like Phobos is also getting there, propably even outperforming Tango by a clear margin. I don'r mean to derail the topic, but if I were aiming for that many simultaneous users I wouldn't even consider using XML at all. Despite MS's, Java's and AJAX's infatuation with it, XML is really only appropriate in two situations: 1. When memory/bandwidth/speed/etc don't matter and 2. When you don't have a choice in the matter.
Re: buffered input
Heywood Floyd soul...@gmail.com wrote in message news:mailman.1318.1296941395.4748.digitalmar...@puremagic.com... As in, you've built some library that passes around ranges, but then some part of it is slow and needs buffered ones. Isn't the point that you can then swap out your ranges for buffered ones here and there, but continue to have a functioning library? The problem with that is that in many many cases it forces unnessisary copying. We can get much better performance with this slightly more hands-on version. But that said, if the traditional hands-free automatic buffering really is all you need, then such a thing [should] be easily to construct out of the Andrei's style of buffered range. Then follows that popFront() means discard the first _element_, so that the element that was second now becomes first. And if we can agree to that, then shiftFront() is possibly very confusing, and so is appendToFront(). They sound like they operate on the first element! I completely agree. The names of those functions confused the hell out of me until I read Andrei's descriptions of them. Now I understand what they do...but I still don't understand their names at all.
Re: buffered input
2011/2/5 Andrei Alexandrescu seewebsiteforem...@erdani.org: I hereby suggest we define buffered input range of T any range R that satisfies the following conditions: 1. R is an input range of T[] 2. R defines a primitive shiftFront(size_t n). The semantics of the primitive is that, if r.front.length = n, then shiftFront(n) discards the first n elements in r.front. Subsequently r.front will return a slice of the remaining elements. 3. R defines a primitive appendToFront(size_t n). Semantics: adds at most n more elements from the underlying stream and makes them available in addition to whatever was in front. For example if r.front.length was 1024, after the call r.appendToFront(512) will have r.front have length 1536 of which the first 1024 will be the old front and the rest will be newly-read elements (assuming that the stream had enough data). If n = 0, this instructs the stream to add any number of elements at its own discretion. This is really cool. I realise now that appendToFront fills the gap in the design providing only shiftFront/advance. I also thought their names were well-chosen. Torarin
Re: std.xml should just go
On 04/02/2011 04:20, Andrei Alexandrescu wrote: Cool. Is Michael Rynn willing to make a submission? He announced a while ago in d.announce. std.xml2 candidate.. A few weeks earlier (if am not completely wrong) he offers his implementation for phobos. Regarding ranges. - Ranges of ranges which are IMHO needed for any kind of non linear data-structures (say:everything which contains a node) seem to be very difficult to handle. -I could further argue that Steven is still using cursors in dcollections, but well, you already know that :) Finally, don't get me wrong, I still like D. Bjoern
Re: buffered input
On 02/05/2011 10:44 PM, Nick Sabalausky wrote: Andrei Alexandrescuseewebsiteforem...@erdani.org wrote in message news:iijq99$1a5o$1...@digitalmars.com... On 2/5/11 6:46 AM, spir wrote: On 02/05/2011 10:36 AM, Nick Sabalausky wrote: On a separate note, I think a good way of testing the design (and end up getting something useful anyway) would be to try to use it to create a range that's automatically-buffered in a more traditional way. Ie, Given any input range 'myRange', buffered(myRange, 2048) (or something like that) would wrap it in a new input range that automatically buffers the next 2048 elements (asynchronously?) whenever its internal buffer is exhausted. Or something like that. It's late and I'm tired and I can't think anymore ;) That's exactly what I'm expecting. Funnily enough, I was about to start a thread on the topic after reading related posts. My point was: I'm not a specialist in efficiency (rather the opposite), I just know there is --theoretically-- relevant performance loss to expect from unbuffered input in various cases. Could we define a generic input-buffering primitive allowing people to benefit from others' competence? Just like Appender. Denis The buffered range interface as I defined it supports infinite lookahead. The interface mentioned by Nick has lookahead between 1 and 2048. So I don't think my interface is appropriate for that. Infinite lookahead is a wonderful thing. Consider reading lines from a file. Essentially what you need to do is to keep on reading blocks of data until you see \n (possibly followed by some more stuff). Then you offer the client the line up to the \n. When the client wants a new line, you combine the leftover data you already have with new stuff you read. On occasion you need to move over leftovers, but if your internal buffers are large enough that is negligible (I actually tested this recently). Another example: consider dealing with line continuations in reading CSV files. Under certain conditions, you need to read one more line and stitch it with the existing one. This is easy with infinite lookahead, but quite elaborate with lookahead 1. I think I can see how it might be worthwhile to discourage the traditional buffer interface I described in favor of the above. It wouldn't be as trivial to use as what people are used to, but I can see that it could avoid a lot of unnessisary copying, especially with other people's suggestion of allowing the user to provide their own buffer to be filled (and it seems easy enough to learn). But what about when you want a circular buffer? Ie, When you know a certain maximum lookahead is fine and you want to minimize memory usage and buffer-appends. Circular buffers don't do infinite lookahead so the interface maybe doesn't work as well. Plus you probably wouldn't want to provide an interface for slicing into the buffer, since the slice could straddle the wrap-around point which would require a new allocation (ie return buffer[indexOfFront+sliceStart..$] ~ buffer[0..sliceLength-($-(frontIndex+sliceStart))]). I guess maybe that would just call for another type of range. Becomes too complicated, doesnt it? Denis -- _ vita es estrany spir.wikidot.com
Re: buffered input
On 02/05/2011 11:00 PM, Nick Sabalausky wrote: Transparent buffering sounds sensible but in fact it robs you of important capabilities. It essentially forces you to use grammars with lookahead 1 for all input operations. Being able to peek forward into the stream without committing to read from it allows you to e.g. do operations like does this stream start with a specific word etc. As soon That shouldn't be a problem for the cases where a lookahead of 1 is all that's needed. So both types can exist (with the traditional/automatic type most likely built on top of Andrei's type). Thus, I think the only question is Are the appropriate use-cases for the traditional/automatic type minor enough and infrequent enough to actively discourage it by not providing it? That I can't answer. And what about backtracking (eg for parsing the source)? Denis -- _ vita es estrany spir.wikidot.com
Re: buffered input
On 2/5/11 12:59 PM, Tomek Sowiński wrote: Andrei Alexandrescu napisał: Transparent buffering sounds sensible but in fact it robs you of important capabilities. It essentially forces you to use grammars with lookahead 1 for all input operations. Being able to peek forward into the stream without committing to read from it allows you to e.g. do operations like does this stream start with a specific word etc. As soon Broken sentence? Sorry. Well it was nothing interesting anyway. Andrei
Re: buffered input
spir wrote: On 02/05/2011 10:44 PM, Nick Sabalausky wrote: Andrei Alexandrescuseewebsiteforem...@erdani.org wrote in message news:iijq99$1a5o$1...@digitalmars.com... On 2/5/11 6:46 AM, spir wrote: On 02/05/2011 10:36 AM, Nick Sabalausky wrote: On a separate note, I think a good way of testing the design (and end up getting something useful anyway) would be to try to use it to create a range that's automatically-buffered in a more traditional way. Ie, Given any input range 'myRange', buffered(myRange, 2048) (or something like that) would wrap it in a new input range that automatically buffers the next 2048 elements (asynchronously?) whenever its internal buffer is exhausted. Or something like that. It's late and I'm tired and I can't think anymore ;) That's exactly what I'm expecting. Funnily enough, I was about to start a thread on the topic after reading related posts. My point was: I'm not a specialist in efficiency (rather the opposite), I just know there is --theoretically-- relevant performance loss to expect from unbuffered input in various cases. Could we define a generic input-buffering primitive allowing people to benefit from others' competence? Just like Appender. Denis The buffered range interface as I defined it supports infinite lookahead. The interface mentioned by Nick has lookahead between 1 and 2048. So I don't think my interface is appropriate for that. Infinite lookahead is a wonderful thing. Consider reading lines from a file. Essentially what you need to do is to keep on reading blocks of data until you see \n (possibly followed by some more stuff). Then you offer the client the line up to the \n. When the client wants a new line, you combine the leftover data you already have with new stuff you read. On occasion you need to move over leftovers, but if your internal buffers are large enough that is negligible (I actually tested this recently). Another example: consider dealing with line continuations in reading CSV files. Under certain conditions, you need to read one more line and stitch it with the existing one. This is easy with infinite lookahead, but quite elaborate with lookahead 1. I think I can see how it might be worthwhile to discourage the traditional buffer interface I described in favor of the above. It wouldn't be as trivial to use as what people are used to, but I can see that it could avoid a lot of unnessisary copying, especially with other people's suggestion of allowing the user to provide their own buffer to be filled (and it seems easy enough to learn). But what about when you want a circular buffer? Ie, When you know a certain maximum lookahead is fine and you want to minimize memory usage and buffer-appends. Circular buffers don't do infinite lookahead so the interface maybe doesn't work as well. Plus you probably wouldn't want to provide an interface for slicing into the buffer, since the slice could straddle the wrap-around point which would require a new allocation (ie return buffer[indexOfFront+sliceStart..$] ~ buffer[0..sliceLength-($-(frontIndex+sliceStart))]). I guess maybe that would just call for another type of range. Becomes too complicated, doesnt it? Denis Circular buffers don't seem like an 'optional' use case to me. Most real I/O works that way. I think if the abstraction can't handle it, the abstraction is a failure.
Re: buffered input
On 2/5/11 1:18 PM, Tomek Sowiński wrote: Andrei Alexandrescu napisał: I don't see a clear need for the two to be separate. Could they fold into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() discards all and loads any number it pleases. I think combining the two into one hurts usability as often you want to do one without the other. OK, but if you go this way, what would popFront() do? Discard everything in the current buffer and fill a new buffer. The new size depends on the stream; for byLine a new line would be read, for byChunk(4096) 4096 more bytes would be read. Some users would benefit if they could just pass in a buffer and say fill'er up. Correct. That observation applies to unbuffered input as well. Right. Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. I think circularity is an implementation detail that is poor as a client-side abstraction. I fear efficiency will get abstracted out. Say this is my internal buffer (pipes indicate front() slice): [ooo|oo|oo] Now I do appendToFront(3) -- how do you expose the expected front() without moving data? You do end up moving data, but proportionally little if the buffer is large enough. Andrei
Re: buffered input
On 02/05/2011 11:22 PM, Nick Sabalausky wrote: Heywood Floydsoul...@gmail.com wrote in message news:mailman.1318.1296941395.4748.digitalmar...@puremagic.com... As in, you've built some library that passes around ranges, but then some part of it is slow and needs buffered ones. Isn't the point that you can then swap out your ranges for buffered ones here and there, but continue to have a functioning library? The problem with that is that in many many cases it forces unnessisary copying. We can get much better performance with this slightly more hands-on version. But that said, if the traditional hands-free automatic buffering really is all you need, then such a thing [should] be easily to construct out of the Andrei's style of buffered range. Then follows that popFront() means discard the first _element_, so that the element that was second now becomes first. And if we can agree to that, then shiftFront() is possibly very confusing, and so is appendToFront(). They sound like they operate on the first element! I completely agree. The names of those functions confused the hell out of me until I read Andrei's descriptions of them. Now I understand what they do...but I still don't understand their names at all. Same here; thought: maybe he meant shiftBuf() appendToBuf(), or such?. (Then, as nobody reacted about that point, thought: You're the stupid one; shut your mouth!) I also agree with Heywood about first() / popFirst(). Then, shiftFront() / appendToFront() would be less confusing --but still hard to guess (for me). I wonder if his view window is the whole or part of the buffer. Well... (Else, I actually share most of Heywood's views, I guess, at least at first read.) Denis -- _ vita es estrany spir.wikidot.com
Re: buffered input
On Sat, 05 Feb 2011 17:22:01 -0500, Nick Sabalausky a@a.a wrote: Heywood Floyd soul...@gmail.com wrote in message news:mailman.1318.1296941395.4748.digitalmar...@puremagic.com... As in, you've built some library that passes around ranges, but then some part of it is slow and needs buffered ones. Isn't the point that you can then swap out your ranges for buffered ones here and there, but continue to have a functioning library? The problem with that is that in many many cases it forces unnessisary copying. We can get much better performance with this slightly more hands-on version. But that said, if the traditional hands-free automatic buffering really is all you need, then such a thing [should] be easily to construct out of the Andrei's style of buffered range. Then follows that popFront() means discard the first _element_, so that the element that was second now becomes first. And if we can agree to that, then shiftFront() is possibly very confusing, and so is appendToFront(). They sound like they operate on the first element! I completely agree. The names of those functions confused the hell out of me until I read Andrei's descriptions of them. Now I understand what they do...but I still don't understand their names at all. See point of Andrei's post: 1. R is an input range of T[] Which means that front returns an array, not a single element. So they sound like they operate on the first element, because that's exactly what they do. Conceptually, you need to think of buffered inputs as range of ranges, not a range of elements.
Re: buffered input
On 02/05/2011 06:42 PM, Heywood Floyd wrote: Nice! And evenin'! Layman's view: - - - - - - - - - - - (I'm serious, please don't take my post too seriously. I'm not a heavy user of D and I don't want to pollute. I know in NGs exposure means influence and I babble a lot. Still, my thoughts, or rather reactions, could be of interest, I assume, or I wouldn't be writing this : ) I'm not sure how these buffered input ranges are supposed to be used (some mockup sample code would be very cool!), but it seems to me, and please correct me if I'm wrong, that it's very desirable for these ranges to be interchangeable? As in, you've built some library that passes around ranges, but then some part of it is slow and needs buffered ones. Isn't the point that you can then swap out your ranges for buffered ones here and there, but continue to have a functioning library? Reusability, generics, bend the spoon neo and all that? If not, then ok. But if yes, then I think these buffered ranges look very troublesome! Naughty even! * * * Then there's the sneaky break of consistency of the D semantics. Even if these ranges are not intended to be interchangeable, still, changing the (human langauge) semantics that the input ranges already define is not good! This makes D a difficult language to get an intuitive feel for, I think. By the definition of input ranges, the word front symbolizes the first _element_ in a forwards facing queue of elements. | 1:st |-- front() | 2:nd | v-- hidden --v | 3:rd | | . | | n:th |-- back ..as front() returns a T. Then follows that popFront() means discard the first _element_, so that the element that was second now becomes first. And if we can agree to that, then shiftFront() is possibly very confusing, and so is appendToFront(). They sound like they operate on the first element! So it seems these buffered ranges have redefined the semantics for the word front, as meaning the view window into the front part of the queue. Sneaky! I mean, imagine being new with D and skimming through the API docs for ranges, and picking up these function names at a glance. You'd be setting yourself up for one of those aaahaa-now-I-get-why-I-didn't-get-it-moments for sure. Hmmm. Still, front() could very well refer to the front part—to a list of elements (or the view window), and first() could refer to the first element. Actually, that would make the most sense! Then an input range would be first()/popFirst()/empty(), and a buffered one would have all those, but also amend something like front(n)/widenFront(n)/popFront(n), but yeah, erhm. +++ (everything above) I call for stricter and more consistent semantics! Decide what front means when talking about ranges, and stick to it! (And I'm talking about human language semantics, not what a function (or primitive?) does.) Erh, I tried to sound resolute there. Not my thing really. pleased to see there is at least one other programmer still considering that semantic applies to human thoughts, rather than machine process... * * * Besides that, shiftFront got me thinking about sliding windows, and that would actually be cool! As in | 1st | '\-- first() | 2nd | |-- front() // view window | 3rd | ./ | 4th | v-- hidden --v | 5th | | .. | | n:th | There is an off-by-one error between 1st first, I guess ;-) What's your view window? Is it buffer, or the needed amount of lookahead, or what else? How would you draw the buffer, on the first picture or the one above? Sliding window is, for me, the mental picture my brain intuitively forms when thinking at buffered input. But the sliding move may not be smooth (element per element), instead could happen as is most practical or efficient; as long as it remains a point not exposed on the interface (or only on request by client code). Meaning there would be an independant index pointing to current/first/front element, in the buffer or the window, automagically maintained when sliding happens (index -= offset). and then calling shiftFront(2) would shift the view window 2 elements forward (thus fetching 2 and discarding 2). Seems like a useful feature when parsing some encoding with variable point width and known distance to the event horizon, no? As in code.viewDistance = 8; do{ auto p = code.front() if(isLongPoint(p)){ processLong(p) code.shiftFront(8); }else if(isPoint(p)){ process(p) code.shiftFront(4); }else break; }while(p); or something like that. But the semantic that shiftFront would mean the same as popFront(), but on a list of elements? Confusing! Surely, at least popFront(n)... Hm, yeah Ok I'm all out of coffee!!! Thanks for your time! BR /HF Veeery interesting message, thank you. I share your care for correct naming. And the rest, actually. Wish you would post regularly. Denis -- _ vita es estrany
Re: buffered input
On 02/06/2011 01:28 AM, Don wrote: spir wrote: On 02/05/2011 10:44 PM, Nick Sabalausky wrote: Andrei Alexandrescuseewebsiteforem...@erdani.org wrote in message news:iijq99$1a5o$1...@digitalmars.com... On 2/5/11 6:46 AM, spir wrote: On 02/05/2011 10:36 AM, Nick Sabalausky wrote: On a separate note, I think a good way of testing the design (and end up getting something useful anyway) would be to try to use it to create a range that's automatically-buffered in a more traditional way. Ie, Given any input range 'myRange', buffered(myRange, 2048) (or something like that) would wrap it in a new input range that automatically buffers the next 2048 elements (asynchronously?) whenever its internal buffer is exhausted. Or something like that. It's late and I'm tired and I can't think anymore ;) That's exactly what I'm expecting. Funnily enough, I was about to start a thread on the topic after reading related posts. My point was: I'm not a specialist in efficiency (rather the opposite), I just know there is --theoretically-- relevant performance loss to expect from unbuffered input in various cases. Could we define a generic input-buffering primitive allowing people to benefit from others' competence? Just like Appender. Denis The buffered range interface as I defined it supports infinite lookahead. The interface mentioned by Nick has lookahead between 1 and 2048. So I don't think my interface is appropriate for that. Infinite lookahead is a wonderful thing. Consider reading lines from a file. Essentially what you need to do is to keep on reading blocks of data until you see \n (possibly followed by some more stuff). Then you offer the client the line up to the \n. When the client wants a new line, you combine the leftover data you already have with new stuff you read. On occasion you need to move over leftovers, but if your internal buffers are large enough that is negligible (I actually tested this recently). Another example: consider dealing with line continuations in reading CSV files. Under certain conditions, you need to read one more line and stitch it with the existing one. This is easy with infinite lookahead, but quite elaborate with lookahead 1. I think I can see how it might be worthwhile to discourage the traditional buffer interface I described in favor of the above. It wouldn't be as trivial to use as what people are used to, but I can see that it could avoid a lot of unnessisary copying, especially with other people's suggestion of allowing the user to provide their own buffer to be filled (and it seems easy enough to learn). But what about when you want a circular buffer? Ie, When you know a certain maximum lookahead is fine and you want to minimize memory usage and buffer-appends. Circular buffers don't do infinite lookahead so the interface maybe doesn't work as well. Plus you probably wouldn't want to provide an interface for slicing into the buffer, since the slice could straddle the wrap-around point which would require a new allocation (ie return buffer[indexOfFront+sliceStart..$] ~ buffer[0..sliceLength-($-(frontIndex+sliceStart))]). I guess maybe that would just call for another type of range. Becomes too complicated, doesnt it? Denis Circular buffers don't seem like an 'optional' use case to me. Most real I/O works that way. I think if the abstraction can't handle it, the abstraction is a failure. Sorry, I meant the way we start to draw the picture; not circular buffers, can see the point about them. Think Heywood's view window is a helpful image and a good modelling starting point. (maybe it's only me) Denis -- _ vita es estrany spir.wikidot.com
Re: buffered input
Andrei Alexandrescu napisał: I fear efficiency will get abstracted out. Say this is my internal buffer (pipes indicate front() slice): [ooo|oo|oo] Now I do appendToFront(3) -- how do you expose the expected front() without moving data? You do end up moving data, but proportionally little if the buffer is large enough. It still matters for frequent big munches. I'd like a minimum memory option if that's neccessary. -- Tomek
Re: buffered input
spir denis.s...@gmail.com wrote in message news:mailman.1321.1296950957.4748.digitalmar...@puremagic.com... On 02/05/2011 11:00 PM, Nick Sabalausky wrote: Transparent buffering sounds sensible but in fact it robs you of important capabilities. It essentially forces you to use grammars with lookahead 1 for all input operations. Being able to peek forward into the stream without committing to read from it allows you to e.g. do operations like does this stream start with a specific word etc. As soon That shouldn't be a problem for the cases where a lookahead of 1 is all that's needed. So both types can exist (with the traditional/automatic type most likely built on top of Andrei's type). Thus, I think the only question is Are the appropriate use-cases for the traditional/automatic type minor enough and infrequent enough to actively discourage it by not providing it? That I can't answer. And what about backtracking (eg for parsing the source)? Like I said, there are certainly cases where a lookahead of 1 isn't sufficient, and for those, something more like Andrei's proposal can be used. (FWIW, LR doesn't usually need backtracking. That's more typically an LL thing. Not that LL is any less important, though. Of course, if the lexical grammer supports non-consuming lookahead, then you'd still need lookahead 1 no matter what parsing algorithm is used.)
Re: buffered input
On 2/5/11 5:22 PM, Nick Sabalausky wrote: Heywood Floydsoul...@gmail.com wrote in message news:mailman.1318.1296941395.4748.digitalmar...@puremagic.com... As in, you've built some library that passes around ranges, but then some part of it is slow and needs buffered ones. Isn't the point that you can then swap out your ranges for buffered ones here and there, but continue to have a functioning library? The problem with that is that in many many cases it forces unnessisary copying. We can get much better performance with this slightly more hands-on version. But that said, if the traditional hands-free automatic buffering really is all you need, then such a thing [should] be easily to construct out of the Andrei's style of buffered range. Then follows that popFront() means discard the first _element_, so that the element that was second now becomes first. And if we can agree to that, then shiftFront() is possibly very confusing, and so is appendToFront(). They sound like they operate on the first element! I completely agree. The names of those functions confused the hell out of me until I read Andrei's descriptions of them. Now I understand what they do...but I still don't understand their names at all. Better names are always welcome! Andrei
Re: buffered input
This sounds similar to how my network code works. I called the functions fetchMore() to append to the buffer and eat(int n) to advance the front position.
Re: A better assert() [was: Re: std.unittests [updated] for review]
bearophile bearophileh...@lycos.com wrote in message news:iihr42$rqp$1...@digitalmars.com... The replacement for assert(0) is meant to be more clear in its purpose compared to assert(0). It may be named thisCantHappen(), or assertZero(), etc. I vote for fubar(); :) ...or fellOffTheEdgeOfTheWorld(); ...fubar is shorter though.
Re: buffered input
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message news:iil3lv$1bb1$1...@digitalmars.com... On 2/5/11 5:22 PM, Nick Sabalausky wrote: Heywood Floydsoul...@gmail.com wrote in message news:mailman.1318.1296941395.4748.digitalmar...@puremagic.com... As in, you've built some library that passes around ranges, but then some part of it is slow and needs buffered ones. Isn't the point that you can then swap out your ranges for buffered ones here and there, but continue to have a functioning library? The problem with that is that in many many cases it forces unnessisary copying. We can get much better performance with this slightly more hands-on version. But that said, if the traditional hands-free automatic buffering really is all you need, then such a thing [should] be easily to construct out of the Andrei's style of buffered range. Then follows that popFront() means discard the first _element_, so that the element that was second now becomes first. And if we can agree to that, then shiftFront() is possibly very confusing, and so is appendToFront(). They sound like they operate on the first element! I completely agree. The names of those functions confused the hell out of me until I read Andrei's descriptions of them. Now I understand what they do...but I still don't understand their names at all. Better names are always welcome! Borrowing slightly from Adam: discard and fetch?
Re: buffered input
On 2/5/11 7:28 PM, Don wrote: spir wrote: On 02/05/2011 10:44 PM, Nick Sabalausky wrote: Andrei Alexandrescuseewebsiteforem...@erdani.org wrote in message news:iijq99$1a5o$1...@digitalmars.com... On 2/5/11 6:46 AM, spir wrote: On 02/05/2011 10:36 AM, Nick Sabalausky wrote: On a separate note, I think a good way of testing the design (and end up getting something useful anyway) would be to try to use it to create a range that's automatically-buffered in a more traditional way. Ie, Given any input range 'myRange', buffered(myRange, 2048) (or something like that) would wrap it in a new input range that automatically buffers the next 2048 elements (asynchronously?) whenever its internal buffer is exhausted. Or something like that. It's late and I'm tired and I can't think anymore ;) That's exactly what I'm expecting. Funnily enough, I was about to start a thread on the topic after reading related posts. My point was: I'm not a specialist in efficiency (rather the opposite), I just know there is --theoretically-- relevant performance loss to expect from unbuffered input in various cases. Could we define a generic input-buffering primitive allowing people to benefit from others' competence? Just like Appender. Denis The buffered range interface as I defined it supports infinite lookahead. The interface mentioned by Nick has lookahead between 1 and 2048. So I don't think my interface is appropriate for that. Infinite lookahead is a wonderful thing. Consider reading lines from a file. Essentially what you need to do is to keep on reading blocks of data until you see \n (possibly followed by some more stuff). Then you offer the client the line up to the \n. When the client wants a new line, you combine the leftover data you already have with new stuff you read. On occasion you need to move over leftovers, but if your internal buffers are large enough that is negligible (I actually tested this recently). Another example: consider dealing with line continuations in reading CSV files. Under certain conditions, you need to read one more line and stitch it with the existing one. This is easy with infinite lookahead, but quite elaborate with lookahead 1. I think I can see how it might be worthwhile to discourage the traditional buffer interface I described in favor of the above. It wouldn't be as trivial to use as what people are used to, but I can see that it could avoid a lot of unnessisary copying, especially with other people's suggestion of allowing the user to provide their own buffer to be filled (and it seems easy enough to learn). But what about when you want a circular buffer? Ie, When you know a certain maximum lookahead is fine and you want to minimize memory usage and buffer-appends. Circular buffers don't do infinite lookahead so the interface maybe doesn't work as well. Plus you probably wouldn't want to provide an interface for slicing into the buffer, since the slice could straddle the wrap-around point which would require a new allocation (ie return buffer[indexOfFront+sliceStart..$] ~ buffer[0..sliceLength-($-(frontIndex+sliceStart))]). I guess maybe that would just call for another type of range. Becomes too complicated, doesnt it? Denis Circular buffers don't seem like an 'optional' use case to me. Most real I/O works that way. I think if the abstraction can't handle it, the abstraction is a failure. The abstraction does handle it implicitly, except that it doesn't fix the buffer size. If you ask for appendToFront() with large numbers without calling shiftFront() too, the size of the buffer will ultimately increase to accommodate the entire input. That's the infinite in infinite lookahead. Most uses, however, stick with front/popFront (i.e. let the range choose the buffer size and handle circularity transparently) and on occasion call appendToFront()/shiftFront(). Whenever a part of the buffer has been released by calling shiftFront(), the implementation may use it as a circular buffer. Circularity is all transparent, as I think it should be. But the power is there. Consider FIE* for contrast. The C API provides setbuf and setvbuf, but no way to otherwise take advantage of buffering - at all. This really hurts; you can only call fungetc() once and be guaranteed success - i.e. all FILE* offer L(1) capabilities no matter how big a buffer you set! Andrei
Re: buffered input
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message news:iil64l$1f6s$1...@digitalmars.com... On 2/5/11 7:28 PM, Don wrote: Circular buffers don't seem like an 'optional' use case to me. Most real I/O works that way. I think if the abstraction can't handle it, the abstraction is a failure. The abstraction does handle it implicitly, except that it doesn't fix the buffer size. If you ask for appendToFront() with large numbers without calling shiftFront() too, the size of the buffer will ultimately increase to accommodate the entire input. That's the infinite in infinite lookahead. But what about when the window straddles the border? Ex: The circular buffer's internal size is 1000, the current starting point is 900 and the window (ie, front()) is 200. I guess that could work fine if front() is a random-access range, but if it's an array (which I think is what you proposed unless I misunderstood), then front() would have to return a new allocation: buf[900..$]~buf[0..100].
New to D: parse a binary file
Hi, I am new to D. I am trying to write a binary file parser for a project of mine and I thought it would be fun to try and learn a new language at the same time. So I chose D! :D I have been struggling however and have not been able to find very many good examples, so I am posting this message. I think I'm supposed to be using std.stdio, but I'm not 100% sure. Could somebody post a short example of how to parse a couple of characters and ints or whatever from a file? Or how to read, say, the next however many bytes into a struct? Also, looking at the documentation, I am confused by this method signature: T[] rawRead(T)(T[] buffer); I understand that T is generic type, but I am not sure of the meaning of the (T) after the method name. Thanks,
Re: Asynchronous concurrency with reference types
On 5/02/11 12:11 AM, Sean Kelly wrote: Peter Alexander Wrote: Things might be easier if the error messages associated with D's concurrent features weren't especially unhelpful (for example, trying to spawn a thread with reference type parameters just gives you a 'no match for spawn template' error). It's nice that it stops you from doing such things, but it would be nice if it told me why it's not going to let me do them. Could you provide an example? When passing reference data, the error you should see is: Aliases to mutable thread-local data not allowed. It's a static assert inside send(). Now that I've investigated a bit more, it appears to be unrelated to reference types, and instead was an error about using a nested function: import std.concurrency; void main() { void foo() {} spawn(foo); } --- test.d(5): Error: template std.concurrency.spawn(T...) does not match any function template declaration test.d(5): Error: template std.concurrency.spawn(T...) cannot deduce template function from argument types !()(void delegate()) --- Why does it think that the function is a delegate?
Re: Asynchronous concurrency with reference types
On 4/02/11 11:44 PM, Sean Kelly wrote: Peter Alexander Wrote: How would you do it with message passing though? As I understand, all of the std.concurrency message passing routines are blocking, and I need this to be asynchronous. What do you mean by blocking? The receive call will block until a message matching one of the supplied types arrives, but if you don't like this you can always use receiveTimeout. send() doesn't deep copy objects, so the only reference types send() will currently accept are those labeled as shared or immutable (Unique!T will probably be added at some point, which is more appropriate for your situation). So to use send() known unique reference data you'll have to cast to/from shared or immutable. Nasty, but it'll work. That's what I meant by blocking (receive). Is using receiveTimeout with a timeout of 0 seconds the D-way of asynchronous message passing? (seems a bit hacky to me). Thanks for your reply.
Re: Asynchronous concurrency with reference types
On Sat, 05 Feb 2011 20:42:53 +0300, Peter Alexander peter.alexander...@gmail.com wrote: On 5/02/11 12:11 AM, Sean Kelly wrote: Peter Alexander Wrote: Things might be easier if the error messages associated with D's concurrent features weren't especially unhelpful (for example, trying to spawn a thread with reference type parameters just gives you a 'no match for spawn template' error). It's nice that it stops you from doing such things, but it would be nice if it told me why it's not going to let me do them. Could you provide an example? When passing reference data, the error you should see is: Aliases to mutable thread-local data not allowed. It's a static assert inside send(). Now that I've investigated a bit more, it appears to be unrelated to reference types, and instead was an error about using a nested function: import std.concurrency; void main() { void foo() {} spawn(foo); } --- test.d(5): Error: template std.concurrency.spawn(T...) does not match any function template declaration test.d(5): Error: template std.concurrency.spawn(T...) cannot deduce template function from argument types !()(void delegate()) --- Why does it think that the function is a delegate? Because even though foo doesn't use any of the local variables (nor does main declare any), it still has frame pointer as if it was using some: void main() { int x = 42; void foo() { printf(%d, x); } spawn(foo); }
Re: Asynchronous concurrency with reference types
On 02/05/2011 06:42 PM, Peter Alexander wrote: On 5/02/11 12:11 AM, Sean Kelly wrote: Peter Alexander Wrote: Things might be easier if the error messages associated with D's concurrent features weren't especially unhelpful (for example, trying to spawn a thread with reference type parameters just gives you a 'no match for spawn template' error). It's nice that it stops you from doing such things, but it would be nice if it told me why it's not going to let me do them. Could you provide an example? When passing reference data, the error you should see is: Aliases to mutable thread-local data not allowed. It's a static assert inside send(). Now that I've investigated a bit more, it appears to be unrelated to reference types, and instead was an error about using a nested function: import std.concurrency; void main() { void foo() {} spawn(foo); } --- test.d(5): Error: template std.concurrency.spawn(T...) does not match any function template declaration test.d(5): Error: template std.concurrency.spawn(T...) cannot deduce template function from argument types !()(void delegate()) --- Why does it think that the function is a delegate? In complement to what Denis Koroskin answered: when a func is defined inside another one, taking what looks like a func pointer to it automatically turns the result into a delegate. I also find this annoying, even more because there is no automagic func* -- delegate cast. What I would like id: * No func* / delegate distinction on the user side (can be impl optimisation if significant). * Function auto-de/referencing: meaning in your code: spawn(foo). Denis -- _ vita es estrany spir.wikidot.com
Re: New to D: parse a binary file
On 02/05/2011 06:26 PM, scottrick wrote: Hi, I am new to D. I am trying to write a binary file parser for a project of mine and I thought it would be fun to try and learn a new language at the same time. So I chose D! :D I have been struggling however and have not been able to find very many good examples, so I am posting this message. I think I'm supposed to be using std.stdio, but I'm not 100% sure. Could somebody post a short example of how to parse a couple of characters and ints or whatever from a file? Or how to read, say, the next however many bytes into a struct? Also, looking at the documentation, I am confused by this method signature: T[] rawRead(T)(T[] buffer); I understand that T is generic type, but I am not sure of the meaning of the (T) after the method name. Thanks, Below a pair of examples that should make all this clearer: a templated hand-written naive map func, and a template struct type (would be nearly the same for a class). Just run it. Additional explanations on demand. import File=std.file; import std.array; Out[] map (In, Out) (In[] source, Out delegate (In) f) { // (0) Out[] target; foreach (element; source) target ~= f(element); return target; } struct StoreStack (T) { T[] items; string logFileName; this (string logFileName, T[] items=[]) { this.items = items; this.logFileName = logFileName; // create/reset log file File.write(logFileName, ); } string toString () { static form = StoreStack(\%s\, %s); return format(form, this.logFileName, this.items); } void put (T item) { this.items ~= item; string message = format(put item: %s\n, item); File.append(logFileName, message); } T take () { T item = this.items[$-1]; this.items = this.items[0..$-1]; string message = format(took item: %s\n, item); File.append(logFileName, message); return item; } } unittest { // map string hex (uint i) { return format(0x%03X, i); } uint[] decs = [1, 3, 9, 27, 81, 243, 729]; auto hexes = map!(uint,string)(decs, hex); // auto hexes = map(decs, hex); // (1) writefln (decs: %s\n--\nhexes: %s, decs, hexes); writeln(); // StoreStack auto store = StoreStack!(int)(test_log); // auto store = StoreStack!int(test_log); // (2) store.put(3); store.put(2); store.put(3); auto i = store.take(); writefln(store: %s, store); writefln(log:\n%s, File.readText(test_log)); } void main() {} (0) The func must be declared as delegate (instead of simple func pointer) because: the actual func hex beeing defined in a block, the compiler turns it into a delegate. Detail. (1) Here, the compiler is able to infer the template parameters (types): no need to specify them. (2) When there is a single template parameter, the syntax allows omitting () around it. Denis -- _ vita es estrany spir.wikidot.com
Re: New to D: parse a binary file
spir: Out[] map (In, Out) (In[] source, Out delegate (In) f) { // (0) ... string hex (uint i) { return format(0x%03X, i); } uint[] decs = [1, 3, 9, 27, 81, 243, 729]; auto hexes = map!(uint,string)(decs, hex); ... (0) The func must be declared as delegate (instead of simple func pointer) because: the actual func hex beeing defined in a block, the compiler turns it into a delegate. Detail. See also: void foo(In, Out)(Out function(In) f) {} void main() { static int bar(int i) { return 0; } foo(bar); } Bye, bearophile
Setting thread priority
How do you set the priority of a thread, or otherwise control how much CPU time it gets? It appears that std.thread had an answer for this, but it has been removed from Phobos by the looks of things. On a side note, why is std.thread still in the online documentation if it was removed long ago? What's the point of having a tool to automatically generate documentation if we're going to have out of date docs anyway?
Re: Setting thread priority
Peter Alexander Wrote: How do you set the priority of a thread, or otherwise control how much CPU time it gets? Use core.thread. And I believe the method name is setPriority.
[Issue 5528] New: Some integer interval analysis to avoid some casts
http://d.puremagic.com/issues/show_bug.cgi?id=5528 Summary: Some integer interval analysis to avoid some casts Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: DMD AssignedTo: nob...@puremagic.com ReportedBy: bearophile_h...@eml.cc --- Comment #0 from bearophile_h...@eml.cc 2011-02-05 06:07:05 PST --- This little D2 program shows code that's correct: void main() { uint i = 10; ubyte x1 = i % ubyte.max; ulong l = 10; uint x2 = l % uint.max; } But dmd 2.051 asks for casts: test.d(3): Error: cannot implicitly convert expression (i % 255u) of type uint to ubyte test.d(5): Error: cannot implicitly convert expression (l % 4294967295LU) of type ulong to uint I think those casts are not necessary, so I'd like dmd to avoid asking for casts in such situations. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5527] Bug in http://www.digitalmars.com/d/2.0/ctod.html#closures ?
http://d.puremagic.com/issues/show_bug.cgi?id=5527 --- Comment #1 from Dr. Christian Maurer christ...@dr-maurer.eu 2011-02-05 12:20:56 PST --- Dear Community, forgot to explicitly remark, that my question was put with regard to the first example, not to the one with function literals (although from the context with the definition of the line number, that should be obvious). Regards, Christian -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5530] New: std.algorithm.len()
http://d.puremagic.com/issues/show_bug.cgi?id=5530 Summary: std.algorithm.len() Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Phobos AssignedTo: nob...@puremagic.com ReportedBy: bearophile_h...@eml.cc --- Comment #0 from bearophile_h...@eml.cc 2011-02-05 17:22:30 PST --- A simple task asks to sort an array according to the length of its items. This is a D2 solution: import std.stdio, std.algorithm; void main() { auto l = [['a','b','c'],['d','e'],['f','g','h'],['i','j','k','l'],['m','n'],['o']]; schwartzSort!((s){return s.length; })(l); writeln(l); } It's supposed to print: [['o'], ['d', 'e'], ['m', 'n'], ['a', 'b', 'c'], ['f', 'g', 'h'], ['i', 'j', 'k', 'l']] I suggest to add a simple len() function to std.algorithm, that allows to shorten that very common code (mapping lengths is a very common operation): import std.stdio, std.algorithm; size_t len(Range)(Range r) if (is(typeof(r.length))) { return r.length; } void main() { auto l = [['a','b','c'],['d','e'],['f','g','h'],['i','j','k','l'],['m','n'],['o']]; schwartzSort!len(l); writeln(l); } In Python the len() function is a free function to allow it to be used as mapping function, sorting function for a Schwartz sort, etc. In Ruby there is a size standard attribute, and blocks are more used. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---