Re: XMLWriter
Tomek Sowiński napisał: Documentation: http://www.keepandshare.com/doc/2863798/std-xml-html-june-11-2011-2-43-am-93k?da=y#XMLWriter I just noticed it requires everyone to sign in :-( Please use this link: http://pastehtml.com/view/awrj8r4zg.html#XMLWriter -- Tomek
XMLWriter
I've pilfered some time to wrap up and discuss the proposal for an easy to use and efficient XML writer. Documentation: http://www.keepandshare.com/doc/2863798/std-xml-html-june-11-2011-2-43-am-93k?da=y#XMLWriter Code: https://github.com/tomeksowi/phobos/commit/9f8bb890af7e85d5c4a38409ac13a73585bba643 I've been circling the design lately, questioning the existence of each feature and snipping off unnecessary parts. During the process I removed about 60% of the code and have come to a stage where I'm not sure whether further deletions won't cut into the healthy flesh of the project. These doubts are expressed in the questions in the documentation -- I'd like them to guide the discussion. Oh, and please comment on the XMLWriter part only, the rest is old stuff. -- Tomek
Re: string[] enumerations
Nrgyzer napisał: I need enumerations with string[] as base type. What for? -- Tomek
Re: GC for pure functions -- implementation ideas
Don napisał: LEAKY FUNCTIONS Define a 'leaky' pure function as a pure function which can return heap-allocated memory to the caller, ie, where the return value or a parameter passed by reference has at least one pointer or reference type. This can be determined simply by inspecting the signature. (Note that the function does not need to be immutably pure). The interesting thing is that heap allocation inside non-leaky pure functions behaves like stack allocation. When you return from that function, *all* those variables are unreachable, and can be discarded en masse. Here's an idea of how to exploit this. THE PURE HEAP [snip] I'm far from being a GC expert but I think Java having identified such cases with escape analysis just puts locally allocated objects on the stack. Couldn't we too? Your mark release pure heap scheme looks alright but this seems simpler. The notion of non-leaky functions can be useful either way. -- Tomek
Re: GSoC XML library proposal
Andrei Alexandrescu napisał: We have an XML library proposal. I know Tomek Sowinski was working on such. What is the status? The writer is close to being ready to be discussed, I've been working on the documentation lately. As for the parser, I got a pretty much good idea how to go about it but the code is pretty much in the woods. At work I've changed teams which entails lots of reading to get myself up to pace in the new area, travelling to an office abroad, and working till late with code I don't know yet. Now, that's not an excuse, just an honest answer to what's taking so long. I'm still willing to pull this module through. The work frenzy is clearing out and most probably I'll have the time to do some solid work this month and on. Does Tomek or someone else want to apply as a mentor for this project? Perhaps let's do it this way: I'll finish the writer, get it through community scrutiny myself, out of GSoC. The GSoC contribution to std.xml will be limited in scope to parsing. I will serve as a light bulb with all I read up so far on the topic, the lessons learned from my tries, and several years of experience with D. This way the density of reviews as well as the odds of bringing the module home will be higher. If that sounds good, let me know how to apply as a mentor. -- Tomek
Re: Has the ban on returning function nested structs been lifted?
Andrei Alexandrescu napisał: Auto returns + local types = just awesome. Why is it awesome? -- Tomek
Re: Library Development: What to finish/flesh out?
dsimcha napisał: I've accumulated a bunch of little libraries via various evening and weekend hacking projects over the past year or so, in various states of completion. Most are things I'm at least half-considering for Phobos, though some belong as third-party libs. I definitely don't have time to finish/flesh out all of them anytime soon, so I've decided to ask the community what to prioritize. Below is a summary of everything I've been working on, with its current level of completion. Please let me know the following: 1. A relative ordering of how useful you think these libraries would be to the community. 2. In absolute terms, would you find this useful? 3. For the Phobos candidates, whether they're general enough to belong in the **standard** library. List in order from most to least finished: 1. Rational: A library for handling rational numbers exactly. Templated on integer type, can use BigInts for guaranteed accuracy, or fixed-width integers for more speed where the denominator and numerator will be small. Completion state: Mostly finished. Just need to fix a litte bit rot and submit for review. (Phobos candidate) I'd find it useful. As for its presence in Phobos, I'm uncertain if it's in enough demand. 2. RandAA: A hash table implementation with deterministic memory management, based on randomized probing. Main advantage over builtin AAs is that it plays much nicer with the GC and multithreaded programs. Lookup times are also expected O(1) no matter how many collisions exist in modulus hash space, as long as there are few collisions in full 32- or 64-bit hash space. Completion state: Mostly finished. Just needs a little doc improvement, a few benchmarks and submission for review. (Phobos candidate) Useful for me and in Phobos. 3. TempAlloc: A memory allocator based on a thread-local segmented stack, useful for allocating large temporary buffers in things like numerics code. Also comes with a hash table, hash set and AVL tree optimized for this allocation scheme. The advantages over plain old stack allocation are that it's independent of function calls (meaning you can return pointers to TempAlloc-allocated memory from a function, etc.) and it's segmented, meaning you can allocate huge buffers w/o risking stack overflow. Its main weakness is that this stack is not scanned by the GC, meaning that you can't store the only reference to a GC-allocated piece of memory here. However, in practice large arrays of primitives are an extremely common case in performance-critical code. I find this module immensely useful in dstats and Lars Kyllingstad uses it in SciD. Getting it into Phobos would make it easy for other scientific/numerics code to use it. Completion state: Working and used. Needs a litte cleanup and documentation. (Phobos candidate) Useful for me, don't know if for everyone else. 4. Streaming CSV Parser: Parses CSV files as they're read in, a few convenience functions for extracting columns into structs. If Phobos every gets SQLite support I'll probably add sugar for turning a CSV file into an SQLite database, too. Completion state: Prototype working, needs testing, cleanup and documentation. (Phobos candidate) You mean a lazy slurp? It'd be useful for everyone. 5. Matrix operations: SciD improvements that allow you to write matrix operations that look like normal math/MATLAB and optimizes them via expression templates so that a minimal number of temporary matrices are created. Uses/will use BLAS for multiplication. Completion state: Addition implemented. Multiplication not. It is worth considering standardizing at least matrix expressions in Phobos. The motivation is analogous to ranges -- to run an algorithm from lib A on a matrix container from lib B. C++ would be green with envy. I'd be glad to be part of the effort once I'm done with xml. 6. Machine learning: Decision trees, KNN, Random Forest, Logistic Regression, SVM, Naive Bayes, etc. This would be a dstats module. Completion state: Decision trees prototyped, logistic regression working. I'd find it useful, I think anyone who's into this would too. 7. std.mixins: Mixins for commonly needed boilerplate code. I stopped working on this when Andrei suggested that making a collection of mixins into a module is a bad idea. I've thought about it some more and I respectfully disagree. std.mixins would be a one-stop shop for pretty much any boilerplate you need to inject, and most of this code doesn't fit in any other obvious place. Completion state: A few things (struct comparison, simple class constructors, Singleton pattern) prototyped. (Phobos candidate) I'm afraid I also think functionality should be categorized by the purpose it serves rather than implementation technique. 8. GZip support in std.file: I'll leave the stream stuff for someone else, but just simple stuff like read(), write(),
Dream package management system (Was: a cabal for D ?)
Jason E. Aten napisał: Please correct me if I'm wrong, but I observe that there doesn't appear to be a package management system / standard repository for D libraries. Or is there? No, there isn't. I'm talking about something as easy to use as R's CRAN, install.packages(rforest) or cpan for perl, ctan for latex, dpgk/apt for debian, cabal for Haskell/ Hackage, etc. If there's not a commonly utilized one currently, perhaps we could borrow cabal, with a trivial port. cabal is Haskell's package manager. Not only does having a standard package install system facilitate adoption, it greatly facilitates code sharing and library maturation. Yes, we need it badly. I think it's a good moment to start a discussion. First off, what exactly do we want from a package management system? -- Tomek
Re: Code Sandwiches
bearophile napisał: One of the things the paper says about D scope guards is: Scope guards do not provide encapsulation. Yep, they don't. So? -- Tomek
Re: full ident name without mangle/demange?
Nick Sabalausky napisał: Is there a way to get the fully-qualified name of an identifier without doing demange( mangledName!(foo) )? Heh, looks like there isn't. It may be worth filing an enhancement request for __traits(fullyQualifiedName, foo). BTW, what do you need it for? -- Tomek
Re: Google Summer of Code 2011 application
Andrei Alexandrescu napisał: I just submitted an application for GSoC 2011 on behalf of Digital Mars. Please review and contribute to the project ideas page: http://prowiki.org/wiki4d/wiki.cgi?GSOC_2011_Ideas Please throw in database interfacing. Does putting up XML mean I should stop working on it? -- Tomek
Re: Haskell infix syntax
Jonathan M Davis napisał: As a feature of its own, it's just sugar. But if introducing infix operators were contingent on banishing classic operator overloading, then it is worthwhile. LOL. And _what_ benefit would banishing classic operator overloading have? I've worked on a financial system written in Java which used BigDecimal extensively. And, of course, I LOLed at that. But after having spent time with the code, a few benefits surfaced. It was clear which function was user-implemented. Displaying the docs by mousing over was nice too (outside the IDE grepping 'add' is easier than '+'). And above all, no abuse whatsoever. It all didn't outweigh the loss in terseness of syntax but did make up for some of it. I'm bringing up this case because it's extremely in favour of operator overloading. Java is not big on number crunching and BigDecimal is one of the few spots on the vast programming landscape where overloaded operators make sense. And yet, the final verdict was: it doesn't suck. A function named add could be abused in _exactly_ the same ways that + can be. There's far less incentive for abuse as there's no illusory mathematical elegance to pursue. The main benefit that infix syntax would provide would be if you had a variety of mathematical functions beyond what the built in operators give you, and you want to be able to treat them the same way. Whether classic operator overloading exists or not is irrelevant. That's mixing vect1 + vect2 with vect1 `dot` vect2. I'd rather see them treated the same way. Regardless, I don't think that adding infix syntax to the language is worth it. D is already pretty complicated and _definitely_ more complicated than most languages out there. One of the major complaints of C++ is how complicated it is. We don't want to be adding extra complexity to the language without the benefit outweighing that complexity, and I don't think that it's at all clear that it does in this case. I agree. Hence the idea of trading operator overloading for infixing. The added complexity is zero, if not less. As as KennyTM~ pointed out, if UFCS is ever implemented, it gives you most of the benefit of this anyway, and there are already a lot of people around here interested in UFCS. So, I find it _far_ more likely that UFCS gets implemented than an infix function call syntax. I also think it is more probable. -- Tomek
Re: Haskell infix syntax
Caligo napisał: With C++, for example, Eigen uses expression templates. How does one do expression templates in D? Could someone rewrite this http://en.wikipedia.org/wiki/Expression_templates this D? You may look at my approach for QuantLibD. http://dsource.org/projects/quantlibd/browser/ql/math/matrix.d Mind you, project suspended. -- Tomek
LIFO refrigerators
Daniel Gibson napisał: You'd need a fridge with two doors: one in the front, one in the back. Insert new food in the front, get food to eat from the back (or the other way round). But reinsert opened food in the back (or, in the alternative case, in the front). Or a cylinder-shaped refrigerator with rotating food shelves. Put new stuff in the front and turn the shelf slightly clockwise to expose oldest food for eating. Ain't circular buffers yummy? -- Tomek (the patent holder ;-)
Re: Haskell infix syntax
bearophile bearophile napisał: Haskell is full of function calls, so the Haskell designers have used/invented several different ways to avoid some parenthesys in the code. From what I've seen if you remove some parenthesis well, in the right places, the resulting code is less noisy, more readable, and it has less chances to contain a bug (because syntax noise is a good place for bugs to hide). One of the ways used to remove some parenthesys is a standard syntax that's optionally usable on any dyadic function (function with two arguments): sum a b = a + b sum 1 5 == 1 `sum` 5 The `name` syntax is just a different way to call a regular function with two arguments. In Haskell there is also a way to assign an arbitrary precedence and associativity to such infix operators, but some Haskell programmers argue that too much syntax sugar gives troubles ( http://www.haskell.org/haskellwiki/Use_of_infix_operators ). In D the back tick has a different meaning, and even if in D you use a different syntax, like just a $ prefix, I don't know how much good this syntax is for D: int sum(int x, int y) { return x + y; } int s = sum(1, sum(5, sum(6, sum(10, 30; Equals to (associativity of $ is fixed like this): int s = 1 $sum 5 $sum 6 $sum 10 $sum 30; So I think it's not worth adding to D. I vaguely recall someone mentioned infixablility by naming convention. int _add_(int x, int y); int s = 1 _add_ 5 _add_ 10; As a feature of its own, it's just sugar. But if introducing infix operators were contingent on banishing classic operator overloading, then it is worthwhile. -- Tomek
Re: uniqueness propagation
Robert Jacques napisał: On Fri, 25 Feb 2011 02:48:01 -0500, Kevin Bealer kevindangerbea...@removedanger.gmail.com wrote: I think immutable could benefit from a Value Range Propagation-like uniqueness 'unique' has been proposed and heavily discussed before in the news group. There even is std.typecons.Unique. Unfortunately, Walter has stated that there are issues/difficulties in adding 'unique' to the language. What were those difficulties? -- Tomek
Re: Should conversion of mutable return value to immutable allowed?
Ali Çehreli napisał: Implicit conversions to immutable in the following two functions feel harmless. Has this been discussed before? string foo() { char[] s; return s; // Error: cannot implicitly convert expression //(s) of type char[] to string } string bar() { char[] s; return s ~ s; // Error: cannot implicitly convert expression //(s ~ s) of type char[] to string } Is there a reason why that's not possible? I am sure there must be other cases that at least I would find harmless. :) Indeed. The returned object can be safely set to stone when its only aliases to the outside world point to immutable data. Such a guarantee is expressed in today's language by marking the function pure and all its arguments immutable. The conversion is currently not allowed as the above virtue of immutably pure functions was discovered not too long ago. If you want it, vote up: http://d.puremagic.com/issues/show_bug.cgi?id=5081 -- Tomek
Re: Do findSplit, findSplitBefore, and findSplitAfter make until unnecessary?
Jonathan M Davis napisał: Does anyone have a good reason why the findSplit* functions don't make until obsolete and unnecessary? Until is lazy, findSplit* are not. -- Tomek
Re: 'live' testing style
spir napisał: * Why isn't testList a unittest block? Using named funcs, I can switch on off specific test suites by (un)commenting their call from the main and unique unittest block. Else, either they all run, or none. During development, I only keep active the test func(s) relative to the feature I'm currently working on. Remedy: named unittests. The interesting thing about named unit tests is that their names aren't interesting at all. They are usually dull and forced; testing filterFoo will be called testFilterFoo, etc. Their only purpose is to suppress running of unrelated tests. Now, there is a seemingly unrelated proposal to include every ddoc'ed unit test in the preceding declaration as an example. This is great because it implies ownership -- a unit test is 'owned' by the symbol above. Going further, it can also be named after its owner. module ooh; void foo(); unittest { test foo... } Compiling with --unittest=ooh.foo runs this unittest only. Nested control as a bonus: compiling with --unittest=ooh runs only the tests in module ooh. So there you go, named unit tests without naming. -- Tomek
Re: assert(expression, error)
spir napisał: Is there a way to specify what error to throw using (a variant of) assert: assert(n 0, new ValueError(...)); (Sure, one can write: if (n = 0) throw new ValueError(...)); but the same remark applies to plain assert: the whole point of assert is to have it as builtin feature with clear application field well-known semantics, shared by the community of D programmers.) With built-in assert, no. But std.exception can do it. enforce(n 0, new ValueError(...)); -- Tomek
Re: 0nnn octal notation considered harmful
spir napisał: Just had a strange bug --in a test func!-- caused by this notation. This is due in my case to the practice (common, I guess) of pretty printing int numbers using %0nd or %0ns format, to get a nice alignment. Then, if one feeds back results into D code, they are interpreted as octal... Now, i know it: will pad with spaces instead ;-) Copying a string'ed integer is indeed not the only this notation is bug-prone: prefixing a number with '0' should not change its value (!). Several programming languages switched to another notation; like 0onnn, which is consistent with common hex bin notations and cannot lead to misinterpretation. Such a change would be, I guess, backward compatible; and would not be misleading for C coders. This has been discussed before. There's octal!123 in Phobos if you don't like these confusing literals but they stay because Walter likes them. -- Tomek
Re: Assert compilation failure with certain message
Andrej Mitrovic napisał: I've managed to screw up the colon placement though, here's a quick fix: import std.stdio; import std.conv; void staticAssert(alias exp, string message, string file = __FILE__, int line = __LINE__)() { static if (!exp) { pragma(msg, file ~ ( ~ to!string(line) ~ ): ~ staticAssert: ~ to!string(message)); assert(0); } } void main() { enum x = false; staticAssert!(x, Oh no we failed!); int y; } How does it help to find out that compilation tripped on a specific static assertion? -- Tomek
Re: std.concurrency immutable classes...
Steven Schveighoffer napisał: It would be much easier if he provided the specific case(s) which broke his teeth. Then we'll all know where's the problem. If it's soluble, it'll open the door to tail type modifiers in general, not just in classes. It's a burning issue e.g. with ranges (mostly struct). http://d.puremagic.com/issues/show_bug.cgi?id=5377 Look at the attachment to get a feel of what hoops we'll have to jump through to side-step lack of tail X. I've worked through this very same problem (a few months back), thinking that we need a general solution to tail-const. The large issue with tail-const for structs in the general case is that you cannot control the type of 'this'. It's always ref. This might seem like a very inconsequential detail, but I realized that a ref to X does not implicitly convert to a ref to a tail-const X. This violates a rule of two indirections, in which case you are not able to implicitly convert the indirect type, even if the indirect type would implicitly convert outside the reference. A simple example, you cannot convert an int** to a const(int)**. Reason being, then you could change the indirect pointer to point to something that's immutable, and the original int ** now points to immutable data. I tried to understand this on an example and now I'm even more confused. :) int* p; int** pp = p; const(int)** cpp = pp; // compiles fine immutable int i = 7; *cpp = i; **pp = 5; // mutate the immutable writeln(cpp, ' ', pp); writeln(*cpp, ' ', *pp, ' ', i); writeln(**cpp, ' ', **pp, ' ', i); The output is interesting: 12FE08 12FE08 12FE14 12FE14 12FE14 5 5 7 So even they all point to i at the end, it remains unchanged. What gives? Register caching? It doesn't matter as the int** to a const(int)** conversion should fail in the first place, but I'm curious... The same is for tail-const structs, because you go through one ref via 'this' and the other ref via the referring member. What does this all mean? It basically means that you have to define *separate* functions for tail-const and const, and separate functions for tail-immutable and immutable. This is untenable. I, from the very first discussions, assumed tail-const functions are inevitable. You define empty() as const but popFront() as tail-const. Feels natural. You might ask why doesn't this problem occur with tail-const arrays?, well because you *don't pass them by ref*. With structs we have no choice. I think what we need is a way to define two different structs as being the tail-const version of the other, with some compiler help, and then we do not need to define a new flavor of const functions. We still need to define these tail-const functions, but it comes in a more understandable form. But importantly, the implicit cast makes a *temporary* copy of the struct, allowing the cast to work. I'd like to understand it better. How would you define with this scheme, say, a range on a const collection, to which ranges on an (im)mutable collection are implicitly convertible? -- Tomek
Re: std.concurrency immutable classes...
Michel Fortin napisał: Thanks for doing this. Is it approved by Walter? Depends on what you mean by approved. He commented once on the newsgroup after I posted an earlier version of the patch, saying I should add tests for type deduction and some other stuff. This change his something he attempted to do in the past and failed, I expect him to be skeptical. It would be much easier if he provided the specific case(s) which broke his teeth. Then we'll all know where's the problem. If it's soluble, it'll open the door to tail type modifiers in general, not just in classes. It's a burning issue e.g. with ranges (mostly struct). http://d.puremagic.com/issues/show_bug.cgi?id=5377 Look at the attachment to get a feel of what hoops we'll have to jump through to side-step lack of tail X. I guess he'll review it when he has the time and I hope he'll merge these changes in the mainline. He'll probably want to take his time however, since it can break existing code in some cases; it's basically a change to the language. If you want to show your support, I guess you can vote up the enhancement request in the bugzilla. http://d.puremagic.com/issues/show_bug.cgi?id=5325 Also feel free to compile it, test it, and share your experience. The more tested it is, the more used and appreciated it is, the more exposure it gets, the sooner it gets approved, or so I guess. I'd love to, but I'm putting shreds of my spare time to xml. -- Tomek
Assert compilation failure with certain message
Is there a way to statically assert compilation of an expression failed *with a certain message*? I want to check my static asserts trip when they should. -- Tomek
Re: Assert compilation failure with certain message
bearophile napisał: Is there a way to statically assert compilation of an expression failed *with a certain message*? I want to check my static asserts trip when they should. I have asked something like this a lot of time ago, but I don't know a way to do it. You are able to statically assert that some code doesn't compile, but I don't know how to assert that a certain message gets produced. You are asking for a specific static catch :-) Static catch, yeah. But I'd be content with traits__(fails, expr, msg) which seems tractable. -- Tomek
Re: High performance XML parser
Steven Schveighoffer napisał: OK, so you mean a buffer other than the I/O buffer. This means double buffering data. I was thinking of a solution that allows simply using the I/O buffer for parsing. I think this is one of the keys to Tango's xml performance. I'd be glad to hear what's your idea. I think they are convergent. In mine, the I/O could be asked to dump data to the iterator's buffer at a given position (right to previous nodes), then the iterator forms a node out of raw data. Some moving would be done but all within the cached buffer so should be quick. I guess it's as far as I can predict performance in a newsgroup post. ;-) Gotta write some code and whip out the stopwatch, then we'll see. -- Tomek
Re: Efficient outputting of to-string conversions
Andrei Alexandrescu napisał: I know about Steven's proposal but it applies only to user types not primitives. Either way std.conv.to would need a buffered output range as integers are written from the right. Any chance for an abstraction analogous to buffered input ranges discussed recently? Generally I found it more difficult to define a solid output buffer abstraction. This is a great motivating example though. To my surprise, an API of the same form seems to be what the doctor prescribed. Here's a semi-formal definition: A buffered output range R is defined as such: R.front returns the currently uncommitted buffer of type T[] R.moreFront(n) makes n more elements available for writing R.commitFront(n) writes the first n elements in front() R.flushFront() writes the buffer currently held in front() and makes another buffer available (initially empty). I was thinking along the same lines. There's one missing: R.skipFront(n) skips the first n elements without outputting Why? Look at integral conversions in std.conv.to. It first calculates maximum string size, then writes numbers to the char array back to front, then returns result[$ - ndigits .. $] where ndigits is how long the string turned out. Returning to Steven's DIP, I think writeTo should take the above rather than void delegate(char[]). With the latter you still have to allocate the pieces. Our buffered output range is friends with polymorphism too. If you set T=char, its API is devoid of generics. Such interface can be placed in object.d with an official blessing. -- Tomek
Re: Efficient outputting of to-string conversions
Andrei Alexandrescu napisał: For the latter, Tomek's idea of passing an output range as an optional second parameter seems appropriate. Please file as an enhancement to bugzilla. If anyone has time to work on this, please do. If not, I'll work on it as my schedule allows. http://d.puremagic.com/issues/show_bug.cgi?id=5548 -- Tomek
Re: Please reply to this to vote to collectExceptionMsg in std.unittests
Andrei Alexandrescu napisał: Reply here to vote ONLY for the function collectExceptionMsg in Jonathan M Davis's std.unittests. Vote closes on Tue Feb 15. I'm in two minds. Since Jonathan has improved collectException the proposed function is just a short-hand for: auto e = collectException!MyException(expression); assert (e); assert (e.msg == ...); or: assert (collectException!MyException(expression) == new MyException(msg)); I would use these because of the possibility to test properties other than .msg. Also, there's an ambiguity ex.msg is null vs. didn't throw. But perhaps msg is important enough to deserve a dedicated wrapper. I'll vote in favour, given that the docs are shrunk to something like: Convenience function for extracting the exception's message. Equivalent of: --- auto e = collectException(mayThrow); string msg = e ? e.msg : null; --- And put a link to collectException. -- Tomek
Re: High performance XML parser
Steven Schveighoffer napisał: The design I'm thinking is that the node iterator will own a buffer. One consequence is that the fields of the current node will point to the buffer akin to foreach(line; File.byLine), so in order to lift the input the user will have to dup (or process the node in-place). As new nodes will be overwritten on the same piece of memory, an important trait of the design emerges: cache intensity. Because of XML namespaces I think it is necessary for the buffer to contain the current node plus all its parents. That might not scale well. For instance, if you are accessing the 1500th child element of a parent, doesn't that mean that the buffer must contain the full text for the previous 1499 elements in order to also contain the parent? Maybe I'm misunderstanding what you mean. Let's talk on an example: a name=value b Some Text 1 c2 !-- HERE -- Some text 2 /c2 Some Text 3 /b /a The buffer of the iterator positioned HERE would be: [Node a | Node b | Node c2] Node c2 and all its parents are available for inspection. Node a's attribute is stored in the buffer, but not b's Some text 1 as it is c2's sibling; Some text 1 was available in the previous iteration, now it's overwritten by c2. To get to Some text 2 let's advance the iterator in depth to get: [Node a | Node b | Node c2 | Text node Some text 2] Advancing it once more we get to: [Node a | Node b | Text node Some text 3] So Some text 3 is written where c2 and the text node 2 used to be. The element type of the range would always be the child, parents available through pointers: foreach (node; xmlRange) { doStuff(node); if (Node* parent = node.parent) doOtherStuff(parent); } Having no access to siblings is quite limiting but the iterator can form an efficient (zero-allocation) basis on which more convenient schemes are built upon. It's still just brain-storming, though. I fear there's something that'll make the whole idea crash burn. I would start out with a non-compliant parser, but one that allocates nothing beyond the I/O buffer, one that simply parses lazily and can be used as well as a SAX parser. Then see how much extra allocations we need to get it to be compliant. Then, one can choose the compliancy level based on what performance penalties one is willing to incur. Yeah, 100% compliance is a long way. -- Tomek
Efficient outputting of to-string conversions
Looks like std.conv.to always allocates behind the scenes. It's a shame as the returned string is immediately processed and discarded in my XML writer. Are there plans to include a custom output variant, e.g. to!string(7, outputRange)? -- Tomek
Re: Efficient outputting of to-string conversions
Jonathan M Davis napisał: On Monday 07 February 2011 13:10:09 Tomek Sowiński wrote: Looks like std.conv.to always allocates behind the scenes. It's a shame as the returned string is immediately processed and discarded in my XML writer. Are there plans to include a custom output variant, e.g. to!string(7, outputRange)? http://prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP9 I know about Steven's proposal but it applies only to user types not primitives. Either way std.conv.to would need a buffered output range as integers are written from the right. Any chance for an abstraction analogous to buffered input ranges discussed recently? -- Tomek
Re: std.concurrency immutable classes...
Michel Fortin napisał: I just made this pull request today: https://github.com/D-Programming-Language/dmd/pull/ If you want to test it, you're very welcome. Here is my development branch for this feature: https://github.com/michelf/dmd/tree/const-object-ref Thanks for doing this. Is it approved by Walter? -- Tomek
Re: buffered input
Nick Sabalausky napisał: discard and fetch? I like that.
Writing XML
While I'm circling the problem of parsing, I took a quick look at writing not to get stuck in analysis-paralysis. Writing XML is pretty independent from parsing and an order of magnitude easier to solve. It was perfect to get myself coding. These are the guidelines I followed: * Memory minimalism: don't force allocating an intermediate node structure just to push a few tags down the wire. * Composability: operating on an arbitrary string output range. * Robustness: tags should not be left open, even if the routine producing tag interior throws. * Simplicity of syntax: resembling real XML if possible. * Space efficiency / readability: can write tightly (without indents and newlines) for faster network transfer and, having easy an means for temporary tight writing, for better readability. * Ease of use: - automatic to!string of non-string values, - automatic string escaping according to XML standard, - handle nulls: close the tags short (tag/), don't write attributes with null values at all. * anything else? The new writer meets pretty much all of the above. Here's an example to get a feel of it: auto books = [ Book([Name(Grębosz, Jerzy)], Pasja C++, 1999), Book([Name(Navin, Robert, N.)], Mathemetics of Derivatives, 2007), Book([Name(Tokarczuk, Olga)], Podróż ludzi Księgi, 1996), Book([Name(Graham, Ronald, L.), Name(Knuth, Donald, E.), Name(Patashnik, Oren)], Matematyka Konkretna, 2008) ]; auto outputRange = ... ; auto xml = xmlWriter(outputRange); xml.comment(books.length, favorite books of mine.); foreach (book; books) { xml.book(year, book.year, { foreach (author; book.authors) { xml.tight.authorName({ xml.first(author.first); xml.middle(author.middle); xml.last(author.last); }); } xml.tight.title(book.title); }); } - program output - !-- 4 favorite books of mine. -- book year=1999 authorNamefirstJerzy/firstmiddle/lastGrębosz/last/authorName titlePasja C++/title /book book year=2007 authorNamefirstRobert/firstmiddleN./middlelastNavin/last/authorName titleMathemetics of Derivatives/title /book book year=1996 authorNamefirstOlga/firstmiddle/lastTokarczuk/last/authorName titlePodróż ludzi Księgi/title /book book year=2008 authorNamefirstRonald/firstmiddleL./middlelastGraham/last/authorName authorNamefirstDonald/firstmiddleE./middlelastKnuth/last/authorName authorNamefirstOren/firstmiddle/lastPatashnik/last/authorName titleMatematyka Konkretna/title /book Questions and comments? -- Tomek
Re: buffered input
Andrei Alexandrescu napisał: Also: could a (truely) circular buffer help solve the above copy problem, concretely? Not if you want infinite lookahead, which I think is what any modern buffering system should offer. Truely circular, probably not, but a wrap-around slice (circular view of length at most underlying.length) does offer that and solves the copy problem with style. -- Tomek
Re: Writing XML
Rainer Schuetze napisał: This looks nice and compact Using opDispatch to specify the tag (I guess that is what you are using to create a tag book by calling xml.book()) feels like misusing opDispatch, though. Does it add readability in contrast to passing the tag as a string to some function? How do you write a tag named tight? Or a tag calculated at runtime? xml.tag(tight, attributes..., { make content }); That's the base implementation. opDispatch is just syntax sugar over it. Something more conventional would be xml.tag(book, attr(year, book.year), { ... but I'm not sure that pairing the attribute name and value adds readability or mere noise. Putting name and value without a wrapper tuple is just sugar. Having some sort of structure representing an attribute is inevitable as we come at namespaces. In the end it should accept any range of (namespace-)name-value tuples as attributes. -- Tomek
std.concurrency immutable classes...
... doesn't work. class C {} thisTid.send(new immutable(C)()); receive((immutable C) { writeln(got it!); }); This throws: core.exception.AssertError@/usr/include/d/dmd/phobos/std/variant.d(285): immutable(C) And when I go for Rebindable, I get Aliases to mutable thread-local data not allowed.. Is there anything I can do? Overall, I think that's another reason D needs native tail const badly. Polymorphic classes are close to being second class citizens just as soon const enters. :( -- Tomek
Re: buffered input
Andrei Alexandrescu napisał: I hereby suggest we define buffered input range of T any range R that satisfies the following conditions: 1. R is an input range of T[] 2. R defines a primitive shiftFront(size_t n). The semantics of the primitive is that, if r.front.length = n, then shiftFront(n) discards the first n elements in r.front. Subsequently r.front will return a slice of the remaining elements. 3. R defines a primitive appendToFront(size_t n). Semantics: adds at most n more elements from the underlying stream and makes them available in addition to whatever was in front. For example if r.front.length was 1024, after the call r.appendToFront(512) will have r.front have length 1536 of which the first 1024 will be the old front and the rest will be newly-read elements (assuming that the stream had enough data). If n = 0, this instructs the stream to add any number of elements at its own discretion. I don't see a clear need for the two to be separate. Could they fold into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() discards all and loads any number it pleases. This is it. I like many things about this design, although I still fear some fatal flaw may be found with it. With these primitives a lot of good operating operating on buffered streams can be written efficiently. The range is allowed to reuse data in its buffers (unless that would contradict language invariants, e.g. if T is invariant), so if client code wants to stash away parts of the input, it needs to make a copy. Some users would benefit if they could just pass in a buffer and say fill'er up. One great thing is that buffered ranges as defined above play very well with both ranges and built-in arrays - two quintessential parts of D. I look at this and say, this all makes sense. For example the design could be generalized to operate on some random-access range other than the built-in array, but then I'm thinking, unless some advantage comes about, why not giving T[] a little special status? Probably everyone thinks of contiguous memory when thinking buffers, so here generalization may be excessive (albeit meaningful). Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. -- Tomek
Re: buffered input
Tomek Sowiński napisał: Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. I meant: when n + front.length buf.length. -- Tomek
Re: buffered input
Andrei Alexandrescu napisał: Transparent buffering sounds sensible but in fact it robs you of important capabilities. It essentially forces you to use grammars with lookahead 1 for all input operations. Being able to peek forward into the stream without committing to read from it allows you to e.g. do operations like does this stream start with a specific word etc. As soon Broken sentence?
Re: buffered input
Andrei Alexandrescu napisał: I don't see a clear need for the two to be separate. Could they fold into popFront(n, m) meaning shiftFront(n); appendToFront(m) ? Nullary popFront() discards all and loads any number it pleases. I think combining the two into one hurts usability as often you want to do one without the other. OK, but if you go this way, what would popFront() do? Some users would benefit if they could just pass in a buffer and say fill'er up. Correct. That observation applies to unbuffered input as well. Right. Contiguous, yes. But I'd rather see front() exposing, say, a circular buffer so that appendToFront(n) reallocates only when n buf.length. I think circularity is an implementation detail that is poor as a client-side abstraction. I fear efficiency will get abstracted out. Say this is my internal buffer (pipes indicate front() slice): [ooo|oo|oo] Now I do appendToFront(3) -- how do you expose the expected front() without moving data? -- Tomek
Re: buffered input
Jean Crystof napisał: I find this discussion interesting. There's one idea for an application I'd like to try at some point. Basically a facebook chat thingie, but with richer gaming features. The expected audience will be 10 - 100K simultaneous clients connecting to a single server. Not sure if DOM or SAX will be better. After seeing the Tango's XML benchmarks I was convinced that the implementation platform will be D1/Tango, but now it looks like Phobos is also getting there, propably even outperforming Tango by a clear margin. Thanks for having faith ;-) Since even looking at Tango's documentation has intellectual property problems and likely causes taint, I could make an independent benchmark comparing the two and their interfaces later. But I propaply need to avoid going into too much details, otherwise the Phobos developers wouldn't be able to read it without changing their license. That would be helpful. From what I've read so far, the proposed design looks very much like what Tango has now in their I/O framework. But probably Phobos's TLS default and immutable strings improve multithreaded performance even more. Well, immutability doesn't help much because a buffer must be written to. Speaking of multithreading, I was thinking of an implementation where an internal thread is doing I/O. It loads data in front of the current front() slice, as much as the internal buffer can hold. The motivation is to overlap content processing and I/O operations so that less time is spent in total. Although there is some interaction overhead: locking, syncing caches so that cores see the same buffer. -- Tomek
Re: buffered input
Andrei Alexandrescu napisał: I fear efficiency will get abstracted out. Say this is my internal buffer (pipes indicate front() slice): [ooo|oo|oo] Now I do appendToFront(3) -- how do you expose the expected front() without moving data? You do end up moving data, but proportionally little if the buffer is large enough. It still matters for frequent big munches. I'd like a minimum memory option if that's neccessary. -- Tomek
High performance XML parser
I am now intensely accumulating information on how to go about creating a high-performance parser as it quickly became clear that my old one won't deliver. And if anything is clear is that memory is the key. One way is the slicing approach mentioned on this NG, notably used by RapidXML. I already contacted Marcin (the author) to ensure that using solutions inspired by his lib is OK with him; it is. But I don't think I'll go this way. One reason is, surprisingly, performance. RapidXML cannot start parsing until the entire document is loaded and ready as a random-access string. Then it's blazingly fast but the time for I/O has already elapsed. Besides, as Marcin himself said, we need a 100% W3C-compliant implementation and RapidXML isn't one. I think a much more fertile approach is to operate on a forward range, perhaps assuming bufferized input. That way I can start parsing as soon as the first buffer gets filled. Not to mention that the end result will use much less memory. Plenty of the XML data stream is indents, spaces, and markup -- there's no reason to copy all this into memory. To sum up, I belive memory and overlapping I/O latencies with parsing effort are pivotal. Please comment on this. -- Tomek
Re: David Simcha's std.parallelism
dsimcha napisał: I could move it over to github, though I'll wait to do that until I get a little more comfortable with Git. I had never used Git before until Phobos switched to it. In the mean time, to remind, the code is at: http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/std_parallelism.d The docs are at: http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html Please run the docs through a spell-checker, there are a few typos: asyncBuf() - for ecample stop() - waitied lazyMap() - Parameters; But I think it's good overall. These primitives are in demand. -- Tomek
Re: High performance XML parser
Michel Fortin napisał: I agree it's important, especially when receiving XML over the network, but I also think it's important to be able to be able to support slicing. Imagine all the memory you could save by just making slices of a memory-mapped file. The difficulty is to support both models: the input range model which requires copying the strings and the slicing model where you're just taking slices of a string. These are valid concerns. Yet, in overwhelming majority XML documents come from hard-drive and network -- these are the places we need to drill. I fear that trying to cover every remote use case will render the library incomprehensible. -- Tomek
Re: High performance XML parser
Steven Schveighoffer napisał: Here is how I would approach it (without doing any research). First, we need a buffered I/O system where you can easily access and manipulate the buffer. I have proposed one a few months ago in this NG. Second, I'd implement the XML lib as a range where front() gives you an XMLNode. If the XMLNode is an element, it will have eager access to the element tag, and lazy access to the attributes and the sub-nodes. Each XMLNode will provide a forward range for the child nodes. Thus you can skip whole elements in the stream by popFront'ing a range, and dive deeper via accessing the nodes of the range. I'm unsure how well this will work, or if you can accomplish all of it without reallocation (in particular, you may need to store the element information, maybe via a specialized member function?). Heh, yesterday when I couldn't sleep I was sketching the design. I converged to a pretty much same concept, so your comment is reassuring :). The design I'm thinking is that the node iterator will own a buffer. One consequence is that the fields of the current node will point to the buffer akin to foreach(line; File.byLine), so in order to lift the input the user will have to dup (or process the node in-place). As new nodes will be overwritten on the same piece of memory, an important trait of the design emerges: cache intensity. Because of XML namespaces I think it is necessary for the buffer to contain the current node plus all its parents. Namespaces are the technical reason but having access to the path all the way to the root node is of value, regardless. This suggests mark-release memory management. The buffer will have to be long enough to fit the deepest tag sequence: theoretically infinite, not that much in practice. Like I said, the buffer will be owned by the iterator so probably deterministic deallocation is possible when the processing is done. One drawback is that you won't know you're dealing with a well-formed DOM until the closing tag comes. If it doesn't, it'll of course throw, but the malformed DOM may already have been digested. So providing some rollback possibility is up to the user. -- Tomek
Re: High performance XML parser
Steven Schveighoffer napisał: Here is how I would approach it (without doing any research). First, we need a buffered I/O system where you can easily access and manipulate the buffer. I have proposed one a few months ago in this NG. Second, I'd implement the XML lib as a range where front() gives you an XMLNode. If the XMLNode is an element, it will have eager access to the element tag, and lazy access to the attributes and the sub-nodes. Each XMLNode will provide a forward range for the child nodes. Thus you can skip whole elements in the stream by popFront'ing a range, and dive deeper via accessing the nodes of the range. I'm unsure how well this will work, or if you can accomplish all of it without reallocation (in particular, you may need to store the element information, maybe via a specialized member function?). Heh, yesterday when I couldn't sleep I was sketching the design. I converged to a pretty much same concept, so your comment is reassuring :). The design I'm thinking is that the node iterator will own a buffer. One consequence is that the fields of the current node will point to the buffer akin to foreach(line; File.byLine), so in order to lift the input the user will have to dup (or process the node in-place). As new nodes will be overwritten on the same piece of memory, an important trait of the design emerges: cache intensity. Because of XML namespaces I think it is necessary for the buffer to contain the current node plus all its parents. Namespaces are the technical reason but having access to the path all the way to the root node is of value, regardless. This suggests mark-release memory management. The buffer will have to be long enough to fit the deepest tag sequence: theoretically infinite, not that much in practice. Like I said, the buffer will be owned by the iterator so probably deterministic deallocation is possible when the processing is done. One drawback is that you won't know you're dealing with a well-formed DOM until the closing tag comes. If it doesn't, it'll of course throw, but the malformed DOM may already have been digested. So providing some rollback possibility is up to the user. Oh, and the direction of iteration (deeper/farther) will of course be controllable in fashion you presented. -- Tomek
Re: A monitor for every object
Steven Schveighoffer napisał: D's monitors are lazily created, so there should be no issue with resource allocation. If you don't ever lock an object instance, it's not going to consume any resources. Most of the time the extra word isn't noticed because the memory size of a class is usually not exactly a power of 2. Except when you put'em in an array. Could happen. D also allows you to replace it's monitor with a custom monitor object (i.e. core.sync.Mutex) so you can have more control over the mutex, assign the same mutex to multiple objects, use conditions, etc. It's much more flexible than Java or C# IMO. I didn't know, thx. Where is it documented? -- Tomek
Re: A monitor for every object
Tomek Sowiński napisał: D's monitors are lazily created, so there should be no issue with resource allocation. If you don't ever lock an object instance, it's not going to consume any resources. Most of the time the extra word isn't noticed because the memory size of a class is usually not exactly a power of 2. Except when you put'em in an array. Could happen. Sorry, for some reason I thought the mutex is on the stack. -- Tomek
Re: std.xml should just go
Andrei Alexandrescu napisał: Is anyone tasked with a replacement yet? I had to write an XML parser at some point. It's plenty of work bringing up to industrial quality, so I'd have to know that before I dive in. Nobody that I know of. If you want to discuss design here while working on it, that would be great. Alright, I'm game. I'll assemble something discussable. I could think of a few high-level requirements: My requirements are similar. (if I don't comment below, then I agree) * works with input ranges so we can plug it in with any source * works with all UTF widths (statically selectable) * avoids where possible memory allocation (perhaps by offering incremental access a la joiner()) What you mean by incremental access? A lazy range? It's obvious for the lexer, but on a higher level? Not sure if I can start traversing the DOM until the closing tag comes (if at all)... A lazy range of tags defined in the global scope seems possible, though. * avoids often-called delegates in favor of alias functions What use case of delegates are you talking about? * is familiar in concept to people who've used today's successful XML libraries -- Tomek
Re: std.xml should just go
Jonathan M Davis napisał: I think that at least a couple of people have said that they have the beginnings of a replacement, but I don't believe that anyone has stepped up to say that they'll actually complete and propose a module for inclusion in Phobos. Wimps ;-) So, std.xml is still very much up in the air, and Tango has set a very high bar with regards to speed. And while we may not be able to match Tango for speed - especially at first - we'd definitely like to have an xml solution that's close. And that's not necessarily going to be easy - especially since we're inevitably going to want a range-based solution. And while ranges can be quite efficient, it can also be easy to make them inefficient if you're not careful. Speaking of Tango, may I look at it? I remember that beef over the first datetime and it gives me shivers... -- Tomek
Re: std.xml should just go
Daniel Gibson napisał: They can claim whatever they want.. if Tomek says he only looked at the documentation (for an idea how a good interface for a XML lib may look like) they can hardly prove anything. One remark: I haven't even looked at the doc. That's why I was asking may I look. -- Tomek
Re: std.xml should just go
spir spir napisał: You probably shouldn't look at the source. I dunno about the interface (documentation) - it's certainly not illegal to take inspiration from it, but maybe then people will again claim that source was stolen.. but when you claim that you haven't looked at the source it may be ok.. Maybe a clean-room approach is possible: Somebody else looks at the source and documents what it does and how it does that (without copying anything) and you could use that documentation for your own code. If you don't want to clone it but have questions about how they did something specific you could just ask here and (hopefully) someone looks it up and explains it to you. Mamma mia! In what world are we supposed to live!? My thoughts exactly. I mean, as soon as Jonathan mentioned Tango's XML, I knee-jerkingly got paranoid and asked about legality of even reading about it to stay clear. I only hope having heard about it is legal. -- Tomek
Max length of a LOC: poll results (Was: On 80 columns...)
Tomek Sowiński napisał: Actually that's a splendid idea. Let's take it easy. Regardless of that silly beef I'm really curious what distribution will emerge. What is your preferred *maximum* length for a line of D code? (please reply with a number only) Alright, I'm wrapping up this toy study. Two things before the numbers come: - A few respondents gave 2 numbers, one reasonable, the other if I really have to. I took the latter (larger) number as I was after maximum length, something usable as a setting for a repository hook. - 2 respondents said no limit. I excluded them from computations, albeit it's a valid answer. 1 respondent answered 1 mole which I also excluded as a 22-order-of-magnitude outlier. lengths = c(80, 80, 110, 120, 80, 80, 100, 100, 120, 110, 90) summary(lengths) Min. 1st Qu. MedianMean 3rd Qu.Max. 80.00 80.00 100.00 97.27 110.00 120.00 sd(lengths) # standard deviation [1] 16.18080 quantile(lengths, c(.1, .25, .5, .75, .9)) 10% 25% 50% 75% 90% 80 80 100 110 120 library(moments) skewness(lengths) # take with a grain of salt, little data [1] 0.1645005 length(lengths) # count [1] 11 -- Tomek
Re: Max length of a LOC: poll results (Was: On 80 columns...)
Tomek Sowiński napisał: Alright, I'm wrapping up this toy study. Two things before the numbers come: - A few respondents gave 2 numbers, one reasonable, the other if I really have to. I took the latter (larger) number as I was after maximum length, something usable as a setting for a repository hook. - 2 respondents said no limit. I excluded them from computations, albeit it's a valid answer. 1 respondent answered 1 mole which I also excluded as a 22-order-of-magnitude outlier. Steven came in late with his datapoint, so once again: lengths = c(80, 80, 110, 120, 80, 80, 100, 100, 120, 110, 90, 80) summary(lengths) Min. 1st Qu. MedianMean 3rd Qu.Max. 80.00 80.00 95.00 95.83 110.00 120.00 sd(lengths) # standard deviation [1] 16.21354 quantile(lengths, c(.1, .25, .5, .75, .9)) 10% 25% 50% 75% 90% 80 80 95 110 119 skewness(lengths) # take with a grain of salt, little data [1] 0.3121957 length(lengths) # count [1] 12 -- Tomek
Re: Image Resizing by Seam Carving (Was: On 80 columns should (not) be enough foreveryone)
Nick Sabalausky napisał: Now, what we need is the audio-equivalent of this: http://www.youtube.com/watch?v=6NcIJXTlugc That's really cool, and seems so obvious in retrospect. There's a D implementation: http://dsource.org/projects/seamzgood but it's abandoned. -- Tomek
Re: d-programming-language.org
Andrei Alexandrescu napisał: In agreement with Walter, I removed the Digitalmars reference. The message is simple - D has long become an entity independent from the company that created it. (However, this makes the page header look different and probably less visually appealing.) The header's D should be in red. It's become a bit of a community crest and it fits the color scheme like a glove. -- Tomek
(Was: On 80 columns should (not) be enough for everyone)
Andrej Mitrovic napisał: If you really want to set up a column limit that *everyone* has to abide to, then make a poll to see what everyone can agree on. Actually that's a splendid idea. Let's take it easy. Regardless of that silly beef I'm really curious what distribution will emerge. What is your preferred *maximum* length for a line of D code? (please reply with a number only) -- Tomek
Re: (Was: On 80 columns should (not) be enough for everyone)
Tomek Sowiński napisał: What is your preferred *maximum* length for a line of D code? (please reply with a number only) 120. -- Tomek
Re: On 80 columns should (not) be enough for everyone
Sean Kelly napisał: Print text doesn't have indentation levels though. Assuming a 4 character indent, the smallest indentation level for code in a D member function is 8 characters. Add a nested conditional and code is starting 16 characters in, which when wrapped at 80 characters begins to look like a newspaper column. I wrap all my comments at 79 characters, but allow code to spill as far as 110 (which is the number of columns on an 8.5x11 piece of paper in landscape mode). Yeah. If counted without indents, 90 characters would probably suffice, but with them it's at least 120 so that nested code doesn't get stifled. And I'm programming with a proportional font -- far more readable than a mono-space. -- Tomek
Re: (Was: On 80 columns should (not) be enough for everyone)
Walter Bright napisał: What is your preferred *maximum* length for a line of D code? (please reply with a number only) 6.022e+23 That's a whole mole of code! ;-) -- Tomek
Re: General unicode category
spir spir napisał: DUnicode has such functionality: https://bitbucket.org/stephan/dunicode/src Watch inside unicodedata.d, search for general category. Thanks. Any word of moving some of it into Phobos? It's jarring to see a Unicode-compliant language have so few tools to work with the standard. -- Tomek
Re: Decision on container design
Michel Fortin napisał: Is there anything implementation specific in the outer struct that provides ref semantics to Impl? If not, Container could be generic, parametrized by Impl type. You could provide an implementation-specific version of some functions as an optimization. For instance there is no need to create the Impl when asking for the length, if the pointer is null, length is zero. Typically, const function can be implemented in the outward container with a shortcut checking for null. I think the reference struct can still be orthogonal to the container. struct Ref(Impl) { private Impl* _impl; ref Impl impl() @property { if (!impl) impl = new Impl; return *impl; } static if (hasLength!Impl) { auto length() @property { return impl ? impl.length : 0; } } alias impl this; } Reusability lightens the burden of the container's author (less fuss for user implementations) and somewhat standardizes containers as they all must exhibit a certain API with certain semantics to be able to fit into Ref. The downside is that the syntax for the most common case (ref semantics) is a little nosier than for value-like behavior. -- Tomek
Re: Decision on container design
Michel Fortin napisał: As for the case of Appender... personally in the case above I'd be tempted to use Appender.Impl directly (value semantics) and make fill take a 'ref'. There's no point in having an extra heap allocation, especially if you're calling test() in a loop or if there's a good chance fill() has nothing to append to it. Or take an output range. -- Tomek
Re: Decision on container design
bearophile napisał: This page: http://www.jroller.com/scolebourne/entry/the_next_big_jvm_language1 A quotation: 3) Everything is a monitor. In Java and the JVM, every object is a monitor, meaning that you can synchronize on any object. This is incredibly wasteful at the JVM level. Senior JVM guys have indicated large percentage improvements in JVM space and performance if we removed the requirement that every object can be synchronized on. (Instead, you would have specific classes like Java 5 Lock) I have read similar comments in various other places. What about creating a @nomonitor annotation, for D2 classes to not create a monitor for specific classes annotated with it? This may reduce some class overhead. Better just remove it, it's not used often. Besides, there are different locks, one size doesn't fit all. -- Tomek
Re: Suggestion: New D front page
Christopher Bergqvist napisał: Hi! I have been putting some free time into creating a design skeleton for a new http://d-programming-language.orghttp://www.d-programming-language.org/ front page: http://digitalpoetry.se/D%20website/D%20overview%20design.png My main concern is presenting newcomers with an inspiring and relevant first impression of D. I think there is lots to gain by having a more alive front page not based on Ddoc (the rest of the site could still be based on it). I have not attempted adding any visual style to the design myself since its not one of my strengths. It should be made to fit better with the overall theme of d-programming-language.org (although IMO it's currently a bit too dark and foreboding). I must confess to being heavily inspired by http://ooc-lang.org and http://cobra-language.com. As creating this would take a significant time investment, I suggest that some more complex sections of the page could be released after the initial version. I have some background in web development but have been almost exclusively doing professional C++ games development during the last 4 years. I would not mind putting some more work into this but am also hopeful that some others in the D community desire to contribute. Constructive feedback with a minimum of bikeshedding is welcome. (Please avoid discussions about specific textual content for now, its just placeholders). Believe it or not but there was a time when the D page welcomed users with beautiful exemplary code, but as time went by it got pushed off by quotes, current status, news, etc. Looking back, it may have been the reason why I didn't say oh.. um.. NEXT! and stayed with D :) I think we need to go back to the roots. -- Tomek
Re: assert(object) fails to adhere to the principle of least surprise
Bernard Helyer napisał: If I do if (object) { ... } What happens is fairly obvious, and is equivalent to if (object !is null) { } However, if I do auto object = new Object(); assert(object); What I expect to happen is assert(object !is null); Just as in the above example. What happens however is the program seg faults. Why? Because it turns out what DMD turns it (silently) into is object.checkInvariants(); // Whatever it's called. This is bad enough, however it gets pants-on-head stupid as *object is not checked for null*. I think the silent rewrite is bad design, but not checking for null is so stupid, so obvious to anyone who actually uses the language, I can't believe it's existed for so long. The fact that assert(object); and import std.exception; enforce(object); do different things boggles my mind. One must write assert(object !is null); or assert(!!object); and every day it's like a giant stabbing pain. A stupid wrong headed design that makes my experience with D _worse_. Just expose a method for checking the invariant explicitly, and don't do this silent rewrite bullshit. Any chance of getting a change of behaviour? FWIW, GDC doesn't do the rewrite, and SDC (the compiler I'm working on github.com/bhelyer/sdc) won't either. http://d.puremagic.com/issues/show_bug.cgi?id=796 Vote up ;) -- Tomek
Re: structs vs classes
Jim napisał: I'm only discussing the heap/stack difference. Classes with value semantics would be prone to the slicing problem. -- Tomek
Re: structs vs classes
Matthias Walter napisał: That is of course a difference, but no argument. The reason is that you can decide whether you want to allocate a class on the stack: http://www.digitalmars.com/d/2.0/memory.html#stackclass AFAIR scope classes are to be banished from the language. There's emplace instead. http://digitalmars.com/d/2.0/phobos/std_conv.html#emplace -- Tomek
Re: Suggestion: New D front page
Russel Winder napisał: I think the current page style looks fine, actually I like it and do not consider it dark and foreboding (*). This is not though a vote for not changing if there is something that is going to be more appealing to a wider range of programmers. (*) Or maybe I am just depressed and it fits with the sense of doom and despondency ;-) You're not depressed, just subconsciously keen on prolonging your eye-sight ;-) Let's blend Chris' dynamic layout with David's toned color scheme, shall we? -- Tomek
Re: Nested function declarations
Tomek Sowiński napisał: What is the purpose of nested function declarations in D? Is it a good idea to just disallow them? 1. Helper functions don't clutter the namespace. 2. Nested functions can access the outer function's stack frame. OK, I just noticed you asked about declarations, not nested functions in general. They're useful for testing: unittest { int foo(); static assert (is(ReturnType!foo == int)); } -- Tomek
Re: How can you read and understand the source of *naryFun in functional.d?
Tom napisał: I am learning D for some time. I come from background of C, C# and Python. When I saw the ways to use std.algorithem's functions, I have noticed that the input lambda's can be writen as strings. Somewhat like the pythonic exec. I went to the source of this feature in functional.d (https://github.com/D-Programming-Language/phobos/blob/master/std/functional.d;). The functions unaryFun and binaryFun. Is there a way I can read them and understand them easily? or maybe I missed something? The standard library implementation must cater for a lot of corner-cases. But the essence is this: template binaryFun(string expr) { auto binaryFun(T, U)(T a, U b) { return mixin(expr); } } unittest { assert (binaryFun!a+b(1,2) == 3); assert (binaryFun!a-b(1,2) == -1); } The magic happens at the mixin line. It takes any expression or statement in string form and compiles it in context of the function. Unlike pythonic exec, the string must be known at compile-time. -- Tomek
General unicode category
How can I get the general unicode category (Lu, Nd, Pc, etc.) of a dchar? std.uni contains barely anything useful. -- Tomek
Re: Decision on container design
Michel Fortin napisał: We already argument this over and over in the past. First, I totally acknowledge that C++ style containers have a problem: they make it easier to copy the content than pass it by reference. On the other side of the spectrum, I think that class semantics makes it too easy to have null dereferences, it's easy to get lost when you have a container of containers. I have some experience with containers having class-style semantics: in Objective-C, I ended up creating a set of macro-like functions which I use to initialize containers whenever I use them in case they are null. And I had to do more of these utility functions to handle a particular data structure of mine which is a dictionary of arrays of objects. In C++, I'd have declared this as a map string, vector Object and be done with it; no need for special care initializing each vector, so much easier than in Objective-C. I agree that defining structs to have reference semantics as you have done is complicated. But I like the lazy initialization, and we have a precedent for that with AAs (ideally, AAs would be a compatible container too). Can't we just use the GC instead of reference counting? I'd make things much easier. Here is a implementation: struct Container { struct Impl { ... } private Impl* _impl; ref Impl impl() @property { if (!impl) impl = new Impl; return *impl; } alias impl this; } I also believe reference semantics are not to be used everywhere, even though they're good most of the time. I'd like to have a way to bypass it and get a value-semantic container. With the above, it's easy as long as you keep Container.Impl public: void main() { Container lazyHeapAllocatedContainer; Container.Impl stackAllocatedContainer; } void MyObject { Container.Impl listOfObjects; } Is there anything implementation specific in the outer struct that provides ref semantics to Impl? If not, Container could be generic, parametrized by Impl type. Overall, I think a value-like implementation in a referency wrapper is a clear-cut idiom, bringing order to otherwise messy struct-implemented ref-semantics. Do you know of a existing collection library that exploits this idea? -- Tomek
Re: dlist for phobos
Andrei Alexandrescu napisał: ref returns should be guaranteed to never escape. Should meaning they're not guaranteed now? I'm curious in what scenarios they escape. -- Tomek
Re: dlist for phobos
Andrei Alexandrescu napisał: On 1/27/11 4:48 PM, Tomek Sowiński wrote: Andrei Alexandrescu napisał: ref returns should be guaranteed to never escape. Should meaning they're not guaranteed now? I'm curious in what scenarios they escape. Any function can take the address of a reference (either a ref parameter or the result of another function) and squirrel it away. Jeez.. I must've had some brain-warp that I didn't think of r.front :-) But is just banning taking addresses of ref parameters and return values going to solve the problem? Sounds delusively simple... -- Tomek
Re: immutable
Trass3r napisał: But thank you for the answer, I have filed the bug. Rats, I've filed one too ;) http://d.puremagic.com/issues/show_bug.cgi?id=5492 We found it over a year ago :) http://d.puremagic.com/issues/show_bug.cgi?id=3534 -- Tomek
Re: Is D still alive?
Walter Bright napisał: bearophile wrote: Walter: The reason that took so long was that few people were using DbC effectively, so it was a low priority. I originally had high hopes that DbC would produce dramatic improvements in code quality, but the real world results were disappointing. After many years and many failed hopes, I think there is no silver bullet in programming, so maybe nothing is able to produce dramatic improvements in code quality. But even if this is true, some things are able to improve coding a bit, like unit testing, a well semantically defined language, syntax coloring, quick compile-run cycles, OOP for certain kinds of programs, DbC, and so on. Each of such things improve the situation only a little, but such improvements pile up and most programmers when have tried them don't want to go back to miss those things. Unit testing has produced a dramatic improvement in coding. Yes, it's big. Funny that it's not really a technical change but a cultural one -- D just leaves no excuses to even the most stone-age programmers not to test their code. -- Tomek
Re: Is D still alive?
Steven Schveighoffer napisał: Adam Ruppe and Piotr Szturmaj have recently been working on some database stuff. See the recent thread Can your programming language do this? I have ignored that thread (I sometimes just ignore threads because they start out uninteresting, or become uninteresting, and then I miss out on some good stuff!) I'll have to take a look, D2 really does need a DB interface -- badly. That and networking. I can help with the latter as I had done a bit of network devving, but I don't know what's the current state of affairs (sb working on it already?) and whether Phobos needs another soul on-board. I would say it is not ready for prime-time yet. It has a way to go, but some have managed to build pretty impressive applications from it. So it would depend on your application. Personally, I think that even though D still has some things to be worked out, I think it's *still* far better than any of the other more mature languages. It all seems really good until you hit an issue that cannot be worked around -- like a compiler error or a misdesigned feature. I call these 'mercy' problems, because you are then at the complete mercy of someone else. If you have a deadline, or have a complete stoppage in work, you really have little choice but to move onto another language or abandon the project. Dcollections sat idle for about a year because of a problem like this. Yeah, ditto for QuantLibD. I just spent too much time on a test project trying to isolate dmd and phobos bugs to submit something meaningful to bugzilla and too little time coding. Not to mention that sometimes it was really hard to know what the language *should* do because of outdated documentation. But maybe the storm has passed and I should try serious work in D again? [snip] BTW, I plan to write a semi-professional project in D2 in the near future, but I'm 1) willing to take the risks 2) have no deadline and 3) not depending on this project for a living. Sheer curiosity: what will the project be about? -- Tomek
Re: Showing unittest in documentation (Was Re: std.unittests [updated] for review)
Steven Schveighoffer napisał: BTW I consider this a very important topic. We have _plenty_ of examples that don't work and are not mechanically verifiable. The reasons range from minor typos to language changes to implementation limitations. Generally this is what they call documentation rot. This is terrible PR for the language. Changing ddoc to recognize documentation unittests would fix this matter once and forever. Last but not least, the separators for code samples are awful because no editor recognizes them for anything - they confuse the hell out of Emacs for one thing. This only makes sense if: 1. The unit test immediately follows the item being documented 2. The unit test *only* tests that item. The second one could be pretty annoying. Consider cases where several functions interact (I've seen this many times on Microsoft's Documentation), and it makes sense to make one example that covers all of them. Having them 'testable' means creating several identical unit tests. One way to easily fix this is to allow an additional parameter to the comment: /** Example(Foo.foo(int), Foo.bar(int)): */ unittest { auto foo = new Foo; foo.foo(5); foo.bar(6); assert(foo.toString() == bazunga!); } The above means, copy the example to both Foo.foo(int) and Foo.bar(int) An alternative that is more verbose, but probably more understandable: /** Example: Covers Foo.foo(int) Covers Foo.bar(int) */ Of course, a lack of target just means it applies to the item just documented. Although coming from good intentions, it's just.. too much. The original idea is very compelling without add-ons. Often the interacting functions are members of the same class or at least same module, so it's enough to place the unittest appropriately. To cover remaining cases an artificial declaration may be introduced. /// Uses of Foo.foo(int) and Foo.bar(int) struct foo_and_bar_examples; /// Example: unittest { ... } Both functions would simply link to the artificial symbol in their ddocs. One other thing, using writefln is considered bad form in unit tests (you want *no* output if the unit test works). But many examples might want to demonstrate how e.g. an object interacts with writefln. Any suggestions? The assert line above is not very pretty for example... I was thinking of mockFile.writefln(obj) but not sure if std.stdio can handle it. -- Tomek
Re: std.unittests [updated] for review
Dnia 2011-01-24, o godz. 06:34:49 Jonathan M Davis jmdavisp...@gmx.com napisał(a): In case you didn't know, I have a set of unit test helper functions which have been being reviewed for possible inclusion in phobos. Here's an update. Most recent code: http://is.gd/F1OHat Okay. I took the previous suggestions into consideration and adjusted the code a bit more. However, most of the changes are to the documentation (though there are some changes to the code). Some of the code duplication was removed, and the way that some of the assertPred functions' errors are formatted has been altered so that values line up vertically, making them easier to compare. That's a solid improvement, thanks. The big change is the docs though. There's now a fake version of assertPred at the top with an overall description for assertPred followed by the individual versions with as little documentation as seemed appropriate while still getting all of the necessary information across. A couple of the functions still have irritatingly long example sections, but anything less wouldn't get the functionality across. I'm not sure... Examples: assertPred!+(7, 5, 12); assertPred!-(7, 5, 2); assertPred!*(7, 5, 35); assertPred!/(7, 5, 1); assertPred!%(7, 5, 2); assertPred!^^(7, 5, 16_807); assertPred!(7, 5, 5); assertPred!|(7, 5, 7); assertPred!^(7, 5, 2); assertPred!(7, 1, 14); assertPred!(7, 1, 3); assertPred!(-7, 1, 2_147_483_644); assertPred!~(hello , world, hello world); assert(collectExceptionMsg(assertPred!+(7, 5, 11)) == assertPred!\+\ failed: [7] + [5]:\n ~ [12] (actual)\n ~ [11] (expected).); assert(collectExceptionMsg(assertPred!/(11, 2, 6, It failed!)) == assertPred!\/\ failed: [11] / [2]:\n ~ [5] (actual)\n ~ [6] (expected): It failed!); Picking only one or two from the above would be enough to get it. It's the description that ought to explain the function's behavior in all cases, examples are for jump-starting the user to action. Oh, one more thing. Previously you asked me why a generic collectThrown is useful and I forgot to answer. One use is the same as collectExceptionMsg() without being tied to the msg property. auto e = collectThrown!MyException(expr); assert(e); assert(e.errorCode == expectedCode); assert(cast(MyCauseException) e.next); I'm not proposing to yank collectExceptionMsg or assertThrown in favor of collectThrown, they're useful idioms. But having also collectThrown (a generic replacement for existing collectException) would definitely be of value. In any case. Here's the updated code. Review away. Andrei set the vote deadline for February 7th, at which point, if it passes majority vote, then it will go into Phobos. The number of functions is small enough now (thanks to having consolidated most of them into the fantastically versatile assertPred) that it looks like it will likely go in std.exception if the vote passes rather than becoming a new module. So, the std.unittests title has now become a bit of a misnomer, but that's what I've been calling it, so it seemed appropriate to continue to label it that way in the thread's title. Good luck! -- Tomek
Re: Ad hoc ranges
Andrei Alexandrescu napisał: On 1/21/11 7:35 PM, Tomek Sowiński wrote: Andrei Alexandrescu napisał: Like I said, anything that doesn't bother to expose range-interfaced iterators and is not performance critical is considered a target for ad hoc ranges. Working with non-D libraries, or libraries ported to D but preserving mother-language idioms. Tasks like traversing a tree of GUI widgets, or business specific objects where defining proper ranges rarely happens and is use-case driven in practice. I expect they could be of some use in unittesting as mock input. Vaguely related: educational -- ad hoc ranges read almost like a for loop so the learning curve for ranges in general is eased off. Adding them to Phobos is an interesting idea. We need to evaluate their worth, though. Everybody: if you could write up a one-liner like range(empty, popFront, front), what would you use it for? How about a singleton range - a range with exactly one element. It could be done with repeat(x, 1) but let's try it with your function as a warm-up exercise. If x is nullable, range(x, x=null, x); it destroys x, though. Otherwise the state must be held separately on the stack. bool empty; auto r = range(empty, empty=true, x); So repeat(x, 1) wins this one. I think such nuggets can better be expressed as a degenerate case of existing facilities. I envision ad hoc ranges at places where no iteration is defined and a one-off range struct doesn't pay. Like database-backed entities which don't conform to any clear-cut data structure, but if you squint you see it's sort of a tree, and you may just be able to e.g. walk through children recursively fetching only active ones from DB, traverse columns of interest, and dump their content to a grid component which takes an arbitrary range of values. And all this can be wrapped in std.parallelism to overlap DB round trips. I think the challenge here is to figure out where to store the state. The idiom makes it difficult for the delegates to communicate state to one another. On the stack, for loops do it for years. -- Tomek
Re: Python's partition
Andrei Alexandrescu napisał: Looking through Python's string functions (http://docs.python.org/release/2.5.2/lib/string-methods.html) I noticed partition(): partition(sep) Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings. New in version 2.5. Right now we find find and findSkip; partition would be a great complement, and can be implemented for all forward ranges. One question is naming - partition() is not good for us because std.algorithm.partition implements Hoare's in-place partition algorithm. How should we call the function? Instead of a one-shot function, would a lazy range of pre-hit-post troikas be possible? That'd rhyme nicely with RegexMatch. In fact, match(string, string) overload is free... -- Tomek
Re: replaceFirst, findPieces, and takeExactly
Andrei Alexandrescu napisał: On 1/22/11 5:14 PM, Nick Sabalausky wrote: Andrei Alexandrescuseewebsiteforem...@erdani.org wrote in message news:ihfm34$jvb$1...@digitalmars.com... On 1/22/11 4:16 PM, bearophile wrote: Andrei: Back then people said that STL's find() is better than D's find() because the former returns an iterator that can be combined with either the first iterator to get the portion before the match, or with the last iterator to get the portion starting at the match. D's find() only gives you the portion after the match. There's a HUGE problem here. This equivalence is sometimes true, but surely not always true: more powerful != better That function allows you to pick a determined number of elements from a range, assuming the range is never shorter than that. That sounds a bit obscure, but plays a pivotal role in findParts() (which is the name I settled on for the equivalent of Python's partition()): trisect is way better than findParts :-) And it's a single word with no uppercase letters in the middle. There is still time until the next release. Votes for trisect? vote-- findParts is the sort of thing that once you read what it does just *once*, it immediately becomes both obvious and easy to remember. But trisect is 1. scary, 2. I'd never remember it, and 3. Whenever I'd come across it, I'd never remember what it meant. Those are paricularly bad since I know right now I'm going to find it an incredibly useful function: There's already been too many times I've written this mess and felt dirty about it: auto result = find(str, delim); auto firstPart = str[0..$-result.length]; So I'm thrilled to see this function being added. Yes, I'm absolutely in agreement with the naming (and thrilled too). I imagine a putative user looking through std.algorithm (let's see... what find functions are out there?). That makes findPieces easy to get to, whereas trisect would be oddly situated in the alphabetic list and oddly named enough to be virtually undiscoverable. Me a tad less, but not because of the name. I'd still rather see a lazy range of pre-hit-post tuples. Am I the only one to see findParts as a no-patterns variation of RegexMatch accepting all element types, not just char? Then even the name comes naturally -- match. -- Tomek
Re: Ad hoc ranges
Jonathan M Davis napisał: I don't know a terser way to get a full-fledged range. It comes at a cost, though. Lazy parameters are just sugar over delegates, so it's not exactly Usain Bolt**... And you can't return it because by bug or by design lazy parameters (unlike vanilla delegates) don't work like closures. Still, even with the overhead and limitations the idiom is remarkably useful, especially in face of range-unfriendly libraries from outside D realm. Enjoy. What types of stuff do you need ad-hoc ranges for? What's the use case? I've never actually needed such a thing. I'm curious. If it's really something that's likely to be generally useful, then a function similar to what you're suggesting probably should be added to std.range. Like I said, anything that doesn't bother to expose range-interfaced iterators and is not performance critical is considered a target for ad hoc ranges. Working with non-D libraries, or libraries ported to D but preserving mother-language idioms. Tasks like traversing a tree of GUI widgets, or business specific objects where defining proper ranges rarely happens and is use-case driven in practice. I expect they could be of some use in unittesting as mock input. Vaguely related: educational -- ad hoc ranges read almost like a for loop so the learning curve for ranges in general is eased off. Adding them to Phobos is an interesting idea. We need to evaluate their worth, though. Everybody: if you could write up a one-liner like range(empty, popFront, front), what would you use it for? -- Tomek
Re: renamepalooza time
Jonathan M Davis napisał: These should be expanded a bit and camelCased: LS:lineSep, lineSeparator PS:paragraphSep, paragraphSeparator Isn't there a rule that constants all fully uppercase? That would be typical in C++ or Java, but that's not the case in D. Phobos certainly doesn't work that way in general, and Andrei doesn't want it to. The reasoning is that constants are so common in D (likely due to CTFE) that you'd have variables all over the place which were in all caps, and it would get really annoying. Right on. So, no. There is no rule in D that constants should be fully uppercase. So if not uppercase, what is the convention for constants then? And, to hair-split more, what is a constant to begin with? Would e.g. a big immutable configuration tree structure fall into that bucket? Or a logger object? -- Tomek
Re: Ad hoc ranges
Andrei Alexandrescu napisał: Like I said, anything that doesn't bother to expose range-interfaced iterators and is not performance critical is considered a target for ad hoc ranges. Working with non-D libraries, or libraries ported to D but preserving mother-language idioms. Tasks like traversing a tree of GUI widgets, or business specific objects where defining proper ranges rarely happens and is use-case driven in practice. I expect they could be of some use in unittesting as mock input. Vaguely related: educational -- ad hoc ranges read almost like a for loop so the learning curve for ranges in general is eased off. Adding them to Phobos is an interesting idea. We need to evaluate their worth, though. Everybody: if you could write up a one-liner like range(empty, popFront, front), what would you use it for? How about a singleton range - a range with exactly one element. It could be done with repeat(x, 1) but let's try it with your function as a warm-up exercise. If x is nullable, range(x, x=null, x); it destroys x, though. Otherwise the state must be held separately on the stack. bool empty; auto r = range(empty, empty=true, x); So repeat(x, 1) wins this one. I think such nuggets can better be expressed as a degenerate case of existing facilities. I envision ad hoc ranges at places where no iteration is defined and a one-off range struct doesn't pay. Like database-backed entities which don't conform to any clear-cut data structure, but if you squint you see it's sort of a tree, and you may just be able to e.g. walk through children recursively fetching only active ones from DB, traverse columns of interest, and dump their content to a grid component which takes an arbitrary range of values. And all this can be wrapped in std.parallelism to overlap DB round trips. -- Tomek
Ad hoc ranges
Doing my own deeds, I often found myself in need of writing up a range just to e.g. feed it into an algorithm. Problem is, defining even the simplest range -- one-pass forward -- is verbose enough to render this (correct) approach unprofitable. This is how I went about the problem: auto range(T, Whatever)(lazy bool _empty, lazy Whatever _popFront, lazy T _front) { struct AdHocRange { @property bool empty() { return _empty(); } void popFront() { _popFront(); } @property T front() { return _front(); } } return AdHocRange(); } --- example --- try { ... } catch(Throwable t) { auto r = range(t is null, t = t.next, t); // process exception chain... } I don't know a terser way to get a full-fledged range. It comes at a cost, though. Lazy parameters are just sugar over delegates, so it's not exactly Usain Bolt**... And you can't return it because by bug or by design lazy parameters (unlike vanilla delegates) don't work like closures. Still, even with the overhead and limitations the idiom is remarkably useful, especially in face of range-unfriendly libraries from outside D realm. Enjoy. -- Tomek ** Of course, there exists a somewhat more verbose compile-time variant of the idiom I presented.
Re: Ad hoc ranges
bearophile napisał: I am not sure, but I think Andrei has deprecated the lazy attribute. Yes, but AFAIR in favor of implicit conversions of expressions to parameterless delegates, which strengthens my little idiom. -- Tomek
Re: Implicit delegate conversions
Steven Schveighoffer napisał: I think this is one place where D can improve by vast amounts without a lot of effort (no change in code generation, just in implicit casting). Yeah, my thoughts exactly. And bumping into a signature mismatch has gotten really likely. I've brought this up, and contributed to one bugzilla report requesting contravariant delegates (which was denied by Walter). Why was it denied? (or just point me to the bug, pls) -- Tomek
Re: repeat
Andrei Alexandrescu napisał: std.range has a function repeat that repeats one value forever. For example, repeat(42) is an infinite range containing 42, 42, 42,... The same module also has a function replicate that repeats one value a specific number of times. In fact, replicate can be expressed as an overload of repeat, so that's what I just did (not committed yet): repeat(42, 100) repeats 42 one hundred times, repeat(42) repeats 42 forever. I'll put replicate on the deprecation chute. So far so good. Now, string has its own repeat. repeat(abc, 2) returns the string abcabc. I want to generalize the functionality in string's repeat and move it outside std.string. There is an obvious semantic clash here. If you say repeat(abc, 3) did you mean one string abcabcabc or three strings abc, abc, and abc? So we need distinct names for the functions. One repeats one value, the other repeats a range. Moreover, I'm thinking sometimes you want to repeat a range lazily, i.e. instead of producing abcabc just return a range that looks like it. Ideas for a good naming scheme are welcome. Overload cycle and call it a day? -- Tomek
Re: repeat
Andrei Alexandrescu napisał: On 1/17/11 1:53 PM, Tomek Sowiński wrote: Andrei Alexandrescu napisał: std.range has a function repeat that repeats one value forever. For example, repeat(42) is an infinite range containing 42, 42, 42,... The same module also has a function replicate that repeats one value a specific number of times. In fact, replicate can be expressed as an overload of repeat, so that's what I just did (not committed yet): repeat(42, 100) repeats 42 one hundred times, repeat(42) repeats 42 forever. I'll put replicate on the deprecation chute. So far so good. Now, string has its own repeat. repeat(abc, 2) returns the string abcabc. I want to generalize the functionality in string's repeat and move it outside std.string. There is an obvious semantic clash here. If you say repeat(abc, 3) did you mean one string abcabcabc or three strings abc, abc, and abc? So we need distinct names for the functions. One repeats one value, the other repeats a range. Moreover, I'm thinking sometimes you want to repeat a range lazily, i.e. instead of producing abcabc just return a range that looks like it. Ideas for a good naming scheme are welcome. Overload cycle and call it a day? cycle(r, n) already has a meaning: cycle r for a maximum total of n elements. Now I'm confused. The docs say it's an initial index... -- Tomek
Re: repeat
Andrei Alexandrescu napisał: On 1/17/11 2:14 PM, Tomek Sowiński wrote: Andrei Alexandrescu napisał: On 1/17/11 1:53 PM, Tomek Sowiński wrote: Andrei Alexandrescu napisał: std.range has a function repeat that repeats one value forever. For example, repeat(42) is an infinite range containing 42, 42, 42,... The same module also has a function replicate that repeats one value a specific number of times. In fact, replicate can be expressed as an overload of repeat, so that's what I just did (not committed yet): repeat(42, 100) repeats 42 one hundred times, repeat(42) repeats 42 forever. I'll put replicate on the deprecation chute. So far so good. Now, string has its own repeat. repeat(abc, 2) returns the string abcabc. I want to generalize the functionality in string's repeat and move it outside std.string. There is an obvious semantic clash here. If you say repeat(abc, 3) did you mean one string abcabcabc or three strings abc, abc, and abc? So we need distinct names for the functions. One repeats one value, the other repeats a range. Moreover, I'm thinking sometimes you want to repeat a range lazily, i.e. instead of producing abcabc just return a range that looks like it. Ideas for a good naming scheme are welcome. Overload cycle and call it a day? cycle(r, n) already has a meaning: cycle r for a maximum total of n elements. Now I'm confused. The docs say it's an initial index... Sorry, my bad. You're right. Still, cycle(r, n) has a meaning distinct from what we might need. I don't think the initial index really useful (even the authors confirm by not bothering to unittest it:-)) My idea is to dump it in favor of popFrontN (provide a method on Cycle, let the stand-alone popFrontN statically recognize that). Bounding an infinite range is much more frequent. Or, if you're really not keen on the idea, introduce cycleN. Essentially I'm looking for a name for the function array(take(cycle(range), n * range.length)). That's what std.string.repeat does currently. With the above cycleN(range, n * range.length).array() doesn't look that bad. What are the use-cases that you want a separate name? -- Tomek
Implicit delegate conversions
The profusion of D's attributes has made delegate signature mismatches all too likely thus one must resort to casts too often with e.g. callbacks. const(short)[] delegate(immutable(int)*) dg1; immutable(short)[] delegate(const(int)*) pure nothrow @safe dg2; dg1 = dg2; // fails (if *any* of storage classes or types don't match) This problem is nothing new. It has been popping up in discussions and bugzilla but was never addressed entirely. The sketch of the conversion rules: dg2 is implicitly convertible to dg1 if - dg2 could override dg1 if they were class methods, bar polymorphic return type covariance; OR - each of d2's arguments is implicitly convertible from and binary equivalent of dg1's respective argument and dg2's return type is implicitly convertible to and binary equivalent of dg1's return type. The overarching thought is that signature types of both delegates should be indistinguishable in compiled binaries to rule out polymorphism** as it involves vtable pointer shifting. In the type system, however, the assigned delegate may have looser but compatible argument types (note: overloading problems don't apply to delegates), a tighter return type, or covariant attributes. The if they were class methods contortion is my try to ease off the implementation -- some compiler code may be reused (I may be wrong). Please find holes. -- Tomek ** It works with C# delegates, though. Anyone knows how they do it?
Re: std.unittests for (final?) review [Update]
Jonathan M Davis napisał: On Monday, January 10, 2011 13:48:50 Tomek Sowiński wrote: Jonathan M Davis napisał: I followed Andrei's suggestion and merged most of the functions into a highly flexible assertPred. I also renamed the functions as suggested and attempted to fully document everything with fully functional examples instead of examples using types or functions which don't actually exist. Did you zip the right file? I still see things like nameFunc and assertPlease. ??? Those are supposed to be there. All examples are tested in the unit tests exactly as they are. I just thought instead of examples using types or functions which don't actually exist meant well-known Phobos functions would be used. On the whole the examples are too long. It's just daunting I can't see docs for *one* function without scrolling. Please give them a solid hair-cut -- max 10 lines with a median of 5. The descriptions are also watered down by over-explanatory writing. Perhaps. If I cut down on the examples though, the usage wouldn't be as clear. The idea was to be thorough. Andrei wanted better examples, so I gave better examples. Not sure if longer means better. However, it is a bit of a balancing act, and I may have put too many in. It's debatable. Nick's suggestion of a main description before each individual overload would help with that. I agree. Perhaps a synopsis for the whole module like in std.variant would help too. So, now there's just assertThrown, assertNotThrown, collectExceptionMsg, and assertPred (though there are eight different overloads of assertPred). So, review away. Some suggestions: assertPred: Try putting expected in front; uniform call syntax can then set it apart from the operands: assertPred!%(7, 5, 2); // old 2.assertPred!%(7, 5); // new I really don't see any value to this. 1. You can't do that with assert, and assertPred is essentially supposed to be a fancy assert. 2. A number of assertPred overloads don't even have an expected, so it would be inconsistent. 3. People already are annoyed enough that the operator doesn't end up between the arguments. Putting the result on the left-hand side of the operator like that would make it that much more confusing. OK, I understand. assertNotThrown: chain the original exception with AssertError as its cause? Oh, this one badly needs a real-life example. I suppose that chaining it would be a good idea. I didn't think of that. But if you want examples, it's used in the unit tests in this very module, and I used it heavily in std.datetime. I meant a real-life example in documentation. People may often ask themselves how is it different than !assertThrown()?. assertThrown: I'd rather see generified collectException (call it collectThrown?). assertThrown may stay as a convenience wrapper, though. ??? I don't get what you're trying for here. assertThrown isn't trying to collect exceptions at all. It's testing whether the given exception was thrown like it's supposed to be for the given function call. If it was, then the assertion succeeded. If it wasn't, then an AssertError is thrown. Just like assert. I mean now collectException doesn't have a parametrized catch block like assertThrown does. If it did, the latter could come down to: void assertThrown(T : Throwable = Exception, F) (lazy F funcToCall, string msg = null, string file = __FILE__, size_t line = __LINE__) { T e = collectThrown!T(funcToCall); if (e is null) throw new AssertError(...); } Shortening assertThrown's implementation is a bonus, main gain is better collectThrown(). [there's more down] Looking at the code I'm seeing the same cancerous coding style std.datetime suffered from (to a lesser extent, I admit). For instance, this routine: if(result != expected) { if(msg.empty) { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s].`, op, lhs, op, rhs, result, expected), file, line); } else { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s]: %s`, op, lhs, op, rhs, result, expected, msg), file, line
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
Andrei Alexandrescu napisał: I've been thinking on how to better deal with Unicode strings. Currently strings are formally bidirectional ranges with a surreptitious random access interface. The random access interface accesses the support of the string, which is understood to hold data in a variable-encoded format. For as long as the programmer understands this relationship, code for string manipulation can be written with relative ease. However, there is still room for writing wrong code that looks legit. Sometimes the best way to tackle a hairy reality is to invite it to the negotiation table and offer it promotion to first-class abstraction status. Along that vein I was thinking of defining a new range: VLERange, i.e. Variable Length Encoding Range. Such a range would have the power somewhere in between bidirectional and random access. The primitives offered would include empty, access to front and back, popFront and popBack (just like BidirectionalRange), and in addition properties typical of random access ranges: indexing, slicing, and length. For some compressions implementing *back is troublesome if not impossible... Note that the result of the indexing operator is not the same as the element type of the range, as it only represents the unit of encoding. It's worth to mention it explicitly -- a VLERange is dually typed. It's important for searching. Statically check if original and encoded match, if so, perform fast search on directly on encoded elements. I think an important feature of a VLERange should be dropping itself down to a encoded-typed range, so that front and back return raw data. Dual typing will also affect foreach -- in general case you'd want to choose whether to decode or not by typing the element. I can't stop thinking that VLERange is a two-piece bikini making a bare random-access range safe to look at, and that you can take off when partners have confidence, not a limited random-access probing facility to span the void between front and back. In addition to these (and connecting the two), a VLERange would offer two additional primitives: 1. size_t stepSize(size_t offset) gives the length of the step needed to skip to the next element. 2. size_t backstepSize(size_t offset) gives the size of the _backward_ step that goes to the previous element. In both cases, offset is assumed to be at the beginning of a logical element of the range. So when I move the spinner in an iPod, I get catapulted in position with the raw data opIndex and from there I try to work my way to the next frame to start playback. Sounds promising. I suspect that a lot of functions in std.string can be written without Unicode-specific knowledge just by relying on such an interface. Moreover, algorithms can be generalized to other structures that use variable-length encoding, such as those used in data compression. (In that case, the support would be a bit array and the encoded type would be ubyte.) I agree, acknowledging encoding/compression as a general direction will bring substantial benefits. Writing to such ranges is not addressed by this design. Ideas are welcome. Yeah, we can address outputting later, that's fair. Adding VLERange would legitimize strings and would clarify their handling, at the cost of adding one additional concept that needs to be minded. Is the trade-off worthwhile? Well, the only way to find out is try it. My advice: VLERanges originated as a solution to the string problem, so start with a non-string incarnation. Having at least two (one, we know, is string) plugs that fit the same socket will spur confidence in the abstraction. -- Tomek
Re: std.unittests for (final?) review [Update]
Jonathan M Davis napisał: I followed Andrei's suggestion and merged most of the functions into a highly flexible assertPred. I also renamed the functions as suggested and attempted to fully document everything with fully functional examples instead of examples using types or functions which don't actually exist. Did you zip the right file? I still see things like nameFunc and assertPlease. On the whole the examples are too long. It's just daunting I can't see docs for *one* function without scrolling. Please give them a solid hair-cut -- max 10 lines with a median of 5. The descriptions are also watered down by over-explanatory writing. So, now there's just assertThrown, assertNotThrown, collectExceptionMsg, and assertPred (though there are eight different overloads of assertPred). So, review away. Some suggestions: assertPred: Try putting expected in front; uniform call syntax can then set it apart from the operands: assertPred!%(7, 5, 2); // old 2.assertPred!%(7, 5); // new assertNotThrown: chain the original exception with AssertError as its cause? Oh, this one badly needs a real-life example. assertThrown: I'd rather see generified collectException (call it collectThrown?). assertThrown may stay as a convenience wrapper, though. Looking at the code I'm seeing the same cancerous coding style std.datetime suffered from (to a lesser extent, I admit). For instance, this routine: if(result != expected) { if(msg.empty) { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s].`, op, lhs, op, rhs, result, expected), file, line); } else { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s]: %s`, op, lhs, op, rhs, result, expected, msg), file, line); } } can be easily compressed to: enforce(result==expected, new AssertError( format([%s] %s [%s] failed: actual [%s], expected [%s] ~ (msg.empty ? . : : %s), op, lhs, op, rhs, result, expected, msg), file, line)); BTW, actual and expected should be in new lines directly under each other for eye-diffing (does wonders for long input): format([%s] %s [%s] failed:\n[%s] - actual\n[%s] - expected ~ ... Another example: { bool thrown = false; try assertNotThrown!AssertError(throwEx(new AssertError(It's an AssertError, __FILE__, __LINE__)), It's a message); catch(AssertError) thrown = true; assert(thrown); } can be: try { assertNotThrown!AssertError(throwEx(new AssertError(It's an AssertError, __FILE__, __LINE__)), It's a message); assert(false); } catch(AssertError) { /*OK*/ } and you don't have to introduce a new scope every time. Not to mention that such routines recur in your code with little discrepancies, so abstracting out private helpers may pay off. Fixing such readability bugs is essential for a standard library module. On the bright side, I do appreciate the thoroughness and extent of unittests in this module. Is coverage 100%? From the sounds of it, if this code gets voted in, it'll be going into std.exception. Please don't rush the adoption. This module, albeit useful, still needs work. -- Tomek