Re: D Ranges in C#
You shouldn't be using 32-bit indices on x64, that defeats the whole point of x64. As of .NET 4.5, 64bit array indexes are supported as well. http://msdn.microsoft.com/en-us/library/hh285054.aspx Don't forget that we're talking about a *hashtable* here. If a .NET hashtable used 64-bit indexes (or pointers) it would require 8-12 bytes more memory per entry, specifically 32 bytes total, including overhead, if the key and value are 4 bytes each. An in-memory hashtable that requires 64-bit indexes rather than 32 bits would have to contain over 4 billion entries which would take at least 128 GB of RAM, assuming 8 bytes for each key-value pair!!! In fact it's worse than that, as the dictionary grows by size-doubling and contains a certain amount of unused entries at the end. No thanks, I'd rather save those 8 bytes and accept the 4 billion limit, if you don't mind.
Re: D Ranges in C#
David Piepgrass: In fact, most STL algorithms require exactly two iterators--a range--and none require only a single iterator< I think there are some C++ data structures that store many single iterators. If you instead store ranges you double the data amount. Hashmaps would be the most common example. Usually implemented as a linked list of key-value pairs along with a vector of list iterators. In theory. But the .NET hashtables are implemented with an *array* of key-value pairs and an array of *indexes*. The former forms a virtual linked list that is more efficient than a traditional linked list, and the latter is more efficient than a vector of iterators (especially on x64, as the indexes can be 32-bit.) Iterators are also useful for constructing sub-ranges, which proves useful in the implementation of some algorithms. Writing std::next_permutation in D with ranges is quiet frustrating compared to C++. https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L10901 http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01045_source.html#l03619 Notice that in D you need to maintain a count of the elements popped off the back and then use takeExactly to retrieve that portion again. In C++ you just move the iterators around and create "ranges" from pairs of iterators as you need them. When I implemented nextPermutation in D, I constantly felt as if I was fighting with ranges instead of working with them - I knew exactly what I wanted to do, but ranges only provide a roundabout way of doing it. Hmm, very interesting. I suppose the problem with C++ iterators is that they are useless without external context: you can't increment or decrement one without also comparing it to begin() or end() in its container, which implies that the caller must manually keep track of which container it came from. Thus, an iterator is hardly an improvement over a plain-old integer index, the only advantages being 1. You can dereference it (*if* you can be sure it doesn't point to end()) 2. Unlike an index, it's compatible with non-random-access data structures But perhaps the iterator concept could be improved by being made self-aware: if the iterator "knew" and could tell you when it was at the beginning or end of its container. This would increase the storage requirement in some circumstances but not others. For example, an iterator to a doubly-linked list node can tell when it is at the beginning or end, but an iterator to a singly-linked list node can only tell when it is at the end. A pointer inside an array may or may not be able to tell if it is at the end depending on how arrays work, e.g. perhaps the way D heap arrays work would allow an array iterator, implemented as a simple pointer, to still know when it is safe to increment and decrement. The simplest possible .NET array iterator is an array reference plus an index, and again the iterator can know when it is at the beginning or end of the array--except that if the iterator came from inside a range, it would be unaware of the range's boundaries, which may be smaller than the array's boundaries.
D Ranges in C#
I'm adding D-style ranges to my new C# collections library. In case anyone would like to comment, please see here: http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html
Re: Java binaries
So how are C++ and C# pointers done in IL ? There are two kind of pointers in C#: managed and unmanaged. Wrapped in a fixed statement (just to tell the garbage collector to keep fixed references), C# pointers will behave like any native language pointer. This is not the first topic where I read that misconception that slices are a problem for IL. From .net 2.0 (9 years ago) there is the ArraySegment type doing exactly what D slices do. Also, in C# arrays are implicitely convertible to pointers. IIRC, the biggest incompatibility between D and .NET is that D pointers can point to the stack, to unmanaged (non-GC) memory or to managed (GC) memory, while simultaneously having unlimited lifetime. In .NET, arguments that are passed by reference can point to GC or non-GC memory, but pointers inside objects (classes or boxed structs) can only point to (1) non-GC non-stack memory OR (2) the beginning of a GC object. The key problem: a single pointer cannot be used for both purposes! D pointers and ranges can point to stack, GC or non-GC memory, regardless of the location of the pointer or range itself. Also, D pointers can point to the interior of an object and not just the beginning, while .NET pointers, in general, cannot. This doesn't make a D implementation for .NET impossible, but if you want to run arbitrary D code on .NET, I think it would have to run inefficiently because it would constantly have to work around limitations of .NET. I think doing D in .NET efficiently would require a specialized version of D, or something. Note that plain old C++ can be used in .NET because C++ pointers can't point to GC memory without special .NET-specific types. Thus, old-fashioned C++ avoids problems related to the .NET garbage collector. Anyway, I don't see any use for a D IL compiler, since probably the language syntax will look 90% like C#. How is "looking" like C# relevant? D looks 90% like C++ too, and D is still better. Certainly D is more powerful than C# on the whole.
Re: small idea
On Wednesday, 9 January 2013 at 15:10:47 UTC, bearophile wrote: eles: void make_a_equal_to_b(??a,!!b); //I give you "b", give me "a" A saner syntax is just to use the same in/out/inout keywords at the call point, This is essentially what C# does (with one exception for COM): make_a_equal_to_b(ref a, in b) This feature was discussed several times in past for D. The advantage is more readability for the code and less surprises. The disadvantages are more typing, some code breakage of D2 code (because even if at the beginning it's a warning, and later a deprecation, you will eventually need to enforce it with an error). The same thing happens every time this is discussed: some people insist "ref" and "out" should be REQUIRED or else it should not be ALLOWED. Others don't want to break backward compatibility so they insist it can't be required. There is no common ground so it never gets to the "allowed" stage. In C++ I actually use call site ref for & params even with no checking at all: #define IN #define OUT in fact these are defined in a Microsoft header. I find them useful for documentation. Again, my proposal is that the compiler should allow ref/out and not warn when it is missing; if users want a warning/error for missing ref/out, they can ask for it per-module with a pragma (or something). One more small D-specific problem is what to do if the first argument is a ref and you want to use UCFS. In the past I suggested allowing implicit ref for structs (but probably not for classes) with UFCS, because the "this" parameter of a member function of a struct is passed by ref already.
Re: the Disruptor framework vs The Complexities of Concurrency
Maybe, but I'm still not clear what are the differences between a normal ring buffer (not a new concept) and this "disruptor" pattern.. Key differences with a typical lock-free queue: - Lightning fast when used correctly. It observes that not only is locking expensive, even CAS (compare and swap) is not cheap, so it avoids CAS in favor of memory barriers (unless multiple writers are required.) Memory allocation is avoided too, by preallocating everything. - Multicast and multisource: multiple readers can view the same entries. - Separation of concerns: disruptors are a whole library instead of a single class, so disruptors support several configurations of producers and consumers, as opposed to a normal queue that is limited to one or two arrangements. To me, one particularly interesting feature is that a reader can modify an entry and then another reader can flag itself as "dependent" on the output of the first reader. So really it supports not just readers and writers but "annotators" that both read an write. And the set of readers and writers can be arranged as a graph. See also http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work
Re: OT (partially): about promotion of integers
On Wednesday, 12 December 2012 at 06:19:14 UTC, Walter Bright wrote: You're not going to get performance with overflow checking even with the best compiler support. For example, much arithmetic code is generated for the x86 using addressing mode instructions, like: LEA EAX,16[8*EBX][ECX] for 16+8*b+c The LEA instruction does no overflow checking. If you wanted it, the best code would be: MOV EAX,16 IMUL EBX,8 JO overflow ADD EAX,EBX JO overflow ADD EAX,ECX JO overflow Which is considerably less efficient. (The LEA is designed to run in one cycle). Plus, often more registers are modified which impedes good register allocation. Thanks for the tip. Of course, I don't need and wouldn't use overflow checking all the time--in fact, since I've written a big system in a language that can't do overflow checking, you might say I "never need" overflow checking, in the same way that C programmers "never need" constructors, destructors, generics or exceptions as demonstrated by the fact that they can and do build large systems without them. Still, the cost of overflow checking is a lot bigger, and requires a lot more cleverness, without compiler support. Hence I work harder to avoid the need for it. If you desire overflows to be programming errors, then you want an abort, not a thrown exception. I am perplexed by your desire to continue execution when overflows happen regularly. I explicitly say I want to handle overflows quickly, and you conclude that I want an unrecoverable abort? WTF! No, I think overflows should be handled efficiently, and should be nonfatal. Maybe it would be better to think in terms of the carry flag: it seems to me that a developer needs access to the carry flag in order to do 128+bit arithmetic efficiently. I have written code to "make do" without the carry flag, it's just more efficient if it can be used. So imagine an intrinsic that gets the value of the carry flag*--obviously it wouldn't throw an exception. I just think overflow should be handled the same way. If the developer wants to react to overflow with an exception/abort, fine, but it should not be mandatory as it is in .NET. * Yes, I know you'd usually just ADC instead of retrieving the actual value of the flag, but sometimes you do want to just get the flag. Usually when there is an overflow I just want to discard one data point and move on, or set the result to the maximum/minimum integer, possibly make a note in a log, but only occasionally do I want the debugger to break.
Re: OT (partially): about promotion of integers
The problem, as I see it, is nobody actually cares about this. Why would I say something so provocative? Because I've seen D programmers go to herculean lengths to get around problems they are having in the language. These efforts make a strong case that they need better language support (UDAs are a primo example of this). I see nobody bothering to write a CheckedInt type and seeing how far they can push it, even though writing such a type is not a significant challenge; it's a bread-and-butter job. I disagree with the analysis. I do want overflow detection, yet I would not use a CheckedInt in D for the same reason I do not usually use one in C++: without compiler support, it is too expensive to detect overflow. In my C++ I have a lot of math to do, and I'm using C++ because it's faster than C# which I would otherwise prefer. Constantly checking for overflow without hardware support would kill most of the performance advantage, so I don't do it. I do use "clipped conversion" though: e.g. ClippedConvert(4)==32767. I can afford the overhead in this case because I don't do type conversions as often as addition, bit shifts, etc. The C# solution is not good enough either. C# throws exceptions on overflow, which is convenient but is bad for performance if it happens regularly; it can also make a debugger almost unusable. Some sort of mechanism that works like an exception, but faster, would probably be better. Consider: result = a * b + c * d; If a * b overflows, there is probably no point to executing c * d so it may as well jump straight to a handler; on the other hand, the exception mechanism is costly, especially if the debugger is hooked in and causes a context switch every single time it happens. So... I dunno. What's the best semantic for an overflow detector?
Re: References in D
void main() { void* x = a(b()); c(); while(goobledegook) { x = p(); d(x); } e(x); /+ Crash! x is null. +/ } Where did x's null value come from? Not a. Not p; the while loop happened to be never executed. To say "b" would be closer, but still imprecise. Actually it was created in the q() function that was called by u() that was called by b() which then created a class that held the null value and was passed to a() that then dereferenced the class and returned the value stored in the class that happened to be null. nulls create very non-local bugs, and that's why they frustrate me to no end sometimes. Since this thread's attracted lots of commotion I thought I'd just drop by and +1 for non-nullable types, and +1 for your arguments. I keep wondering, though, if it is 'enough' to solve the null problem, or if it would be possible to devise a more general mechanism for solving other problems too, like say, the fact that certain integers have to always be positive, or if you want to go more general, that a certain relationship must hold between two structures... Not having used D's invariants so far (well, I haven't used D itself for a real project actually)... what's stopping D's invariant mechanism from handling all this? http://dlang.org/class.html#invariants (as is typical of D documentation, this says nothing about invariants on structs, but the page about structs says that they support invariants with an X.) I mean, problems are detected at runtime this way, and slightly too late, but still, it would be better than most popular languages that can't do anything about nulls at all. Since D's devs don't even seem to have enough time to implement D as described in TDPL (published more than two years ago), I wouldn't expect to see this feature in the D language in the near future.
Re: Feature request: extending comma operator's functionality
Because it's the only way to guarantee that x exits when you reach the end of the loop. do { if(true) continue; //Yawn... skip. const x = ... ; } while (predicate(x)); //What's x? But the compiler could tell that there is a 'continue' before x was declared, and issue an error when it is used in while(...)
Re: Implicit instantiation of parameterless templates
On Friday, 5 October 2012 at 12:24:12 UTC, Paulo Pinto wrote: On Friday, 5 October 2012 at 12:01:30 UTC, Piotr Szturmaj wrote: Java and C# with their generics can do the following: class List { } class List { } List list = new List(); List intList = new List(); In D similar code can't work because we can't have both a type and a template with the same name. So this code must be rewritten to: ... Why to you need this? Java and C# only allow this type of code due to backwards compatibility, because their first version did not allow for generics, and their creators did not want to force everyone to recode their code bases. The Java and C# situations are very different. In Java, generics are "erased" at runtime so a List is the same thing as a List. In C#/.NET, however, List and List are 'unrelated' types, which is what Piotr was talking about. .NET allows "overloading" types based on the number of generic parameters, so for example Tuple is a different type than Tuple (the runtime names are Tuple`1 and Tuple`2). Since C# has no "default arguments" for generics or "tuple template parameters", it is trivial to allow different types that have the same name but a different number of generic parameters. In D, however, the situation is a bit more complicated.
Re: Getting started with D - Phobos documentation sucks
I think documentation is really important, and something has to be done about it. How can a newcomer get started with D when he doesn't have a readable documentation of Phobos? A couple of random things I'd like to see: 1. Improve index.html. It's the first thing new users are likely to see about Phobos and it appears to contain an overview of the modules, but in fact it only lists half the modules of Phobos and the description of most modules is too short to be useful. There should also be a getting-started guide that lists the most common data types and functions and which module contains them (to!T, Tuple, writeln, ) and it should also discuss the 'built-in' types for completeness, like slices, hashes and strings, since in other languages these are standard library components.) 2. To make the documentation easier to Google, put the keyword "D2" on every page of the Phobos documentation, e.g. the heading could change from "std.file" to "std.file (D2)". Nowadays when I search for something about "D Language", I often find a page about D1 instead of D2. The "articles" should be reviewed too. For example the page on tuples http://dlang.org/tuple.html makes it sound like you're supposed to define your own Tuple type instead of using the one in std.typecons; in fact it suggests template Tuple(E...) { alias E Tuple; } which is really a TypeTuple isn't it?
Re: I have a feature request: "Named enum scope inference"
I have a feature request: "Named enum scope inference" The idea is, that whenever a named enum value is expected, you don't need to explicitly specify the scope of the enum value. This would reduce redundancy in typing, just like automatic type inference does. Examples: - enum MyDirection { forward, reverse } struct MyIterator(MyDirection dir) { ... } int forward = 42; // Doesn't interfere with the next line... auto itr = MyIterator!forward(); // Infers MyDirection.forward I like the spirit of this feature, but as Alex pointed out, ambiguity is possible (which could theoretically cause errors in existing code) and while I'm not familiar with how the compiler is implemented, my spidey-sense thinks that what you're asking for could be tricky to implement (in a language that already has a very large amount of rules and features.) Plus, I don't like the fact that when you see something like "MyIterator!forward" by itself in code, there is no obvious clue that forward is an enum value and not a class name or a variable. So there is a sort of decrease in clarity of the entire language by increasing the total number of possible meanings that an identifier can have. So I think this feature would need a more clear syntax, something to indicate that the value is an enum value. I don't currently have a really good counterproposal though
Re: DIP19: Remove comma operator from D and provision better syntactic support for tuples
The built-in tuple is also quite useful when defining templates. In essence, we have two kinds of tuples: the built-in language tuple is the "unpacked" tuple while Phobos hosts the "packed" one. They each have their own use case and they can coexist peacefully. But the language itself needs to standardize on one or the other. +1, and it should standardize on "packed" (non-expanded) tuples because "unpacked" ones have very unusual behavior, and because it's impractical to eliminate "packed" tuples but practical to eliminate "unpacked" ones. "unpacked" tuples should only exist as an intermediate result (the result of .expand). If the language made T… a packed tuple instead, then we could use the packed tuple everywhere and unpack it where necessary, and something like this could be used to make a packed tuple: T getThings(T...)(T.expand t) { return T(t); } T t1; T t2 = getThings!(T)(t1.expand); "T.expand" naturally has the connotation "unpacked" to me, whereas what you really want to do is indicate that "t" is packed, right? Clearly, the syntax for a varargs template like this would have to change to indicate that T is non-expanded; unfortunately, I don't have a really compelling syntax to suggest. P.S. If non-expanded tuples were the default, they should probably have a quicker syntax than "t.expand" to expand them. I suggest overloading unary * as in "*t"; this is known as the "explode" operator in boo.
Re: Neat: UFCS for integer dot operator suffix
I used this in a small unit library (partially accessible on github), to obtain code like: auto distance = 100.km; auto speed = 130.km/h; // division works, too. auto timeToDestination = (distance/speed).hour; // distance/speed gives seconds => transformed in hours. It was a nice exercise in using UFCS and mixins to create your own unit library (not only IS, but ay kind of unit library). And, you know what? I *never* used it after coding it. These examples are cute, they make for nice blog posts for F#, but the real-world usage is dubious to me (I know they were space-programs crashes) I quite like the implicit message in units: use the type system to help you catch errors are compile-time. Add to that a nice syntax and a showcase for D's generational capabilities and it's quite nice. But, to my eyes, it's but a toy. I wouldn't read too much into it. You're a library author, not (I assume) a scientific computing guy. So beyond playing with a few examples, your work on this library is done - you wouldn't be a client of it for the simple reason you don't intensively work with kilometers, speeds, dollars, and such. It's possible that a good and usable library of units could add value to a category of users. IMO, you don't need to be a scientific computing guy to find unit checking useful, since almost any number conceptually has a unit on it. I would ask any programmer, how often do you accidentally use a measurement of 'bytes' where 'dwords' were expected, or use a variable as an array index when it was actually something totally different? However, unit checking cannot be done satisfactorially in a library; it has two main problems when provided that way: 1. It's too bulky (too much syntax required, as units have to be spelled out constantly) 2. Values with traditionally-typed units don't interoperate with existing libraries, including very simple functions such as int abs(int x) { return x > 0 ? x : -x; } int square(int x) { return x*x; } You can define an inplicit conversion from e.g. 'Unit!"pixels"' to 'int' but then you'll need to manually cast it back, and the compiler can't check your cast to make sure it's correct. IMO, solving these two problems requires a parallel type system to infer unit relationships automatically, either with direct language support, or a separate analysis tool that uses the compiler as a service (currently not possible with D).
Re: DIP19: Remove comma operator from D and provision better syntactic support for tuples
The analysis in there fails to construct a case even half strong that deprecating the comma operator could significantly help tuples. That is because it does not base the discussion on the right limitations of built-in tuples: auto (a,b) = (1,"3"); (auto a, string b) = (1, "3"); Agreed, this is the key thing missing from D. There is also no consideration in the DIP of what I consider one of D's most confusing "features": "pre-expanded tuples" or in other words, type tuples. These beasts can be very confusing when first encountered, and they do not behave like any data type in any other language I know of: import std.typecons; // Contains Tuple!(...), which reminds me, // how do I know which module contains a given feature? // http://dlang.org/phobos/index.html doesn't mention it. void call() { humm(1, 2); } void humm(T...)(T x) // x, a pre-expanded tuple { //auto c = [x.expand]; // ERROR, expand undefined // (it's already expanded!) auto a = x;// a is also pre-expanded auto b = [ a, a ]; // int[], not Tuple!(int,int)[] //int d = derr(x); // ERROR, have to un-expand it writeln(a);// "12" writeln(b);// "[1, 2, 1, 2]" } int derr(Tuple!(int,int) a) { return a[0] + a[1]; } I know you guys are all used to this behavior but I'm telling you, pre-expanding is very weird. It would be nice if type tuples could somehow be unified with library tuples and behave like the latter.
Re: Review of Andrei's std.benchmark
- It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right? Øivind says tests are run 1000 times... but it needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.) Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".) I don't think it's a good idea because the "for 1000 ms" doesn't say anything except how good the clock resolution was on the system. I'm as strongly convinced we shouldn't print useless information as I am we should print useful information. I am puzzled about what you think my suggestion meant. I am suggesting allowing the user to configure how long benchmarking takes. Some users might want to run their benchmark for an hour to get stable and reliable numbers; others don't want to wait and want to see results ASAP. Perhaps the *same* user will want to run benchmarks quickly while developing them and then do a "final run" with more trials once their benchmark suite is complete. Also, some individual benchmark functions will take microseconds to complete; others may take seconds to complete. All I'm suggesting are simple ways to avoid wasting users' time, without making std.benchmark overly complicated.
Re: Review of Andrei's std.benchmark
Some random comments about std.benchmark based on its documentation: - It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done Because all of those are irrelevant and confusing. Huh? It's not nearly as confusing as reading the documentation and not having the faintest idea what it will do. The way the benchmarker works is somehow 'irrelevant'? The documentation doesn't even indicate that the functions are to be run more than once!! I don't think that's a good idea. I have never seen you make such vague arguments, Andrei.
Re: Review of Andrei's std.benchmark
After extensive tests with a variety of aggregate functions, I can say firmly that taking the minimum time is by far the best when it comes to assessing the speed of a function. Like others, I must also disagree in princple. The minimum sounds like a useful metric for functions that (1) do the same amount of work in every test and (2) are microbenchmarks, i.e. they measure a small and simple task. If the benchmark being measured either (1) varies the amount of work each time (e.g. according to some approximation of real-world input, which obviously may vary)* or (2) measures a large system, then the average and standard deviation and even a histogram may be useful (or perhaps some indicator whether the runtimes are consistent with a normal distribution or not). If the running-time is long then the max might be useful (because things like task-switching overhead probably do not contribute that much to the total). * I anticipate that you might respond "so, only test a single input per benchmark", but if I've got 1000 inputs that I want to try, I really don't want to write 1000 functions nor do I want 1000 lines of output from the benchmark. An average, standard deviation, min and max may be all I need, and if I need more detail, then I might break it up into 10 groups of 100 inputs. In any case, the minimum runtime is not the desired output when the input varies. It's a little surprising to hear "The purpose of std.benchmark is not to estimate real-world time. (That is the purpose of profiling)"... Firstly, of COURSE I would want to estimate real-world time with some of my benchmarks. For some benchmarks I just want to know which of two or three approaches is faster, or to get a coarse ball-park sense of performance, but for others I really want to know the wall-clock time used for realistic inputs. Secondly, what D profiler actually helps you answer the question "where does the time go in the real-world?"? The D -profile switch creates an instrumented executable, which in my experience (admittedly not experience with DMD) severely distorts running times. I usually prefer sampling-based profiling, where the executable is left unchanged and a sampling program interrupts the program at random and grabs the call stack, to avoid the distortion effect of instrumentation. Of course, instrumentation is useful to find out what functions are called the most and whether call frequencies are in line with expectations, but I wouldn't trust the time measurements that much. As far as I know, D doesn't offer a sampling profiler, so one might indeed use a benchmarking library as a (poor) substitute. So I'd want to be able to set up some benchmarks that operate on realistic data, with perhaps different data in different runs in order to learn about how the speed varies with different inputs (if it varies a lot then I might create more benchmarks to investigate which inputs are processed quickly, and which slowly.) Some random comments about std.benchmark based on its documentation: - It is very strange that the documentation of printBenchmarks uses neither of the words "average" or "minimum", and doesn't say how many trials are done I suppose the obvious interpretation is that it only does one trial, but then we wouldn't be having this discussion about averages and minimums right? Øivind says tests are run 1000 times... but it needs to be configurable per-test (my idea: support a _x1000 suffix in function names, or _for1000ms to run the test for at least 1000 milliseconds; and allow a multiplier when when running a group of benchmarks, e.g. a multiplier argument of 0.5 means to only run half as many trials as usual.) Also, it is not clear from the documentation what the single parameter to each benchmark is (define "iterations count".) - The "benchmark_relative_" feature looks quite useful. I'm also happy to see benchmarkSuspend() and benchmarkResume(), though benchmarkSuspend() seems redundant in most cases: I'd like to just call one function, say, benchmarkStart() to indicate "setup complete, please start measuring time now." - I'm glad that StopWatch can auto-start; but the documentation should be clearer: does reset() stop the timer or just reset the time to zero? does stop() followed by start() start from zero or does it keep the time on the clock? I also think there should be a method that returns the value of peek() and restarts the timer at the same time (perhaps stop() and reset() should just return peek()?) - After reading the documentation of comparingBenchmark and measureTime, I have almost no idea what they do.
Re: Extending unittests [proposal] [Proof Of Concept]
However, what's truly insane IMHO is continuing to run a unittest block after it's already had a failure in it. Unless you have exceedingly simplistic unit tests, the failures after the first one mean pretty much _nothing_ and simply clutter the results. I disagree. Not only are my unit tests independent (so of course the test runner should keep running tests after one fails) but often I do want to keep running after a failure. I like the BOOST unit test library's approach, which has two types of "assert": BOOST_CHECK and BOOST_REQUIRE. After a BOOST_CHECK fails, the test keeps running, but BOOST_REQUIRE throws an exception to stop the test. When testing a series of inputs in a loop, it is useful (for debugging) to see the complete set of which ones succeed and which ones fail. For this feature (continuation) to be really useful though, it needs to be able to output context information on failure (e.g. "during iteration 13 of input group B").
Re: SpanMode uses incorrect terminology (breadth)
Breadth-first (probably never required): a/b a/c a/1.txt a/2.txt a/b/1.txt a/b/2.txt a/c/z a/c/1.txt a/c/z/1.txt Defining property: number of /'s increases monotonically. Note how the deeper you go, the more spread out the children become. It's ALL children, then ALL grandchildren, then ALL great-grandchildren, etc. I wouldn't bother implementing breadth-first. It's doubtful that anyone would want it, surely...? Actually I prefer breadth-first search when searching the file system. When I search an entire volume, inevitably the (depth-first) search gets stuck in a few giant, deep directories like the source code of Mono or some other cave of source code, you know, something 12 directories deep with 100,000 files in it. A breadth-first search would be more likely to find the thing I'm looking for BEFORE going spelunking in these 12-deep caves.
Re: Would like to see ref and out required for function calls
I really think that optionally allowing ref and out at the call site is more damaging than beneficial. _Requiring_ it could be beneficial, since then you know that the arguments are being taken by ref, but if it's optional, it gives you a false sense of security and can be misleading. It gives *who* a false sense of security? If it's optional then I *know* lack of ref/out doesn't imply that the parameter won't change. Only people who don't know the rules would have this false sense of security. I think it would be nice to have it required, but it's very bad to break everyone's code. It could only be reasonably enforced with a compiler switch--or, wait, come to think of it, a pragma would probably be better way to introduce language changes like this: module foo; pragma(callSiteRef); // and would it make sense to offer an alternative to -property too? pragma(property); Now you can tell whether a program uses ref/out religiously or not.
Re: Would like to see ref and out required for function calls
On Thursday, 13 September 2012 at 15:01:28 UTC, Andrei Alexandrescu wrote: On 9/13/12 10:53 AM, David Piepgrass wrote: Walter and I have discussed this for quite a while. We have recently decided to disallow, at least in SafeD, escaping the address of a ref parameter. In the beginning we'll be overly conservative by disallowing taking the address of a ref altogether. I'll write a DIP on that soon. Err, wouldn't that break a lot of stuff, a lot of which is actually safe code? void a(ref int x) { b(&x); } void b(int* x) { if(x != null) (*x)++; } Yes. Disallowing taking the address of a local is conservative and would disallow a number of valid programs. Arguably, such programs are in poor style anyway. A good program takes pointers only if it needs to keep them around; if all that's needed is to use the parameter transitorily or pass it down, ref is best. Another common reason to use a pointer (instead of ref) is if it's optional (nullable). If the parameter is ref then the caller must go to the trouble of creating a variable. However, this could be solved with a feature like the following: int* find(string searchString, out int index) { ... } // _ means "don't care", assuming no variable "_" is defined void caller() { find("foo", out _); } In fact this is arguably better for 'out' variables since the callee (find) no longer has to check whether 'index' is null before assigning it. However this doesn't totally solve the problem for 'ref' parameters, since such parameters are both output and input parameters and the programmer may want 'null' to have some special meaning as an input. Escaping the addresses of stack variables, not just ref parameters, is a general problem in "safe" D. Do you have any ideas about that? Btw just a simple illustrative example: int* unsafe1() { int x = 1; return unsafe2(&x); } int* unsafe2(int* x) { return x; } int unsafe3() { int y = 7; *unsafe1() = 8; return y; } enum gaff = unsafe3(); // ICE, no line number given Same thing. By and large safe programs will need to make more use of the garbage collector than others. It's the way things work; stack allocation can be made safer if we add typed regions, but that's a very significant escalation of complication. There is no simple solution to this today. Same thing meaning that you'd propose disallowing taking the address of a stack variable in SafeD? (I guess this would include escaping 'this' within a struct.)
Re: Would like to see ref and out required for function calls
I don't think there would be problems with allowing ref/out optionally at the call site. The thing is, however, that in this matter reasonable people may disagree. I'd be unable to identify any pattern in engineers choosing one preference over the other. Maybe C++ fans prefer pointers or implicit ref, C# fans prefer call-site ref? Now that the subject has been broken, we do have good evidence of a pattern that generates significant and difficult bugs: escaping the address of a reference. In C++: struct A { A(int& host) : host_(host) {} private: int& host_; }; In D: class A { // or struct A(ref int host) : _host(&host) {} private: int* _host; } A solution we use for C++ is to require escaped addresses to be always passed as pointers or smart pointers. Walter and I have discussed this for quite a while. We have recently decided to disallow, at least in SafeD, escaping the address of a ref parameter. In the beginning we'll be overly conservative by disallowing taking the address of a ref altogether. I'll write a DIP on that soon. Err, wouldn't that break a lot of stuff, a lot of which is actually safe code? void a(ref int x) { b(&x); } void b(int* x) { if(x != null) (*x)++; } Escaping the addresses of stack variables, not just ref parameters, is a general problem in "safe" D. Do you have any ideas about that?
Re: Would like to see ref and out required for function calls
void func (ref int[], int) If ref/out were required at the call site, this destroys UFCS. int[] array; array.func(0); // error, ref not specified by caller For UFCS, ref should be implied. +1 Why? UFCS means uniform function call syntax. It is already understood that the thing left of '.' may be passed by reference: struct Foo { int x = 0; void f() { x++; } } void obvious() { Foo foo; foo.f(); // x is passed to f() by reference } Perhaps your argument makes sense for classes, but not for structs. In any case the syntax (ref foo).f() would require extra work for Walter so I would not propose it. What I might propose instead is that, if the user requests (via command-line argument such as '-callSiteRef') that a warning be issued for arguments passed without 'ref' at the call site, then a situation like this should prompt a warning. class Bar { int b; } void changeBar(ref Bar b) { b = new Bar(); } void warning() { Bar bar = new Bar(); bar.b = 10; bar.changeBar(); // Warning: 'bar' is implicitly passed by reference. To eliminate this warning, use 'changeBar(ref bar)' instead or do not compile with '-callSiteRef' } Again, this problem only applies to classes, since it is understood that structs are normally passed by reference. Also for 'const ref' parameters, callsite ref should not be necessary. The callee might escape a pointer to the argument. Which is 'non-obvious' as well when there is no callsite ref. If you're referring to the fact that it's easy to have a D pointer to a stack variable outlive the variable... I don't think that this 'flaw' (I think of it as a flaw, others may think of it as a feature) is a good enough reason to say 'call site ref should be required for const ref parameters'. for value types, it is arguably important. This is not necessarily a valid conclusion. Popularity does not imply importance. I think 'ref' is a popular idea because people have used it in C# and liked it. I didn't start putting 'IN OUT' and 'OUT' in my C++ code until C# taught me the value of documenting it at the call site. Generally speaking, if a parameter being ref/out is surprising, there is something wrong with the design. (There are times it is non-obvious in otherwise good code, this seems uncommon.) I often want to 'scan' code to see what it does. Especially for debugging, I want to see where the side-effects are QUICKLY. Guessing which parameters are 'ref' requires me to analyze the code in my head. Even if I myself wrote the code, it can be time consuming. That's why I would prefer to explicitly mark possible side effects with 'ref' (of course, when passing a class to a function, the class members may be modified when the reference to the class was passed by value. But it is far easier to keep track of which classes are mutable than to keep track of which parameters of which functions are 'ref', because functions far outnumber classes.) IMHO it is better left to the future D editor. That's probably a long way off.
Re: Would like to see ref and out required for function calls
Actually the darndest thing is that C# has retired the syntax in 5.0 (it used to be required up until 4.0). Apparently users complained it was too unsightly. Andrei Wh-huh?? Reference please. I have sought out info about C# 5 multiple times and I never heard that. Anyway I don't mind if ref is not required, but it ticks me off that it is not *allowed*. Even in C++ I can use "OUT" and "IN OUT" at both the definition and call sites (I may as well, since Windows header files #define them already). The compiler doesn't verify it but I find it useful to make the code self-documenting. Some have said that "well if the the compiler doesn't enforce it then it's pointless, you won't be able to tell if a call site without 'ref' is passed by ref". But no, it's not pointless, because (1) if you see a call site WITH 'ref' then clearly it is passed by reference, (2) I would use 'ref' consistently in my own code so that when I look back at my code a year later, the absence of 'ref' is a clear indication that it is an input parameter, and (3) if the compiler offered the option to issue a warning when 'ref' is absent, statement (2) would be true 100% of the time, in my code, instead of just 98%. Most of the code I look at is my own so that's my primary motive for wanting 'ref'. Yes, if 'ref' were allowed, some people would not use it; so when looking at a new code base I'd have no guarantee that a parameter NOT marked ref is passed by value. But at least (1) still applies.
Re: handful and interval
if (a.among("struct", "class", "union")) { ... } if (b.between(1, 100)) { ... } Is between inclusive or not of the endpoints? After quite a bit of thought, I think inclusive is the right way. Then there's no way to specify an empty interval. I suppose with "between" that would not be relevant. Perhaps b.between(1, 0) would always return false. However I'd use different names: among=>isOneOf, between=>isInRange. I would also define another function inRange that ensures, rather than tests, that a value is in range: string userInput = "-7"; int cleanInput = inRange(parse!int(userInput), 1, 100);
Re: handful and interval
However I'd use different names: among=>isOneOf, between=>isInRange. I forgot to state the reason, namely, I think boolean functions should be named so that you can tell they return bool, as "between" could easily be a function that places a value into a range rather than tests whether it is in range.
Re: Consistency, Templates, Constructors, and D3
And a postblits would end up being...? The extra 'this' makes it look like an obvious typo or a minor headache. this this(this){} //postblitz? This is not an appropriate syntax, not just because it looks silly, but because a postblit constructor is not really a constructor, it's is a postprocessing function that is called after an already-constructed value is copied. So I don't think there's any fundamental need for postblit constructors to look like normal constructors. I'm sure this case has an easy solution. How about: struct Foo { this new() { ... } // constructor this() { ... } // postblit } But now you're breaking consistency by not including a return type. maybe 'this this()' but that looks like a mistake or typo. I don't see how "this this()" is any worse than "this(this)"; IMO neither name really expresses the idea "function that is called on a struct after its value is copied". But a postblit constructor doesn't work like normal constructors, so keeping the "this(this)" syntax makes sense to me even though it is not consistent with normal constructors. "this()" has the virtual of simplicity, but it's even less googlable than "this(this)". And for overload distinction (new vs load), which is an issue beyond Memory Pools and effects and even larger codebase. There needs to be a consistent way to distinguish (by name) a constructor that loads from a file, and one that creates the object "manually". Isn't that more an API issue? Sorry, I don't follow. If we take your approach and suggestion, which one should the compile assume? Something globalSomething; class Something { this defaultConstructor(); this duplicate(); //or clone this copyGlobalSomething(); this constructorWithDefault(int x = 100); } By signature alone... Which one? They are all legal, they are uniquely named, and they are all equal candidates. Order of functions are irrelevant. It could work identically to how D functions today. A 'new()' constructor would be part of the root Object classes are derived of, and structs would have an implicit 'new()' constructor. But new wouldn't be a constructor then would it? It would still be based on allocating memory that's optionally different. Constructor and allocation are two different steps; And for it to seamlessly go from one to another defaults to having a set default constructor. Let's assume... class Object { this new() { //allocate return defaultConstructor(); } this defaultConstructor() {} } Now in order to make a constructor (and then destructor) you either can: A) overload or use 'defaultConstructor', which would be publicly known B) overload new to do allocation the same way and call a different constructor and specifically add a destructor to make sure it follows the same lines. C) overload new to call the default allocator and then call a different constructor Now assuming you can make a different constructor by name, you then have to be able to specify a destuctor the same way for consistancy. class CustomType { this MyAwesomeConstuctor(); void MyAwesomeDestructor(); } Same problem, how do you tell it ahead of time without completely rewriting the rules? leaving it as 'this' and '~this' are simple to remember and work with, and factory functions should be used to do a bulk of work when you don't want the basic/bare minimum. Sorry, I don't understand what you're getting it. I suspect that you're interpreting his proposal in a completely different way than I am, and then trying to expose the flaws in your interpretation of the proposal, and then I can't follow it because my interpretation doesn't have those flaws :)
Re: Consistency, Templates, Constructors, and D3
On Monday, 27 August 2012 at 20:22:47 UTC, Era Scarecrow wrote: On Monday, 27 August 2012 at 14:53:57 UTC, F i L wrote: in C#, you use 'new Type()' for both classes and structs, and it works fine. In fact, it has some benefit with generic programming. Plus, it's impossible to completely get away from having to understand the type, even in C++/D today, because we can always make factory functions: I'm sure in C# that all structs and classes are heap allocated (It takes after C++ very likely) that's the simplest way to do it. You can do that in C++ as well, but other than having to declare it a pointer first. In C++ they made structs 'classes that are public by default' by it's definition I believe. Considering how C++ is set up that makes perfect sense. You're mistaken as FiL pointed out. "new" is simply not a heap allocation operator in C#, it is a creation operator. Structs in C# are allocated on the stack or embedded in another object (on the stack or on the heap). "new X()" creates a new value of type X, which could be a struct on the stack or a class on the heap. I like the way C# works in this regard because the way X is allocated is an implementation detail that is hidden from clients. If the type X is immutable, then I can freely change it from struct to class or vice versa without affecting clients that use X. (Mind you if X is mutable, the difference is visible to clients since x1 = x2 copies X itself, not a reference to X.) Plus as mentioned, generic code can use "new T()" without caring what kind of type T is.
Re: Consistency, Templates, Constructors, and D3
Interestingly, the discussion so far has been all about syntax, not any significant new features. I'm thinking ... coersion of a class to any compatible interface (as in Go)? We already have: import std.range; auto range = ...; auto obj = inputRangeObject(range); alias ElementType!(typeof(range)) E; InputRange!E iface = obj; writeln(iface.front); So maybe we can do: auto implementObject(Interface, T)(T t){...} auto obj = implementObject!(InputRange!E)(range); Well, my D-fu is too weak to tell whether it's doable. When it comes to ranges, the standard library already knows what it's looking for, so I expect the wrapping to be straightforward. Even if a template-based solution could work at compile-time, run-time (when you want to cast some unknown object to a known interface) may be a different story. I am sometimes amazed by the things the Boost people come up with, that support C++03, things that I was "sure" C++03 couldn't do, such as lambda/inner functions (see "Boost Lambda Library", "Boost.LocalFunction" and "Phoenix"), scope(exit) (BOOST_SCOPE_EXIT), and a "typeof" operator (Boost.Typeof). If there were 1/4 as many D programmers as C++ ones, I might be amazed on a regular basis. Also, it might be nice to have 'canImplement' for template constraints: auto foo(T)(T v) if (canImplement!(T, SomeInterface)){...} or 'couldImplement', assuming T doesn't officially declare that it implements the interface...
Re: Consistency, Templates, Constructors, and D3
I'm inclined to think that constructors should use "init", in keeping with tradition. Wow, what the hell am I saying. Scratch that sentence, I often wish I could edit stuff after posting.
Re: Consistency, Templates, Constructors, and D3
I've had a couple of ideas recently about the importance of consistency in a language design, and how a few languages I highly respect (D, C#, and Nimrod) approach these issues. This post is mostly me wanting to reach out to a community that enjoys discussing such issues, in an effort to correct any mis-conceptions I might hold, and to spread potentially good ideas to the community in hopes that my favorite language will benefit from our discussion. The points you raise are good and I generally like your ideas, although it feels a little early to talk about D3 when D2 is still far from a comprehensive solution. Amazing that bug 1528 is still open for example: http://stackoverflow.com/questions/10970143/wheres-the-conflict-here Regarding your idea for merging compile-time and run-time arguments together, it sounds good at first but I wonder if it would be difficult to handle in the parser, because at the call site, the parser does not know whether a particular argument should be a type or an expression. Still, no insurmountable difficulties come to mind. I certainly like the idea to introduce a more regular syntax for object construction (as I have proposed before, see http://d.puremagic.com/issues/show_bug.cgi?id=8381#c1) but you didn't say whether it would be allowed to declare a static method called "new". I'd be adamant that it should be allowed: the caller should not know whether they are calling a constructor or not. Also, I'm inclined to think that constructors should use "init", in keeping with tradition. A couple inconsistencies that come immediately to my mind about D2 are 1. Function calling is foo!(x, y)(z) but declaration is foo(x, y)(int z) And the compiler doesn't always offer a good error message. I'm seeing "function declaration without return type. (Note that constructors are always named 'this')" "no identifier for declarator myFunction!(Range)(Range r)" 2. Ref parameters are declared as (ref int x) but are not allowed to be called as (ref x) -- then again, maybe it's not a real inconsistency, but I'm annoyed. It prevents my code from self-documenting properly. Obviously, D is easy compared to C++, but no language should be judged by such a low standard of learnability. So I am also bothered by various things about D that feel unintuitive: 1. Enums. Since most enums are just a single value, they are named incorrectly. 2. immutable int[] func()... does not return an immutable array of int[]? 3. 0..10 in a "foreach" loop is not a range. It took me awhile to find the equivalent range function, whose name is quite baffling: "iota(10)" 4. Eponymous templates aren't distinct enough. Their syntax is the same as a normal template except that the outer and inner members just happen to have the same name. This confused me the other day when I was trying to understand some code by Nick, which called a method inside an eponymous templates via another magic syntax, UFCS (I like UFCS, but I might be a little happier if free functions had to request participation in it.) 5. The meaning is non-obvious when using "static import" and advanced imports like "import a = b : c, d" or "import a : b = c, d = e" or "import a = b : c = d". 6. the syntax of is(...)! It looks like a function or operator with an expression inside, when in fact the whole thing is one big operator. It's especially not obvious that "is(typeof(foo + bar))" means "test whether foo+bar is a valid and meaningful expression". Making matters worse, the language itself and most of its constructs are non-Googlable. For example if you don't remember how do declare the forwarding operator (alias this), what do you search for? If you see "alias _suchAndSuch this" and don't know what it means, what do you search for? (one might not think of removing the middle word and searching for that). I even have trouble finding stuff in TDPL e-book. The place where templates are discussed is odd: section 7.5 in chapter 7, "user-defined types", even though the template statement doesn't actually define a type. I know, I should just read the book again... say, where's the second edition? I got so disappointed when I reached the end of chapter 13 and it was followed by an index. No UFCS or traits or ranges mentioned in there anywhere... compile-time function evaluation is mentioned, but the actual acronym CTFE is not. I also hope something will be changed about contracts. I am unlikely to ever use them if there's no option to keep SOME of them in release builds (I need them to work at all boundaries between different parties' code, e.g. official API boundaries, and it is preferable to keep them in all cases that they don't hurt performance; finally, we should consider that the class that contains the contracts may not know its own role in the program, so it may not know whether to assert or enforce is best). Plus, the syntax is too verbose. Instead of in {
Re: Fragile ABI
I think the only reason we still use COM today is that, sadly, there is no other OO standard interoperable with all languages. C++ vtables are the closest competitor; I guess their fatal flaw is that there is no standard for memory management across C++ DLLs. Even .NET with his goal of supporting multiple languages has the CLS as the common set of datatypes and OO concepts to support across .NET languages. Given that OO has so many types of possible implementations, it is hard to implement an ABI that works across multiple languages. Sure, but .NET apps are not limited to CLS. Two different .NET languages can easily interoperate outside the rules of CLS (as long as it is still within the rules of .NET). Whereas operating beyond the limits of COM is much harder. Besides that, CLS itself is far more expansive than COM, allowing function overloading, inheritance, constructor arguments, etc. It's unfortunate that .NET has limitations that make it hard for languages with novel features, like D, to fit in. (D could target .NET, of course, but there would be a significant cost, in terms of either performance, interoperability with other .NET code, and/or placing limitations on what D code can do.) Lets see how the improved COM (WinRT) turns out to be. Sadly, WinRT is again intended to be Windows-only, so developers like me that hate lock-in will avoid it in preference for .NET (hi Mono!) and yucky old C.
Re: Fragile ABI
On Monday, 20 August 2012 at 18:37:00 UTC, R Grocott wrote: On Monday, 20 August 2012 at 15:26:48 UTC, Kagamin wrote: What you ask for sounds quite similar to COM composition with delegation. Would anybody mind linking to resources which describe COM composition with delegation? It's been suggested twice in this thread as an alternative way to develop a non-fragile API, but anything related to COM is almost invisible to search engines (even moreso than D itself). There's nothing novel about COM except aggregation, and aggregation is just an implementation detail where a class pretends that it implements an interface but the calls to that interface go to another object, conceptually it's like "alias this" except that a dynamic cast (i.e. QueryInterface) is required to reach the second object: http://msdn.microsoft.com/en-us/library/ms686558(v=vs.85) For the most part COM sucks really bad: it is a very ordinary object-oriented ABI but without numerous features that we otherwise take for granted: - In COM, you can't define static methods - In COM, you can't overload functions - In COM, constructors can't have arguments - In COM, there are no fields, only properties - In COM, class inheritance is not allowed (an interface IB can inherit from IA, but if you implement a class A that implements IA, you can't write a class B that derives from A and implements IB. In C++/ATL a template-based workaround is possible if A and B are in the same DLL.) Moreover COM ABIs are fragile, in that there is almost zero support for adding or removing methods without either breaking everything or creating a new, independent, incompatible version (the only exception: you can safely add a method at the end of an interface, if you can be certain that no other interface inherits from it.) Finally, it's Windows-only (although it has been reimplemented on Linux, e.g. for WINE) and modules must be registered in the Windows Registry. I think the only reason we still use COM today is that, sadly, there is no other OO standard interoperable with all languages. C++ vtables are the closest competitor; I guess their fatal flaw is that there is no standard for memory management across C++ DLLs.
Re: Example of Rust code
I'd say we're doing all right. Are you serious? Yes. What's wrong with my D version? It's short and to the point, works, and produces optimal code. Your version is basically a very long-winded way to say "auto x = 5 - (3 + 1);" so it really has nothing to do with the example. The point of the example was to represent a simple AST and store it on the stack, not to represent + and - operators as plus() and minus() functions. (I must say though, that while ADTs are useful for simple ASTs, I am not convinced that they scale to big and complex ASTs, let alone extensible ASTs, which I care about more. Nevertheless ADTs are at least useful for rapid prototyping, and pattern matching is really nice too. I'm sure somebody could at least write a D mixin for ADTs, if not pattern matching.) 1. If you write FORTRAN code in D, it will not work as well as writing FORTRAN in FORTRAN. 2. If you write C code in D, it will not work as well as writing C in C. Really? And here I genuinely thought D was good enough for all the things C and FORTRAN are used for. 3. If you write Rust code in D, it will not work as well as writing Rust in Rust. I hope someday to have a programming system whose features are not limited to whatever features the language designers saw fit to include -- a language where the users can add their own features, all the while maintaining "native efficiency" like D. That language would potentially allow Rust-like code, D-like code, Ruby-like code and even ugly C-like code. I guess you don't want to be the one to kickstart that PL. I've been planning to do it myself, but so far the task seems just too big for one person.
Re: Functional programming in D and some reflexion on the () optionality.
The problem isn't about following haskell precisely or not (I think we shouldn't). The problem is wanting to have everything, and resulting in getting nothing. Let's take haskell as example. Function are all pure. So it doesn't matter when a function get executed or not, and, as a result, haskell don't need a explicit function call like () in D. Some people find that great, and want it to be the case in D. So D drop () usage. Now, as D don't enforce purity, when does the function get executed is important. As a result, complicated scheme is implemented to know when does the function get executed, wand when it doesn't (You'll notice *4* families of scheme for that in D). As a result, the design is overly complex, and defined nowhere. Just to have that haskell feature, that work well in haskell because of some other properties of the language D don't have. What are the 4 "families of scheme to know when does the function get executed"?
Re: Functional programming in D and some reflexion on the () optionality.
class A { void B() {} } auto a = new A().B(); // ^ semicolon expected following auto declaration, not '.' Obviously. No clue what this snippet is trying to do. Well I meant "int B() { return 0; }" of course. I think you deliberately miss the point.
Re: Functional programming in D and some reflexion on the () optionality.
To me, the first big failure of D to implement functional style is to not have first class functions. You get a function using & operator. But does it really make sense ? See code below : void foo(){} void bar(void function() buzz) {} void main() { bar(foo); } // This will execute foo, and so fail. Functions are not first class objects. void main() { auto bar = &foo; foo(); // Do something. bar(); // Do the same thing. auto buzz = &bar; (*buzz)(); // Do the same thing. } Functions don't behave the same way is they are variables or declared in the source code. Worse, foo was before a function call. Now it isn't anymore. foo, as a expression have a different meaning depending on what is done on it. It would become very confusing if foo return a reference, so it is an lvalue and & is a valid operation on the function call. As D don't enforce purity like functional programing does, it can't be up to the compiler to decide when does the function get executed. Then come UFCS. UFCS allow for function calls with parameters. It is still inconsistent. void foo(T)(T t) {} a.foo; // foo is called with a as argument. &a.foo; // error : not an lvalue Now let imagine that foo is a member function of a, &a.foo become a delegate. a.foo is still a function call. This is still quite inconsistent. Implementing all this is almost impossible when you add @property into the already messy situation. Additionally, the current implement fails to provide the basics of functional programing, and break several abstraction provided by other languages features. C++ has proven that bad association of good language features lead to serious problems. This require to be formalized in some way and not based on dmd's implementation. Inevitably, the process will lead to code breakage (adding or removing some ()/&). Reading the @property thread, it seems that most people want to keep dmd's current behavior. Because code rely on it. This make sense, but if dmd's implement is the only goal, it means that other compiler are only to be reverse engineering dmd's behavior, and are guaranteed to lag behind. Considering this, I seriously wonder if it make sense to even try to follow dmd's behavior and decide whatever seems the right one when writing a D compiler, which will result in community split, or no alternative compiler produced for D. I also have some proposal to fix thing, even some that would allow a.map!(...).array() to still be available. But inevitably, some other construct will broke. At this point, what matter isn't really what solution is adopted, but do we still want to be dependent on dmd implementation for D features. I'm not sure if I understand your point perfectly, but I definitely feel that the way D handles optional parens is awful. The other day I noticed that the following is a syntax error (DMD 2.059): class A { void B() {} } auto a = new A().B(); // ^ semicolon expected following auto declaration, not '.' Even without silly errors like this, optional parenthesis create ambiguities, and ambiguities are bad. Maybe there is a sane way for parenthesis to be optional, but the way I've seen D behaving is *bizarre*. The compiler should *expect* parenthesis, and only assume that the parenthesis are missing if it's the only way to compile without an immediate error. So for example, - if foo is a non-@property function that returns another function, foo() must invoke foo itself and never the function that foo returns. - if I say "&foo" where foo is a non-@property function, it should always take the address of the function, never take the address of the return value. - The rules shouldn't change if you replace "foo" with a complex expression like "x.y[z]" or "new Module.ClassName".
Re: D language and .NET platform
On Sunday, 29 July 2012 at 16:32:10 UTC, Alex Rønne Petersen wrote: On 29-07-2012 17:36, bearophile wrote: .NET is too limited to represent the language, Can you tell us why? Array slices. The .NET type system has no way to represent them because it's designed for precise GC, and array slices allow interior pointers in the heap (as opposed to the stack when passing a field of an object by reference to a function, or whatever). D is theoretically designed for precise GC, too. But in .NET you can only have a reference to an array as a whole, so a slice must be represented as an array, offset and length. The real problem I see is that in D you can have a slice that does *not* refer to an array on the GC heap, such as a slice on a non-GC heap, or on the stack (currently, in fact, in D you can easily make pointers and slices that point to stack data to outlive the stack frame, which the 'safe' .NET type system inherently prevents). .NET allows one to break the type system using pointers (in functions marked 'unsafe'), so as far as I can tell D for .NET could theoretically do everything that native D does, but with some annoying caveats mainly related to garbage collection. For instance, in a slice, I believe you can't use the same memory word to refer to an array on the GC heap OR an array that is not on the GC heap (unless you want to pin all your arrays, and you really don't). IIUC, doing so can crash the garbage collector. I'm thinking that a .NET D slice would be implemented as a reference to a GC array and two integers (start and length). If the slice refers to a non-GC array, it would be stored in the same space, as a null reference, a pointer cast to IntPtr, and a length. However, this would make the code for accessing a slice rather clumsy and/or inefficient. .NET has other limitations too, but again I expect there would be workarounds.
Re: @trusted considered harmful
On Saturday, July 28, 2012 22:08:42 David Nadlinger wrote: On Saturday, 28 July 2012 at 02:33:54 UTC, Jonathan M Davis But unfortunately wrong – you call S.save in the @trusted block… ;) Yeah. I screwed that up. I was obviously in too much of a hurry when I wrote it. And actually, in this particular case, since the part that can't be @trusted is in the middle of an expression doing @system stuff, simply using an @trusted block wouldn't do the trick. Have you guys thought about the possibility that the language could simply not trust any calls that were resolved using a template argument? I'm a bit tired so I may be missing something, but it seems to me that (in a @trusted template) if the compiler uses an instantiated template parameter (e.g. actual type Foo standing in for template parameter T) to choose a function to call, the compiler should require that the function be @safe, based on the principle that a template cannot vouch for what it can't control. IOW, since a template can't predict what function actually gets called, the compiler should require whatever function gets called to be @safe. If the programmer actually does want his template function to be able to call _unpredictable_ @system functions, he should mark his template as @system instead of @trusted.
Re: Impressed
I'd say this argument on which is "better", yield or ranges, is a problem ill posed. Yeah, since yielding is just a convenient way to implement an input range, asking which is better is like asking "Which is better, pick-up trucks or vehicles?" "yield" adds real, nontrivial value, and is not entirely implementable as a library. Walter and I saw some uses of it in C# at Lang.NEXT that were quite impressive. On the other hand yield's charter is limited when compared to that of ranges. Yield goes with the very simple "go through everything once" functionality, which is essentially input ranges - only a tiny part of ranges can do. "yield" adds real, nontrivial value, and is not entirely implementable as a library. Walter and I saw some uses of it in C# at Lang.NEXT that were quite impressive. Agreed. However, I have been looking at D's Fibers and I wonder if an optimized implementation of them could provide the same functionality reasonably well: https://www.semitwist.com/articles/article/view/combine-coroutines-and-input-ranges-for-dead-simple-d-iteration The only problem is performance (and perhaps memory usage, but there are ways to reduce that). Someone reported that a trivial fiber-based forward range had 26x the overhead of opApply for iteration (70s vs 2.7s for 1 billion iterations). I wonder if the fiber-switching could be optimized? But I looked at core/thread.d and unless I'm missing something, the fiber switch does not appear to do much work: it calls Thread.getThis() twice per switch (= 4 times per iteration), getStackTop() (= rt_stackTop) once, and a naked asm routine with 21 asm instructions. The entire yield() process contains no branches; call() additionally calls setThis() twice and checks if the Fiber threw an exception. What's the easiest way to time something in D? I'm curious if Thread.getThis() (= TlsGetValue()) is the bottleneck. Anyway, stack-switching lets you do not only the same things as C# 2's "yield return" but as far as I can tell, it can also do everything that C# 5's "async/await" can do and more: http://qscribble.blogspot.ca/2012/07/asyncawait-vs-stack-switching.html i.e. stack switching can accomplish tasks that async/await cannot, while I don't know of any cases of the reverse. async is more limited because all functions involved in an async task must be explicitly marked and transformed by the compiler, but stack switching works no matter what code is involved; even C code can be called on an asynchronous fiber task.
Re: @trusted considered harmful
I don't see flaw with 1. However 2 doesn't sound right. @trusted { // Do something dirty. } You aren't supposed to do dirty things in @trusted code. You're supposed to safely wrap a system function to be usable by a safe function. The system function is supposed to be short and getting its hands dirty. True, but since the proposal is that all functions should be either @safe or @system, a @trusted block is necessary in a @safe function in order to call @system functions. Perhaps you would suggest that a @trusted block should be able to _call_ @system code but not actually do anything unsafe directly? That sounds interesting, but it's not how @trusted currently works.
Re: Impressed
On Friday, 27 July 2012 at 01:56:33 UTC, Stuart wrote: On Friday, 27 July 2012 at 00:10:31 UTC, Brad Anderson wrote: D uses ranges instead of iterators. You can read more about them here: http://ddili.org/ders/d.en/ranges.html I find ranges to be a vast improvement over iterators personally (I use iterators extensively in C++ for my job and lament not having ranges regularly). On Friday, 27 July 2012 at 00:17:21 UTC, H. S. Teoh wrote: D has something far superior: ranges. http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1 Even better, they are completely implemented in the library. No unnecessary language bloat just to support them. I'm not very well up on ranges. I understand the general [1 ... 6] type of ranges, but I really don't see how custom range functions could be as useful as the Yield support in VB.NET. Yes, I think H. S. Teoh wrote what that without knowing what C#/VB iterators actually are. .NET has a concept of "enumerators" which are basically equivalent to D's "input ranges". Both enumerators and input ranges are easier to use and safer than C++ iterators. Neither enumerators nor input ranges require any language support to use, but both C# and D have syntactic sugar for them in the form of the foreach statement. Both C# and D input ranges can be infinite. C#/VB "iterators", however, are an additional syntactic sugar that transforms a function into a state machine that provides an enumerator (or "enumerable"). These are indeed very useful, and missing from D. Here is an example of an iterator that I updated today: public IEnumerable Overlays() { foreach (var ps in _patterns) { yield return ps.RouteLine; yield return ps.PermShapes; if (ps.Selected) yield return ps.SelShapes; } } It does not work like opApply; the compiler creates a heap object that implements IEnumerable or IEnumerator (depending on the return value that you ask for -- it is actually IEnumerator that works like a forward ranges, but foreach only accepts IEnumerable, which is a factory for IEnumerators) In D you could use opApply to do roughly the same thing, but in that case the caller cannot treat the opApply provider like an ordinary collection (e.g. IIUC, the caller cannot use map or filter on the results).
Re: Can you do this in D?
3. Is there any way of executing code or programs during compile time? I've seen an example of CTFE (Compile Time Function Evaluation), although I'm unsure if this works for stuff like classes. However, I am considering more advanced execution (not constants) such as printing to a file during compiling for stuff like how long compiling a certain function/template takes. You can call any safe and pure D code at compile time (none of the code has to be marked pure explicitly, but it cannot access any static or global variables, call C code, access files, etc.) This is called CTFE=Compile-Time Function Evaluation. The "pure" limitation isn't a huge restriction, since you can still edit member variables (fields) and the compiler can memoize the results of CTFE... although I don't know if it memoizes automatically, or if you have to use a template to accomplish it. For example if I do enum twoPi = computePi() + computePi(); I don't know if the compiler computes PI once or twice. Does someone know? But if I define this template: @property auto memoize(T, T code)() { return code; } enum twoPi = memoize!(double,computePi()) + memoize!(double,computePi()); Then computePi is surely called only once, and thus you can cache the result of any computation for repeated use. (I don't know how to get the type 'double' to be inferred automatically, though.) You can also, of course, use enums for this purpose: enum pi = computePi(); // computed only once enum twoPi = pi + pi; I don't think you can run "programs" at compile-time, but since you can call ordinary functions and use arbitrarily large structs, you can accomplish a lot. I believe the current released build, 2.059, can't use classes at compile time, but bearophile just implied that 2.060 can. 5. Why not support other operators like $, #, and @? This is more of a rhetorical... as I know the language doesn't need them, nor would I know if they would be binary/unary prefix/etc or the precedence... although they would be nice to have. Specifically I'd like $prefix to be stringification. Just to clarify, because other people are making it sound like D could do this... no, D does not offer user-defined operators, only overloading of predefined operators. User-defined ops would certainly be a nice feature that I would like to have, but the D developers have too much to do already. Personally I think the D syntax and rules feel too ad-hoc and unintuitive right now; it should be simplified slightly, formalized more clearly, and debugged further before yet more features are piled on.
Re: DCT use cases - draft
On Wednesday, 23 May 2012 at 15:36:59 UTC, Roman D. Boiko wrote: On Tuesday, 22 May 2012 at 18:33:38 UTC, Roman D. Boiko wrote: I'm reviewing text right now Posted an updated version, but it is still a draft: http://d-coding.com/2012/05/23/dct-use-cases-revised.html BTW, have you seen the video by Bret Victor entitled "Inventing on Principle"? This should be a use case for DCT: http://vimeo.com/36579366 The most important part for the average (nongraphical) developer is his demo of writing a binary search algorithm. It may be difficult to use an ordinary debugger to debug CTFE, template overload resolution and "static if" statements, but something like Bret's demo, or what the Light Table IDE is supposed to do... http://www.kickstarter.com/projects/ibdknox/light-table ...would be perfect for compile-time debugging, and not only that, it would also help people write their code in the first place, including (obviously) code intended for run-time. P.S. oh how nice it would be if we could convince anyone to pay us to develop these compiler tools... just minimum wage would be s nice.
Re: DCT use cases - draft
On Wednesday, 23 May 2012 at 15:36:59 UTC, Roman D. Boiko wrote: On Tuesday, 22 May 2012 at 18:33:38 UTC, Roman D. Boiko wrote: I'm reviewing text right now Posted an updated version, but it is still a draft: http://d-coding.com/2012/05/23/dct-use-cases-revised.html I think one of the key challenges will be incremental updates. You could perhaps afford to reparse entire source files on each keystroke, assuming DCT runs on a PC*, but you don't want to repeat the whole semantic analysis of several modules on every keystroke. (*although, in all seriousness, I hope someday to browse/write code in a smartphone/tablet IDE, without killing battery life) D in particular makes standard IDE features difficult, if the code uses a lot of CTFE just to decide the meaning of the code, e.g. "static if" computes 1_000_000 digits of PI and decides whether to declare method "foo" or method "bar" based on whether the last digit is odd or even. Of course, code does not normally waste the compiler's time deliberately, but these sorts of things can easily crop up accidentally. So DCT could profile its own operation and report to the user which analyses and functions are taking the longest to run. Ideally, somebody would design an algorithm that, given a location where the syntax tree has changed, figures out what parts of the code are impacted by that change and only re-runs semantic analysis on the code whose meaning has potentially changed. But, maybe that is too just hard. A simple approach would be to just re-analyze the whole damn program, but prioritize analysis so that whatever code the user is looking at is re-analyzed first. This could be enhanced by a simple-minded dependency tree, so that changing module X does not trigger reinterpretation of module Y if Y does not directly or indirectly use X at all. By using multiple threads to analyze, any long computations wouldn't prevent analysis of the "easy parts"; but several threads could get stuck waiting on the same thing. For example, it would seem to me that if a module X contains a slow "static if" at module scope, ANY other module that imports X cannot resolve ANY unqualified function calls until that "static if" is done processing, because the contents of the "static if" MIGHT create new overloads that have to be considered*. So, when a thread gets stuck, it needs to be able to look for other work to do instead. In any case, since D is turing-complete and CTFE may enter infinite loops (or just very long loops), an IDE will need to occasionally terminate threads and restart analysis, so the analysis threads must be killable, but hopefully it could be designed so that analysis doesn't have to restart from scratch. I guess immutable data structures will therefore be quite important in the design, which you seem to be aware of already.
Re: What is the compilation model of D?
I hope someone can give more details about this. TDPL chapter 11 "Scaling Up". That's where I was looking. As I said already, TDPL does not explain how compilation works, especially not anything about the low-level semantic analysis which has me most curious.
Re: What is the compilation model of D?
If you use rdmd to compile (instead of dmd), you *just* give it your *one* main source file (typically the one with your "main()" function). This file must be the *last* parameter passed to rdmd: $rdmd --build-only (any other flags) main.d Then, RDMD will figure out *all* of the source files needed (using the full compiler's frontend, so it never gets fooled into missing anything), and if any of them have been changed, it will automatically pass them *all* into DMD for you. This way, you don't have to manually keep track of all your files and pass them all into DMD youself. Just give RDMD your main file and that's it, you're golden. I meant to ask, why would it recompile *all* of the source files if only one changed? Seems like it only should recompile the changed ones (but still compile them together as a unit.) Is it because of bugs (e.g. the template problem you mentioned)?
Re: What is the compilation model of D?
Thanks for the very good description, Nick! So if I understand correctly, if 1. I use an "auto" return value or suchlike in a module Y.d 2. module X.d calls this function 3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps Then the compiler will have to fully parse Y twice and fully analyze the Y function twice, although it generates object code for the function only once. Right? I wonder how smart it is about not analyzing things it does not need to analyze (e.g. when Y is a big module but X only calls one function from it - the compiler has to parse Y fully but it should avoid most of the semantic analysis.) What about templates? In C++ it is a problem that the compiler will instantiate templates repeatedly, say if I use vector in 20 source files, the compiler will generate and store 20 copies of vector (plus 20 copies of basic_string, too) in object files. 1. So in D, if I compile the 20 sources separately, does the same thing happen (same collection template instantiated 20 times with all 20 copies stored)? 2. If I compile the 20 sources all together, I guess the template would be instantiated just once, but then which .obj file does the instantiated template go in? $rdmd --build-only (any other flags) main.d Then, RDMD will figure out *all* of the source files needed (using the full compiler's frontend, so it never gets fooled into missing anything), and if any of them have been changed, it will automatically pass them *all* into DMD for you. This way, you don't have to manually keep track of all your files and pass them all into DMD youself. Just give RDMD your main file and that's it, you're golden. Side note: Another little trick with RDMD: Omit the --build-only and it will compile AND then run your program: Yes. (Unless you never import anything from in phobos...I think.) But it's very, very fast to parse. Lightning-speed if you compare it to C++. I don't even want to legitimize C++ compiler speed by comparing it to any other language ;) - Is there any concept of an incremental build? Yes, but there's a few "gotcha"s: 1. D compiles so damn fast that it's not nearly as much of an issue as it is with C++ (which is notoriously ultra-slow compared to...everything, hence the monumental importance of C++'s incremental builds). I figure as CTFE is used more, especially when it is used to decide which template overloads are valid or how a mixin will behave, this will slow down the compiler more and more, thus making incremental builds more important. A typical example would be a compile-time parser-generator, or compiled regexes. Plus, I've heard some people complaining that the compiler uses over 1 GB RAM, and splitting up compilation into parts might help with that. BTW, I think I heard the compiler uses multithreading to speed up the build, is that right? It keeps diving deeper and deeper to find anything it can "start" with. One it finds that, it'll just build everything back up in whatever order is necessary. I hope someone can give more details about this. - In light of the above (that the meaning of D code can be interdependent with other D code, plus the presence of mixins and all that), what are the limitations of __traits(allMembers...) and other compile-time reflection operations, and what kind of problems might a user expect to encounter? Shouldn't really be an issue. Such things won't get evaluated until the types/identifiers involved are *fully* analyzed (or at least to the extent that they need to be analyzed). So the results of things like __traits(allMembers...) should *never* change during compilation, or when changing the order of files or imports (unless there's some compiler bug). Any situation that *would* result in any such ambiguity will get flagged as an error in your code. Hmm. Well, I couldn't find an obvious example... for example, you are right, this doesn't work, although the compiler annoyingly doesn't give a reason: struct OhCrap { void a() {} // main.d(72): Error: error evaluating static if expression // (what error? syntax error? type error? c'mon...) static if ([ __traits(allMembers, OhCrap) ].length > 1) { auto b() { return 2; } } void c() {} } But won't this be a problem when it comes time to produce run-time reflection information? I mean, when module A asks to create run-time reflection information for all the functions and types in module A er, I naively thought the information would be created as a set of types and functions *in module A*, which would then change the set of allMembers of A. But, maybe it makes more sense to create that stuff in a different module (which A could then import??) Anyway, I can't even figure out how to enumerate the members of a module A; __traits(allMembers, A) causes "Error: import Y has no members". Aside: I first wrote the above code as follows
Re: What is the compilation model of D?
I find it shocking that anyone would consider 15 seconds slow to compile for a large program. Yes, D's builds are lightning fast in general, and 15 seconds is probably a longer build, but calling 15 seconds "slow-to-compile" just about blows my mind. 15 seconds for a large program is _fast_. If anyone complains about a large program taking 15 seconds to build, then they're just plain spoiled or naive. I've dealt with _Java_ apps which took in the realm of 10 minutes to compile, let alone C++ apps which take _hours_ to compile. 15 seconds is a godsend. I agree with Andrej, 15 seconds *is* slow for a edit-compile-run cycle, although it might be understandable when editing code that uses a lot of CTFE and static foreach and reinstantiates templates with a crapton of different arguments. I am neither spoiled nor naive to think it can be done in under 15 seconds. Fully rebuilding all my C# code takes less than 10 seconds (okay, not a big program, but several smaller programs). Plus, it isn't just build times that concern me. In C# I'm used to having an IDE that immediately understands what I have typed, giving me error messages and keeping metadata about the program up-to-date within 2 seconds. I can edit a class definition in file A and get code completion for it in file B, 2 seconds later. I don't expect the IDE can ever do that if the compiler can't do a debug build in a similar timeframe.
Re: Computed gotos on Reddit
OK I've taken your comments into account. Now I think I finally got it right: mov ecx, [ebx] ; ecx = code[pc] inc ebx ; pc ++ jmp ecx ; goto code[pc], as ecx is already a pointer Nope, ecx is an opcode, not a pointer. You need another indirection. Man this has been frustrating to read. I understood what Dmitry was talking about over at least dozen posts ago, and that's without actually reading the article about interpreters (I did write a SNES emulator once, but it didn't use this cool technique. I did, however, have to write it in assembly because the C version was dog-slow because e.g. I couldn't capture the overflow/negative/zero flags in C.)
What is the compilation model of D?
(Maybe this should be in D.learn but it's a somewhat advanced topic) I would really like to understand how D compiles a program or library. I looked through TDPL and it doesn't seem to say anything about how compilation works. - Does it compile all source files in a project at once? - Does the compiler it have to re-parse all Phobos templates (in modules used by the program) whenever it starts? - Is there any concept of an incremental build? - Obviously, one can set up circular dependencies in which the compile-time meaning of some code in module A depends on the meaning of some code in module B, which in turn depends on the meaning of some other code in module A. Sometimes the D compiler can resolve the ultimate meaning, other times it cannot. I was pleased that the compiler successfully understood this: // Y.d import X; struct StructY { int a = StructX().c; auto b() { return StructX().d(); } } // X.d import Y; struct StructX { int c = 3; auto d() { static if (StructY().a == 3 && StructY().a.sizeof == 3) return 3; else return "C"; } } But what procedure does the compiler use to resolve the semantics of the code? Is there a specification anywhere? Does it have some limitations, such that there is code with an unambiguous meaning that a human could resolve but the compiler cannot? - In light of the above (that the meaning of D code can be interdependent with other D code, plus the presence of mixins and all that), what are the limitations of __traits(allMembers...) and other compile-time reflection operations, and what kind of problems might a user expect to encounter?
Re: Just where has this language gone wrong?
I suspect that you have a C++ background. If this is not accurate, ignore the rest. But if it is accurate, my plea to you is: Learn other languages. C++ has next to no innovative language features (even C++11's take on lambdas is an abomination) and encourages defensive programming to the point where it's ridiculous (I mean, no default initialization of variables? In 2012?). Actually, C# has no default initialization* of local variables, and I love it. Instead, it is a compile-time error to read a variable if the compiler cannot guarantee that you have initialized it. IMO this is much better than D's "let's initialize doubles to NaN so that something fishy will happen at runtime if you forget to initialize it" :) * technically the compiler asks the runtime to bitwise 0-fill everything, but that's just an implementation detail required for the .NET verifier, and the optimizer can ignore the request to preinitialize.
Re: Need runtime reflection?
I want to imitate golang's interface in D, to study D's template. I wrote some code: https://gist.github.com/3123593 Now we can write code like golang: -- interface IFoo { void foo(int a, string b, float c); } struct Foo { void foo(int a, string b, float c) { writeln("Foo.foo: ", a, ", ", b, ", ", c); } } struct FooFoo { void foo(int a, string b, float c) { writeln("FooFoo.foo: ", a, ", ", b, ", ", c); } } GoInterface!(IFoo) f = new Foo; f.foo(3, "abc", 2.2); f = new FooFoo; f.foo(5, "def", 7.7); -- It is also very naive, does not support some features, like out/ref parameters, free functions *[1]* and so on. The biggest problem is downcast not supported. In golang, we can write code like*[2]*: -- var p IWriter = NewB(10) p2, ok := p.(IReadWriter) -- Seems [p.(IReadWriter)] dynamically build a virtual table *[3]*,because the type of "p" is IWriter, it is *smaller* than IReadWriter, the cast operation must search methods and build vtbl at run time. In D, GoInterface(T).opAssign!(V)(V v) can build a rich runtime information to *V* if we need. But if *V* is interface or base class, the type information not complete. So, seems like I need runtime reflection? and how can I do this in D? I did not find any useful information in the TypeInfo*. -- [1] free functions support, e.g. -- interface IFoo { void foo(int a, string b, float c); } void foo(int self, int a, string b, float c) { writefln("..."); } GoInterface!(int) p = 1; p.foo(4, "ccc", 6.6); -- In theory no problem. I, too, was enamored with Go Interfaces and implemented them for .NET: http://www.codeproject.com/Articles/87991/Dynamic-interfaces-in-any-NET-language And I wasn't the only one; later, someone else published another library for .NET with the exact same goal. This is definitely a feature I would want to see in D, preferably as a first-class feature, although sadly that would break any code that relies on ISomething being pointer-sized; Go uses fat pointers, and we use a thin-pointer implementation in .NET but it's inefficient (as every cast creates a heap-allocated wrapper, and double-indirection is needed to reach the real method.) Anyway, they say it's possible to build runtime reflection in D but I've no idea how... has it never been done before? Of course, runtime template instantiation won't be possible. Therefore, run-time casting will have to be more limited than compile-time casting. Reflection to free functions would be really nice, but it might be less capable at run-time. Consider if you there is a class A in third-party module MA that you want to cast to interface I, but class A is missing a function F() from I. So in your module (module MB) you define a free function F(B) and now you can do the cast. I guess realistically this can only happen at compile-time, since a run-time cast would naturally only look in module MA, not MB, for functions it could use to perform the cast. Presumably, it also requires that MA requested a run-time reflection table to be built, and is it possible to build a reflection table for a module over which you have no control?
Re: D front-end in D for D
On Saturday, 14 July 2012 at 10:48:56 UTC, Gor Gyolchanyan wrote: I just got an amazing thought. If we end up getting a D front-end in D, I think it would be possible to make the back-end in the same space as the code being compiled. This means, having the back-end as a library solution. This would automatically provide 100% compile-time code introspection. This is just a thought. Not a proposal or anything. What do you guys think? Compile-time code introspection is a job for the front-end. It's not very good to have code introspect itself at compile-time using a library... that would mean the library loads, parses and analyzes the very same code that the compiler has already loaded, parsed and analyzed. Sounds quite inefficient, and is it even legal to read files at compile time, and how would you know what paths to read? Having the front+back-end as a library would, of course, be handy for run-time code generation, which definitely is useful place too. In C# there's a handy library called RunSharp for that, and I miss it in C++. It would, however, mean bundling a complete compiler with your application, so the solution feels very heavy (as compared to the .NET framework, where developers can take for granted that the user's machine already has the libraries.) I think, for multiple reasons including this use case, D should have a "lightweight subset" with a smaller standard library and a somewhat simpler language definition (that retains most of D's power), which could shrink the size of a program that uses runtime codegen. For simplicity, the D front-end written in D could use the same backend for CTFE as for its output. And one hopes that generated code could be garbage-collected. However, presumably you'd have to include LLVM which I believe is around 1MB for a bare-minimum build (with no optimization passes included.)
Re: just an idea (!! operator)
On Friday, 13 July 2012 at 09:49:22 UTC, monarch_dodra wrote: I don't know much about C#, but in C#, isn't EVERYTHING a reference type? Meaning it always makes sense to check if "myobject is null". No, C# has value types (enums, primitives, and user-defined types) which are not nullable. The null coalescing operator (and null?.dot, if it existed) is still useful for nullable types of course; plus, any value type has a nullable counterpart (e.g. int? = nullable int).
Re: just an idea (!! operator)
Yeah, I've been planning to try and get this into D one day. Probably something like: (a ?: b) -> (auto __tmp = a, __tmp ? __tmp : b) gcc used to have that extension and they dropped it... But GCC can't control the C++ language spec. Naturally there is a reluctance to add nonstandard features. It's a successful feature in C#, however, and a lot of people (including me) have also been pestering the C# crew for "null dot" (for safely calling methods on object references that might be null.) I don't see why you would use ?: instead of ??, though.
Re: Counterproposal for extending static members and constructors
On Thursday, 12 July 2012 at 17:35:51 UTC, H. S. Teoh wrote: On Thu, Jul 12, 2012 at 06:25:03PM +0200, David Piepgrass wrote: I'm putting this in a separate thread from http://forum.dlang.org/thread/uufohvapbyceuaylo...@forum.dlang.org because my counterproposal brings up a new issue, which could be summarized as "Constructors Considered Harmful": http://d.puremagic.com/issues/show_bug.cgi?id=8381 So, if I understand your proposal correctly, you're essentially saying that the ctor of a given class C may return a derived class of C instead of just C itself? No, it can also return a different class with the same name. Isn't this just the "object factory" pattern in disguise? Is is a unification of syntax, just as UFCS is a unification of syntax. It solves multiple problems, including information hiding, and extending classes written by other parties.
Re: All right, all right! Interim decision regarding qualified Object methods
we can't just cast to IObject. Oops, I meant IComparable
Re: All right, all right! Interim decision regarding qualified Object methods
On Thursday, 12 July 2012 at 17:51:32 UTC, Andrei Alexandrescu wrote: On 7/12/12 1:40 PM, David Piepgrass wrote: 1. Most importantly, the C++ template approach is a big pain for large-scale systems, because in such systems you want to create DLLs/SOs and C++ has neither a standard ABI nor a safe way to pass around template instantiations between DLLs (in the presence of changes to internal implementation details). Similar problems exist for D, yes? It's a lot easier to define a standard ABI for classes than to solve the cross-DLL template problem. The thing is, that can be done in an opt-in manner. People who want methods in the root of the hierarchy can define a root that defines them. But there's no way to opt out of inheriting Object. Basically it's nice to not force people to buy into a constrained environment without necessity. But is the constrained environment we're talking about really all that constrained? - 'const' is not overly harsh if the user has machanisms to make that mean 'logical const'. - regarding the 5 vtable entries (destructor, toString, toHash, opEquals, opCmp), well, that's only 20/40 bytes per process, and maybe we don't need opCmp that much. Although having these in Object seems constraining in one way, removing them is constraining in a different way: you can no longer provide collection classes for "any" object without involving templates. Wait a minute, though. Keeping in mind the problem of DLL interoperability, and the constraints on using templated + many DLLs together, what if D introduced the feature that Go and Rust have, the ability to adapt any object to a compatible interface? interface IComparable { bool opEquals(IComparable rhs); int opCmp(IComparable rhs); } class Foo { /* could contain anything */ } So let's say we remove all the methods from Object, but we still want people to be able to make a collection of "any object", such as Foo, and pass this collection between DLLs safely. Moreover we want only be a single instance of the collection class, defined in a single DLL (so this collection cannot be a template class). Since a class Foo does not declare that it implements IComparable, and it might not even contain opCmp() and opEquals(), we can't just cast to IObject. Not in the current D, anyway. But now add interface adaptation from Go/Rust. Foo might not define opEquals and opCmp itself, but any client can add those via UFCS, and the standard library would probably define opEquals via UFCS as reference equality already. Thus it is possible for any client to pretend that any class implements IComparable, by adding the missing pieces (if any) and casting to IComparable.
Re: All right, all right! Interim decision regarding qualified Object methods
On Thursday, 12 July 2012 at 04:15:48 UTC, Andrei Alexandrescu wrote: Required reading prior to this: http://goo.gl/eXpuX You destroyed, we listened. I think Christophe makes a great point. We've been all thinking inside the box but we should question the very existence of the box. Once the necessity of opCmp, opEquals, toHash, toString is being debated, we get to some interesting points: Well, I'm not convinced it is a good idea to eliminate the stuff from Object, nor to remove const (I think RawObject as a base class of Object has merit, but to remove the Object functions for everyone? I'm very suspicious.) Some problems I would point out with the idea of "eliminate the stuff from Object and use more templates instead": 1. Most importantly, the C++ template approach is a big pain for large-scale systems, because in such systems you want to create DLLs/SOs and C++ has neither a standard ABI nor a safe way to pass around template instantiations between DLLs (in the presence of changes to internal implementation details). Similar problems exist for D, yes? It's a lot easier to define a standard ABI for classes than to solve the cross-DLL template problem. 2. Although templates are used a lot in C++, in D programs they are used even more and this proposal would increase template usage, so I'd expect the bloat problem to increase. However, merging identical functions (identical machine code) might be a sufficient solution. 3. The need for more templates slows down compilation. We know this is a huge problem in C++. 4. Template bloat is no big deal on desktops but it becomes a bigger problem as the target system gets smaller. Maybe some compromise should be made to ensure D remains powerful and capable on small targets. There were two proposals yesterday that I liked. Taken together, they address all the problems that were raised with const functions in Object: 1. Provide a 'safe workaround' for const, for caching and lazy evaluation (implement it carefully to avoid breaking the guarantees of immutable) 2. Provide a class modifier that makes immutable(_) illegal, so the class uses "logical const" instead of "physical const".
Re: Inherited const when you need to mutate
Except that I don't see why Cached!(...) needs to physically separate the mutable state from the rest of the object. I mean, I see that Cached!(...) would have to cast away immutable (break the type system) in order to put mutable state in an immutable object, but if we set aside the current type system for a moment, *in principle* what's the big deal if the mutable state is physically located within the object? In many cases you can save significant time and memory by avoiding all that hashtable management, and performance Nazis like me will want that speed (when it comes to standard libraries, I demand satisfaction). Now, I recognize and respect the benefits of transitive immutability: 1. safe multithreading 2. allowing compiler optimizations that are not possible in C++ 3. ability to store compile-time immutable literals in ROM (3) does indeed require mutable state to be stored separately, but it doesn't seem like a common use case (and there is a workaround), and I don't see how (1) and (2) are necessarily broken. I must be tired. Regarding (1), right after posting this I remembered the difference between caching to a "global" hashtable and storing the cached value directly within the object: the hashtable is thread-local, but the object itself may be shared between threads. So that's a pretty fundamental difference. Even so, if Cached!(...) puts mutable state directly in the object, fast synchronization mechanisms could be used to ensure that two threads don't step on each other, if they both compute the cached value at the same time. If the cached value is something simple like a hashcode, an atomic write should suffice. And both threads should compute the same result so it doesn't matter who wins.
Re: Inherited const when you need to mutate
Suppose we had a caching solution (you could think of it as @cached, but it could be done in a library). The user would need to provide a const, pure function which returns the same value that is stored in the cache. This is enforceable. The only way to write to the cache, is by calling the function. How far would that take us? I don't think there are many use cases for logically pure, apart from caching, but I have very little idea about logical const. I think a caching solution would cover most valid needs and indeed would be checkable. We can even try its usability with a library-only solution. The idea is to plant a mixin inside the object that defines a static hashtable mapping addresses of objects to cached values of the desired types. The destructor of the object removes the address of the current object from the hash (if there). Given that the hashtable is global, it doesn't obey the regular rules for immutability, so essentially each object has access to a private stash of unbounded size. The cost of getting to the stash is proportional to the number of objects within the thread that make use of that stash. Uh, it better not be proportional. Hashtable gives us O(1), one hopes. Sample usage: class Circle { private double radius; private double circumferenceImpl() const { return radius * 2 * pi; } mixin Cached!(double, "circumference", circumferenceImpl); ... } auto c = new const(Circle); Aside: what's the difference between this and new immutable(Circle)? double len1 = c.circumference; double len2 = c.circumference; Upon the first use of property c.circumference, Lazy computes the value by calling this.circumferenceImpl() and stashes it in the hash. The second call just does a hash lookup. In this example searching the hash may actually take longer than computing the thing, but I'm just proving the concept. If this is a useful artifact, Walter had an idea a while ago that we can have the compiler help by using the per-object monitor pointer instead of the static hashtable. Right now the pointer points to a monitor object, but it could point to a little struct containing e.g. a Monitor and a void*, which opens the way to O(1) access to unbounded cached data. The compiler would then "understand" to not consider that date regular field accesses, and not make assumptions about them being immutable. Any takers for Cached? It would be good to assess its level of usefulness first. I like this idea, and I suspect it could be used to implement not just caching but lazy immutable data structures. Except that I don't see why Cached!(...) needs to physically separate the mutable state from the rest of the object. I mean, I see that Cached!(...) would have to cast away immutable (break the type system) in order to put mutable state in an immutable object, but if we set aside the current type system for a moment, *in principle* what's the big deal if the mutable state is physically located within the object? In many cases you can save significant time and memory by avoiding all that hashtable management, and performance Nazis like me will want that speed (when it comes to standard libraries, I demand satisfaction). Now, I recognize and respect the benefits of transitive immutability: 1. safe multithreading 2. allowing compiler optimizations that are not possible in C++ 3. ability to store compile-time immutable literals in ROM (3) does indeed require mutable state to be stored separately, but it doesn't seem like a common use case (and there is a workaround), and I don't see how (1) and (2) are necessarily broken. As a separate question, do you think it possible to implement Cached!(...) to access an immutable field by casting away immutable, without screwing up (1) and (2)?
Re: Congratulations to the D Team!
On Wednesday, 11 July 2012 at 18:21:24 UTC, Steven Schveighoffer wrote: On Wed, 11 Jul 2012 14:01:44 -0400, deadalnix wrote: On 11/07/2012 19:49, Andrei Alexandrescu wrote: On 7/11/12 1:40 PM, Jakob Ovrum wrote: Some classes don't lend themselves to immutability. Let's take something obvious like a class object representing a dataset in a database. How is an immutable instance of such a class useful? This is a good point. It seems we're subjecting all classes to certain limitations for the benefit of a subset of those classes. Andrei Did you saw the proposal of feep/tgehr on #d ? It basically state that you can overload a const method with a non const one if : - You don't mutate any data that belong to the parent. - You are prevented to create any immutable instance of that classe or any subclasse. I don't like this idea. It means you could not use pure functions to implicitly convert mutable class instances to immutable (something that should be possible today). I do like the idea. Please explain by example why a pure function could no longer convert mutable class instances to immutable? The proposal to restrict the use of immutable is only supposed to affect classes that specifically request it. It also seems to allow abuses. For example: class A { private int _x; public @property x() const { return _x; } } class B : A { private int _x2; public @property x() { return _x2++; } } I think you would have to mark B somehow to indicate that immutable(B) is now illegal, e.g. @mutating class B : A { private int _x2; public @property override x() { return _x2++; } } Now I've completely changed the logistics of the x property so that it's essentially become mutable. This kind of perversion is already possible when x() is const. x() is allowed to mutate and return a static or global variable.
Re: Let's stop parser Hell
On Tuesday, 10 July 2012 at 23:49:58 UTC, Timon Gehr wrote: On 07/11/2012 01:16 AM, deadalnix wrote: On 09/07/2012 10:14, Christophe Travert wrote: deadalnix , dans le message (digitalmars.D:171330), a écrit : D isn't 100% CFG. But it is close. What makes D fail to be a CFG? type[something] <= something can be a type or an expression. typeid(somethning) <= same here identifier!(something) <= again 'something' is context-free: something ::= type | expression. I don't see how "type | expression" is context free. The input "Foo" could be a type or expression, you can't tell which without looking at the context.
Re: Rust updates
bool test(int x) { return x & 2 > 0; } gives: foo.d(1): Error: 2 > 0 must be parenthesized when next to operator & That reminds me, I was so happy the first two times I got an undefined symbol error in D. The compiler said: "Did you mean ''?" LOL, don't tell me how it works... it's magic, right? I love a good error message.
Re: Rust updates
Oh, I can't tell you what a pet peeve PITA the C precedence is. Ugh! I know it's against D philosophy to change the precedence w.r.t. C, but how about a compromise: give a warning or error for "x&2 > 0", with error message: "add parenthesis around x&2 to clarify your intention." bool test(int x) { return x & 2 > 0; } gives: foo.d(1): Error: 2 > 0 must be parenthesized when next to operator & Doh! You read my mind before I thought it :) I hadn't got around to bit fiddling in D yet.
Re: Rust updates
On Wednesday, 11 July 2012 at 18:31:23 UTC, David Piepgrass wrote: The trouble with segmented stacks are: 1. they have a significant runtime penalty Why? Extra instructions generated for each function. Every function? Why? Looks like I misunderstood what "Segmented stacks" are. From an LLVM page: Segmented stack allows stack space to be allocated incrementally than as a monolithic chunk (of some worst case size) at thread initialization. This is done by allocating stack blocks (henceforth called stacklets) and linking them into a doubly linked list. The function prologue is responsible for checking if the current stacklet has enough space for the function to execute; and if not, call into the libgcc runtime to allocate more stack space. Support for segmented stacks on x86 / Linux is currently being worked on. I envision a rather different implementation for 32-bit code. 1. Reserve a normal stack with one 4K page committed + some known minimum amount of uncommitted memory, e.g. another 8 KB uncommitted with a guard page that the program can trap via OS facilities (signals, etc.) 2. When the stack overflows, move the stack to a new, much larger region of Virtual Memory. Much like languages that support compacting garbage collectors, the language / runtime environment must be designed to support this. 3. If one needs to call C code, one preallocates the maximum expected virtual memory needed, e.g. 32 MB.
Re: Rust updates
The trouble with segmented stacks are: 1. they have a significant runtime penalty Why? Extra instructions generated for each function. Every function? Why? 2. interfacing to C code becomes problematic Isn't it possible to auto-commit new pages when C code needs it? ... There's no way to predict how much stack arbitrary C code will use. Presumably one does not call arbitrary C code. Usually one knows what one might call in advance and can plan accordingly (and even if it is arbitrary, one at least knows *that* one is going to call C code and plan accordingly. Most C code doesn't allocate more than a few megabytes on the stack).
Re: just an idea (!! operator)
it is just an idea, i do not have any specific use in mind. ... But we can't base the decision solely on this fact. Then we could add a million operators to the language just because they seem neat. Actually, we could! Great idea, nimrod! (inside joke)
Re: Rust updates
On Wednesday, 11 July 2012 at 17:09:27 UTC, Timon Gehr wrote: On 07/11/2012 06:45 PM, David Piepgrass wrote: ... These benefits (except 3) all exist for "function" as well as "fn", but while many languages use "fun", requiring "function" for all functions is almost unheard of (at least I haven't heard of it), why? It's too damn long! We write functions constantly, we don't want to type "function" constantly. You could have a look at JavaScript. Ack! You got me. Dynamic languages aren't my thing. But JS being dynamically typed, it's not as bad since you don't have to specify the return type in addition.
Re: Rust updates
Rust has type classes from Haskell (with some simplifications for higher kinds), uniqueness typing, and typestates. As nice as kinds, typestates, typeclasses and several pointer types may be, I was in the Rust mailing list and felt unable to participate because they kept using terminology that only PhD in type systems understand. And googling for "kind" doesn't tell me a darn thing ;) That's why have gravitated to D, it's so much more familiar (sometimes too much so, e.g. I still need to 'break' in 'switch'? how many meanings for 'static'?) as well as very powerful. I would still like to learn about the mumbo-jumbo, though, and I know how nice pattern-matching can be from one Haskell-based course in university :) This seems a bit overkill to me: This is very strict, maybe too much strict: Agreed about the int suffixes, but I wonder what Marco meant about "mass-casts" in D. The safe pointer types are @T for shared, reference-counted boxes, and ~T, for uniquely-owned pointers. I wonder how well these could be simulated in D. It seems to me Rust is carefully designed for performance, or at least real-time performance by avoiding garbage collection in favor of safely tracking ownership. That's good, but only now are they developing things like OOP support that I take for granted. ++ and -- are missing Rust, like Go, seems very focused on making a "simple" language. Another reason that I prefer D. the logical bitwise operators have higher precedence. In C, x & 2 > 0 comes out as x & (2 > 0), in Rust, it means (x & 2) > 0, which is more likely to be what you expect (unless you are a C veteran). Oh, I can't tell you what a pet peeve PITA the C precedence is. Ugh! I know it's against D philosophy to change the precedence w.r.t. C, but how about a compromise: give a warning or error for "x&2 > 0", with error message: "add parenthesis around x&2 to clarify your intention." Enums are datatypes that have several different representations. For example, the type shown earlier: enum shape { circle(point, float), rectangle(point, point) } fn angle(vec: (float, float)) -> float { alt vec { (0f, y) if y < 0f { 1.5 * float::consts::pi } (0f, y) { 0.5 * float::consts::pi } (x, y) { float::atan(y / x) } } } alt mypoint { {x: 0f, y: y_name} { /* Provide sub-patterns for fields */ } {x, y} { /* Simply bind the fields */ } } let (a, b) = get_tuple_of_two_ints(); Records, tuples, and destructuring go so well together. I would love to have this. I am particularly a fan of structural typing. I don't know if Rust uses it but Opa and other functional languages often do. You see, there's a problem that pops up in .NET all the time, and probably the same problem exists in D. Any time two libraries want to use the same concept, but the concept is not in the standard library, they need to define it. For instance if there is no "Point" type in the standard library, but two unrelated libraries need points, they will both define their own (amazingly, Points are poorly thought out in .NET and tightly bound to GUI libraries, so people define their own in some cases): // JoesLibrary struct Point!T { T x, y; /* followed by some manipulation functions */ } // FunkyLibrary struct Point!T { T x, y; /* followed by other manipulation functions */ } Sadly, the two point types are not compatible with each other. A client that wants to use both libraries now has an interoperability problem when he wants to pass data between the. Even a client that uses only one of the library, let's call it "JoesLibrary" has to import Point from "JoesLibrary", even if its functionality is not quite what the client wants. It would be much nicer if the client could define his own Point struct that seamlessly interoperates with Joes'. In D this is currently impractical, but I would enjoy designing a way to make it work (before you point out that "what if x and y are in a different order in the two structs" and "it could be T X,Y in one and T x,y in the other", yes, I know, It's on my list of problems to cleverly solve) A similar problem exists with interfaces, where two unrelated libraries expose two similar classes with some common functions, but you can't cast them to a common type in D. This is a solved problem in Go (http://www.airs.com/blog/archives/277) and it's actually pretty easy for a compiler to magically cast a class to an interface that the class did not declare--if the underlying language is designed for that, anyway. In fact, in .NET at least, the same problem exists even if the libraries DO know about each other and are even written by the same person and use identical interfaces. The problem is, if I write two libraries A and B, and I want them to be interoperable, then I need to factor out the common structs and interfaces to a microscopic third library, I. But from the client's perspective, if a client only
Re: Rust updates
On Sunday, 8 July 2012 at 19:28:11 UTC, Walter Bright wrote: On 7/8/2012 6:49 AM, bearophile wrote: I think in Go the function stack is segmented and growable as in Go. This saves RAM if you need a small stack, and avoids stack overflows where lot of stack is needed. The trouble with segmented stacks are: 1. they have a significant runtime penalty Why? 2. interfacing to C code becomes problematic Isn't it possible to auto-commit new pages when C code needs it? I see that *moving* the stack would be a problem unless you have a means to adjust all pointers that point into the stack. If you need to call C code in 32-bit, you'd have to specify a maximum stack size.
Re: Rust updates
On Wednesday, 11 July 2012 at 16:45:17 UTC, David Piepgrass wrote: Anyway I think short vs long is much ado about nothing. No one complains that we have to type "int" instead of "integer", after all. Most languages have only a few keywords, which people quickly learn. As long as all the standard library functions are well-named, I don't care about the language keywords. Okay, I actually care a lot, just about the meaning of the keyword and not about whether it's abbreviated. I think D's use of "enum" for "static constant" and "static" for "thread singleton" (and three or four other things) is quite unfortunate, albeit understandable given the C heritage.
Re: Rust updates
Short keywords are only important with barebones editors like a default vi. Nobody would use this for real development. I started I long discussion on Reddit, because I complained that the goal of 5 letter keywords is primitive, and brings back memories of the time the compilers were memory constraint. ... As someone that values readable code, I don't understand this desire to turn every programming language into APL. Short or long, I don't think it matters if the IDE can help you with the long ones. I don't mind typing immutable, once, but if I had to do it 50 times a day? And somehow, even though I have been programming for over 20 years, I still type "reutrn" and "retrun" all the damn time! So "ret" would save me time. Anyway I think short vs long is much ado about nothing. No one complains that we have to type "int" instead of "integer", after all. Most languages have only a few keywords, which people quickly learn. As long as all the standard library functions are well-named, I don't care about the language keywords. Actually I think "fn" for functions is great, why? 1. Greppability. With the C syntax there is no way to search for function definitions. Even if we had an IDE to find functions for us, you are not always looking at source code in an IDE (you could be browsing a repository on the web) 2. Easier to parse. When the compiler sees "fn", it knows it's dealing with a function and not a variable or an expression. It seems especially beneficial inside functions, where perhaps X * Y might begin an expression (or is that impossible in D?) 3. Googlability. "function" will find results across all PLs, "fn" will narrow the search down quite a bit if you want to see code in Rust. These benefits (except 3) all exist for "function" as well as "fn", but while many languages use "fun", requiring "function" for all functions is almost unheard of (at least I haven't heard of it), why? It's too damn long! We write functions constantly, we don't want to type "function" constantly.
Re: Does D have too many features?
forum.dlang.org apparently failed to post this 10 minutes ago, retrying. On Tuesday, 10 July 2012 at 02:43:05 UTC, Era Scarecrow wrote: On Tuesday, 10 July 2012 at 01:41:29 UTC, bearophile wrote: David Piepgrass: This use case is pretty complex, so if I port this to D, I'd probably just cast away const/immutable where necessary. You are not the first person that says similar things. So D docs need to stress more than casting away const/immutable in D is rather more dangerous than doing the same thing in C++. ... Let's say a class/struct is a book with Page protectors signifying 'const(ant)'. You promise to return the book to the library without making any changes; Although you promised you wouldn't make changes, you still take the Page protectors off and make make notes on the outer edges or make adjustments in the text, then return the book. Is this wise? This isn't C++. If something shouldn't change, then don't change it god damn it. If it needs to change it isn't const(ant) and shouldn't suggest it is. The difficulty, in case you missed it, is that somebody else (the Object class) says that certain functions are const, but in certain cases we really, really want to mutate something, either for efficiency or because "that's just how the data structure works". If a data structure needs to mutate itself when read, yeah, maybe its functions should not be marked const, but quite often the "const" is inherited from Object or some interface that (quite reasonably, it would seem) expects functions that /read stuff/ to be const. And yet we can't drop const from Object or such interfaces, because there is other code elsewhere that /needs/ const to be there. So far I have no solution to the dilemma in mind, btw. But the idea someone had of providing two (otherwise identical) functions, one const and one non-const, feels like a kludge to me, and note that anybody with an object would expect to be able to call the const version on any Object. Seriously, it's not that hard a concept. I guess if something doesn't port well from C++ then redesign it. Some things done in C++ are hacks due to the language's limitations and faults. I was referring to a potential port from C#, which has no const. My particular data structure (a complex beast) contains a mutable tree of arbitrary size, which the user can convert to a conceptually immutable tree in O(1) time by calling Clone(). This marks a flag in the root node that says "read-only! do not change" and shares the root between the clones. At this point it should be safe to cast the clone to immutable. However, the original, mutable-typed version still exists. As the user requests changes to the mutable copy in the future, parts of the tree are duplicated to avoid changing the immutable nodes, with one exception: the read-only flag in various parts of the original, immutable tree will gradually be set to true. In this case, I don't think the D type system could do anything to help ensure that I don't modify the original tree that is supposed to be immutable. Since the static type of internal references must either be all mutable or all immutable, they will be typed mutable in the mutable copy, and immutable in the immutable copy, even though the two copies are sharing the same memory. And one flag, the read-only flag, must be mutable in this data structure, at least the transition from false->true must happen *after* the immutable copy is created; otherwise, Clone() would have to run in O(N) time, to mark every node read-only. This fact, however, does not affect the immutable copy in any way.
Re: Inherited const when you need to mutate
On Tuesday, 10 July 2012 at 02:43:05 UTC, Era Scarecrow wrote: On Tuesday, 10 July 2012 at 01:41:29 UTC, bearophile wrote: David Piepgrass: This use case is pretty complex, so if I port this to D, I'd probably just cast away const/immutable where necessary. You are not the first person that says similar things. So D docs need to stress more than casting away const/immutable in D is rather more dangerous than doing the same thing in C++. ... Let's say a class/struct is a book with Page protectors signifying 'const(ant)'. You promise to return the book to the library without making any changes; Although you promised you wouldn't make changes, you still take the Page protectors off and make make notes on the outer edges or make adjustments in the text, then return the book. Is this wise? This isn't C++. If something shouldn't change, then don't change it god damn it. If it needs to change it isn't const(ant) and shouldn't suggest it is. The difficulty, in case you missed it, is that somebody else (the Object class) says that certain functions are const, but in certain cases we really, really want to mutate something, either for efficiency or because "that's just how the data structure works". If a data structure needs to mutate itself when read, yeah, maybe its functions should not be marked const, but quite often the "const" is inherited from Object or some interface that (quite reasonably, it would seem) expects functions that /read stuff/ to be const. And yet we can't drop const from Object or such interfaces, because there is other code elsewhere that /needs/ const to be there. So far I have no solution to the dilemma in mind, btw. But the idea someone had of providing two (otherwise identical) functions, one const and one non-const, feels like a kludge to me, and note that anybody with an object would expect to be able to call the const version on any Object. Seriously, it's not that hard a concept. I guess if something doesn't port well from C++ then redesign it. Some things done in C++ are hacks due to the language's limitations and faults. I was referring to a potential port from C#, which has no const. My particular data structure (a complex beast) contains a mutable tree of arbitrary size, which the user can convert to a conceptually immutable tree in O(1) time by calling Clone(). This marks a flag in the root node that says "read-only! do not change" and shares the root between the clones. At this point it should be safe to cast the clone to immutable. However, the original, mutable-typed version still exists. As the user requests changes to the mutable copy in the future, parts of the tree are duplicated to avoid changing the immutable nodes, with one exception: the read-only flag in various parts of the original, immutable tree will gradually be set to true. In this case, I don't think the D type system could do anything to help ensure that I don't modify the original tree that is supposed to be immutable. Since the static type of internal references must either be all mutable or all immutable, they will be typed mutable in the mutable copy, and immutable in the immutable copy, even though the two copies are sharing the same memory. And one flag, the read-only flag, must be mutable in this data structure, at least the transition from false->true must happen *after* the immutable copy is created; otherwise, Clone() would have to run in O(N) time, to mark every node read-only. This fact, however, does not affect the immutable copy in any way.
Re: getNext
On Monday, 9 July 2012 at 07:53:41 UTC, David Piepgrass wrote: I don't know if this proposal went anywhere since 2010, but it occurs to me that there is a hidden danger here. alloca will allocate a sequence of separate temporaries. If the collection is large, the stack will overflow, and the client might not have a clue what happened. Amazing. My post unleashed four pages of comments and not one of them responded to my post :O I think Mehrdad is right that an in/out range should have its own name to distinguish it from an input range, but that doesn't necessarily mean that the same interface can't be used for both. I imagine a couple of advantages of: T tmp; for(T* front = r.getNext(ref tmp)) // do something with front instead of: for(; !r.empty; r.popFront()) // do something with r.front - If the range uses late-binding, getNext() is faster because you're only calling one function instead of 3. When I program in C#, I am quite irritated enough that IEnumerator requires 2 interface calls to get each item. Late binding, of course, is necessary across DLL boundaries and can help avoid code bloat. - If an input-only range has to unpack its elements (e.g. bit array => bool, or anything compressed), the range doesn't need to unpack repeatedly every time 'front' is accessed, nor does it need to reserve memory inside itself for a scratch area (you don't want scratch areas in every range if your app keeps track of thousands of ranges; plus, ranges tend to get passed by value, right?). That said, it may be unreasonable for the compiler to support the necessary escape analysis (impossible in case you're importing .di files)... and maybe the existing empty/popFront/front is too well established to reconsider? (I am not familiar with the status quo).
Inherited const when you need to mutate
On Monday, 9 July 2012 at 16:02:38 UTC, Timon Gehr wrote: On 07/09/2012 05:00 PM, H. S. Teoh wrote: On Mon, Jul 09, 2012 at 01:44:24PM +0200, Timon Gehr wrote: On 07/09/2012 08:37 AM, Adam Wilson wrote: Object is now const-correct throughout D. This has been a dream for many of you. Today it is a reality. PITA. Forced const can severely harm a code base that wants to be flexible -- it leaks implementation details and is infectuous. [...] Can you give an explicit example of code that is harmed by const correctness? 1. Most code that gives amortized complexity guarantees, eg: interface Map(K, V){ V opIndex(K k) const; // ... } class SplayTree(K, V) : Map!(K, V) { // ??? } 2. - hash table - opApply compacts the table if it is occupied too sparsely, in order to speed up further iteration. - toString iterates over all key/value pairs by the means of opApply. Clearly, toString cannot be const in this setup. 3. Often, objects can cache derived properties to speed up the code. With 'const-correctness' in place, such an optimization is not transparent nor doable in a modular way. I guess D does not have 'mutable' (like C++) to override const on methods? Caching anything slow-to-compute is my typical use case, and I know a hashtable design where the getter will move whatever value at finds to the front of a hash collision chain. Oh, and this is interesting, I implemented a B+tree-like data structure* in C# that supports O(1) cloning. It marks the root as "frozen", making it copy-on-write. In order to clone in O(1), the children are not marked as frozen until later, when someone actually wants to mutate one of the copies. A user can also make the tree immutable in O(1) time and freely make mutable copies of it. This use case is pretty complex, so if I port this to D, I'd probably just cast away const/immutable where necessary. C#, of course, has no const so it was never a concern there. *it's actually way fancier than that, I should really write a CodeProject article on it. Of course, the trouble is, you can call any const method on an immutable object, so any const method that mutates needs to be thread safe. Many uses of C++ 'mutable' are thread-safe (e.g. most platforms guarantee atomic pointer-size writes, right? So two threads can cache the same int or two equivalent class instances, and it doesn't matter who wins)... but many other cases are not (e.g. the hashtable). This is not a solved problem, is it. Ideas?
Re: Congratulations to the D Team!
Thanks for doing this! I haven't contributed yet, but it was worrisome hearing about various pull requests languishing for long periods. Now maybe I should go learn how to use git... On Monday, 9 July 2012 at 07:56:40 UTC, Jonathan M Davis wrote: As far as I'm concerned, 3.minutes() is a prime example of what's wong UFCS. UFCS can be very useful, but oh how I hate that syntax (completely aside from the particular function being called, I think that 3.anything() is horrible). But obviously not everyone agrees. Certainly not. C# has had this syntax since 1.0 (albeit not extension methods until v3.0, but IIRC you could always write 3.ToString() or 3.HashCode and, incidentally, int.Parse("3") etc. Ruby has it too (not UFCS per se, but you actually can add methods to any class including integers, IIRC)
Re: getNext
I've just had an idea that is so dark and devious, I was almost afraid to try it. But it works like a charm. Consider: T * getNext(R, E)(ref R range, ref E store = *(cast(E*) alloca(E.sizeof)) { ... } I don't know if this proposal went anywhere since 2010, but it occurs to me that there is a hidden danger here. alloca will allocate a sequence of separate temporaries. If the collection is large, the stack will overflow, and the client might not have a clue what happened.
Re: run-time stack-based allocation
On Thursday, 10 May 2012 at 03:03:22 UTC, Andrei Alexandrescu wrote: On 5/9/12 3:17 PM, Tove wrote: On Tuesday, 8 May 2012 at 07:03:35 UTC, Gor Gyolchanyan wrote: Cool! Thanks! I'l definitely check it out! I hope it's DDOCed :-D I just invented an absolutely wicked way of using alloca() in the parent context... auto Create(void* buf=alloca(frame_size)) Yah, me too. http://forum.dlang.org/thread/i1gnlo$18g0$1...@digitalmars.com#post-i1gql2:241k6o:241:40digitalmars.com I found it by googling for my name and "dark" and "devious" :o). That is so awesome that it can't possibly be legal by the spec! This "runtime struct" sounds really cool too. Pinch me, I must be dreaming :D
Re: Why not all statement are expressions ?
int[void] intSet = [2:(), 3:(), 4:()] oops, void[int] intSet = [2:(), 3:(), 4:()] rather.
Re: Why not all statement are expressions ?
I'm usually fairly ambivalent about the idea of statements being expressions, but I would *love* for switch to be usable as an expression. For instance, in Haxe, you can do stuff like the following, which I get a ton of use out of and often wish D had: a = switch(b) { case 1: "foo"; case 2: "bar"; case 3: "baz"; case 4: "whee"; default: "blork"; } The D equivalents aren't terrible, but they aren't nearly as nice. This won't work anyway. We are talking about language grammar here. If made expression, statement would be of type void. Just like assert is. I see what you're saying, but this switch expression should really be of type string. I certainly wish more things were expressions. "a = if (x) y; else z;" isn't especially useful since we have "a = x ? y : z", but consider instead something that doesn't map so easily to an expression: // very loosely based on some Android code I wrote recently dpWidth = _lastKnownWidth = if (window.isVisible()) { auto m = context.getResources().getSystemMetrics(); // final statement as value of "if" expr window.getWidth() / m.pixelDensity(); } else if (_lastKnownWidth != 0) _lastKnownWidth; else screenInfo().getWidth(); Or how about: auto area = { auto tmp = foo.bar(baz); tmp.width * tmp.height; } I also wish "void" were a first-class type with sizeof==0 for maximum efficiency: int[void] intSet = [2:(), 3:(), 4:()] Ditto for size of empty structs. D code should never need abominations like the C++ EBCO.
Re: Let's stop parser Hell
On Sunday, 8 July 2012 at 21:22:39 UTC, Roman D. Boiko wrote: On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote: It's been too long since I was actively working on parsers to give any details, but it is my understanding that because a hand-written parser is optimized for a specific grammar, it's going to be faster. My aim is to find out any potential bottlenecks and ensure that those are possible to get rid of. So, let's try. I believe it would not hurt generality or quality of a parser generator if it contained sews for inserting custom (optimized) code where necessary, including those needed to take advantage of some particular aspects of D grammar. Thus I claim that optimization for D grammar is possible. I'm convinced that the output of a parser generator (PG) can be very nearly as fast as hand-written code. ANTLR's output (last I checked) was not ideal, but the one I planned to make (a few years ago) would have produced faster code. By default the PG's output will not be the speed of hand-written code, but the user can optimize it. Assuming an ANTLR-like PG, the user can inspect the original output looking for inefficient lookahead, or cases where the parser looks for rare cases before common cases, and then improve the grammar and insert ... I forget all the ANTLR terminology ... syntactic predicates or whatever, to optimize the parser. So far discussion goes in favor of LL(*) parser like ANTLR, which is top-down recursive-descent. Either Pegged will be optimized with LL(*) algorithms, or a new parser generator created. Right, for instance I am interested in writing a top-down PG because I understand them better and prefer the top-down approach due to its flexibility (semantic actions, allowing custom code) and understandability (the user can realistically understand the output; in fact readability would be a specific goal of mine) Roman, regarding what you were saying to me earlier: In stage 2 you have only performed some basic analysis, like, e.g., matched braces to define some hierarchy. This means that at the time when you find a missing brace, for example, you cannot tell anything more than that braces don't match. Stage 2 actually can tell more than just "a brace is missing somewhere". Because so many languages are C-like. So given this situation: frob (c &% x) blip # gom; } It doesn't need to know what language this is to tell where the brace belongs. Even in a more nebulous case like: frob (c &% x) bar @ lic blip # gom; } probably the brace belongs at the end of the first line. Perhaps your point is that there are situations where a parser that knows the "entire" grammar could make a better guess about where the missing brace/paren belongs. That's certainly true. However, just because it can guess better, doesn't mean it can reinterpret the code based on that guess. I mean, I don't see any way to "back up" a parser by an arbitrary amount. A hypothetical stage 2 would probably be hand-written and could realistically back up and insert a brace/paren anywhere that the heuristics dictate, because it is producing a simple data structure and it doesn't need to do any semantic actions as it parses. A "full" parser, on the other hand, has done a lot of work that it can't undo, so the best it can do is report to the user "line 54: error: brace mismatch; did you forget a brace on line 13?" The heuristic is still helpful, but it has already parsed lines 13 to 54 in the wrong context (and, in some cases, has already split out a series of error messages that are unrelated to the user's actual mistake). As I demonstrated in some examples, it could get the output which implies incorrect structure I was unable to find the examples you refer to... this thread's getting a little unweildy :)
Re: Let's stop parser Hell
Yeah, with a tree-transforming parser, I imagine the same thing, except my current [fantasy] is to convert a certain subset of D to multiple other languages automatically. Then I could write libraries that can easily be used by an astonishingly large audience. I certainly would like to see D targetting Android, but that's best done directly from D to ARM. That does sound very cool. Possibly difficult though, due to having to cater to the lowest-common-denominator in all of your API designs. No templated functions or ranges in your API, that's for sure. I'm sure there are some things where this is very doable though; it probably depends on what kind of libraries you are writing. Well, for templates, in general, it would be necessary to instantiate a particular set of templates and explicitly give them names in the target language. So for instance, I could define a Point!T struct in D, sure, but then I'd have to tell the language converter to create target-language specializations: in C#, PointD=Point!double, PointI=Point!int, etc. If the target were C++, the template could be translated to a C++ template, Point, as long as there aren't any "static ifs" or other things that can't be translated. Notably, if a template P!T depends on another template Q!T, then P!T cannot be translated to a C++/C# P unless Q!T was also translated as Q. Adapting standard libraries could no doubt be a gigantic problem. I don't know how to begin to think about doing that. But for ranges in particular, I think the concept is too important to leave out of public interfaces. So I'd port the major range data structures to the target languages, most likely by hand, so that they could be used by converted code. As for D targeting Android, my intent is really to target X where X is any CPU/OS combo you can think of. I want to be able to get D, the language, not necessarily phobos or other niceties, to work on any platform, and to do so without much work. Cross-compiling to a new platform that has never been cross-compiled before should require zero coding. I understand. Conversion to C is an effective last resort. And, well, I hear a lot of compilers have even used it as a standard practice. I guess you'd be stuck with refcounting, though. I think that the D-directly-to-ARM is the current approach for cross-compiling. I critique it for its underwhelming lack of results. Yeah. I assume it involves weird object-file formats, calling conventions and ABIs. I guess very few want to get involved with that stuff, and very few have the slightest clue where to begin, myself included. (2) suffer from integration problems if you try to compile the expressions in separate files before compiling the rest of the front-end. Absolutely, I love language-integrated metaprogramming. Without it you end up with complicated build environments, and I hate those, cuz there isn't a single standard build environment that everybody likes. I think people should be able to just load up their favorite IDE and add all source files to the project and It Just Works. Or on the command line, do dmd *.d or whatever. Oh, and the ability to run the same code at meta-compile-time, compile-time and run-time, also priceless.
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 22:35:37 UTC, Roman D. Boiko wrote: On Saturday, 7 July 2012 at 22:25:00 UTC, David Piepgrass wrote: This is all true, but forgetting a brace commonly results in a barrage of error messages anyway. Code that guesses what you meant obviously won't work all the time, and phase 3 would have to take care not to emit an error message about a "{" token that doesn't actually exist (that was merely "guessed-in"). But at least it's nice for a parser to be /able/ to guess what you meant; for a typical parser it would be out of the question, upon detecting an error, to back up four source lines, insert a brace and try again. So you simply admit that error recovery is difficult to implement. For me, it is a must-have, and thus throwing away information is bad. I'm not seeing any tremendous error-handling difficulty in my idea. Anyway, I missed the part about information being thrown away...?
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 22:07:02 UTC, Roman D. Boiko wrote: On Saturday, 7 July 2012 at 21:52:09 UTC, David Piepgrass wrote: it seems easier to tell what the programmer "meant" with three phases, in the face of errors. I mean, phase 2 can tell when braces and parenthesis are not matched up properly and then it can make reasonable guesses about where those missing braces/parenthesis were meant to be, based on things like indentation. That would be especially helpful when the parser is used in an IDE, since if the IDE guesses the intention correctly, it can still understand broken code and provide code completion for it. And since phase 2 is a standard tool, anybody's parser can use it. There could be multiple errors that compensate each other and make your phase 2 succeed and prevent phase 3 from doing proper error handling. Even knowing that there is an error, in many cases you would not be able to create a meaningful error message. And any error would make your phase-2 tree incorrect, so it would be difficult to recover from it by inserting an additional token or ignoring tokens until parser is able to continue its work properly. All this would suffer for the same reason: you loose information. This is all true, but forgetting a brace commonly results in a barrage of error messages anyway. Code that guesses what you meant obviously won't work all the time, and phase 3 would have to take care not to emit an error message about a "{" token that doesn't actually exist (that was merely "guessed-in"). But at least it's nice for a parser to be /able/ to guess what you meant; for a typical parser it would be out of the question, upon detecting an error, to back up four source lines, insert a brace and try again.
Re: Let's stop parser Hell
What I like about it is not its performance, but how it matches the way we think about languages. Humans tend to see overall structure first, and examine the fine details later. The tree parsing approach is similarly nonlinear and can be modularized in a way that might be more intuitive than traditional EBNF. That reminds me, I forgot to write a another advantage I expected the three-phase approach to have, namely, that it seems easier to tell what the programmer "meant" with three phases, in the face of errors. I mean, phase 2 can tell when braces and parenthesis are not matched up properly and then it can make reasonable guesses about where those missing braces/parenthesis were meant to be, based on things like indentation. That would be especially helpful when the parser is used in an IDE, since if the IDE guesses the intention correctly, it can still understand broken code and provide code completion for it. And since phase 2 is a standard tool, anybody's parser can use it. Example: void f() { if (cond) x = y + 1; y = z + 1; } } // The error appears to be here, but it's really 4 lines up.
Re: Let's stop parser Hell
On Saturday, 7 July 2012 at 20:39:18 UTC, Roman D. Boiko wrote: On Saturday, 7 July 2012 at 20:26:07 UTC, David Piepgrass wrote: I'd like to add that if we give tree parsing first-class treatment, I believe the most logical approach to parsing has three or more stages instead of the traditional two (lex+parse): 1. Lexer 2. Tree-ification 3. Parsing to AST (which may itself use multiple stages, e.g. parse the declarations first, then parse function bodies later) The new stage two simply groups things that are in parenthesis and braces. So an input stream such as the following: I bet that after stage 2 you would have performed almost the same amount of work (in other words, spent almost the same time) as you would if you did full parsing. After that you would need to iterate the whole tree (possibly multiple times), modify (or recreate if the AST is immutable) its nodes, etc. Altogether this might be a lot of overhead. My opinion is that tree manipulation is something that should be available to clients of parser-as-a-library or even of parser+semantic analyzer, but not necessarily advantageous for parser itself. Hmm, you've got a good point there, although simple tree-ification is clearly less work than standard parsing, since statements like "auto x = y + z;" would be quickly "blitted" into the same node in phase 2, but would become multiple separate nodes in phase 3. What I like about it is not its performance, but how it matches the way we think about languages. Humans tend to see overall structure first, and examine the fine details later. The tree parsing approach is similarly nonlinear and can be modularized in a way that might be more intuitive than traditional EBNF. On the other hand, one could argue it is /too/ flexible, admitting so many different approaches to parsing that a front-end based on this approach is not necessarily intuitive to follow; and of course, not using a standard EBNF-type grammar could be argued to be bad. Still... it's a fun concept, and even if the initial parsing ends up using the good-old lex-parse approach, semantic analysis and lowering can benefit from a tree parser. Tree parsing, of course, is just a generalization of linear parsing and so a tree parser generator (TPG) could work equally well for flat input.
Re: Let's stop parser Hell
Since I didn't understand your question I assume that my statement was somehow incorrect (likely because I made some wrong assumptions about ANTLR). I didn't know about its existence until today and still don't understand it completely. What I think I understood is that it uses DFA for deciding which grammar rule to apply instead of doing backtracking. I also think that it uses DFA for low-level scanning (I'm not sure). ANTLR 3 doesn't use a DFA unless it needs to. If unlimited lookahead is not called for, it uses standard LL(k) or perhaps it uses the modified (approximate? I forget the name) LL(k) from ANTLR 2. DFA comes into play, for instance, if you need to check what comes after an argument list (of, unlimited, length) before you can decide that it *is* an argument list and start the "real" parsing (The author says LL(k) is too inefficient so he used a restricted form of it; personally I'm not convinced, but I digress)
Re: Let's stop parser Hell
auto captures = syntaxNode.matchNodes( TOK_WHILE_NODE, OP_ENTER_NODE, OP_CAPTURE(0), OP_BEGIN, TOK_EXPRESSION, OP_END, OP_CAPTURE(1), OP_BEGIN, TOK_STATEMENT, OP_END, OP_LEAVE_NODE); I'm glad to hear you like the tree-parsing approach, Chad, although the particular syntax here looks pretty unfriendly :O -- does this represent something that you are working on right now? This kind of architecture leads to other interesting benefits, like being able to assert which symbols a pattern is designed to handle or which symbols are allowed to exist in the AST at any point in time. Thus if you write a lowering that introduces nodes that a later pass can't handle, you'll know very quickly, at least in principle. I wanted to make such a front-end so that I could easily make a C backend. I believe such a compiler would be able to do that with great ease. I really want a D compiler that can output ANSI C code that can be used with few or no OS/CPU dependencies. I would be willing to lose a lot of the nifty parallelism/concurrency stuff and deal with reference counting instead of full garbage collection, as long as it lets me EASILY target new systems (any phone, console platform, and some embedded microcontrollers). Then what I have is something that's as ubiquitous as C, but adds a lot of useful features like exception handling, dynamic arrays, templates, CTFE, etc etc. My ideas for how to deal with ASTs in pattern recognition and substitution followed from this. I tend to agree that it would be better to have a "general" node class with the node type as a property rather than a subtype and rather than a myriad of independent types, although in the past I haven't been able to figure out how to make this approach simultaneously general, efficient, and easy to use. I'm kind of a perfectionist which perhaps holds me back sometimes :) I'd like to add that if we give tree parsing first-class treatment, I believe the most logical approach to parsing has three or more stages instead of the traditional two (lex+parse): 1. Lexer 2. Tree-ification 3. Parsing to AST (which may itself use multiple stages, e.g. parse the declarations first, then parse function bodies later) The new stage two simply groups things that are in parenthesis and braces. So an input stream such as the following: A man (from a [very ugly] house in the suburbs) was quoted as saying { I saw Batman (and Robin) last night! } Is converted to a tree where everything parenthesized or braced gets to be a child: A man ( from a [ very ugly ] house in the suburbs ) was quoted as saying { ... } Some of the things I like about this approach are: 1. It's language-agnostic. Lots of languages and DSLs could re-use exactly the same code from stage 2. (Stage 1, also, is fairly similar between languages and I wonder if a parameterized standard lexer is a worthwhile pursuit.) 2. It mostly eliminates the need for arbitrary-length lookahead for things like D's template_functions(...)(...). Of course, the tokens will almost always end up getting scanned twice, but hey, at least you know you won't need to scan them more than twice, right? (er, of course the semantic analysis will scan it several times anyway. Maybe this point is moot.) 3. It is very efficient for tools that don't need to examine function bodies. Such tools can easily leave out that part of the parser simply by not invoking the function-body sub-parser. 4. It leaves open the door to supporting embedded DSLs in the future. It's trivial to just ignore a block of text in braces and hand it off to a DSL later. It is similar to the way PEGs allow several different parties to contribute parts of a grammar, except that this approach does not constrain all the parties to actually use PEGs; for instance if I am a really lazy DSL author and I already have a SQL parser laying around (whether it's LL(k), LALR, whatever), I can just feed the original input text to that parser (or, better, use the flat token stream, sans comments, that came out of the lexer.) 5. It's risky 'cause I've never heard of anyone taking this approach before. Bring on the danger! I have observed that most PLs (Programming Langs) use one of two versions of stage 2: (1) C-style, with structure indicated entirely with {}, (), [], and possibly <> (shudder), or (2) Python-style, with structure indicated by indentation instead of {}. My favorite is the Boo language, which combines these two, using Python style by default, but also having a WSA parsing mode (whitespace-agnostic) with braces, and switching to WSA mode inside a Python-style module whenever the user uses an opener ("(,{,["
Re: Let's stop parser Hell
Note that PEG does not impose to use packrat parsing, even though it was developed to use it. I think it's a historical 'accident' that put the two together: Bryan Ford thesis used the two together. Interesting. After trying to use ANTLR-C# several years back, I got disillusioned because nobody was interested in fixing the bugs in it (ANTLR's author is a Java guy first and foremost) and the source code of the required libraries didn't have source code or a license (wtf.) So, for awhile I was thinking about how I might make my own parser generator that was "better" than ANTLR. I liked the syntax of PEG descriptions, but I was concerned about the performance hit of packrat and, besides, I already liked the syntax and flexibility of ANTLR. So my idea was to make something that was LL(k) and mixed the syntax of ANTLR and PEG while using more sane (IMO) semantics than ANTLR did at the time (I've no idea if ANTLR 3 still uses the same semantics today...) All of this is 'water under the bridge' now, but I hand-wrote a lexer to help me plan out how my parser-generator would produce code. The output code was to be both more efficient and significantly more readable than ANTLR's output. I didn't get around to writing the parser-generator itself but I'll have a look back at my handmade lexer for inspiration. However, as I found a few hours ago, Packrat parsing (typically used to handle PEG) has serious disadvantages: it complicates debugging because of frequent backtracking, it has problems with error recovery, and typically disallows to add actions with side effects (because of possibility of backtracking). These are important enough to reconsider my plans of using Pegged. I will try to analyze whether the issues are so fundamental that I (or somebody else) will have to create an ANTLR-like parser instead, or whether it is possible to introduce changes into Pegged that would fix these problems. I don't like the sound of this either. Even if PEGs were fast, difficulty in debugging, error handling, etc. would give me pause. I insist on well-rounded tools. For example, even though LALR(1) may be the fastest type of parser (is it?), I prefer not to use it due to its inflexibility (it just doesn't like some reasonable grammars), and the fact that the generated code is totally unreadable and hard to debug (mind you, when I learned LALR in school I found that it is possible to visualize how it works in a pretty intuitive way--but debuggers won't do that for you.) While PEGs are clearly far more flexible than LALR and probably more flexible than LL(k), I am a big fan of old-fashioned recursive descent because it's very flexible (easy to insert actions during parsing, and it's possible to use custom parsing code in certain places, if necessary*) and the parser generator's output is potentially very straightforward to understand and debug. In my mind, the main reason you want to use a parser generator instead of hand-coding is convenience, e.g. (1) to compress the grammar down so you can see it clearly, (2) have the PG compute the first-sets and follow-sets for you, (3) get reasonably automatic error handling. * (If the language you want to parse is well-designed, you'll probably not need much custom parsing. But it's a nice thing to offer in a general-purpose parser generator.) I'm not totally sure yet how to support good error messages, efficiency and straightforward output at the same time, but by the power of D I'm sure I could think of something... I would like to submit another approach to parsing that I dare say is my favorite, even though I have hardly used it at all yet. ANTLR offers something called "tree parsing" that is extremely cool. It parses trees instead of linear token streams, and produces other trees as output. I don't have a good sense of how tree parsing works, but I think that some kind of tree-based parser generator could become the basis for a very flexible and easy-to-understand D front-end. If a PG operates on trees instead of linear token streams, I have a sneaky suspicion that it could revolutionize how a compiler front-end works. Why? because right now parsers operate just once, on the user's input, and from there you manipulate the AST with "ordinary" code. But if you have a tree parser, you can routinely manipulate and transform parts of the tree with a sequence of independent parsers and grammars. Thus, parsers would replace a lot of things for which you would otherwise use a visitor pattern, or something. I think I'll try to sketch out this idea in more detail later.
Re: Let's stop parser Hell
Resume: everybody is welcome to join effort of translating DMD front end, and improving Pegged. Also I would like to invite those interested in DCT project to help me with it. Right now I'm trying to understand whether it is possible to incorporate Pegged inside without losing anything critical (and I think it is very likely possible), and how exactly to do that. Dmitry proposed to help improve Pegged (or some other compiler's) speed. Anyone else? I'd really want to create a task force on this, it is of strategic importance to D. In Walter's own words, no new feature is going to push us forward since we're not really using the great goodies we've got, and CTFE technology is the most important. Hi everybody! My name's David and I've been dreaming since around 1999 of making my own computer language, and never found the time for it. The first time I looked at D it was around 2004 or so, and it just looked like a "moderately better C++" which I forgot about, having more lofty ideas. When I found out about D2's metaprogramming facilities I instantly became much more interested, although I still wish to accomplish more than is possible ATM. I've been talking to my boss about reducing my working hours, mainly in order to have time to work on something related to D. My goal is to popularize a language that is efficient (as in runtime speed and size), expressive, safe, concise, readable, well-documented, easy-to-use, and good at finding errors in your code. In other words, I want a language that is literally all things to all people, a language that is effective for any task. I want to kill off this preconceived notion that most programmers seem to have, that fast code requires a language like C++ that is hard to use. The notion that Rapid Application Development is incompatible with an efficient executable is nonsense and I want to kill it :) To be honest I have some reservations about D, but of all the languages I know, D is currently closest to my ideal. This work on parsers might be a good place for me to dive in. I have an interest in parsers and familiarity with LL, LALR, PEGs, and even Pratt parsers (fun!), but I am still inexperienced. I also like writing documentation and articles, but I always find it hard to figure out how other people's code works well enough to document it. I'm having some trouble following this thread due to the acronyms: CTFE, DCT, AA. At least I managed to figure out that CTFE is Compile Time Function Execution. DCT and AA I already know as Discrete Cosine Transform and Anti-Aliasing, of course but what's it mean to you? One thing that has always concerned me about PEGs is that they always say PEGs are slower than traditional two-phase LALR(1) or LL(k) parsers. However, I have never seen any benchmarks. Does anyone know exactly how much performance you lose in an (optimized) PEG compared to an (optimized) LALR/LL parser + LL/regex lexer? Anyway, it's the weekend, during which I hope I can find a place to fit in with you guys.
Re: Proposal: takeFront and takeBack
(grain of salt, I'm new to D.) I'd vote for consumeFront being always available, because it's distinctly more convenient to call one function instead of two, especially when you expect that making a copy of front is cheap (e.g. a collection of pointers, numbers or slices). Ranges where 'front' returns a pointer to a buffer that popFront destroys (overwrites) are surely in the minority, right? So I hope they would be retrofitted to support consumeFront. But, given that popFront is allowed to be destructive to the value of front, by re-using the same buffer (and that the proposed consumeFront might also be implemented with 'delayed destruction' to front), I am wondering how one is supposed to implement generic code correctly when this is unacceptable, e.g.: void append(Range1,Range2)(Range1 input, ref Range2 output) { // Usually works, unless input.popFront happens to be destructive? foreach(e; input) output ~= e; }