from:"David Piepgrass"

Re: D Ranges in C#

2013-06-01 Thread David Piepgrass

You shouldn't be using 32-bit indices on x64, that defeats the 
whole

point of x64.


As of .NET 4.5, 64bit array indexes are supported as well.

http://msdn.microsoft.com/en-us/library/hh285054.aspx


Don't forget that we're talking about a *hashtable* here. If a 
.NET hashtable used 64-bit indexes (or pointers) it would require 
8-12 bytes more memory per entry, specifically 32 bytes total, 
including overhead, if the key and value are 4 bytes each.


An in-memory hashtable that requires 64-bit indexes rather than 
32 bits would have to contain over 4 billion entries which would 
take at least 128 GB of RAM, assuming 8 bytes for each key-value 
pair!!! In fact it's worse than that, as the dictionary grows by 
size-doubling and contains a certain amount of unused entries at 
the end.


No thanks, I'd rather save those 8 bytes and accept the 4 billion 
limit, if you don't mind.

Re: D Ranges in C#

2013-06-01 Thread David Piepgrass


David Piepgrass:
In fact, most STL algorithms require exactly two iterators--a 
range--and none require only a single iterator<


I think there are some C++ data structures that store many 
single iterators. If you instead store ranges you double the 
data amount.


Hashmaps would be the most common example. Usually implemented 
as a linked list of key-value pairs along with a vector of list 
iterators.


In theory. But the .NET hashtables are implemented with an 
*array* of key-value pairs and an array of *indexes*. The former 
forms a virtual linked list that is more efficient than a 
traditional linked list, and the latter is more efficient than a 
vector of iterators (especially on x64, as the indexes can be 
32-bit.)


Iterators are also useful for constructing sub-ranges, which 
proves useful in the implementation of some algorithms. Writing 
std::next_permutation in D with ranges is quiet frustrating 
compared to C++.


https://github.com/D-Programming-Language/phobos/blob/master/std/algorithm.d#L10901
http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01045_source.html#l03619

Notice that in D you need to maintain a count of the elements 
popped off the back and then use takeExactly to retrieve that 
portion again. In C++ you just move the iterators around and 
create "ranges" from pairs of iterators as you need them.


When I implemented nextPermutation in D, I constantly felt as 
if I was fighting with ranges instead of working with them - I 
knew exactly what I wanted to do, but ranges only provide a 
roundabout way of doing it.


Hmm, very interesting. I suppose the problem with C++ iterators 
is that they are useless without external context: you can't 
increment or decrement one without also comparing it to begin() 
or end() in its container, which implies that the caller must 
manually keep track of which container it came from. Thus, an 
iterator is hardly an improvement over a plain-old integer index, 
the only advantages being


1. You can dereference it (*if* you can be sure it doesn't point 
to end())
2. Unlike an index, it's compatible with non-random-access data 
structures


But perhaps the iterator concept could be improved by being made 
self-aware: if the iterator "knew" and could tell you when it was 
at the beginning or end of its container. This would increase the 
storage requirement in some circumstances but not others. For 
example, an iterator to a doubly-linked list node can tell when 
it is at the beginning or end, but an iterator to a singly-linked 
list node can only tell when it is at the end. A pointer inside 
an array may or may not be able to tell if it is at the end 
depending on how arrays work, e.g. perhaps the way D heap arrays 
work would allow an array iterator, implemented as a simple 
pointer, to still know when it is safe to increment and decrement.


The simplest possible .NET array iterator is an array reference 
plus an index, and again the iterator can know when it is at the 
beginning or end of the array--except that if the iterator came 
from inside a range, it would be unaware of the range's 
boundaries, which may be smaller than the array's boundaries.

D Ranges in C#

2013-05-31 Thread David Piepgrass

I'm adding D-style ranges to my new C# collections library. In 
case anyone would like to comment, please see here:


http://loyc-etc.blogspot.ca/2013/06/d-style-ranges-in-c-net.html

Re: Java binaries

2013-02-19 Thread David Piepgrass


So how are C++ and C# pointers done in IL ?


There are two kind of pointers in C#: managed and unmanaged. 
Wrapped in a fixed statement (just to tell the garbage 
collector to keep fixed references), C# pointers will behave 
like any native language pointer. This is not the first topic 
where I read that misconception that slices are a problem for 
IL. From .net 2.0 (9 years ago) there is the ArraySegment 
type doing exactly what D slices do. Also, in C# arrays are 
implicitely convertible to pointers.


IIRC, the biggest incompatibility between D and .NET is that D 
pointers can point to the stack, to unmanaged (non-GC) memory or 
to managed (GC) memory, while simultaneously having unlimited 
lifetime. In .NET, arguments that are passed by reference can 
point to GC or non-GC memory, but pointers inside objects 
(classes or boxed structs) can only point to (1) non-GC non-stack 
memory OR (2) the beginning of a GC object. The key problem: a 
single pointer cannot be used for both purposes!


D pointers and ranges can point to stack, GC or non-GC memory, 
regardless of the location of the pointer or range itself. Also, 
D pointers can point to the interior of an object and not just 
the beginning, while .NET pointers, in general, cannot.


This doesn't make a D implementation for .NET impossible, but if 
you want to run arbitrary D code on .NET, I think it would have 
to run inefficiently because it would constantly have to work 
around limitations of .NET. I think doing D in .NET efficiently 
would require a specialized version of D, or something.


Note that plain old C++ can be used in .NET because C++ pointers 
can't point to GC memory without special .NET-specific types. 
Thus, old-fashioned C++ avoids problems related to the .NET 
garbage collector.


Anyway, I don't see any use for a D IL compiler, since probably 
the language syntax will look 90% like C#.


How is "looking" like C# relevant? D looks 90% like C++ too, and 
D is still better. Certainly D is more powerful than C# on the 
whole.

Re: small idea

2013-01-09 Thread David Piepgrass


On Wednesday, 9 January 2013 at 15:10:47 UTC, bearophile wrote:

eles:


void make_a_equal_to_b(??a,!!b); //I give you "b", give me "a"


A saner syntax is just to use the same in/out/inout keywords at 
the call point, This is essentially what C# does (with one 
exception for COM):


make_a_equal_to_b(ref a, in b)

This feature was discussed several times in past for D. The 
advantage is more readability for the code and less surprises. 
The disadvantages are more typing, some code breakage of D2 
code (because even if at the beginning it's a warning, and 
later a deprecation, you will eventually need to enforce it 
with an error).


The same thing happens every time this is discussed: some people 
insist "ref" and "out" should be REQUIRED or else it should not 
be ALLOWED. Others don't want to break backward compatibility so 
they insist it can't be required. There is no common ground so it 
never gets to the "allowed" stage.


In C++ I actually use call site ref for & params even with no 
checking at all:

#define IN
#define OUT
in fact these are defined in a Microsoft header. I find them 
useful for documentation.


Again, my proposal is that the compiler should allow ref/out and 
not warn when it is missing; if users want a warning/error for 
missing ref/out, they can ask for it per-module with a pragma (or 
something).


One more small D-specific problem is what to do if the first 
argument is a ref and you want to use UCFS.


In the past I suggested allowing implicit ref for structs (but 
probably not for classes) with UFCS, because the "this" parameter 
of a member function of a struct is passed by ref already.

Re: the Disruptor framework vs The Complexities of Concurrency

2012-12-12 Thread David Piepgrass

Maybe, but I'm still not clear what are the differences between 
a normal ring buffer (not a new concept) and this "disruptor" 
pattern..


Key differences with a typical lock-free queue:
- Lightning fast when used correctly. It observes that not only 
is locking expensive, even CAS (compare and swap) is not cheap, 
so it avoids CAS in favor of memory barriers (unless multiple 
writers are required.) Memory allocation is avoided too, by 
preallocating everything.
- Multicast and multisource: multiple readers can view the same 
entries.
- Separation of concerns: disruptors are a whole library instead 
of a single class, so disruptors support several configurations 
of producers and consumers, as opposed to a normal queue that is 
limited to one or two arrangements. To me, one particularly 
interesting feature is that a reader can modify an entry and then 
another reader can flag itself as "dependent" on the output of 
the first reader. So really it supports not just readers and 
writers but "annotators" that both read an write. And the set of 
readers and writers can be arranged as a graph.


See also
http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work

Re: OT (partially): about promotion of integers

2012-12-12 Thread David Piepgrass

On Wednesday, 12 December 2012 at 06:19:14 UTC, Walter Bright 
wrote:
You're not going to get performance with overflow checking even 
with the best compiler support. For example, much arithmetic 
code is generated for the x86 using addressing mode 
instructions, like:


LEA EAX,16[8*EBX][ECX]  for 16+8*b+c

The LEA instruction does no overflow checking. If you wanted 
it, the best code would be:


MOV EAX,16
IMUL EBX,8
JO overflow
ADD EAX,EBX
JO overflow
ADD EAX,ECX
JO overflow

Which is considerably less efficient. (The LEA is designed to 
run in one cycle). Plus, often more registers are modified 
which impedes good register allocation.


Thanks for the tip. Of course, I don't need and wouldn't use 
overflow checking all the time--in fact, since I've written a big 
system in a language that can't do overflow checking, you might 
say I "never need" overflow checking, in the same way that C 
programmers "never need" constructors, destructors, generics or 
exceptions as demonstrated by the fact that they can and do build 
large systems without them.


Still, the cost of overflow checking is a lot bigger, and 
requires a lot more cleverness, without compiler support. Hence I 
work harder to avoid the need for it.


If you desire overflows to be programming errors, then you want 
an abort, not a thrown exception. I am perplexed by your desire 
to continue execution when overflows happen regularly.


I explicitly say I want to handle overflows quickly, and you 
conclude that I want an unrecoverable abort? WTF! No, I think 
overflows should be handled efficiently, and should be nonfatal.


Maybe it would be better to think in terms of the carry flag: it 
seems to me that a developer needs access to the carry flag in 
order to do 128+bit arithmetic efficiently. I have written code 
to "make do" without the carry flag, it's just more efficient if 
it can be used. So imagine an intrinsic that gets the value of 
the carry flag*--obviously it wouldn't throw an exception. I just 
think overflow should be handled the same way. If the developer 
wants to react to overflow with an exception/abort, fine, but it 
should not be mandatory as it is in .NET.


* Yes, I know you'd usually just ADC instead of retrieving the 
actual value of the flag, but sometimes you do want to just get 
the flag.


Usually when there is an overflow I just want to discard one data 
point and move on, or set the result to the maximum/minimum 
integer, possibly make a note in a log, but only occasionally do 
I want the debugger to break.

Re: OT (partially): about promotion of integers

2012-12-11 Thread David Piepgrass

The problem, as I see it, is nobody actually cares about this. 
Why would I say something so provocative? Because I've seen D 
programmers go to herculean lengths to get around problems they 
are having in the language. These efforts make a strong case 
that they need better language support (UDAs are a primo 
example of this). I see nobody bothering to write a CheckedInt 
type and seeing how far they can push it, even though writing 
such a type is not a significant challenge; it's a 
bread-and-butter job.


I disagree with the analysis. I do want overflow detection, yet I 
would not use a CheckedInt in D for the same reason I do not 
usually use one in C++: without compiler support, it is too 
expensive to detect overflow. In my C++ I have a lot of math to 
do, and I'm using C++ because it's faster than C# which I would 
otherwise prefer. Constantly checking for overflow without 
hardware support would kill most of the performance advantage, so 
I don't do it.


I do use "clipped conversion" though: e.g. 
ClippedConvert(4)==32767. I can afford the overhead in 
this case because I don't do type conversions as often as 
addition, bit shifts, etc.


The C# solution is not good enough either. C# throws exceptions 
on overflow, which is convenient but is bad for performance if it 
happens regularly; it can also make a debugger almost unusable. 
Some sort of mechanism that works like an exception, but faster, 
would probably be better. Consider:


result = a * b + c * d;

If a * b overflows, there is probably no point to executing c * d 
so it may as well jump straight to a handler; on the other hand, 
the exception mechanism is costly, especially if the debugger is 
hooked in and causes a context switch every single time it 
happens. So... I dunno. What's the best semantic for an overflow 
detector?

Re: References in D

2012-10-06 Thread David Piepgrass


void main()
{
void* x = a(b());
c();
while(goobledegook)
{
x = p();
d(x);
}
e(x); /+ Crash! x is null. +/
}

Where did x's null value come from?  Not a. Not p; the while 
loop happened to be never executed.  To say "b" would be 
closer, but still imprecise.  Actually it was created in the 
q() function that was called by u() that was called by b() 
which then created a class that held the null value and was 
passed to a() that then dereferenced the class and returned the 
value stored in the class that happened to be null.  nulls 
create very non-local bugs, and that's why they frustrate me to 
no end sometimes.


Since this thread's attracted lots of commotion I thought I'd 
just drop by and +1 for non-nullable types, and +1 for your 
arguments.


I keep wondering, though, if it is 'enough' to solve the null 
problem, or if it would be possible to devise a more general 
mechanism for solving other problems too, like say, the fact that 
certain integers have to always be positive, or if you want to go 
more general, that a certain relationship must hold between two 
structures...


Not having used D's invariants so far (well, I haven't used D 
itself for a real project actually)... what's stopping D's 
invariant mechanism from handling all this?


http://dlang.org/class.html#invariants (as is typical of D 
documentation, this says nothing about invariants on structs, but 
the page about structs says that they support invariants with an 
X.)


I mean, problems are detected at runtime this way, and slightly 
too late, but still, it would be better than most popular 
languages that can't do anything about nulls at all. Since D's 
devs don't even seem to have enough time to implement D as 
described in TDPL (published more than two years ago), I wouldn't 
expect to see this feature in the D language in the near future.

Re: Feature request: extending comma operator's functionality

2012-10-05 Thread David Piepgrass

Because it's the only way to guarantee that x exits when you 
reach the end of the loop.


do {
  if(true) continue; //Yawn... skip.
  const x = ... ;
} while (predicate(x)); //What's x?


But the compiler could tell that there is a 'continue' before x 
was declared, and issue an error when it is used in while(...)

Re: Implicit instantiation of parameterless templates

2012-10-05 Thread David Piepgrass


On Friday, 5 October 2012 at 12:24:12 UTC, Paulo Pinto wrote:

On Friday, 5 October 2012 at 12:01:30 UTC, Piotr Szturmaj wrote:

Java and C# with their generics can do the following:

class List { }
class List { }

List list = new List();
List intList = new List();

In D similar code can't work because we can't have both a type 
and a template with the same name. So this code must be 
rewritten to:

...


Why to you need this?

Java and C# only allow this type of code due to backwards 
compatibility, because their first version did not allow for 
generics, and their creators did not want to force everyone to 
recode their code bases.


The Java and C# situations are very different. In Java, generics 
are "erased" at runtime so a List is the same thing as a 
List. In C#/.NET, however, List and List are 'unrelated' 
types, which is what Piotr was talking about.


.NET allows "overloading" types based on the number of generic 
parameters, so for example Tuple is a different type than 
Tuple (the runtime names are Tuple`1 and Tuple`2). Since C# 
has no "default arguments" for generics or "tuple template 
parameters", it is trivial to allow different types that have the 
same name but a different number of generic parameters. In D, 
however, the situation is a bit more complicated.

Re: Getting started with D - Phobos documentation sucks

2012-10-01 Thread David Piepgrass

I think documentation is really important, and something has to 
be done about it. How can a newcomer get started with D when he 
doesn't have a readable documentation of Phobos?


A couple of random things I'd like to see:

1. Improve index.html. It's the first thing new users are likely 
to see about Phobos and it appears to contain an overview of the 
modules, but in fact it only lists half the modules of Phobos and 
the description of most modules is too short  to be useful. There 
should also be a getting-started guide that lists the most common 
data types and functions and which module contains them (to!T, 
Tuple, writeln, ) and it should also discuss the 'built-in' 
types for completeness, like slices, hashes and strings, since in 
other languages these are standard library components.)


2. To make the documentation easier to Google, put the keyword 
"D2" on every page of the Phobos documentation, e.g. the heading 
could change from "std.file" to "std.file (D2)". Nowadays when I 
search for something about "D Language", I often find a page 
about D1 instead of D2.


The "articles" should be reviewed too. For example the page on 
tuples http://dlang.org/tuple.html makes it sound like you're 
supposed to define your own Tuple type instead of using the one 
in std.typecons; in fact it suggests


template Tuple(E...) { alias E Tuple; }

which is really a TypeTuple isn't it?

Re: I have a feature request: "Named enum scope inference"

2012-09-28 Thread David Piepgrass


I have a feature request: "Named enum scope inference"

The idea is, that whenever a named enum value is expected, you 
don't need to explicitly specify the scope of the enum value. 
This would reduce redundancy in typing, just like automatic 
type inference does.


Examples:
-

enum MyDirection { forward, reverse }
struct MyIterator(MyDirection dir)
{
...
}

int forward = 42; // Doesn't interfere with the next line...
auto itr = MyIterator!forward(); // Infers MyDirection.forward


I like the spirit of this feature, but as Alex pointed out, 
ambiguity is possible (which could theoretically cause errors in 
existing code) and while I'm not familiar with how the compiler 
is implemented, my spidey-sense thinks that what you're asking 
for could be tricky to implement (in a language that already has 
a very large amount of rules and features.) Plus, I don't like 
the fact that when you see something like "MyIterator!forward" by 
itself in code, there is no obvious clue that forward is an enum 
value and not a class name or a variable. So there is a sort of 
decrease in clarity of the entire language by increasing the 
total number of possible meanings that an identifier can have.


So I think this feature would need a more clear syntax, something 
to indicate that the value is an enum value. I don't currently 
have a really good counterproposal though

Re: DIP19: Remove comma operator from D and provision better syntactic support for tuples

2012-09-25 Thread David Piepgrass


The built-in tuple is also quite useful when defining templates.

In essence, we have two kinds of tuples: the built-in language 
tuple is the "unpacked" tuple while Phobos hosts the "packed" 
one. They each have their own use case and they can coexist 
peacefully. But the language itself needs to standardize on one 
or the other.


+1, and it should standardize on "packed" (non-expanded) tuples 
because "unpacked" ones have very unusual behavior, and because 
it's impractical to eliminate "packed" tuples but practical to 
eliminate "unpacked" ones. "unpacked" tuples should only exist as 
an intermediate result (the result of .expand).


If the language made T… a packed tuple instead, then we could 
use the packed tuple everywhere and unpack it where necessary, 
and something like this could be used to make a packed tuple:


T getThings(T...)(T.expand t)
{
return T(t);
}

T t1;
T t2 = getThings!(T)(t1.expand);


"T.expand" naturally has the connotation "unpacked" to me, 
whereas what you really want to do is indicate that "t" is 
packed, right? Clearly, the syntax for a varargs template like 
this would have to change to indicate that T is non-expanded; 
unfortunately, I don't have a really compelling syntax to suggest.


P.S. If non-expanded tuples were the default, they should 
probably have a quicker syntax than "t.expand" to expand them. I 
suggest overloading unary * as in "*t"; this is known as the 
"explode" operator in boo.

Re: Neat: UFCS for integer dot operator suffix

2012-09-24 Thread David Piepgrass

I used this in a small unit library (partially accessible on 
github),

to obtain code like:

auto distance = 100.km;
auto speed = 130.km/h; // division works, too.

auto timeToDestination = (distance/speed).hour; // 
distance/speed

gives seconds =>  transformed in hours.

It was a nice exercise in using UFCS and mixins to create your 
own

unit library (not only IS, but ay kind of unit library).

And, you know what? I *never* used it after coding it. These 
examples
are cute, they make for nice blog posts for F#, but the 
real-world
usage is dubious to me (I know they were space-programs 
crashes)


I quite like the implicit message in units: use the type 
system to
help you catch errors are compile-time. Add to that a nice 
syntax and
a showcase for D's generational capabilities and it's quite 
nice.


But, to my eyes, it's but a toy.


I wouldn't read too much into it. You're a library author, not 
(I assume) a scientific computing guy. So beyond playing with a 
few examples, your work on this library is done - you wouldn't 
be a client of it for the simple reason you don't intensively 
work with kilometers, speeds, dollars, and such. It's possible 
that a good and usable library of units could add value to a 
category of users.


IMO, you don't need to be a scientific computing guy to find unit 
checking useful, since almost any number conceptually has a unit 
on it. I would ask any programmer, how often do you accidentally 
use a measurement of 'bytes' where 'dwords' were expected, or use 
a variable as an array index when it was actually something 
totally different?


However, unit checking cannot be done satisfactorially in a 
library; it has two main problems when provided that way:
1. It's too bulky (too much syntax required, as units have to be 
spelled out constantly)
2. Values with traditionally-typed units don't interoperate with 
existing libraries, including very simple functions such as


int abs(int x) { return x > 0 ? x : -x; }
int square(int x) { return x*x; }

You can define an inplicit conversion from e.g. 'Unit!"pixels"' 
to 'int' but then you'll need to manually cast it back, and the 
compiler can't check your cast to make sure it's correct.


IMO, solving these two problems requires a parallel type system 
to infer unit relationships automatically, either with direct 
language support, or a separate analysis tool that uses the 
compiler as a service (currently not possible with D).

Re: DIP19: Remove comma operator from D and provision better syntactic support for tuples

2012-09-24 Thread David Piepgrass

The analysis in there fails to construct a case even half 
strong that deprecating the comma operator could significantly 
help tuples.


That is because it does not base the discussion on the right
limitations of built-in tuples:

auto (a,b) = (1,"3");
(auto a, string b) = (1, "3");


Agreed, this is the key thing missing from D.

There is also no consideration in the DIP of what I consider one 
of D's most confusing "features": "pre-expanded tuples" or in 
other words, type tuples. These beasts can be very confusing when 
first encountered, and they do not behave like any data type in 
any other language I know of:


import std.typecons; // Contains Tuple!(...), which reminds me,
// how do I know which module contains a given feature?
// http://dlang.org/phobos/index.html doesn't mention it.
void call() { humm(1, 2); }

void humm(T...)(T x)   // x, a pre-expanded tuple
{
//auto c = [x.expand]; // ERROR, expand undefined
   // (it's already expanded!)
auto a = x;// a is also pre-expanded
auto b = [ a, a ]; // int[], not Tuple!(int,int)[]
//int d = derr(x); // ERROR, have to un-expand it
writeln(a);// "12"
writeln(b);// "[1, 2, 1, 2]"
}
int derr(Tuple!(int,int) a) { return a[0] + a[1]; }

I know you guys are all used to this behavior but I'm telling 
you, pre-expanding is very weird. It would be nice if type tuples 
could somehow be unified with library tuples and behave like the 
latter.

Re: Review of Andrei's std.benchmark

2012-09-22 Thread David Piepgrass

- It is very strange that the documentation of printBenchmarks 
uses
neither of the words "average" or "minimum", and doesn't say 
how many
trials are done I suppose the obvious interpretation is 
that it
only does one trial, but then we wouldn't be having this 
discussion
about averages and minimums right? Øivind says tests are run 
1000
times... but it needs to be configurable per-test (my idea: 
support a
_x1000 suffix in function names, or _for1000ms to run the test 
for at
least 1000 milliseconds; and allow a multiplier when when 
running a
group of benchmarks, e.g. a multiplier argument of 0.5 means 
to only
run half as many trials as usual.) Also, it is not clear from 
the
documentation what the single parameter to each benchmark is 
(define

"iterations count".)


I don't think it's a good idea because the "for 1000 ms" 
doesn't say anything except how good the clock resolution was 
on the system. I'm as strongly convinced we shouldn't print 
useless information as I am we should print useful information.


I am puzzled about what you think my suggestion meant. I am 
suggesting allowing the user to configure how long benchmarking 
takes. Some users might want to run their benchmark for an hour 
to get stable and reliable numbers; others don't want to wait and 
want to see results ASAP. Perhaps the *same* user will want to 
run benchmarks quickly while developing them and then do a "final 
run" with more trials once their benchmark suite is complete. 
Also, some individual benchmark functions will take microseconds 
to complete; others may take seconds to complete. All I'm 
suggesting are simple ways to avoid wasting users' time, without 
making std.benchmark overly complicated.

Re: Review of Andrei's std.benchmark

2012-09-21 Thread David Piepgrass

Some random comments about std.benchmark based on its 
documentation:


- It is very strange that the documentation of printBenchmarks 
uses neither of the words "average" or "minimum", and doesn't 
say how many trials are done


Because all of those are irrelevant and confusing.


Huh? It's not nearly as confusing as reading the documentation 
and not having the faintest idea what it will do. The way the 
benchmarker works is somehow 'irrelevant'? The documentation 
doesn't even indicate that the functions are to be run more than 
once!!



I don't think that's a good idea.


I have never seen you make such vague arguments, Andrei.

Re: Review of Andrei's std.benchmark

2012-09-21 Thread David Piepgrass

After extensive tests with a variety of aggregate functions, I 
can say firmly that taking the minimum time is by far the best 
when it comes to assessing the speed of a function.


Like others, I must also disagree in princple. The minimum sounds 
like a useful metric for functions that (1) do the same amount of 
work in every test and (2) are microbenchmarks, i.e. they measure 
a small and simple task. If the benchmark being measured either 
(1) varies the amount of work each time (e.g. according to some 
approximation of real-world input, which obviously may vary)* or 
(2) measures a large system, then the average and standard 
deviation and even a histogram may be useful (or perhaps some 
indicator whether the runtimes are consistent with a normal 
distribution or not). If the running-time is long then the max 
might be useful (because things like task-switching overhead 
probably do not contribute that much to the total).


* I anticipate that you might respond "so, only test a single 
input per benchmark", but if I've got 1000 inputs that I want to 
try, I really don't want to write 1000 functions nor do I want 
1000 lines of output from the benchmark. An average, standard 
deviation, min and max may be all I need, and if I need more 
detail, then I might break it up into 10 groups of 100 inputs. In 
any case, the minimum runtime is not the desired output when the 
input varies.


It's a little surprising to hear "The purpose of std.benchmark is 
not to estimate real-world time. (That is the purpose of 
profiling)"... Firstly, of COURSE I would want to estimate 
real-world time with some of my benchmarks. For some benchmarks I 
just want to know which of two or three approaches is faster, or 
to get a coarse ball-park sense of performance, but for others I 
really want to know the wall-clock time used for realistic inputs.


Secondly, what D profiler actually helps you answer the question 
"where does the time go in the real-world?"? The D -profile 
switch creates an instrumented executable, which in my experience 
(admittedly not experience with DMD) severely distorts running 
times. I usually prefer sampling-based profiling, where the 
executable is left unchanged and a sampling program interrupts 
the program at random and grabs the call stack, to avoid the 
distortion effect of instrumentation. Of course, instrumentation 
is useful to find out what functions are called the most and 
whether call frequencies are in line with expectations, but I 
wouldn't trust the time measurements that much.


As far as I know, D doesn't offer a sampling profiler, so one 
might indeed use a benchmarking library as a (poor) substitute. 
So I'd want to be able to set up some benchmarks that operate on 
realistic data, with perhaps different data in different runs in 
order to learn about how the speed varies with different inputs 
(if it varies a lot then I might create more benchmarks to 
investigate which inputs are processed quickly, and which slowly.)


Some random comments about std.benchmark based on its 
documentation:


- It is very strange that the documentation of printBenchmarks 
uses neither of the words "average" or "minimum", and doesn't say 
how many trials are done I suppose the obvious interpretation 
is that it only does one trial, but then we wouldn't be having 
this discussion about averages and minimums right? Øivind says 
tests are run 1000 times... but it needs to be configurable 
per-test (my idea: support a _x1000 suffix in function names, or 
_for1000ms to run the test for at least 1000 milliseconds; and 
allow a multiplier when when running a group of benchmarks, e.g. 
a multiplier argument of 0.5 means to only run half as many 
trials as usual.) Also, it is not clear from the documentation 
what the single parameter to each benchmark is (define 
"iterations count".)


- The "benchmark_relative_" feature looks quite useful. I'm also 
happy to see benchmarkSuspend() and benchmarkResume(), though 
benchmarkSuspend() seems redundant in most cases: I'd like to 
just call one function, say, benchmarkStart() to indicate "setup 
complete, please start measuring time now."


- I'm glad that StopWatch can auto-start; but the documentation 
should be clearer: does reset() stop the timer or just reset the 
time to zero? does stop() followed by start() start from zero or 
does it keep the time on the clock? I also think there should be 
a method that returns the value of peek() and restarts the timer 
at the same time (perhaps stop() and reset() should just return 
peek()?)


- After reading the documentation of comparingBenchmark and 
measureTime, I have almost no idea what they do.

Re: Extending unittests [proposal] [Proof Of Concept]

2012-09-21 Thread David Piepgrass

However, what's truly insane IMHO is continuing to run a 
unittest block after
it's already had a failure in it. Unless you have exceedingly 
simplistic unit
tests, the failures after the first one mean pretty much 
_nothing_ and simply

clutter the results.


I disagree. Not only are my unit tests independent (so of course 
the test runner should keep running tests after one fails) but 
often I do want to keep running after a failure.


I like the BOOST unit test library's approach, which has two 
types of "assert": BOOST_CHECK and BOOST_REQUIRE. After a 
BOOST_CHECK fails, the test keeps running, but BOOST_REQUIRE 
throws an exception to stop the test. When testing a series of 
inputs in a loop, it is useful (for debugging) to see the 
complete set of which ones succeed and which ones fail. For this 
feature (continuation) to be really useful though, it needs to be 
able to output context information on failure (e.g. "during 
iteration 13 of input group B").

Re: SpanMode uses incorrect terminology (breadth)

2012-09-18 Thread David Piepgrass


Breadth-first (probably never required):
a/b
a/c
a/1.txt
a/2.txt
a/b/1.txt
a/b/2.txt
a/c/z
a/c/1.txt
a/c/z/1.txt
Defining property: number of /'s increases monotonically. Note 
how the deeper you go, the more spread out the children become. 
It's ALL children, then ALL grandchildren, then ALL 
great-grandchildren, etc.


I wouldn't bother implementing breadth-first. It's doubtful 
that anyone would want it, surely...?


Actually I prefer breadth-first search when searching the file 
system. When I search an entire volume, inevitably the 
(depth-first) search gets stuck in a few giant, deep directories 
like the source code of Mono or some other cave of source code, 
you know, something 12 directories deep with 100,000 files in it. 
A breadth-first search would be more likely to find the thing I'm 
looking for BEFORE going spelunking in these 12-deep caves.

Re: Would like to see ref and out required for function calls

2012-09-13 Thread David Piepgrass

I really think that optionally allowing ref and out at the call 
site is more
damaging than beneficial. _Requiring_ it could be beneficial, 
since then you
know that the arguments are being taken by ref, but if it's 
optional, it gives

you a false sense of security and can be misleading.


It gives *who* a false sense of security? If it's optional then I 
*know* lack of ref/out doesn't imply that the parameter won't 
change. Only people who don't know the rules would have this 
false sense of security.


I think it would be nice to have it required, but it's very bad 
to break everyone's code. It could only be reasonably enforced 
with a compiler switch--or, wait, come to think of it, a pragma 
would probably be better way to introduce language changes like 
this:


module foo;
pragma(callSiteRef);
// and would it make sense to offer an alternative to -property 
too?

pragma(property);

Now you can tell whether a program uses ref/out religiously or 
not.

Re: Would like to see ref and out required for function calls

2012-09-13 Thread David Piepgrass

On Thursday, 13 September 2012 at 15:01:28 UTC, Andrei 
Alexandrescu wrote:

On 9/13/12 10:53 AM, David Piepgrass wrote:
Walter and I have discussed this for quite a while. We have 
recently
decided to disallow, at least in SafeD, escaping the address 
of a ref

parameter. In the beginning we'll be overly conservative by
disallowing taking the address of a ref altogether. I'll 
write a DIP

on that soon.


Err, wouldn't that break a lot of stuff, a lot of which is 
actually safe

code?

void a(ref int x) { b(&x); }
void b(int* x) { if(x != null) (*x)++; }


Yes. Disallowing taking the address of a local is conservative 
and would disallow a number of valid programs.


Arguably, such programs are in poor style anyway. A good 
program takes pointers only if it needs to keep them around; if 
all that's needed is to use the parameter transitorily or pass 
it down, ref is best.


Another common reason to use a pointer (instead of ref) is if 
it's optional (nullable). If the parameter is ref then the caller 
must go to the trouble of creating a variable.


However, this could be solved with a feature like the following:

int* find(string searchString, out int index) { ... }
// _ means "don't care", assuming no variable "_" is defined
void caller() { find("foo", out _); }

In fact this is arguably better for 'out' variables since the 
callee (find) no longer has to check whether 'index' is null 
before assigning it. However this doesn't totally solve the 
problem for 'ref' parameters, since such parameters are both 
output and input parameters and the programmer may want 'null' to 
have some special meaning as an input.


Escaping the addresses of stack variables, not just ref 
parameters, is a

general problem in "safe" D. Do you have any ideas about that?

Btw just a simple illustrative example:
int* unsafe1() { int x = 1; return unsafe2(&x); }
int* unsafe2(int* x) { return x; }
int unsafe3() { int y = 7; *unsafe1() = 8; return y; }
enum gaff = unsafe3(); // ICE, no line number given

Same thing. By and large safe programs will need to make more 
use of the garbage collector than others. It's the way things 
work; stack allocation can be made safer if we add typed 
regions, but that's a very significant escalation of 
complication. There is no simple solution to this today.


Same thing meaning that you'd propose disallowing taking the 
address of a stack variable in SafeD? (I guess this would include 
escaping 'this' within a struct.)

Re: Would like to see ref and out required for function calls

2012-09-13 Thread David Piepgrass

I don't think there would be problems with allowing ref/out 
optionally at the call site. The thing is, however, that in 
this matter reasonable people may disagree.
I'd be unable to identify any pattern in engineers choosing one 
preference over the other.


Maybe C++ fans prefer pointers or implicit ref, C# fans prefer 
call-site ref?


Now that the subject has been broken, we do have good evidence 
of a pattern that generates significant and difficult bugs: 
escaping the address of a reference. In C++:


struct A {
A(int& host) : host_(host) {}
private:
int& host_;
};

In D:

class A { // or struct
A(ref int host) : _host(&host) {}
private:
int* _host;
}

A solution we use for C++ is to require escaped addresses to be 
always passed as pointers or smart pointers.


Walter and I have discussed this for quite a while. We have 
recently decided to disallow, at least in SafeD, escaping the 
address of a ref parameter. In the beginning we'll be overly 
conservative by disallowing taking the address of a ref 
altogether. I'll write a DIP on that soon.


Err, wouldn't that break a lot of stuff, a lot of which is 
actually safe code?


void a(ref int x) { b(&x); }
void b(int* x) { if(x != null) (*x)++; }

Escaping the addresses of stack variables, not just ref 
parameters, is a general problem in "safe" D. Do you have any 
ideas about that?

Re: Would like to see ref and out required for function calls

2012-09-11 Thread David Piepgrass


void func (ref int[], int)

If ref/out were required at the call site, this destroys UFCS.

int[] array;
array.func(0); // error, ref not specified by caller


For UFCS, ref should be implied.

+1


Why? UFCS means uniform function call syntax.
It is already understood that the thing left of '.' may be passed 
by reference:


struct Foo { int x = 0; void f() { x++; } }
void obvious()
{
   Foo foo; foo.f(); // x is passed to f() by reference
}

Perhaps your argument makes sense for classes, but not for 
structs. In any case the syntax (ref foo).f() would require extra 
work for Walter so I would not propose it. What I might propose 
instead is that, if the user requests (via command-line argument 
such as '-callSiteRef') that a warning be issued for arguments 
passed without 'ref' at the call site, then a situation like this 
should prompt a warning.


class Bar { int b; }
void changeBar(ref Bar b) { b = new Bar(); }
void warning()
{
Bar bar = new Bar();
bar.b = 10;
bar.changeBar(); // Warning: 'bar' is implicitly passed by 
reference. To eliminate this warning, use 'changeBar(ref bar)' 
instead or do not compile with '-callSiteRef'

}

Again, this problem only applies to classes, since it is 
understood that structs are normally passed by reference.


Also for 'const ref' parameters, callsite ref should not be 
necessary.



The callee might escape a pointer to the argument. Which is
'non-obvious' as well when there is no callsite ref.


If you're referring to the fact that it's easy to have a D 
pointer to a stack variable outlive the variable... I don't think 
that this 'flaw' (I think of it as a flaw, others may think of it 
as a feature) is a good enough reason to say 'call site ref 
should be required for const ref parameters'.



for value types, it is arguably important.


This is not necessarily a valid conclusion. Popularity does not 
imply importance.


I think 'ref' is a popular idea because people have used it in C# 
and liked it. I didn't start putting 'IN OUT' and 'OUT' in my C++ 
code until C# taught me the value of documenting it at the call 
site.



Generally speaking, if a parameter being
ref/out is surprising, there is something wrong with the 
design.  (There
are times it is non-obvious in otherwise good code, this 
seems uncommon.)


I often want to 'scan' code to see what it does. Especially for 
debugging, I want to see where the side-effects are QUICKLY. 
Guessing which parameters are 'ref' requires me to analyze the 
code in my head. Even if I myself wrote the code, it can be time 
consuming. That's why I would prefer to explicitly mark possible 
side effects with 'ref' (of course, when passing a class to a 
function, the class members may be modified when the reference to 
the class was passed by value. But it is far easier to keep track 
of which classes are mutable than to keep track of which 
parameters of which functions are 'ref', because functions far 
outnumber classes.)



IMHO it is better left to the future D editor.


That's probably a long way off.

Re: Would like to see ref and out required for function calls

2012-09-11 Thread David Piepgrass

Actually the darndest thing is that C# has retired the syntax 
in 5.0 (it used to be required up until 4.0). Apparently users 
complained it was too unsightly.


Andrei


Wh-huh?? Reference please. I have sought out info about C# 5 
multiple times and I never heard that.


Anyway I don't mind if ref is not required, but it ticks me off 
that it is not *allowed*. Even in C++ I can use "OUT" and "IN 
OUT" at both the definition and call sites (I may as well, since 
Windows header files #define them already). The compiler doesn't 
verify it but I find it useful to make the code self-documenting.


Some have said that "well if the the compiler doesn't enforce it 
then it's pointless, you won't be able to tell if a call site 
without 'ref' is passed by ref". But no, it's not pointless, 
because (1) if you see a call site WITH 'ref' then clearly it is 
passed by reference, (2) I would use 'ref' consistently in my own 
code so that when I look back at my code a year later, the 
absence of 'ref' is a clear indication that it is an input 
parameter, and (3) if the compiler offered the option to issue a 
warning when 'ref' is absent, statement (2) would be true 100% of 
the time, in my code, instead of just 98%.


Most of the code I look at is my own so that's my primary motive 
for wanting 'ref'. Yes, if 'ref' were allowed, some people would 
not use it; so when looking at a new code base I'd have no 
guarantee that a parameter NOT marked ref is passed by value. But 
at least (1) still applies.

Re: handful and interval

2012-09-03 Thread David Piepgrass


if (a.among("struct", "class", "union")) { ... }
if (b.between(1, 100)) { ... }


Is between inclusive or not of the endpoints?


After quite a bit of thought, I think inclusive is the right 
way.


Then there's no way to specify an empty interval. I suppose 
with "between" that would not be relevant.


Perhaps b.between(1, 0) would always return false.

However I'd use different names: among=>isOneOf, 
between=>isInRange. I would also define another function inRange 
that ensures, rather than tests, that a value is in range:


string userInput = "-7";
int cleanInput = inRange(parse!int(userInput), 1, 100);

Re: handful and interval

2012-09-03 Thread David Piepgrass

However I'd use different names: among=>isOneOf, 
between=>isInRange.
I forgot to state the reason, namely, I think boolean functions 
should be named so that you can tell they return bool, as 
"between" could easily be a function that places a value into a 
range rather than tests whether it is in range.

Re: Consistency, Templates, Constructors, and D3

2012-08-28 Thread David Piepgrass

And a postblits would end up being...? The extra 'this' makes 
it look like an obvious typo or a minor headache.


this this(this){} //postblitz?


This is not an appropriate syntax, not just because it looks 
silly, but because a postblit constructor is not really a 
constructor, it's is a postprocessing function that is called 
after an already-constructed value is copied. So I don't think 
there's any fundamental need for postblit constructors to look 
like normal constructors.



I'm sure this case has an easy solution. How about:

   struct Foo {
   this new() { ... } // constructor
   this() { ... } // postblit
   }


 But now you're breaking consistency by not including a return 
type. maybe 'this this()' but that looks like a mistake or typo.


I don't see how "this this()" is any worse than "this(this)"; IMO 
neither name really expresses the idea "function that is called 
on a struct after its value is copied". But a postblit 
constructor doesn't work like normal constructors, so keeping the 
"this(this)" syntax makes sense to me even though it is not 
consistent with normal constructors. "this()" has the virtual of 
simplicity, but it's even less googlable than "this(this)".


And for overload distinction (new vs load), which is an issue 
beyond Memory Pools and effects and even larger codebase. 
There needs to be a consistent way to distinguish (by name) a 
constructor that loads from a file, and one that creates the 
object "manually".


 Isn't that more an API issue?


Sorry, I don't follow.

If we take your approach and suggestion, which one should the 
compile assume?


Something globalSomething;

class Something {
 this defaultConstructor();
 this duplicate(); //or clone
 this copyGlobalSomething();
 this constructorWithDefault(int x = 100);
}

By signature alone... Which one? They are all legal, they are 
uniquely named, and they are all equal candidates. Order of 
functions are irrelevant.


It could work identically to how D functions today. A 'new()' 
constructor would be part of the root Object classes are 
derived of, and structs would have an implicit 'new()' 
constructor.


 But new wouldn't be a constructor then would it? It would 
still be based on allocating memory that's optionally 
different. Constructor and allocation are two different steps; 
And for it to seamlessly go from one to another defaults to 
having a set default constructor. Let's assume...


 class Object {
   this new() {
   //allocate
 return defaultConstructor();
   }
   this defaultConstructor() {}
 }

 Now in order to make a constructor (and then destructor) you 
either can:


 A) overload or use 'defaultConstructor', which would be 
publicly known
 B) overload new to do allocation the same way and call a 
different constructor and specifically add a destructor to make 
sure it follows the same lines.
 C) overload new to call the default allocator and then call a 
different constructor


 Now assuming you can make a different constructor by name, you 
then have to be able to specify a destuctor the same way for 
consistancy.


 class CustomType {
   this MyAwesomeConstuctor();
   void MyAwesomeDestructor();
 }

 Same problem, how do you tell it ahead of time without 
completely rewriting the rules? leaving it as 'this' and 
'~this' are simple to remember and work with, and factory 
functions should be used to do a bulk of work when you don't 
want the basic/bare minimum.


Sorry, I don't understand what you're getting it. I suspect that 
you're interpreting his proposal in a completely different way 
than I am, and then trying to expose the flaws in your 
interpretation of the proposal, and then I can't follow it 
because my interpretation doesn't have those flaws :)

Re: Consistency, Templates, Constructors, and D3

2012-08-28 Thread David Piepgrass


On Monday, 27 August 2012 at 20:22:47 UTC, Era Scarecrow wrote:

On Monday, 27 August 2012 at 14:53:57 UTC, F i L wrote:
in C#, you use 'new Type()' for both classes and structs, and 
it works fine. In fact, it has some benefit with generic 
programming. Plus, it's impossible to completely get away from 
having to understand the type, even in C++/D today, because we 
can always make factory functions:


 I'm sure in C# that all structs and classes are heap allocated 
(It takes after C++ very likely) that's the simplest way to do 
it. You can do that in C++ as well, but other than having to 
declare it a pointer first. In C++ they made structs 'classes 
that are public by default' by it's definition I believe. 
Considering how C++ is set up that makes perfect sense.


You're mistaken as FiL pointed out. "new" is simply not a heap 
allocation operator in C#, it is a creation operator. Structs in 
C# are allocated on the stack or embedded in another object (on 
the stack or on the heap). "new X()" creates a new value of type 
X, which could be a struct on the stack or a class on the heap.


I like the way C# works in this regard because the way X is 
allocated is an implementation detail that is hidden from 
clients. If the type X is immutable, then I can freely change it 
from struct to class or vice versa without affecting clients that 
use X. (Mind you if X is mutable, the difference is visible to 
clients since x1 = x2 copies X itself, not a reference to X.)


Plus as mentioned, generic code can use "new T()" without caring 
what kind of type T is.

Re: Consistency, Templates, Constructors, and D3

2012-08-25 Thread David Piepgrass

Interestingly, the discussion so far has been all about 
syntax, not any
significant new features. I'm thinking ... coersion of a class 
to any

compatible interface (as in Go)?


We already have:

import std.range;
auto range = ...;
auto obj = inputRangeObject(range);
alias ElementType!(typeof(range)) E;
InputRange!E iface = obj;
writeln(iface.front);

So maybe we can do:

auto implementObject(Interface, T)(T t){...}
auto obj = implementObject!(InputRange!E)(range);


Well, my D-fu is too weak to tell whether it's doable. When it 
comes to ranges, the standard library already knows what it's 
looking for, so I expect the wrapping to be straightforward. Even 
if a template-based solution could work at compile-time, run-time 
(when you want to cast some unknown object to a known interface) 
may be a different story.


I am sometimes amazed by the things the Boost people come up 
with, that support C++03, things that I was "sure" C++03 couldn't 
do, such as lambda/inner functions (see "Boost Lambda Library", 
"Boost.LocalFunction" and "Phoenix"), scope(exit) 
(BOOST_SCOPE_EXIT), and a "typeof" operator (Boost.Typeof). If 
there were 1/4 as many D programmers as C++ ones, I might be 
amazed on a regular basis.


Also, it might be nice to have 'canImplement' for template 
constraints:


auto foo(T)(T v) if (canImplement!(T, SomeInterface)){...}


or 'couldImplement', assuming T doesn't officially declare that 
it implements the interface...

Re: Consistency, Templates, Constructors, and D3

2012-08-25 Thread David Piepgrass

I'm inclined to think that constructors should use "init", in 
keeping with tradition.


Wow, what the hell am I saying. Scratch that sentence, I often 
wish I could edit stuff after posting.

Re: Consistency, Templates, Constructors, and D3

2012-08-24 Thread David Piepgrass

I've had a couple of ideas recently about the importance of 
consistency in a language design, and how a few languages I 
highly respect (D, C#, and Nimrod) approach these issues. This 
post is mostly me wanting to reach out to a community that 
enjoys discussing such issues, in an effort to correct any 
mis-conceptions I might hold, and to spread potentially good 
ideas to the community in hopes that my favorite language will 
benefit from our discussion.


The points you raise are good and I generally like your ideas, 
although it feels a little early to talk about D3 when D2 is 
still far from a comprehensive solution. Amazing that bug 1528 is 
still open for example: 
http://stackoverflow.com/questions/10970143/wheres-the-conflict-here


Regarding your idea for merging compile-time and run-time 
arguments together, it sounds good at first but I wonder if it 
would be difficult to handle in the parser, because at the call 
site, the parser does not know whether a particular argument 
should be a type or an expression. Still, no insurmountable 
difficulties come to mind.


I certainly like the idea to introduce a more regular syntax for 
object construction (as I have proposed before, see 
http://d.puremagic.com/issues/show_bug.cgi?id=8381#c1) but you 
didn't say whether it would be allowed to declare a static method 
called "new". I'd be adamant that it should be allowed: the 
caller should not know whether they are calling a constructor or 
not. Also, I'm inclined to think that constructors should use 
"init", in keeping with tradition.


A couple inconsistencies that come immediately to my mind about 
D2 are


1. Function calling is foo!(x, y)(z) but declaration is foo(x, 
y)(int z)
   And the compiler doesn't always offer a good error message. 
I'm seeing
   "function declaration without return type. (Note that 
constructors are always named 'this')"

   "no identifier for declarator myFunction!(Range)(Range r)"
2. Ref parameters are declared as (ref int x) but are not allowed 
to be called as (ref x) -- then again, maybe it's not a real 
inconsistency, but I'm annoyed. It prevents my code from 
self-documenting properly.


Obviously, D is easy compared to C++, but no language should be 
judged by such a low standard of learnability. So I am also 
bothered by various things about D that feel unintuitive:


1. Enums. Since most enums are just a single value, they are 
named incorrectly.
2. immutable int[] func()... does not return an immutable array 
of int[]?
3. 0..10 in a "foreach" loop is not a range. It took me awhile to 
find the equivalent range function, whose name is quite baffling: 
"iota(10)"
4. Eponymous templates aren't distinct enough. Their syntax is 
the same as a normal template except that the outer and inner 
members just happen to have the same name. This confused me the 
other day when I was trying to understand some code by Nick, 
which called a method inside an eponymous templates via another 
magic syntax, UFCS (I like UFCS, but I might be a little happier 
if free functions had to request participation in it.)
5. The meaning is non-obvious when using "static import" and 
advanced imports like "import a = b : c, d" or "import a : b = c, 
d = e" or "import a = b : c = d".
6. the syntax of is(...)! It looks like a function or operator 
with an expression inside, when in fact the whole thing is one 
big operator. It's especially not obvious that "is(typeof(foo + 
bar))" means "test whether foo+bar is a valid and meaningful 
expression".


Making matters worse, the language itself and most of its 
constructs are non-Googlable. For example if you don't remember 
how do declare the forwarding operator (alias this), what do you 
search for? If you see "alias _suchAndSuch this" and don't know 
what it means, what do you search for? (one might not think of 
removing the middle word and searching for that).


I even have trouble finding stuff in TDPL e-book. The place where 
templates are discussed is odd: section 7.5 in chapter 7, 
"user-defined types", even though the template statement doesn't 
actually define a type. I know, I should just read the book 
again... say, where's the second edition? I got so disappointed 
when I reached the end of chapter 13 and it was followed by an 
index. No UFCS or traits or ranges mentioned in there anywhere... 
compile-time function evaluation is mentioned, but the actual 
acronym CTFE is not.


I also hope something will be changed about contracts. I am 
unlikely to ever use them if there's no option to keep SOME of 
them in release builds (I need them to work at all boundaries 
between different parties' code, e.g. official API boundaries, 
and it is preferable to keep them in all cases that they don't 
hurt performance; finally, we should consider that the class that 
contains the contracts may not know its own role in the program, 
so it may not know whether to assert or enforce is best). Plus, 
the syntax is too verbose. Instead of


   in {

Re: Fragile ABI

2012-08-21 Thread David Piepgrass

I think the only reason we still use COM today is that, sadly, 
there is no other OO standard interoperable with all 
languages. C++ vtables are the closest competitor; I guess 
their fatal flaw is that there is no standard for memory 
management across C++ DLLs.


Even .NET with his goal of supporting multiple languages has the
CLS as the common set of datatypes and OO concepts to support 
across .NET

languages.

Given that OO has so many types of possible implementations, it 
is hard to implement an ABI that works across multiple 
languages.


Sure, but .NET apps are not limited to CLS. Two different .NET 
languages can easily interoperate outside the rules of CLS (as 
long as it is still within the rules of .NET). Whereas operating 
beyond the limits of COM is much harder. Besides that, CLS itself 
is far more expansive than COM, allowing function overloading, 
inheritance, constructor arguments, etc.


It's unfortunate that .NET has limitations that make it hard for 
languages with novel features, like D, to fit in. (D could target 
.NET, of course, but there would be a significant cost, in terms 
of either performance, interoperability with other .NET code, 
and/or placing limitations on what D code can do.)



Lets see how the improved COM (WinRT) turns out to be.


Sadly, WinRT is again intended to be Windows-only, so developers 
like me that hate lock-in will avoid it in preference for .NET 
(hi Mono!) and yucky old C.

Re: Fragile ABI

2012-08-21 Thread David Piepgrass


On Monday, 20 August 2012 at 18:37:00 UTC, R Grocott wrote:

On Monday, 20 August 2012 at 15:26:48 UTC, Kagamin wrote:
What you ask for sounds quite similar to COM composition with 
delegation.


Would anybody mind linking to resources which describe COM 
composition with delegation? It's been suggested twice in this 
thread as an alternative way to develop a non-fragile API, but 
anything related to COM is almost invisible to search engines 
(even moreso than D itself).


There's nothing novel about COM except aggregation, and 
aggregation is just an implementation detail where a class 
pretends that it implements an interface but the calls to that 
interface go to another object, conceptually it's like "alias 
this" except that a dynamic cast (i.e. QueryInterface) is 
required to reach the second object:


http://msdn.microsoft.com/en-us/library/ms686558(v=vs.85)

For the most part COM sucks really bad: it is a very ordinary 
object-oriented ABI but without numerous features that we 
otherwise take for granted:


- In COM, you can't define static methods
- In COM, you can't overload functions
- In COM, constructors can't have arguments
- In COM, there are no fields, only properties
- In COM, class inheritance is not allowed (an interface IB can 
inherit from IA, but if you implement a class A that implements 
IA, you can't write a class B that derives from A and implements 
IB. In C++/ATL a template-based workaround is possible if A and B 
are in the same DLL.)


Moreover COM ABIs are fragile, in that there is almost zero 
support for adding or removing methods without either breaking 
everything or creating a new, independent, incompatible version 
(the only exception: you can safely add a method at the end of an 
interface, if you can be certain that no other interface inherits 
from it.)


Finally, it's Windows-only (although it has been reimplemented on 
Linux, e.g. for WINE) and modules must be registered in the 
Windows Registry.


I think the only reason we still use COM today is that, sadly, 
there is no other OO standard interoperable with all languages. 
C++ vtables are the closest competitor; I guess their fatal flaw 
is that there is no standard for memory management across C++ 
DLLs.

Re: Example of Rust code

2012-08-10 Thread David Piepgrass


I'd say we're doing all right.


Are you serious?


Yes. What's wrong with my D version? It's short and to the 
point, works, and produces optimal code.


Your version is basically a very long-winded way to say "auto x = 
5 - (3 + 1);" so it really has nothing to do with the example.


The point of the example was to represent a simple AST and store 
it on the stack, not to represent + and - operators as plus() and 
minus() functions.


(I must say though, that while ADTs are useful for simple ASTs, I 
am not convinced that they scale to big and complex ASTs, let 
alone extensible ASTs, which I care about more. Nevertheless ADTs 
are at least useful for rapid prototyping, and pattern matching 
is really nice too. I'm sure somebody could at least write a D 
mixin for ADTs, if not pattern matching.)


1. If you write FORTRAN code in D, it will not work as well as 
writing

FORTRAN in FORTRAN.
2. If you write C code in D, it will not work as well as writing 
C in C.


Really? And here I genuinely thought D was good enough for all 
the things C and FORTRAN are used for.


3. If you write Rust code in D, it will not work as well as 
writing Rust in Rust.


I hope someday to have a programming system whose features are 
not limited to whatever features the language designers saw fit 
to include -- a language where the users can add their own 
features, all the while maintaining "native efficiency" like D. 
That language would potentially allow Rust-like code, D-like 
code, Ruby-like code and even ugly C-like code.


I guess you don't want to be the one to kickstart that PL. I've 
been planning to do it myself, but so far the task seems just too 
big for one person.

Re: Functional programming in D and some reflexion on the () optionality.

2012-08-08 Thread David Piepgrass

The problem isn't about following haskell precisely or not (I 
think we shouldn't). The problem is wanting to have everything, 
and resulting in getting nothing.


Let's take haskell as example. Function are all pure. So it 
doesn't matter when a function get executed or not, and, as a 
result, haskell don't need a explicit function call like () in 
D.


Some people find that great, and want it to be the case in D. 
So D drop () usage.


Now, as D don't enforce purity, when does the function get 
executed is important. As a result, complicated scheme is 
implemented to know when does the function get executed, wand 
when it doesn't (You'll notice *4* families of scheme for that 
in D).


As a result, the design is overly complex, and defined nowhere. 
Just to have that haskell feature, that work well in haskell 
because of some other properties of the language D don't have.


What are the 4 "families of scheme to know when does the function 
get executed"?

Re: Functional programming in D and some reflexion on the () optionality.

2012-08-06 Thread David Piepgrass


class A { void B() {} }
auto a = new A().B();
// ^ semicolon expected following auto declaration, not '.'


Obviously. No clue what this snippet is trying to do.


Well I meant "int B() { return 0; }" of course. I think you 
deliberately miss the point.

Re: Functional programming in D and some reflexion on the () optionality.

2012-08-06 Thread David Piepgrass

To me, the first big failure of D to implement functional style 
is to not have first class functions. You get a function using 
& operator. But does it really make sense ? See code below :


void foo(){}
void bar(void function() buzz) {}

void main() { bar(foo); } // This will execute foo, and so 
fail. Functions are not first class objects.


void main() {
auto bar = &foo;
foo(); // Do something.
bar(); // Do the same thing.
auto buzz = &bar;
(*buzz)(); // Do the same thing.
}

Functions don't behave the same way is they are variables or 
declared in the source code.


Worse, foo was before a function call. Now it isn't anymore. 
foo, as a expression have a different meaning depending on what 
is done on it. It would become very confusing if foo return a 
reference, so it is an lvalue and & is a valid operation on the 
function call.


As D don't enforce purity like functional programing does, it 
can't be up to the compiler to decide when does the function 
get executed.


Then come UFCS. UFCS allow for function calls with parameters. 
It is still inconsistent.


void foo(T)(T t) {}

a.foo; // foo is called with a as argument.
&a.foo; // error : not an lvalue

Now let imagine that foo is a member function of a, &a.foo 
become a delegate. a.foo is still a function call. This is 
still quite inconsistent.


Implementing all this is almost impossible when you add 
@property into the already messy situation. Additionally, the 
current implement fails to provide the basics of functional 
programing, and break several abstraction provided by other 
languages features. C++ has proven that bad association of good 
language features lead to serious problems.


This require to be formalized in some way and not based on 
dmd's implementation. Inevitably, the process will lead to code 
breakage (adding or removing some ()/&).


Reading the @property thread, it seems that most people want to 
keep dmd's current behavior. Because code rely on it. This make 
sense, but if dmd's implement is the only goal, it means that 
other compiler are only to be reverse engineering dmd's 
behavior, and are guaranteed to lag behind. Considering this, I 
seriously wonder if it make sense to even try to follow dmd's 
behavior and decide whatever seems the right one when writing a 
D compiler, which will result in community split, or no 
alternative compiler produced for D.


I also have some proposal to fix thing, even some that would 
allow a.map!(...).array() to still be available. But 
inevitably, some other construct will broke. At this point, 
what matter isn't really what solution is adopted, but do we 
still want to be dependent on dmd implementation for D features.


I'm not sure if I understand your point perfectly, but I 
definitely feel that the way D handles optional parens is awful. 
The other day I noticed that the following is a syntax error (DMD 
2.059):


class A { void B() {} }
auto a = new A().B();
// ^ semicolon expected following auto declaration, not '.'

Even without silly errors like this, optional parenthesis create 
ambiguities, and ambiguities are bad. Maybe there is a sane way 
for parenthesis to be optional, but the way I've seen D behaving 
is *bizarre*.


The compiler should *expect* parenthesis, and only assume that 
the parenthesis are missing if it's the only way to compile 
without an immediate error. So for example,
- if foo is a non-@property function that returns another 
function, foo() must invoke foo itself and never the function 
that foo returns.
- if I say "&foo" where foo is a non-@property function, it 
should always take the address of the function, never take the 
address of the return value.
- The rules shouldn't change if you replace "foo" with a complex 
expression like "x.y[z]" or "new Module.ClassName".

Re: D language and .NET platform

2012-07-29 Thread David Piepgrass

On Sunday, 29 July 2012 at 16:32:10 UTC, Alex Rønne Petersen 
wrote:

On 29-07-2012 17:36, bearophile wrote:

.NET is too limited to represent the language,

Can you tell us why?
Array slices. The .NET type system has no way to represent them 
because it's designed for precise GC, and array slices allow 
interior pointers in the heap (as opposed to the stack when 
passing a field of an object by reference to a function, or 
whatever).


D is theoretically designed for precise GC, too. But in .NET you 
can only have a reference to an array as a whole, so a slice must 
be represented as an array, offset and length. The real problem I 
see is that in D you can have a slice that does *not* refer to an 
array on the GC heap, such as a slice on a non-GC heap, or on the 
stack (currently, in fact, in D you can easily make pointers and 
slices that point to stack data to outlive the stack frame, which 
the 'safe' .NET type system inherently prevents).


.NET allows one to break the type system using pointers (in 
functions marked 'unsafe'), so as far as I can tell D for .NET 
could theoretically do everything that native D does, but with 
some annoying caveats mainly related to garbage collection. For 
instance, in a slice, I believe you can't use the same memory 
word to refer to an array on the GC heap OR an array that is not 
on the GC heap (unless you want to pin all your arrays, and you 
really don't). IIUC, doing so can crash the garbage collector.


I'm thinking that a .NET D slice would be implemented as a 
reference to a GC array and two integers (start and length). If 
the slice refers to a non-GC array, it would be stored in the 
same space, as a null reference, a pointer cast to IntPtr, and a 
length. However, this would make the code for accessing a slice 
rather clumsy and/or inefficient.


.NET has other limitations too, but again I expect there would be 
workarounds.

Re: @trusted considered harmful

2012-07-28 Thread David Piepgrass


On Saturday, July 28, 2012 22:08:42 David Nadlinger wrote:
On Saturday, 28 July 2012 at 02:33:54 UTC, Jonathan M Davis 
But unfortunately wrong – you call S.save in the @trusted 
block… ;)


Yeah. I screwed that up. I was obviously in too much of a hurry 
when I wrote
it. And actually, in this particular case, since the part that 
can't be
@trusted is in the middle of an expression doing @system stuff, 
simply using an

@trusted block wouldn't do the trick.


Have you guys thought about the possibility that the language 
could simply not trust any calls that were resolved using a 
template argument?


I'm a bit tired so I may be missing something, but it seems to me 
that (in a @trusted template) if the compiler uses an 
instantiated template parameter (e.g. actual type Foo standing in 
for template parameter T) to choose a function to call, the 
compiler should require that the function be @safe, based on the 
principle that a template cannot vouch for what it can't control. 
IOW, since a template can't predict what function actually gets 
called, the compiler should require whatever function gets called 
to be @safe.


If the programmer actually does want his template function to be 
able to call _unpredictable_ @system functions, he should mark 
his template as @system instead of @trusted.

Re: Impressed

2012-07-28 Thread David Piepgrass

I'd say this argument on which is "better", yield or ranges, is
a problem ill posed.

Yeah, since yielding is just a convenient way to implement an
input range, asking which is better is like asking "Which is
better, pick-up trucks or vehicles?"

"yield" adds real, nontrivial value, and is not entirely
implementable as a library. Walter and I saw some uses of it in
C# at Lang.NEXT that were quite impressive.

On the other hand yield's charter is limited when compared to
that of ranges. Yield goes with the very simple "go through
everything once" functionality, which is essentially input
ranges - only a tiny part of ranges can do.

"yield" adds real, nontrivial value, and is not entirely
implementable as a library. Walter and I saw some uses of it in
C# at Lang.NEXT that were quite impressive.

Agreed. However, I have been looking at D's Fibers and I wonder
if an optimized implementation of them could provide the same
functionality reasonably well:

https://www.semitwist.com/articles/article/view/combine-coroutines-and-input-ranges-for-dead-simple-d-iteration

The only problem is performance (and perhaps memory usage, but
there are ways to reduce that). Someone reported that a trivial
fiber-based forward range had 26x the overhead of opApply for
iteration (70s vs 2.7s for 1 billion iterations). I wonder if the
fiber-switching could be optimized? But I looked at core/thread.d
and unless I'm missing something, the fiber switch does not
appear to do much work: it calls Thread.getThis() twice per
switch (= 4 times per iteration), getStackTop() (= rt_stackTop)
once, and a naked asm routine with 21 asm instructions. The
entire yield() process contains no branches; call() additionally
calls setThis() twice and checks if the Fiber threw an exception.
What's the easiest way to time something in D? I'm curious if
Thread.getThis() (= TlsGetValue()) is the bottleneck.

Anyway, stack-switching lets you do not only the same things as
C# 2's "yield return" but as far as I can tell, it can also do
everything that C# 5's "async/await" can do and more:

http://qscribble.blogspot.ca/2012/07/asyncawait-vs-stack-switching.html

i.e. stack switching can accomplish tasks that async/await
cannot, while I don't know of any cases of the reverse. async is
more limited because all functions involved in an async task must
be explicitly marked and transformed by the compiler, but stack
switching works no matter what code is involved; even C code can
be called on an asynchronous fiber task.

Re: @trusted considered harmful

2012-07-27 Thread David Piepgrass


I don't see flaw with 1.

However 2 doesn't sound right.

@trusted {
  // Do something dirty.
}

You aren't supposed to do dirty things in @trusted code. You're 
supposed to  safely wrap a system function to be usable by a 
safe function. The system function is supposed to be short and 
getting its hands dirty.


True, but since the proposal is that all functions should be 
either @safe or @system, a @trusted block is necessary in a @safe 
function in order to call @system functions. Perhaps you would 
suggest that a @trusted block should be able to _call_ @system 
code but not actually do anything unsafe directly? That sounds 
interesting, but it's not how @trusted currently works.

Re: Impressed

2012-07-27 Thread David Piepgrass


On Friday, 27 July 2012 at 01:56:33 UTC, Stuart wrote:

On Friday, 27 July 2012 at 00:10:31 UTC, Brad Anderson wrote:
D uses ranges instead of iterators. You can read more about 
them here: http://ddili.org/ders/d.en/ranges.html


I find ranges to be a vast improvement over iterators 
personally (I use iterators extensively in C++ for my job and 
lament not having ranges regularly).




On Friday, 27 July 2012 at 00:17:21 UTC, H. S. Teoh wrote:

D has something far superior: ranges.

http://www.informit.com/articles/printerfriendly.aspx?p=1407357&rll=1

Even better, they are completely implemented in the library. No
unnecessary language bloat just to support them.


I'm not very well up on ranges. I understand the general [1 ... 
6] type of ranges, but I really don't see how custom range 
functions could be as useful as the Yield support in VB.NET.


Yes, I think H. S. Teoh wrote what that without knowing what 
C#/VB iterators actually are.


.NET has a concept of "enumerators" which are basically 
equivalent to D's "input ranges". Both enumerators and input 
ranges are easier to use and safer than C++ iterators. Neither 
enumerators nor input ranges require any language support to use, 
but both C# and D have syntactic sugar for them in the form of 
the foreach statement. Both C# and D input ranges can be infinite.


C#/VB "iterators", however, are an additional syntactic sugar 
that transforms a function into a state machine that provides an 
enumerator (or "enumerable"). These are indeed very useful, and 
missing from D. Here is an example of an iterator that I updated 
today:


public IEnumerable Overlays() {
foreach (var ps in _patterns)
{
yield return ps.RouteLine;
yield return ps.PermShapes;
if (ps.Selected)
yield return ps.SelShapes;
}
}

It does not work like opApply; the compiler creates a heap object 
that implements IEnumerable or IEnumerator (depending on the 
return value that you ask for -- it is actually IEnumerator that 
works like a forward ranges, but foreach only accepts 
IEnumerable, which is a factory for IEnumerators)


In D you could use opApply to do roughly the same thing, but in 
that case the caller cannot treat the opApply provider like an 
ordinary collection (e.g. IIUC, the caller cannot use map or 
filter on the results).

Re: Can you do this in D?

2012-07-26 Thread David Piepgrass


3. Is there any way of executing code or programs during compile
time?
I've seen an example of CTFE (Compile Time Function Evaluation),
although I'm unsure if this works for stuff like classes.
However, I am considering more advanced execution (not 
constants)

such as printing to a file during compiling for stuff like how
long compiling a certain function/template takes.


You can call any safe and pure D code at compile time (none of 
the code has to be marked pure explicitly, but it cannot access 
any static or global variables, call C code, access files, etc.) 
This is called CTFE=Compile-Time Function Evaluation.


The "pure" limitation isn't a huge restriction, since you can 
still edit member variables (fields) and the compiler can memoize 
the results of CTFE... although I don't know if it memoizes 
automatically, or if you have to use a template to accomplish it. 
For example if I do


enum twoPi = computePi() + computePi();

I don't know if the compiler computes PI once or twice. Does 
someone know? But if I define this template:


@property auto memoize(T, T code)() { return code; }

enum twoPi = memoize!(double,computePi()) + 
memoize!(double,computePi());


Then computePi is surely called only once, and thus you can cache 
the result of any computation for repeated use. (I don't know how 
to get the type 'double' to be inferred automatically, though.) 
You can also, of course, use enums for this purpose:


enum pi = computePi(); // computed only once
enum twoPi = pi + pi;

I don't think you can run "programs" at compile-time, but since 
you can call ordinary functions and use arbitrarily large 
structs, you can accomplish a lot. I believe the current released 
build, 2.059, can't use classes at compile time, but bearophile 
just implied that 2.060 can.



5. Why not support other operators like $, #, and @?
This is more of a rhetorical... as I know the language doesn't
need them, nor would I know if they would be binary/unary
prefix/etc or the precedence... although they would be nice to
have. Specifically I'd like $prefix to be stringification.


Just to clarify, because other people are making it sound like D 
could do this... no, D does not offer user-defined operators, 
only overloading of predefined operators. User-defined ops would 
certainly be a nice feature that I would like to have, but the D 
developers have too much to do already. Personally I think the D 
syntax and rules feel too ad-hoc and unintuitive right now; it 
should be simplified slightly, formalized more clearly, and 
debugged further before yet more features are piled on.

Re: DCT use cases - draft

2012-07-25 Thread David Piepgrass


On Wednesday, 23 May 2012 at 15:36:59 UTC, Roman D. Boiko wrote:

On Tuesday, 22 May 2012 at 18:33:38 UTC, Roman D. Boiko wrote:

I'm reviewing text right now

Posted an updated version, but it is still a draft:

http://d-coding.com/2012/05/23/dct-use-cases-revised.html


BTW, have you seen the video by Bret Victor entitled "Inventing 
on Principle"? This should be a use case for DCT:


http://vimeo.com/36579366

The most important part for the average (nongraphical) developer 
is his demo of writing a binary search algorithm. It may be 
difficult to use an ordinary debugger to debug CTFE, template 
overload resolution and "static if" statements, but something 
like Bret's demo, or what the Light Table IDE is supposed to do...


http://www.kickstarter.com/projects/ibdknox/light-table

...would be perfect for compile-time debugging, and not only 
that, it would also help people write their code in the first 
place, including (obviously) code intended for run-time.


P.S. oh how nice it would be if we could convince anyone to pay 
us to develop these compiler tools... just minimum wage would be 
s nice.

Re: DCT use cases - draft

2012-07-25 Thread David Piepgrass


On Wednesday, 23 May 2012 at 15:36:59 UTC, Roman D. Boiko wrote:

On Tuesday, 22 May 2012 at 18:33:38 UTC, Roman D. Boiko wrote:

I'm reviewing text right now

Posted an updated version, but it is still a draft:

http://d-coding.com/2012/05/23/dct-use-cases-revised.html


I think one of the key challenges will be incremental updates. 
You could perhaps afford to reparse entire source files on each 
keystroke, assuming DCT runs on a PC*, but you don't want to 
repeat the whole semantic analysis of several modules on every 
keystroke. (*although, in all seriousness, I hope someday to 
browse/write code in a smartphone/tablet IDE, without killing 
battery life)


D in particular makes standard IDE features difficult, if the 
code uses a lot of CTFE just to decide the meaning of the code, 
e.g. "static if" computes 1_000_000 digits of PI and decides 
whether to declare method "foo" or method "bar" based on whether 
the last digit is odd or even.


Of course, code does not normally waste the compiler's time 
deliberately, but these sorts of things can easily crop up 
accidentally. So DCT could profile its own operation and report 
to the user which analyses and functions are taking the longest 
to run.


Ideally, somebody would design an algorithm that, given a 
location where the syntax tree has changed, figures out what 
parts of the code are impacted by that change and only re-runs 
semantic analysis on the code whose meaning has potentially 
changed.


But, maybe that is too just hard. A simple approach would be to 
just re-analyze the whole damn program, but prioritize analysis 
so that whatever code the user is looking at is re-analyzed 
first. This could be enhanced by a simple-minded dependency tree, 
so that changing module X does not trigger reinterpretation of 
module Y if Y does not directly or indirectly use X at all.


By using multiple threads to analyze, any long computations 
wouldn't prevent analysis of the "easy parts"; but several 
threads could get stuck waiting on the same thing. For example, 
it would seem to me that if a module X contains a slow "static 
if" at module scope, ANY other module that imports X cannot 
resolve ANY unqualified function calls until that "static if" is 
done processing, because the contents of the "static if" MIGHT 
create new overloads that have to be considered*. So, when a 
thread gets stuck, it needs to be able to look for other work to 
do instead.


In any case, since D is turing-complete and CTFE may enter 
infinite loops (or just very long loops), an IDE will need to 
occasionally terminate threads and restart analysis, so the 
analysis threads must be killable, but hopefully it could be 
designed so that analysis doesn't have to restart from scratch.


I guess immutable data structures will therefore be quite 
important in the design, which you seem to be aware of already.

Re: What is the compilation model of D?

2012-07-25 Thread David Piepgrass


I hope someone can give more details about this.


TDPL chapter 11 "Scaling Up".


That's where I was looking. As I said already, TDPL does not 
explain how compilation works, especially not anything about the 
low-level semantic analysis which has me most curious.

Re: What is the compilation model of D?

2012-07-25 Thread David Piepgrass


If you use rdmd to compile (instead of dmd), you *just* give it
your *one* main source file (typically the one with your 
"main()"
function). This file must be the *last* parameter passed to 
rdmd:


$rdmd --build-only (any other flags) main.d

Then, RDMD will figure out *all* of the source files needed 
(using
the full compiler's frontend, so it never gets fooled into 
missing
anything), and if any of them have been changed, it will 
automatically

pass them *all* into DMD for you. This way, you don't have to
manually keep track of all your files and pass them all into
DMD youself. Just give RDMD your main file and that's it, 
you're golden.


I meant to ask, why would it recompile *all* of the source files 
if only one changed? Seems like it only should recompile the 
changed ones (but still compile them together as a unit.) Is it 
because of bugs (e.g. the template problem you mentioned)?

Re: What is the compilation model of D?

2012-07-25 Thread David Piepgrass

Thanks for the very good description, Nick! So if I understand 
correctly, if


1. I use an "auto" return value or suchlike in a module Y.d
2. module X.d calls this function
3. I call "dmd -c X.d" and "dmd -c Y.d" as separate steps

Then the compiler will have to fully parse Y twice and fully 
analyze the Y function twice, although it generates object code 
for the function only once. Right? I wonder how smart it is about 
not analyzing things it does not need to analyze (e.g. when Y is 
a big module but X only calls one function from it - the compiler 
has to parse Y fully but it should avoid most of the semantic 
analysis.)


What about templates? In C++ it is a problem that the compiler 
will instantiate templates repeatedly, say if I use 
vector in 20 source files, the compiler will generate and 
store 20 copies of vector (plus 20 copies of 
basic_string, too) in object files.


1. So in D, if I compile the 20 sources separately, does the same 
thing happen (same collection template instantiated 20 times with 
all 20 copies stored)?
2. If I compile the 20 sources all together, I guess the template 
would be instantiated just once, but then which .obj file does 
the instantiated template go in?



$rdmd --build-only (any other flags) main.d

Then, RDMD will figure out *all* of the source files needed 
(using
the full compiler's frontend, so it never gets fooled into 
missing
anything), and if any of them have been changed, it will 
automatically

pass them *all* into DMD for you. This way, you don't have to
manually keep track of all your files and pass them all into
DMD youself. Just give RDMD your main file and that's it, 
you're golden.


Side note: Another little trick with RDMD: Omit the 
--build-only and it will compile AND then run your program:


Yes. (Unless you never import anything from in phobos...I 
think.) But
it's very, very fast to parse. Lightning-speed if you compare 
it to C++.


I don't even want to legitimize C++ compiler speed by comparing 
it to any other language ;)



- Is there any concept of an incremental build?


Yes, but there's a few "gotcha"s:

1. D compiles so damn fast that it's not nearly as much of an 
issue as

it is with C++ (which is notoriously ultra-slow compared
to...everything, hence the monumental importance of C++'s 
incremental

builds).


I figure as CTFE is used more, especially when it is used to 
decide which template overloads are valid or how a mixin will 
behave, this will slow down the compiler more and more, thus 
making incremental builds more important. A typical example would 
be a compile-time parser-generator, or compiled regexes.


Plus, I've heard some people complaining that the compiler uses 
over 1 GB RAM, and splitting up compilation into parts might help 
with that.


BTW, I think I heard the compiler uses multithreading to speed up 
the build, is that right?


It keeps diving deeper and deeper to find anything it can 
"start" with.
One it finds that, it'll just build everything back up in 
whatever

order is necessary.


I hope someone can give more details about this.

- In light of the above (that the meaning of D code can be 
interdependent with other D code, plus the presence of mixins 
and all that), what are the limitations of 
__traits(allMembers...) and other compile-time reflection 
operations, and what kind of problems might a user expect to 
encounter?


Shouldn't really be an issue. Such things won't get evaluated 
until the
types/identifiers involved are *fully* analyzed (or at least to 
the
extent that they need to be analyzed). So the results of things 
like
__traits(allMembers...) should *never* change during 
compilation, or

when changing the order of files or imports (unless there's some
compiler bug). Any situation that *would* result in any such 
ambiguity

will get flagged as an error in your code.


Hmm. Well, I couldn't find an obvious example... for example, you 
are right, this doesn't work, although the compiler annoyingly 
doesn't give a reason:


struct OhCrap {
void a() {}
// main.d(72): Error: error evaluating static if expression
// (what error? syntax error? type error? c'mon...)
static if ([ __traits(allMembers, OhCrap) ].length > 1) {
auto b() { return 2; }
}
void c() {}
}

But won't this be a problem when it comes time to produce 
run-time reflection information? I mean, when module A asks to 
create run-time reflection information for all the functions and 
types in module A er, I naively thought the information would 
be created as a set of types and functions *in module A*, which 
would then change the set of allMembers of A. But, maybe it makes 
more sense to create that stuff in a different module (which A 
could then import??)


Anyway, I can't even figure out how to enumerate the members of a 
module A; __traits(allMembers, A) causes "Error: import Y has no 
members".


Aside: I first wrote the above code as follows

Re: What is the compilation model of D?

2012-07-25 Thread David Piepgrass



I find it shocking that anyone would consider 15 seconds slow 
to compile for a
large program. Yes, D's builds are lightning fast in general, 
and 15 seconds
is probably a longer build, but calling 15 seconds 
"slow-to-compile" just
about blows my mind. 15 seconds for a large program is _fast_. 
If anyone
complains about a large program taking 15 seconds to build, 
then they're just
plain spoiled or naive. I've dealt with _Java_ apps which took 
in the realm of
10 minutes to compile, let alone C++ apps which take _hours_ to 
compile. 15

seconds is a godsend.


I agree with Andrej, 15 seconds *is* slow for a edit-compile-run 
cycle, although it might be understandable when editing code that 
uses a lot of CTFE and static foreach and reinstantiates 
templates with a crapton of different arguments.


I am neither spoiled nor naive to think it can be done in under 
15 seconds. Fully rebuilding all my C# code takes less than 10 
seconds (okay, not a big program, but several smaller programs).


Plus, it isn't just build times that concern me. In C# I'm used 
to having an IDE that immediately understands what I have typed, 
giving me error messages and keeping metadata about the program 
up-to-date within 2 seconds. I can edit a class definition in 
file A and get code completion for it in file B, 2 seconds later. 
I don't expect the IDE can ever do that if the compiler can't do 
a debug build in a similar timeframe.

Re: Computed gotos on Reddit

2012-07-25 Thread David Piepgrass


OK I've taken your comments into account.
Now I think I finally got it right:

mov ecx, [ebx] ; ecx = code[pc]
inc ebx ; pc ++
jmp ecx ; goto code[pc], as ecx is already a pointer


Nope, ecx is an opcode, not a pointer. You need another 
indirection.


Man this has been frustrating to read. I understood what Dmitry 
was talking about over at least dozen posts ago, and that's 
without actually reading the article about interpreters (I did 
write a SNES emulator once, but it didn't use this cool 
technique. I did, however, have to write it in assembly because 
the C version was dog-slow because e.g. I couldn't capture the 
overflow/negative/zero flags in C.)

What is the compilation model of D?

2012-07-24 Thread David Piepgrass

(Maybe this should be in D.learn but it's a somewhat advanced 
topic)


I would really like to understand how D compiles a program or 
library. I looked through TDPL and it doesn't seem to say 
anything about how compilation works.


- Does it compile all source files in a project at once?
- Does the compiler it have to re-parse all Phobos templates (in 
modules used by the program) whenever it starts?

- Is there any concept of an incremental build?
- Obviously, one can set up circular dependencies in which the 
compile-time meaning of some code in module A depends on the 
meaning of some code in module B, which in turn depends on the 
meaning of some other code in module A. Sometimes the D compiler 
can resolve the ultimate meaning, other times it cannot. I was 
pleased that the compiler successfully understood this:


// Y.d
import X;
struct StructY {
int a = StructX().c;
auto b() { return StructX().d(); }
}

// X.d
import Y;
struct StructX {
int c = 3;
auto d()
{
static if (StructY().a == 3 && StructY().a.sizeof == 3)
return 3;
else
return "C";
}
}

But what procedure does the compiler use to resolve the semantics 
of the code? Is there a specification anywhere? Does it have some 
limitations, such that there is code with an unambiguous meaning 
that a human could resolve but the compiler cannot?


- In light of the above (that the meaning of D code can be 
interdependent with other D code, plus the presence of mixins and 
all that), what are the limitations of __traits(allMembers...) 
and other compile-time reflection operations, and what kind of 
problems might a user expect to encounter?

Re: Just where has this language gone wrong?

2012-07-19 Thread David Piepgrass

I suspect that you have a C++ background. If this is not 
accurate, ignore the rest. But if it is accurate, my plea to 
you is: Learn other languages. C++ has next to no innovative 
language features (even C++11's take on lambdas is an 
abomination) and encourages defensive programming to the point 
where it's ridiculous (I mean, no default initialization of 
variables? In 2012?).


Actually, C# has no default initialization* of local variables, 
and I love it. Instead, it is a compile-time error to read a 
variable if the compiler cannot guarantee that you have 
initialized it. IMO this is much better than D's "let's 
initialize doubles to NaN so that something fishy will happen at 
runtime if you forget to initialize it" :)


* technically the compiler asks the runtime to bitwise 0-fill 
everything, but that's just an implementation detail required for 
the .NET verifier, and the optimizer can ignore the request to 
preinitialize.

Re: Need runtime reflection?

2012-07-17 Thread David Piepgrass

I want to imitate golang's interface in D, to study D's 
template. I wrote

some code: https://gist.github.com/3123593

Now we can write code like golang:
--
interface IFoo {
void foo(int a, string b, float c);
}

struct Foo {
void foo(int a, string b, float c) {
writeln("Foo.foo: ", a, ", ", b, ", ", c);
}
}

struct FooFoo {
void foo(int a, string b, float c) {
writeln("FooFoo.foo: ", a, ", ", b, ", ", c);
}
}

GoInterface!(IFoo) f = new Foo;
f.foo(3, "abc", 2.2);

f = new FooFoo;
f.foo(5, "def", 7.7);
--

It is also very naive, does not support some features, like 
out/ref
parameters, free functions *[1]* and so on. The biggest problem 
is downcast

not supported. In golang, we can write code like*[2]*:
--
var p IWriter = NewB(10)
p2, ok := p.(IReadWriter)
--

Seems [p.(IReadWriter)] dynamically build a virtual table 
*[3]*，because the
type of "p" is IWriter, it is *smaller* than IReadWriter, the 
cast

operation must search methods and build vtbl at run time.

In D, GoInterface(T).opAssign!(V)(V v) can build a rich runtime 
information
to *V* if we need. But if *V* is interface or base class, the 
type
information not complete. So, seems like I need runtime 
reflection? and how
can I do this in D? I did not find any useful information in 
the TypeInfo*.


--
[1] free functions support, e.g.
--
interface IFoo {
void foo(int a, string b, float c);
}
void foo(int self, int a, string b, float c) {
writefln("...");
}

GoInterface!(int) p = 1;
p.foo(4, "ccc", 6.6);
--
In theory no problem.


I, too, was enamored with Go Interfaces and implemented them for 
.NET:


http://www.codeproject.com/Articles/87991/Dynamic-interfaces-in-any-NET-language

And I wasn't the only one; later, someone else published another 
library for .NET with the exact same goal. This is definitely a 
feature I would want to see in D, preferably as a first-class 
feature, although sadly that would break any code that relies on 
ISomething being pointer-sized; Go uses fat pointers, and we use 
a thin-pointer implementation in .NET but it's inefficient (as 
every cast creates a heap-allocated wrapper, and 
double-indirection is needed to reach the real method.)


Anyway, they say it's possible to build runtime reflection in D 
but I've no idea how... has it never been done before?


Of course, runtime template instantiation won't be possible. 
Therefore, run-time casting will have to be more limited than 
compile-time casting.


Reflection to free functions would be really nice, but it might 
be less capable at run-time. Consider if you there is a class A 
in third-party module MA that you want to cast to interface I, 
but class A is missing a function F() from I. So in your module 
(module MB) you define a free function F(B) and now you can do 
the cast. I guess realistically this can only happen at 
compile-time, since a run-time cast would naturally only look in 
module MA, not MB, for functions it could use to perform the 
cast. Presumably, it also requires that MA requested a run-time 
reflection table to be built, and is it possible to build a 
reflection table for a module over which you have no control?

Re: D front-end in D for D

2012-07-14 Thread David Piepgrass


On Saturday, 14 July 2012 at 10:48:56 UTC, Gor Gyolchanyan wrote:
I just got an amazing thought. If we end up getting a D 
front-end in D, I
think it would be possible to make the back-end in the same 
space as the
code being compiled. This means, having the back-end as a 
library solution.
This would automatically provide 100% compile-time code 
introspection. This
is just a thought. Not a proposal or anything. What do you guys 
think?


Compile-time code introspection is a job for the front-end. It's 
not very good to have code introspect itself at compile-time 
using a library... that would mean the library loads, parses and 
analyzes the very same code that the compiler has already loaded, 
parsed and analyzed. Sounds quite inefficient, and is it even 
legal to read files at compile time, and how would you know what 
paths to read?


Having the front+back-end as a library would, of course, be handy 
for run-time code generation, which definitely is useful place 
too. In C# there's a handy library called RunSharp for that, and 
I miss it in C++. It would, however, mean bundling a complete 
compiler with your application, so the solution feels very heavy 
(as compared to the .NET framework, where developers can take for 
granted that the user's machine already has the libraries.)


I think, for multiple reasons including this use case, D should 
have a "lightweight subset" with a smaller standard library and a 
somewhat simpler language definition (that retains most of D's 
power), which could shrink the size of a program that uses 
runtime codegen. For simplicity, the D front-end written in D 
could use the same backend for CTFE as for its output. And one 
hopes that generated code could be garbage-collected.


However, presumably you'd have to include LLVM which I believe is 
around 1MB for a bare-minimum build (with no optimization passes 
included.)

Re: just an idea (!! operator)

2012-07-13 Thread David Piepgrass


On Friday, 13 July 2012 at 09:49:22 UTC, monarch_dodra wrote:
I don't know much about C#, but in C#, isn't EVERYTHING a 
reference type? Meaning it always makes sense to check if 
"myobject is null".


No, C# has value types (enums, primitives, and user-defined
types) which are not nullable. The null coalescing operator (and
null?.dot, if it existed) is still useful for nullable types of
course; plus, any value type has a nullable counterpart (e.g.
int? = nullable int).

Re: just an idea (!! operator)

2012-07-12 Thread David Piepgrass

Yeah, I've been planning to try and get this into D one day.  
Probably

something like:
(a ?: b) ->  (auto __tmp = a, __tmp ? __tmp : b)


gcc used to have that extension and they dropped it...


But GCC can't control the C++ language spec. Naturally there is a 
reluctance to add nonstandard features. It's a successful feature 
in C#, however, and a lot of people (including me) have also been 
pestering the C# crew for "null dot" (for safely calling methods 
on object references that might be null.)


I don't see why you would use ?: instead of ??, though.

Re: Counterproposal for extending static members and constructors

2012-07-12 Thread David Piepgrass


On Thursday, 12 July 2012 at 17:35:51 UTC, H. S. Teoh wrote:

On Thu, Jul 12, 2012 at 06:25:03PM +0200, David Piepgrass wrote:

I'm putting this in a separate thread from
http://forum.dlang.org/thread/uufohvapbyceuaylo...@forum.dlang.org
because my counterproposal brings up a new issue, which could 
be

summarized as "Constructors Considered Harmful":

http://d.puremagic.com/issues/show_bug.cgi?id=8381


So, if I understand your proposal correctly, you're essentially 
saying
that the ctor of a given class C may return a derived class of 
C instead

of just C itself?


No, it can also return a different class with the same name.


Isn't this just the "object factory" pattern in disguise?


Is is a unification of syntax, just as UFCS is a unification of 
syntax. It solves multiple problems, including information 
hiding, and extending classes written by other parties.

Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread David Piepgrass


we can't just cast to IObject.

Oops, I meant IComparable

Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread David Piepgrass

On Thursday, 12 July 2012 at 17:51:32 UTC, Andrei Alexandrescu 
wrote:

On 7/12/12 1:40 PM, David Piepgrass wrote:
1. Most importantly, the C++ template approach is a big pain 
for
large-scale systems, because in such systems you want to 
create DLLs/SOs
and C++ has neither a standard ABI nor a safe way to pass 
around
template instantiations between DLLs (in the presence of 
changes to
internal implementation details). Similar problems exist for 
D, yes?
It's a lot easier to define a standard ABI for classes than to 
solve the

cross-DLL template problem.


The thing is, that can be done in an opt-in manner. People who 
want methods in the root of the hierarchy can define a root 
that defines them. But there's no way to opt out of inheriting 
Object. Basically it's nice to not force people to buy into a 
constrained environment without necessity.


But is the constrained environment we're talking about really all 
that constrained?


- 'const' is not overly harsh if the user has machanisms to make 
that mean 'logical const'.
- regarding the 5 vtable entries (destructor, toString, toHash, 
opEquals, opCmp), well, that's only 20/40 bytes per process, and 
maybe we don't need opCmp that much.


Although having these in Object seems constraining in one way, 
removing them is constraining in a different way: you can no 
longer provide collection classes for "any" object without 
involving templates.


Wait a minute, though. Keeping in mind the problem of DLL 
interoperability, and the constraints on using templated + many 
DLLs together, what if D introduced the feature that Go and Rust 
have, the ability to adapt any object to a compatible interface?


interface IComparable {
   bool opEquals(IComparable rhs);
   int opCmp(IComparable rhs);
}

class Foo { /* could contain anything */ }

So let's say we remove all the methods from Object, but we still 
want people to be able to make a collection of "any object", such 
as Foo, and pass this collection between DLLs safely. Moreover we 
want only be a single instance of the collection class, defined 
in a single DLL (so this collection cannot be a template class).


Since a class Foo does not declare that it implements 
IComparable, and it might not even contain opCmp() and 
opEquals(), we can't just cast to IObject. Not in the current D, 
anyway.


But now add interface adaptation from Go/Rust. Foo might not 
define opEquals and opCmp itself, but any client can add those 
via UFCS, and the standard library would probably define opEquals 
via UFCS as reference equality already. Thus it is possible for 
any client to pretend that any class implements IComparable, by 
adding the missing pieces (if any) and casting to IComparable.

Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread David Piepgrass

On Thursday, 12 July 2012 at 04:15:48 UTC, Andrei Alexandrescu 
wrote:

Required reading prior to this: http://goo.gl/eXpuX

You destroyed, we listened.

I think Christophe makes a great point. We've been all thinking 
inside the box but we should question the very existence of the 
box. Once the necessity of opCmp, opEquals, toHash, toString is 
being debated, we get to some interesting points:


Well, I'm not convinced it is a good idea to eliminate the stuff 
from Object, nor to remove const (I think RawObject as a base 
class of Object has merit, but to remove the Object functions for 
everyone? I'm very suspicious.)


Some problems I would point out with the idea of "eliminate the 
stuff from Object and use more templates instead":


1. Most importantly, the C++ template approach is a big pain for 
large-scale systems, because in such systems you want to create 
DLLs/SOs and C++ has neither a standard ABI nor a safe way to 
pass around template instantiations between DLLs (in the presence 
of changes to internal implementation details). Similar problems 
exist for D, yes? It's a lot easier to define a standard ABI for 
classes than to solve the cross-DLL template problem.


2. Although templates are used a lot in C++, in D programs they 
are used even more and this proposal would increase template 
usage, so I'd expect the bloat problem to increase. However, 
merging identical functions (identical machine code) might be a 
sufficient solution.


3. The need for more templates slows down compilation. We know 
this is a huge problem in C++.


4. Template bloat is no big deal on desktops but it becomes a 
bigger problem as the target system gets smaller. Maybe some 
compromise should be made to ensure D remains powerful and 
capable on small targets.


There were two proposals yesterday that I liked. Taken together, 
they address all the problems that were raised with const 
functions in Object:


1. Provide a 'safe workaround' for const, for caching and lazy 
evaluation (implement it carefully to avoid breaking the 
guarantees of immutable)
2. Provide a class modifier that makes immutable(_) illegal, so 
the class uses "logical const" instead of "physical const".

Re: Inherited const when you need to mutate

2012-07-11 Thread David Piepgrass

Except that I don't see why Cached!(...) needs to physically 
separate the mutable state from the rest of the object. I mean, 
I see that Cached!(...) would have to cast away immutable 
(break the type system) in order to put mutable state in an 
immutable object, but if we set aside the current type system 
for a moment, *in principle* what's the big deal if the mutable 
state is physically located within the object? In many cases 
you can save significant time and memory by avoiding all that 
hashtable management, and performance Nazis like me will want 
that speed (when it comes to standard libraries, I demand 
satisfaction).


Now, I recognize and respect the benefits of transitive 
immutability:

1. safe multithreading
2. allowing compiler optimizations that are not possible in C++
3. ability to store compile-time immutable literals in ROM

(3) does indeed require mutable state to be stored separately, 
but it doesn't seem like a common use case (and there is a 
workaround), and I don't see how (1) and (2) are necessarily 
broken.


I must be tired.

Regarding (1), right after posting this I remembered the 
difference between caching to a "global" hashtable and storing 
the cached value directly within the object: the hashtable is 
thread-local, but the object itself may be shared between 
threads. So that's a pretty fundamental difference.


Even so, if Cached!(...) puts mutable state directly in the 
object, fast synchronization mechanisms could be used to ensure 
that two threads don't step on each other, if they both compute 
the cached value at the same time. If the cached value is 
something simple like a hashcode, an atomic write should suffice. 
And both threads should compute the same result so it doesn't 
matter who wins.

Re: Inherited const when you need to mutate

2012-07-11 Thread David Piepgrass

Suppose we had a caching solution (you could think of it as 
@cached, but
it could be done in a library). The user would need to provide 
a const,
pure function which returns the same value that is stored in 
the cache.
This is enforceable. The only way to write to the cache, is by 
calling

the function.

How far would that take us? I don't think there are many use 
cases for
logically pure, apart from caching, but I have very little 
idea about

logical const.


I think a caching solution would cover most valid needs and 
indeed would be checkable.


We can even try its usability with a library-only solution. The 
idea is to plant a mixin inside the object that defines a 
static hashtable mapping addresses of objects to cached values 
of the desired types. The destructor of the object removes the 
address of the current object from the hash (if there).


Given that the hashtable is global, it doesn't obey the regular 
rules for immutability, so essentially each object has access 
to a private stash of unbounded size. The cost of getting to 
the stash is proportional to the number of objects within the 
thread that make use of that stash.
Uh, it better not be proportional. Hashtable gives us O(1), one 
hopes.



Sample usage:

class Circle {
private double radius;
private double circumferenceImpl() const {
return radius * 2 * pi;
}
mixin Cached!(double, "circumference", circumferenceImpl);
...
}

auto c = new const(Circle);
Aside: what's the difference between this and new 
immutable(Circle)?



double len1 = c.circumference;
double len2 = c.circumference;

Upon the first use of property c.circumference, Lazy computes 
the value by calling this.circumferenceImpl() and stashes it in 
the hash. The second call just does a hash lookup.


In this example searching the hash may actually take longer 
than computing the thing, but I'm just proving the concept.


If this is a useful artifact, Walter had an idea a while ago 
that we can have the compiler help by using the per-object 
monitor pointer instead of the static hashtable. Right now the 
pointer points to a monitor object, but it could point to a 
little struct containing e.g. a Monitor and a void*, which 
opens the way to O(1) access to unbounded cached data. The 
compiler would then "understand" to not consider that date 
regular field accesses, and not make assumptions about them 
being immutable.


Any takers for Cached? It would be good to assess its level of 
usefulness first.


I like this idea, and I suspect it could be used to implement not 
just caching but lazy immutable data structures.


Except that I don't see why Cached!(...) needs to physically 
separate the mutable state from the rest of the object. I mean, I 
see that Cached!(...) would have to cast away immutable (break 
the type system) in order to put mutable state in an immutable 
object, but if we set aside the current type system for a moment, 
*in principle* what's the big deal if the mutable state is 
physically located within the object? In many cases you can save 
significant time and memory by avoiding all that hashtable 
management, and performance Nazis like me will want that speed 
(when it comes to standard libraries, I demand satisfaction).


Now, I recognize and respect the benefits of transitive 
immutability:

1. safe multithreading
2. allowing compiler optimizations that are not possible in C++
3. ability to store compile-time immutable literals in ROM

(3) does indeed require mutable state to be stored separately, 
but it doesn't seem like a common use case (and there is a 
workaround), and I don't see how (1) and (2) are necessarily 
broken.


As a separate question, do you think it possible to implement 
Cached!(...) to access an immutable field by casting away 
immutable, without screwing up (1) and (2)?

Re: Congratulations to the D Team!

2012-07-11 Thread David Piepgrass

On Wednesday, 11 July 2012 at 18:21:24 UTC, Steven Schveighoffer 
wrote:
On Wed, 11 Jul 2012 14:01:44 -0400, deadalnix 
 wrote:



On 11/07/2012 19:49, Andrei Alexandrescu wrote:

On 7/11/12 1:40 PM, Jakob Ovrum wrote:
Some classes don't lend themselves to immutability. Let's 
take something
obvious like a class object representing a dataset in a 
database. How is

an immutable instance of such a class useful?


This is a good point. It seems we're subjecting all classes 
to certain

limitations for the benefit of a subset of those classes.

Andrei


Did you saw the proposal of feep/tgehr on #d ?

It basically state that you can overload a const method with a 
non const one if :

 - You don't mutate any data that belong to the parent.
 - You are prevented to create any immutable instance of that 
classe or any subclasse.


I don't like this idea.  It means you could not use pure 
functions to implicitly convert mutable class instances to 
immutable (something that should be possible today).


I do like the idea. Please explain by example why a pure function 
could no longer convert mutable class instances to immutable? The 
proposal to restrict the use of immutable is only supposed to 
affect classes that specifically request it.



It also seems to allow abuses.  For example:

class A
{
   private int _x;
   public @property x() const { return _x; }
}

class B : A
{
   private int _x2;
   public @property x() { return _x2++; }
}


I think you would have to mark B somehow to indicate that 
immutable(B) is now illegal, e.g.


@mutating class B : A
{
   private int _x2;
   public @property override x() { return _x2++; }
}

Now I've completely changed the logistics of the x property so 
that it's essentially become mutable.


This kind of perversion is already possible when x() is const. 
x() is allowed to mutate and return a static or global variable.

Re: Let's stop parser Hell

2012-07-11 Thread David Piepgrass


On Tuesday, 10 July 2012 at 23:49:58 UTC, Timon Gehr wrote:

On 07/11/2012 01:16 AM, deadalnix wrote:

On 09/07/2012 10:14, Christophe Travert wrote:

deadalnix , dans le message (digitalmars.D:171330), a écrit :

D isn't 100% CFG. But it is close.

What makes D fail to be a CFG?

type[something] <= something can be a type or an expression.
typeid(somethning) <= same here
identifier!(something) <= again


'something' is context-free:

something ::= type | expression.


I don't see how "type | expression" is context free. The input 
"Foo" could be a type or expression, you can't tell which without 
looking at the context.

Re: Rust updates

2012-07-11 Thread David Piepgrass


  bool test(int x) { return x & 2 > 0; }

gives:

  foo.d(1): Error: 2 > 0 must be parenthesized when next to 
operator &


That reminds me, I was so happy the first two times I got an 
undefined symbol error in D. The compiler said: "Did you mean 
''?" LOL, don't tell me how it works... 
it's magic, right? I love a good error message.

Re: Rust updates

2012-07-11 Thread David Piepgrass

Oh, I can't tell you what a pet peeve PITA the C precedence 
is. Ugh! I know it's
against D philosophy to change the precedence w.r.t. C, but 
how about a
compromise: give a warning or error for "x&2 > 0", with error 
message: "add

parenthesis around x&2 to clarify your intention."


  bool test(int x) { return x & 2 > 0; }

gives:

  foo.d(1): Error: 2 > 0 must be parenthesized when next to 
operator &


Doh! You read my mind before I thought it :) I hadn't got around 
to bit fiddling in D yet.

Re: Rust updates

2012-07-11 Thread David Piepgrass


On Wednesday, 11 July 2012 at 18:31:23 UTC, David Piepgrass wrote:

The trouble with segmented stacks are:

1. they have a significant runtime penalty

Why?

Extra instructions generated for each function.

Every function? Why?


Looks like I misunderstood what "Segmented stacks" are. From an 
LLVM page:


Segmented stack allows stack space to be allocated 
incrementally than as a monolithic chunk (of some worst case 
size) at thread initialization. This is done by allocating 
stack blocks (henceforth called stacklets) and linking them 
into a doubly linked list. The function prologue is responsible 
for checking if the current stacklet has enough space for the 
function to execute; and if not, call into the libgcc runtime 
to allocate more stack space. Support for segmented stacks on 
x86 / Linux is currently being worked on.


I envision a rather different implementation for 32-bit code.

1. Reserve a normal stack with one 4K page committed + some known 
minimum amount of uncommitted memory, e.g. another 8 KB 
uncommitted with a guard page that the program can trap via OS 
facilities (signals, etc.)
2. When the stack overflows, move the stack to a new, much larger 
region of Virtual Memory. Much like languages that support 
compacting garbage collectors, the language / runtime environment 
must be designed to support this.
3. If one needs to call C code, one preallocates the maximum 
expected virtual memory needed, e.g. 32 MB.

Re: Rust updates

2012-07-11 Thread David Piepgrass


The trouble with segmented stacks are:

1. they have a significant runtime penalty

Why?

Extra instructions generated for each function.

Every function? Why?


2. interfacing to C code becomes problematic
Isn't it possible to auto-commit new pages when C code needs 
it?

...
There's no way to predict how much stack arbitrary C code will 
use.
Presumably one does not call arbitrary C code. Usually one knows 
what one might call in advance and can plan accordingly (and even 
if it is arbitrary, one at least knows *that* one is going to 
call C code and plan accordingly. Most C code doesn't allocate 
more than a few megabytes on the stack).

Re: just an idea (!! operator)

2012-07-11 Thread David Piepgrass


it is just an idea, i do not have any specific use in mind.

...
But we can't base the decision solely on this fact. Then we 
could add a million operators to the language just because they 
seem neat.


Actually, we could! Great idea, nimrod! (inside joke)

Re: Rust updates

2012-07-11 Thread David Piepgrass


On Wednesday, 11 July 2012 at 17:09:27 UTC, Timon Gehr wrote:

On 07/11/2012 06:45 PM, David Piepgrass wrote:

...
These benefits (except 3) all exist for "function" as well as 
"fn", but
while many languages use "fun", requiring "function" for all 
functions
is almost unheard of (at least I haven't heard of it), why? 
It's too

damn long! We write functions constantly, we don't want to type
"function" constantly.


You could have a look at JavaScript.


Ack! You got me. Dynamic languages aren't my thing. But JS being 
dynamically typed, it's not as bad since you don't have to 
specify the return type in addition.

Re: Rust updates

2012-07-11 Thread David Piepgrass

Rust has type classes from Haskell (with some simplifications 
for higher kinds), uniqueness typing, and typestates.


As nice as kinds, typestates, typeclasses and several pointer 
types may be, I was in the Rust mailing list and felt unable to 
participate because they kept using terminology that only PhD in 
type systems understand. And googling for "kind" doesn't tell me 
a darn thing ;)


That's why have gravitated to D, it's so much more familiar 
(sometimes too much so, e.g. I still need to 'break' in 'switch'? 
how many meanings for 'static'?) as well as very powerful. I 
would still like to learn about the mumbo-jumbo, though, and I 
know how nice pattern-matching can be from one Haskell-based 
course in university :)



This seems a bit overkill to me:
This is very strict, maybe too much strict:
Agreed about the int suffixes, but I wonder what Marco meant 
about "mass-casts" in D.


The safe pointer types are @T for shared, reference-counted 
boxes, and ~T, for uniquely-owned pointers.
I wonder how well these could be simulated in D. It seems to me 
Rust is carefully designed for performance, or at least real-time 
performance by avoiding garbage collection in favor of safely 
tracking ownership. That's good, but only now are they developing 
things like OOP support that I take for granted.



++ and -- are missing
Rust, like Go, seems very focused on making a "simple" language. 
Another reason that I prefer D.


the logical bitwise operators have higher precedence. In C, x & 
2 > 0 comes out as x & (2 > 0), in Rust, it means (x & 2) > 0, 
which is more likely to be what you expect (unless you are a C 
veteran).
Oh, I can't tell you what a pet peeve PITA the C precedence is. 
Ugh! I know it's against D philosophy to change the precedence 
w.r.t. C, but how about a compromise: give a warning or error for 
"x&2 > 0", with error message: "add parenthesis around x&2 to 
clarify your intention."


Enums are datatypes that have several different 
representations. For example, the type shown earlier:


enum shape {
circle(point, float),
rectangle(point, point)
}
fn angle(vec: (float, float)) -> float {
alt vec {
  (0f, y) if y < 0f { 1.5 * float::consts::pi }
  (0f, y) { 0.5 * float::consts::pi }
  (x, y) { float::atan(y / x) }
}
}
alt mypoint {
{x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
{x, y} { /* Simply bind the fields */ }
}
let (a, b) = get_tuple_of_two_ints();


Records, tuples, and destructuring go so well together. I would 
love to have this.


I am particularly a fan of structural typing. I don't know if 
Rust uses it but Opa and other functional languages often do. You 
see, there's a problem that pops up in .NET all the time, and 
probably the same problem exists in D.


Any time two libraries want to use the same concept, but the 
concept is not in the standard library, they need to define it. 
For instance if there is no "Point" type in the standard library, 
but two unrelated libraries need points, they will both define 
their own (amazingly, Points are poorly thought out in .NET and 
tightly bound to GUI libraries, so people define their own in 
some cases):


// JoesLibrary
struct Point!T { T x, y; /* followed by some manipulation 
functions */ }


// FunkyLibrary
struct Point!T { T x, y; /* followed by other manipulation 
functions */ }


Sadly, the two point types are not compatible with each other. A 
client that wants to use both libraries now has an 
interoperability problem when he wants to pass data between the.


Even a client that uses only one of the library, let's call it 
"JoesLibrary" has to import Point from "JoesLibrary", even if its 
functionality is not quite what the client wants. It would be 
much nicer if the client could define his own Point struct that 
seamlessly interoperates with Joes'. In D this is currently 
impractical, but I would enjoy designing a way to make it work 
(before you point out that "what if x and y are in a different 
order in the two structs" and "it could be T X,Y in one and T x,y 
in the other", yes, I know, It's on my list of problems to 
cleverly solve)


A similar problem exists with interfaces, where two unrelated 
libraries expose two similar classes with some common functions, 
but you can't cast them to a common type in D. This is a solved 
problem in Go (http://www.airs.com/blog/archives/277) and it's 
actually pretty easy for a compiler to magically cast a class to 
an interface that the class did not declare--if the underlying 
language is designed for that, anyway.


In fact, in .NET at least, the same problem exists even if the 
libraries DO know about each other and are even written by the 
same person and use identical interfaces. The problem is, if I 
write two libraries A and B, and I want them to be interoperable, 
then I need to factor out the common structs and interfaces to a 
microscopic third library, I. But from the client's perspective, 
if a client only

Re: Rust updates

2012-07-11 Thread David Piepgrass


On Sunday, 8 July 2012 at 19:28:11 UTC, Walter Bright wrote:

On 7/8/2012 6:49 AM, bearophile wrote:
I think in Go the function stack is segmented and growable as 
in Go. This saves
RAM if you need a small stack, and avoids stack overflows 
where lot of stack is

needed.


The trouble with segmented stacks are:

1. they have a significant runtime penalty

Why?


2. interfacing to C code becomes problematic
Isn't it possible to auto-commit new pages when C code needs it? 
I see that *moving* the stack would be a problem unless you have 
a means to adjust all pointers that point into the stack. If you 
need to call C code in 32-bit, you'd have to specify a maximum 
stack size.

Re: Rust updates

2012-07-11 Thread David Piepgrass


On Wednesday, 11 July 2012 at 16:45:17 UTC, David Piepgrass wrote:
Anyway I think short vs long is much ado about nothing. No one 
complains that we have to type "int" instead of "integer", 
after all. Most languages have only a few keywords, which 
people quickly learn. As long as all the standard library 
functions are well-named, I don't care about the language 
keywords.


Okay, I actually care a lot, just about the meaning of the 
keyword and not about whether it's abbreviated. I think D's use 
of "enum" for "static constant" and "static" for "thread 
singleton" (and three or four other things) is quite unfortunate, 
albeit understandable given the C heritage.

Re: Rust updates

2012-07-11 Thread David Piepgrass

Short keywords are only important with barebones editors like 
a default vi.

Nobody would use this for real development.


I started I long discussion on Reddit, because I complained 
that the goal of 5 letter keywords is primitive, and brings 
back memories of the time the compilers were memory constraint.

...
As someone that values readable code, I don't understand this 
desire to turn every programming language into APL.


Short or long, I don't think it matters if the IDE can help you 
with the long ones. I don't mind typing immutable, once, but if I 
had to do it 50 times a day? And somehow, even though I have been 
programming for over 20 years, I still type "reutrn" and "retrun" 
all the damn time! So "ret" would save me time.


Anyway I think short vs long is much ado about nothing. No one 
complains that we have to type "int" instead of "integer", after 
all. Most languages have only a few keywords, which people 
quickly learn. As long as all the standard library functions are 
well-named, I don't care about the language keywords.


Actually I think "fn" for functions is great, why?

1. Greppability. With the C syntax there is no way to search for 
function definitions. Even if we had an IDE to find functions for 
us, you are not always looking at source code in an IDE (you 
could be browsing a repository on the web)
2. Easier to parse. When the compiler sees "fn", it knows it's 
dealing with a function and not a variable or an expression. It 
seems especially beneficial inside functions, where perhaps X * Y 
might begin an expression (or is that impossible in D?)
3. Googlability. "function" will find results across all PLs, 
"fn" will narrow the search down quite a bit if you want to see 
code in Rust.


These benefits (except 3) all exist for "function" as well as 
"fn", but while many languages use "fun", requiring "function" 
for all functions is almost unheard of (at least I haven't heard 
of it), why? It's too damn long! We write functions constantly, 
we don't want to type "function" constantly.

Re: Does D have too many features?

2012-07-10 Thread David Piepgrass

forum.dlang.org apparently failed to post this 10 minutes ago, 
retrying.


On Tuesday, 10 July 2012 at 02:43:05 UTC, Era Scarecrow wrote:

On Tuesday, 10 July 2012 at 01:41:29 UTC, bearophile wrote:

David Piepgrass:
This use case is pretty complex, so if I port this to D, I'd 
probably just cast away const/immutable where necessary.


You are not the first person that says similar things. So D 
docs need to stress more than casting away const/immutable in 
D is rather more dangerous than doing the same thing in C++.

...
 Let's say a class/struct is a book with Page protectors 
signifying 'const(ant)'. You promise to return the book to the 
library without making any changes; Although you promised you 
wouldn't make changes, you still take the Page protectors off 
and make make notes on the outer edges or make adjustments in 
the text, then return the book.


 Is this wise? This isn't C++. If something shouldn't change, 
then don't change it god damn it. If it needs to change it 
isn't const(ant) and shouldn't suggest it is.


The difficulty, in case you missed it, is that somebody else (the 
Object class) says that certain functions are const, but in 
certain cases we really, really want to mutate something, either 
for efficiency or because "that's just how the data structure 
works". If a data structure needs to mutate itself when read, 
yeah, maybe its functions should not be marked const, but quite 
often the "const" is inherited from Object or some interface that 
(quite reasonably, it would seem) expects functions that /read 
stuff/ to be const.


And yet we can't drop const from Object or such interfaces, 
because there is other code elsewhere that /needs/ const to be 
there.


So far I have no solution to the dilemma in mind, btw. But the 
idea someone had of providing two (otherwise identical) 
functions, one const and one non-const, feels like a kludge to 
me, and note that anybody with an object would expect to be able 
to call the const version on any Object.


Seriously, it's not that hard a concept. I guess if something 
doesn't port well from C++ then redesign it. Some things done 
in C++ are hacks due to the language's limitations and faults.


I was referring to a potential port from C#, which has no const. 
My particular data structure (a complex beast) contains a mutable 
tree of arbitrary size, which the user can convert to a 
conceptually immutable tree in O(1) time by calling Clone(). This 
marks a flag in the root node that says "read-only! do not 
change" and shares the root between the clones. At this point it 
should be safe to cast the clone to immutable. However, the 
original, mutable-typed version still exists. As the user 
requests changes to the mutable copy in the future, parts of the 
tree are duplicated to avoid changing the immutable nodes, with 
one exception: the read-only flag in various parts of the 
original, immutable tree will gradually be set to true.


In this case, I don't think the D type system could do anything 
to help ensure that I don't modify the original tree that is 
supposed to be immutable. Since the static type of internal 
references must either be all mutable or all immutable, they will 
be typed mutable in the mutable copy, and immutable in the 
immutable copy, even though the two copies are sharing the same 
memory.


And one flag, the read-only flag, must be mutable in this data 
structure, at least the transition from false->true must happen 
*after* the immutable copy is created; otherwise, Clone() would 
have to run in O(N) time, to mark every node read-only. This 
fact, however, does not affect the immutable copy in any way.

Re: Inherited const when you need to mutate

2012-07-10 Thread David Piepgrass


On Tuesday, 10 July 2012 at 02:43:05 UTC, Era Scarecrow wrote:

On Tuesday, 10 July 2012 at 01:41:29 UTC, bearophile wrote:

David Piepgrass:
This use case is pretty complex, so if I port this to D, I'd 
probably just cast away const/immutable where necessary.


You are not the first person that says similar things. So D 
docs need to stress more than casting away const/immutable in 
D is rather more dangerous than doing the same thing in C++.

...
 Let's say a class/struct is a book with Page protectors 
signifying 'const(ant)'. You promise to return the book to the 
library without making any changes; Although you promised you 
wouldn't make changes, you still take the Page protectors off 
and make make notes on the outer edges or make adjustments in 
the text, then return the book.


 Is this wise? This isn't C++. If something shouldn't change, 
then don't change it god damn it. If it needs to change it 
isn't const(ant) and shouldn't suggest it is.


The difficulty, in case you missed it, is that somebody else (the 
Object class) says that certain functions are const, but in 
certain cases we really, really want to mutate something, either 
for efficiency or because "that's just how the data structure 
works". If a data structure needs to mutate itself when read, 
yeah, maybe its functions should not be marked const, but quite 
often the "const" is inherited from Object or some interface that 
(quite reasonably, it would seem) expects functions that /read 
stuff/ to be const.


And yet we can't drop const from Object or such interfaces, 
because there is other code elsewhere that /needs/ const to be 
there.


So far I have no solution to the dilemma in mind, btw. But the 
idea someone had of providing two (otherwise identical) 
functions, one const and one non-const, feels like a kludge to 
me, and note that anybody with an object would expect to be able 
to call the const version on any Object.


Seriously, it's not that hard a concept. I guess if something 
doesn't port well from C++ then redesign it. Some things done 
in C++ are hacks due to the language's limitations and faults.


I was referring to a potential port from C#, which has no const. 
My particular data structure (a complex beast) contains a mutable 
tree of arbitrary size, which the user can convert to a 
conceptually immutable tree in O(1) time by calling Clone(). This 
marks a flag in the root node that says "read-only! do not 
change" and shares the root between the clones. At this point it 
should be safe to cast the clone to immutable. However, the 
original, mutable-typed version still exists. As the user 
requests changes to the mutable copy in the future, parts of the 
tree are duplicated to avoid changing the immutable nodes, with 
one exception: the read-only flag in various parts of the 
original, immutable tree will gradually be set to true.


In this case, I don't think the D type system could do anything 
to help ensure that I don't modify the original tree that is 
supposed to be immutable. Since the static type of internal 
references must either be all mutable or all immutable, they will 
be typed mutable in the mutable copy, and immutable in the 
immutable copy, even though the two copies are sharing the same 
memory.


And one flag, the read-only flag, must be mutable in this data 
structure, at least the transition from false->true must happen 
*after* the immutable copy is created; otherwise, Clone() would 
have to run in O(N) time, to mark every node read-only. This 
fact, however, does not affect the immutable copy in any way.

Re: getNext

2012-07-09 Thread David Piepgrass


On Monday, 9 July 2012 at 07:53:41 UTC, David Piepgrass wrote:
I don't know if this proposal went anywhere since 2010, but it 
occurs to me that there is a hidden danger here. alloca will 
allocate a sequence of separate temporaries. If the collection 
is large, the stack will overflow, and the client might not 
have a clue what happened.


Amazing. My post unleashed four pages of comments and not one of 
them responded to my post :O


I think Mehrdad is right that an in/out range should have its own 
name to distinguish it from an input range, but that doesn't 
necessarily mean that the same interface can't be used for both.


I imagine a couple of advantages of:


T tmp;
for(T* front = r.getNext(ref tmp))
// do something with front


instead of:


for(; !r.empty; r.popFront())
// do something with r.front


- If the range uses late-binding, getNext() is faster because 
you're only calling one function instead of 3. When I program in 
C#, I am quite irritated enough that IEnumerator requires 2 
interface calls to get each item. Late binding, of course, is 
necessary across DLL boundaries and can help avoid code bloat.
- If an input-only range has to unpack its elements (e.g. bit 
array => bool, or anything compressed), the range doesn't need to 
unpack repeatedly every time 'front' is accessed, nor does it 
need to reserve memory inside itself for a scratch area (you 
don't want scratch areas in every range if your app keeps track 
of thousands of ranges; plus, ranges tend to get passed by value, 
right?).


That said, it may be unreasonable for the compiler to support the 
necessary escape analysis (impossible in case you're importing 
.di files)... and maybe the existing empty/popFront/front is too 
well established to reconsider? (I am not familiar with the 
status quo).

Inherited const when you need to mutate

2012-07-09 Thread David Piepgrass


On Monday, 9 July 2012 at 16:02:38 UTC, Timon Gehr wrote:

On 07/09/2012 05:00 PM, H. S. Teoh wrote:

On Mon, Jul 09, 2012 at 01:44:24PM +0200, Timon Gehr wrote:

On 07/09/2012 08:37 AM, Adam Wilson wrote:
Object is now const-correct throughout D. This has been a 
dream for many of you. Today it is a reality.


PITA. Forced const can severely harm a code base that wants 
to be flexible -- it leaks implementation details and is 
infectuous.

[...]

Can you give an explicit example of code that is harmed by 
const

correctness?


1.

Most code that gives amortized complexity guarantees, eg:

interface Map(K, V){
V opIndex(K k) const;
// ...
}

class SplayTree(K, V) : Map!(K, V) {
// ???
}

2.

- hash table
- opApply compacts the table if it is occupied too sparsely, in 
order

  to speed up further iteration.
- toString iterates over all key/value pairs by the means of 
opApply.


Clearly, toString cannot be const in this setup.

3.

Often, objects can cache derived properties to speed up the 
code. With
'const-correctness' in place, such an optimization is not 
transparent

nor doable in a modular way.


I guess D does not have 'mutable' (like C++) to override const on 
methods? Caching anything slow-to-compute is my typical use case, 
and I know a hashtable design where the getter will move whatever 
value at finds to the front of a hash collision chain.


Oh, and this is interesting, I implemented a B+tree-like data 
structure* in C# that supports O(1) cloning. It marks the root as 
"frozen", making it copy-on-write. In order to clone in O(1), the 
children are not marked as frozen until later, when someone 
actually wants to mutate one of the copies. A user can also make 
the tree immutable in O(1) time and freely make mutable copies of 
it. This use case is pretty complex, so if I port this to D, I'd 
probably just cast away const/immutable where necessary. C#, of 
course, has no const so it was never a concern there.


*it's actually way fancier than that, I should really write a 
CodeProject article on it.


Of course, the trouble is, you can call any const method on an 
immutable object, so any const method that mutates needs to be 
thread safe. Many uses of C++ 'mutable' are thread-safe (e.g. 
most platforms guarantee atomic pointer-size writes, right? So 
two threads can cache the same int or two equivalent class 
instances, and it doesn't matter who wins)... but many other 
cases are not (e.g. the hashtable).


This is not a solved problem, is it. Ideas?

Re: Congratulations to the D Team!

2012-07-09 Thread David Piepgrass

Thanks for doing this! I haven't contributed yet, but it was 
worrisome hearing about various pull requests languishing for 
long periods. Now maybe I should go learn how to use git...


On Monday, 9 July 2012 at 07:56:40 UTC, Jonathan M Davis wrote:
As far as I'm concerned, 3.minutes() is a prime example of 
what's wong UFCS.
UFCS can be very useful, but oh how I hate that syntax 
(completely aside from
the particular function being called, I think that 3.anything() 
is horrible).

But obviously not everyone agrees.


Certainly not. C# has had this syntax since 1.0 (albeit not 
extension methods until v3.0, but IIRC you could always write 
3.ToString() or 3.HashCode and, incidentally, int.Parse("3") etc. 
Ruby has it too (not UFCS per se, but you actually can add 
methods to any class including integers, IIRC)

Re: getNext

2012-07-09 Thread David Piepgrass

I've just had an idea that is so dark and devious, I was almost 
afraid to try it. But it works like a charm. Consider:


T * getNext(R, E)(ref R range,
  ref E store = *(cast(E*) alloca(E.sizeof))
{
...
}


I don't know if this proposal went anywhere since 2010, but it 
occurs to me that there is a hidden danger here. alloca will 
allocate a sequence of separate temporaries. If the collection is 
large, the stack will overflow, and the client might not have a 
clue what happened.

Re: run-time stack-based allocation

2012-07-09 Thread David Piepgrass

On Thursday, 10 May 2012 at 03:03:22 UTC, Andrei Alexandrescu 
wrote:

On 5/9/12 3:17 PM, Tove wrote:

On Tuesday, 8 May 2012 at 07:03:35 UTC, Gor Gyolchanyan wrote:
Cool! Thanks! I'l definitely check it out! I hope it's DDOCed 
:-D




I just invented an absolutely wicked way of using alloca() in 
the parent

context...

auto Create(void* buf=alloca(frame_size))


Yah, me too. 
http://forum.dlang.org/thread/i1gnlo$18g0$1...@digitalmars.com#post-i1gql2:241k6o:241:40digitalmars.com 
I found it by googling for my name and "dark" and "devious" :o).


That is so awesome that it can't possibly be legal by the spec!

This "runtime struct" sounds really cool too. Pinch me, I must be 
dreaming :D

Re: Why not all statement are expressions ?

2012-07-08 Thread David Piepgrass


int[void] intSet = [2:(), 3:(), 4:()]

oops, void[int] intSet = [2:(), 3:(), 4:()] rather.

Re: Why not all statement are expressions ?

2012-07-08 Thread David Piepgrass

I'm usually fairly ambivalent about the idea of statements 
being
expressions, but I would *love* for switch to be usable as an 
expression.
For instance, in Haxe, you can do stuff like the following, 
which I get a

ton of use out of and often wish D had:

a = switch(b)
{
case 1: "foo";
case 2: "bar";
case 3: "baz";
case 4: "whee";
default: "blork";
}

The D equivalents aren't terrible, but they aren't nearly as 
nice.


This won't work anyway. We are talking about language grammar 
here. If made expression, statement would be of type void. Just 
like assert is.


I see what you're saying, but this switch expression should 
really be of type string.


I certainly wish more things were expressions. "a = if (x) y; 
else z;" isn't especially useful since we have "a = x ? y : z", 
but consider instead something that doesn't map so easily to an 
expression:


// very loosely based on some Android code I wrote recently
dpWidth = _lastKnownWidth =
if (window.isVisible()) {
auto m = context.getResources().getSystemMetrics();
// final statement as value of "if" expr
window.getWidth() / m.pixelDensity();
} else if (_lastKnownWidth != 0)
_lastKnownWidth;
else
screenInfo().getWidth();

Or how about:

auto area = {
auto tmp = foo.bar(baz);
tmp.width * tmp.height;
}

I also wish "void" were a first-class type with sizeof==0 for 
maximum efficiency:


int[void] intSet = [2:(), 3:(), 4:()]

Ditto for size of empty structs. D code should never need 
abominations like the C++ EBCO.

Re: Let's stop parser Hell

2012-07-08 Thread David Piepgrass


On Sunday, 8 July 2012 at 21:22:39 UTC, Roman D. Boiko wrote:

On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote:
It's been too long since I was actively working on parsers to 
give any details, but it is my understanding that because a 
hand-written parser is optimized for a specific grammar, it's 
going to be faster.


My aim is to find out any potential bottlenecks and ensure that 
those are possible to get rid of. So, let's try.


I believe it would not hurt generality or quality of a parser 
generator if it contained sews for inserting custom (optimized) 
code where necessary, including those needed to take advantage 
of some particular aspects of D grammar. Thus I claim that 
optimization for D grammar is possible.


I'm convinced that the output of a parser generator (PG) can be 
very nearly as fast as hand-written code. ANTLR's output (last I 
checked) was not ideal, but the one I planned to make (a few 
years ago) would have produced faster code.


By default the PG's output will not be the speed of hand-written 
code, but the user can optimize it. Assuming an ANTLR-like PG, 
the user can inspect the original output looking for inefficient 
lookahead, or cases where the parser looks for rare cases before 
common cases, and then improve the grammar and insert ... I 
forget all the ANTLR terminology ... syntactic predicates or 
whatever, to optimize the parser.


So far discussion goes in favor of LL(*) parser like ANTLR, 
which is top-down recursive-descent. Either Pegged will be 
optimized with LL(*) algorithms, or a new parser generator 
created.


Right, for instance I am interested in writing a top-down PG 
because I understand them better and prefer the top-down approach 
due to its flexibility (semantic actions, allowing custom code) 
and understandability (the user can realistically understand the 
output; in fact readability would be a specific goal of mine)


Roman, regarding what you were saying to me earlier:
In stage 2 you have only performed some basic analysis, like, 
e.g., matched braces to define some hierarchy. This means that 
at the time when you find a missing brace, for example, you 
cannot tell anything more than that braces don't match.


Stage 2 actually can tell more than just "a brace is missing 
somewhere". Because so many languages are C-like. So given this 
situation:


   frob (c &% x)
  blip # gom;
   }

It doesn't need to know what language this is to tell where the 
brace belongs. Even in a more nebulous case like:


   frob (c &% x) bar @ lic
  blip # gom;
   }

probably the brace belongs at the end of the first line.

Perhaps your point is that there are situations where a parser 
that knows the "entire" grammar could make a better guess about 
where the missing brace/paren belongs. That's certainly true.


However, just because it can guess better, doesn't mean it can 
reinterpret the code based on that guess. I mean, I don't see any 
way to "back up" a parser by an arbitrary amount. A hypothetical 
stage 2 would probably be hand-written and could realistically 
back up and insert a brace/paren anywhere that the heuristics 
dictate, because it is producing a simple data structure and it 
doesn't need to do any semantic actions as it parses. A "full" 
parser, on the other hand, has done a lot of work that it can't 
undo, so the best it can do is report to the user "line 54: 
error: brace mismatch; did you forget a brace on line 13?" The 
heuristic is still helpful, but it has already parsed lines 13 to 
54 in the wrong context (and, in some cases, has already split 
out a series of error messages that are unrelated to the user's 
actual mistake).


As I demonstrated in some examples, it could get the output 
which implies incorrect structure


I was unable to find the examples you refer to... this thread's 
getting a little unweildy :)

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass

Yeah, with a tree-transforming parser, I imagine the same 
thing, except
my current [fantasy] is to convert a certain subset of D to 
multiple other
languages automatically. Then I could write libraries that can 
easily be
used by an astonishingly large audience. I certainly would 
like to see D
targetting Android, but that's best done directly from D to 
ARM.


That does sound very cool.  Possibly difficult though, due to 
having to cater to the lowest-common-denominator in all of your 
API designs.  No templated functions or ranges in your API, 
that's for sure.  I'm sure there are some things where this is 
very doable though; it probably depends on what kind of 
libraries you are writing.


Well, for templates, in general, it would be necessary to 
instantiate a particular set of templates and explicitly give 
them names in the target language. So for instance, I could 
define a Point!T struct in D, sure, but then I'd have to tell the 
language converter to create target-language specializations: in 
C#, PointD=Point!double, PointI=Point!int, etc. If the target 
were C++, the template could be translated to a C++ template, 
Point, as long as there aren't any "static ifs" or other 
things that can't be translated. Notably, if a template P!T 
depends on another template Q!T, then P!T cannot be translated to 
a C++/C# P unless Q!T was also translated as Q.


Adapting standard libraries could no doubt be a gigantic problem. 
I don't know how to begin to think about doing that.


But for ranges in particular, I think the concept is too 
important to leave out of public interfaces. So I'd port the 
major range data structures to the target languages, most likely 
by hand, so that they could be used by converted code.


As for D targeting Android, my intent is really to target X 
where X is any CPU/OS combo you can think of.  I want to be 
able to get D, the language, not necessarily phobos or other 
niceties, to work on any platform, and to do so without much 
work.  Cross-compiling to a new platform that has never been 
cross-compiled before should require zero coding.


I understand. Conversion to C is an effective last resort. And, 
well, I hear a lot of compilers have even used it as a standard 
practice. I guess you'd be stuck with refcounting, though.


I think that the D-directly-to-ARM is the current approach for 
cross-compiling.  I critique it for its underwhelming lack of 
results.


Yeah. I assume it involves weird object-file formats, calling 
conventions and ABIs. I guess very few want to get involved with 
that stuff, and very few have the slightest clue where to begin, 
myself included.


(2) suffer from integration problems if you try to compile the 
expressions in separate files before compiling the rest of the 
front-end.


Absolutely, I love language-integrated metaprogramming. Without 
it you end up with complicated build environments, and I hate 
those, cuz there isn't a single standard build environment that 
everybody likes. I think people should be able to just load up 
their favorite IDE and add all source files to the project and It 
Just Works. Or on the command line, do dmd *.d or whatever. Oh, 
and the ability to run the same code at meta-compile-time, 
compile-time and run-time, also priceless.

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass


On Saturday, 7 July 2012 at 22:35:37 UTC, Roman D. Boiko wrote:

On Saturday, 7 July 2012 at 22:25:00 UTC, David Piepgrass wrote:
This is all true, but forgetting a brace commonly results in a 
barrage of error messages anyway. Code that guesses what you 
meant obviously won't work all the time, and phase 3 would 
have to take care not to emit an error message about a "{" 
token that doesn't actually exist (that was merely 
"guessed-in"). But at least it's nice for a parser to be 
/able/ to guess what you meant; for a typical parser it would 
be out of the question, upon detecting an error, to back up 
four source lines, insert a brace and try again.


So you simply admit that error recovery is difficult to 
implement. For me, it is a must-have, and thus throwing away 
information is bad.


I'm not seeing any tremendous error-handling difficulty in my 
idea. Anyway, I missed the part about information being thrown 
away...?

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass


On Saturday, 7 July 2012 at 22:07:02 UTC, Roman D. Boiko wrote:

On Saturday, 7 July 2012 at 21:52:09 UTC, David Piepgrass wrote:
it seems easier to tell what the programmer "meant" with three 
phases, in the face of errors. I mean, phase 2 can tell when 
braces and parenthesis are not matched up properly and then it 
can make reasonable guesses about where those missing 
braces/parenthesis were meant to be, based on things like 
indentation. That would be especially helpful when the parser 
is used in an IDE, since if the IDE guesses the intention 
correctly, it can still understand broken code and provide 
code completion for it. And since phase 2 is a standard tool, 
anybody's parser can use it.


There could be multiple errors that compensate each other and 
make your phase 2 succeed and prevent phase 3 from doing proper 
error handling. Even knowing that there is an error, in many 
cases you would not be able to create a meaningful error 
message. And any error would make your phase-2 tree incorrect, 
so it would be difficult to recover from it by inserting an 
additional token or ignoring tokens until parser is able to 
continue its work properly. All this would suffer for the same 
reason: you loose information.


This is all true, but forgetting a brace commonly results in a 
barrage of error messages anyway. Code that guesses what you 
meant obviously won't work all the time, and phase 3 would have 
to take care not to emit an error message about a "{" token that 
doesn't actually exist (that was merely "guessed-in"). But at 
least it's nice for a parser to be /able/ to guess what you 
meant; for a typical parser it would be out of the question, upon 
detecting an error, to back up four source lines, insert a brace 
and try again.

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass

What I like about it is not its performance, but how it matches 
the way we think about languages. Humans tend to see overall 
structure first, and examine the fine details later. The tree 
parsing approach is similarly nonlinear and can be modularized 
in a way that might be more intuitive than traditional EBNF.


That reminds me, I forgot to write a another advantage I expected 
the three-phase approach to have, namely, that it seems easier to 
tell what the programmer "meant" with three phases, in the face 
of errors. I mean, phase 2 can tell when braces and parenthesis 
are not matched up properly and then it can make reasonable 
guesses about where those missing braces/parenthesis were meant 
to be, based on things like indentation. That would be especially 
helpful when the parser is used in an IDE, since if the IDE 
guesses the intention correctly, it can still understand broken 
code and provide code completion for it. And since phase 2 is a 
standard tool, anybody's parser can use it.


Example:

void f() {
if (cond)
x = y + 1;
y = z + 1;
}
} // The error appears to be here, but it's really 4 lines up.

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass


On Saturday, 7 July 2012 at 20:39:18 UTC, Roman D. Boiko wrote:

On Saturday, 7 July 2012 at 20:26:07 UTC, David Piepgrass wrote:
I'd like to add that if we give tree parsing first-class 
treatment, I believe the most logical approach to parsing has 
three or more stages instead of the traditional two 
(lex+parse):


1. Lexer
2. Tree-ification
3. Parsing to AST (which may itself use multiple stages, e.g. 
parse the declarations first, then parse function bodies later)


The new stage two simply groups things that are in parenthesis 
and braces. So an input stream such as the following:


I bet that after stage 2 you would have performed almost the 
same amount of work (in other words, spent almost the same 
time) as you would if you did full parsing. After that you 
would need to iterate the whole tree (possibly multiple times), 
modify (or recreate if the AST is immutable) its nodes, etc. 
Altogether this might be a lot of overhead.


My opinion is that tree manipulation is something that should 
be available to clients of parser-as-a-library or even of 
parser+semantic analyzer, but not necessarily advantageous for 
parser itself.


Hmm, you've got a good point there, although simple 
tree-ification is clearly less work than standard parsing, since 
statements like "auto x = y + z;" would be quickly "blitted" into 
the same node in phase 2, but would become multiple separate 
nodes in phase 3.


What I like about it is not its performance, but how it matches 
the way we think about languages. Humans tend to see overall 
structure first, and examine the fine details later. The tree 
parsing approach is similarly nonlinear and can be modularized in 
a way that might be more intuitive than traditional EBNF.


On the other hand, one could argue it is /too/ flexible, 
admitting so many different approaches to parsing that a 
front-end based on this approach is not necessarily intuitive to 
follow; and of course, not using a standard EBNF-type grammar 
could be argued to be bad.


Still... it's a fun concept, and even if the initial parsing ends 
up using the good-old lex-parse approach, semantic analysis and 
lowering can benefit from a tree parser. Tree parsing, of course, 
is just a generalization of linear parsing and so a tree parser 
generator (TPG) could work equally well for flat input.

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass

Since I didn't understand your question I assume that my 
statement was somehow incorrect (likely because I made some 
wrong assumptions about ANTLR). I didn't know about its 
existence until today and still don't understand it completely. 
What I think I understood is that it uses DFA for deciding 
which grammar rule to apply instead of doing backtracking. I 
also think that it uses DFA for low-level scanning (I'm not 
sure).


ANTLR 3 doesn't use a DFA unless it needs to. If unlimited 
lookahead is not called for, it uses standard LL(k) or perhaps it 
uses the modified (approximate? I forget the name) LL(k) from 
ANTLR 2. DFA comes into play, for instance, if you need to check 
what comes after an argument list (of, unlimited, length) before 
you can decide that it *is* an argument list and start the "real" 
parsing (The author says LL(k) is too inefficient so he used a 
restricted form of it; personally I'm not convinced, but I 
digress)

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass


auto captures = syntaxNode.matchNodes(
TOK_WHILE_NODE,
OP_ENTER_NODE,
OP_CAPTURE(0),
OP_BEGIN,
TOK_EXPRESSION,
OP_END,
OP_CAPTURE(1),
OP_BEGIN,
TOK_STATEMENT,
OP_END,
OP_LEAVE_NODE);


I'm glad to hear you like the tree-parsing approach, Chad, 
although the particular syntax here looks pretty unfriendly :O -- 
does this represent something that you are working on right now?


This kind of architecture leads to other interesting benefits, 
like being able to assert which symbols a pattern is designed 
to handle or which symbols are allowed to exist in the AST at 
any point in time. Thus if you write a lowering that introduces 
nodes that a later pass can't handle, you'll know very quickly, 
at least in principle.


I wanted to make such a front-end so that I could easily make a 
C backend.  I believe such a compiler would be able to do that 
with great ease.  I really want a D compiler that can output 
ANSI C code that can be used with few or no OS/CPU 
dependencies.  I would be willing to lose a lot of the nifty 
parallelism/concurrency stuff and deal with reference counting 
instead of full garbage collection, as long as it lets me 
EASILY target new systems (any phone, console platform, and 
some embedded microcontrollers).  Then what I have is something 
that's as ubiquitous as C, but adds a lot of useful features 
like exception handling, dynamic arrays, templates, CTFE, etc 
etc.  My ideas for how to deal with ASTs in pattern recognition 
and substitution followed from this.


I tend to agree that it would be better to have a "general" node 
class with the node type as a property rather than a subtype and 
rather than a myriad of independent types, although in the past I 
haven't been able to figure out how to make this approach 
simultaneously general, efficient, and easy to use. I'm kind of a 
perfectionist which perhaps holds me back sometimes :)


I'd like to add that if we give tree parsing first-class 
treatment, I believe the most logical approach to parsing has 
three or more stages instead of the traditional two (lex+parse):


1. Lexer
2. Tree-ification
3. Parsing to AST (which may itself use multiple stages, e.g. 
parse the declarations first, then parse function bodies later)


The new stage two simply groups things that are in parenthesis 
and braces. So an input stream such as the following:


A man (from a [very ugly] house in the suburbs) was quoted as 
saying {

I saw Batman (and Robin) last night!
}

Is converted to a tree where everything parenthesized or braced 
gets to be a child:


A man (
   from a [
   very ugly
   ] house in the suburbs
) was quoted as saying {
   ...
}

Some of the things I like about this approach are:

1. It's language-agnostic. Lots of languages and DSLs could 
re-use exactly the same code from stage 2. (Stage 1, also, is 
fairly similar between languages and I wonder if a parameterized 
standard lexer is a worthwhile pursuit.)


2. It mostly eliminates the need for arbitrary-length lookahead 
for things like D's template_functions(...)(...). Of course, the 
tokens will almost always end up getting scanned twice, but hey, 
at least you know you won't need to scan them more than twice, 
right? (er, of course the semantic analysis will scan it several 
times anyway. Maybe this point is moot.)


3. It is very efficient for tools that don't need to examine 
function bodies. Such tools can easily leave out that part of the 
parser simply by not invoking the function-body sub-parser.


4. It leaves open the door to supporting embedded DSLs in the 
future. It's trivial to just ignore a block of text in braces and 
hand it off to a DSL later. It is similar to the way PEGs allow 
several different parties to contribute parts of a grammar, 
except that this approach does not constrain all the parties to 
actually use PEGs; for instance if I am a really lazy DSL author 
and I already have a SQL parser laying around (whether it's 
LL(k), LALR, whatever), I can just feed the original input text 
to that parser (or, better, use the flat token stream, sans 
comments, that came out of the lexer.)


5. It's risky 'cause I've never heard of anyone taking this 
approach before. Bring on the danger!


I have observed that most PLs (Programming Langs) use one of two 
versions of stage 2: (1) C-style, with structure indicated 
entirely with {}, (), [], and possibly <> (shudder), or (2) 
Python-style, with structure indicated by indentation instead of 
{}. My favorite is the Boo language, which combines these two, 
using Python style by default, but also having a WSA parsing mode 
(whitespace-agnostic) with braces, and switching to WSA mode 
inside a Python-style module whenever the user uses an opener 
("(,{,["

Re: Let's stop parser Hell

2012-07-07 Thread David Piepgrass

Note that PEG does not impose to use packrat parsing, even 
though it
was developed to use it. I think it's a historical 'accident' 
that put

the two together: Bryan Ford thesis used the two together.


Interesting. After trying to use ANTLR-C# several years back, I 
got disillusioned because nobody was interested in fixing the 
bugs in it (ANTLR's author is a Java guy first and foremost) and 
the source code of the required libraries didn't have source code 
or a license (wtf.)


So, for awhile I was thinking about how I might make my own 
parser generator that was "better" than ANTLR. I liked the syntax 
of PEG descriptions, but I was concerned about the performance 
hit of packrat and, besides, I already liked the syntax and 
flexibility of ANTLR. So my idea was to make something that was 
LL(k) and mixed the syntax of ANTLR and PEG while using more sane 
(IMO) semantics than ANTLR did at the time (I've no idea if ANTLR 
3 still uses the same semantics today...) All of this is 'water 
under the bridge' now, but I hand-wrote a lexer to help me plan 
out how my parser-generator would produce code. The output code 
was to be both more efficient and significantly more readable 
than ANTLR's output. I didn't get around to writing the 
parser-generator itself but I'll have a look back at my handmade 
lexer for inspiration.


However, as I found a few hours ago, Packrat parsing 
(typically used to
handle PEG) has serious disadvantages: it complicates 
debugging because of
frequent backtracking, it has problems with error recovery, 
and typically
disallows to add actions with side effects (because of 
possibility of
backtracking). These are important enough to reconsider my 
plans of using
Pegged. I will try to analyze whether the issues are so 
fundamental that I
(or somebody else) will have to create an ANTLR-like parser 
instead, or
whether it is possible to introduce changes into Pegged that 
would fix these

problems.


I don't like the sound of this either. Even if PEGs were fast, 
difficulty in debugging, error handling, etc. would give me 
pause. I insist on well-rounded tools. For example, even though 
LALR(1) may be the fastest type of parser (is it?), I prefer not 
to use it due to its inflexibility (it just doesn't like some 
reasonable grammars), and the fact that the generated code is 
totally unreadable and hard to debug (mind you, when I learned 
LALR in school I found that it is possible to visualize how it 
works in a pretty intuitive way--but debuggers won't do that for 
you.)


While PEGs are clearly far more flexible than LALR and probably 
more flexible than LL(k), I am a big fan of old-fashioned 
recursive descent because it's very flexible (easy to insert 
actions during parsing, and it's possible to use custom parsing 
code in certain places, if necessary*) and the parser generator's 
output is potentially very straightforward to understand and 
debug. In my mind, the main reason you want to use a parser 
generator instead of hand-coding is convenience, e.g. (1) to 
compress the grammar down so you can see it clearly, (2) have the 
PG compute the first-sets and follow-sets for you, (3) get 
reasonably automatic error handling.


* (If the language you want to parse is well-designed, you'll 
probably not need much custom parsing. But it's a nice thing to 
offer in a general-purpose parser generator.)


I'm not totally sure yet how to support good error messages, 
efficiency and straightforward output at the same time, but by 
the power of D I'm sure I could think of something...


I would like to submit another approach to parsing that I dare 
say is my favorite, even though I have hardly used it at all yet. 
ANTLR offers something called "tree parsing" that is extremely 
cool. It parses trees instead of linear token streams, and 
produces other trees as output. I don't have a good sense of how 
tree parsing works, but I think that some kind of tree-based 
parser generator could become the basis for a very flexible and 
easy-to-understand D front-end. If a PG operates on trees instead 
of linear token streams, I have a sneaky suspicion that it could 
revolutionize how a compiler front-end works.


Why? because right now parsers operate just once, on the user's 
input, and from there you manipulate the AST with "ordinary" 
code. But if you have a tree parser, you can routinely manipulate 
and transform parts of the tree with a sequence of independent 
parsers and grammars. Thus, parsers would replace a lot of things 
for which you would otherwise use a visitor pattern, or 
something. I think I'll try to sketch out this idea in more 
detail later.

Re: Let's stop parser Hell

2012-07-06 Thread David Piepgrass

Resume: everybody is welcome to join effort of translating DMD 
front end, and improving Pegged.


Also I would like to invite those interested in DCT project to 
help me with it. Right now I'm trying to understand whether it 
is possible to incorporate Pegged inside without losing 
anything critical (and I think it is very likely possible), 
and how exactly to do that.


Dmitry proposed to help improve Pegged (or some other 
compiler's) speed.


Anyone else?


I'd really want to create a task force on this, it is of 
strategic importance to D. In Walter's own words, no new 
feature is going to push us forward since we're not really 
using the great goodies we've got, and CTFE technology is the 
most important.


Hi everybody! My name's David and I've been dreaming since around 
1999 of making my own computer language, and never found the time 
for it. The first time I looked at D it was around 2004 or so, 
and it just looked like a "moderately better C++" which I forgot 
about, having more lofty ideas. When I found out about D2's 
metaprogramming facilities I instantly became much more 
interested, although I still wish to accomplish more than is 
possible ATM.


I've been talking to my boss about reducing my working hours, 
mainly in order to have time to work on something related to D. 
My goal is to popularize a language that is efficient (as in 
runtime speed and size), expressive, safe, concise, readable, 
well-documented, easy-to-use, and good at finding errors in your 
code.  In other words, I want a language that is literally all 
things to all people, a language that is effective for any task. 
I want to kill off this preconceived notion that most programmers 
seem to have, that fast code requires a language like C++ that is 
hard to use. The notion that Rapid Application Development is 
incompatible with an efficient executable is nonsense and I want 
to kill it :)


To be honest I have some reservations about D, but of all the 
languages I know, D is currently closest to my ideal.


This work on parsers might be a good place for me to dive in. I 
have an interest in parsers and familiarity with LL, LALR, PEGs, 
and even Pratt parsers (fun!), but I am still inexperienced.


I also like writing documentation and articles, but I always find 
it hard to figure out how other people's code works well enough 
to document it.


I'm having some trouble following this thread due to the 
acronyms: CTFE, DCT, AA. At least I managed to figure out that 
CTFE is Compile Time Function Execution. DCT and AA I already 
know as Discrete Cosine Transform and Anti-Aliasing, of 
course but what's it mean to you?


One thing that has always concerned me about PEGs is that they 
always say PEGs are slower than traditional two-phase LALR(1) or 
LL(k) parsers. However, I have never seen any benchmarks. Does 
anyone know exactly how much performance you lose in an 
(optimized) PEG compared to an (optimized) LALR/LL parser + 
LL/regex lexer?


Anyway, it's the weekend, during which I hope I can find a place 
to fit in with you guys.

Re: Proposal: takeFront and takeBack

2012-07-05 Thread David Piepgrass


(grain of salt, I'm new to D.)

I'd vote for consumeFront being always available, because it's 
distinctly more convenient to call one function instead of two, 
especially when you expect that making a copy of front is cheap 
(e.g. a collection of pointers, numbers or slices). Ranges where 
'front' returns a pointer to a buffer that popFront destroys 
(overwrites) are surely in the minority, right? So I hope they 
would be retrofitted to support consumeFront.


But, given that popFront is allowed to be destructive to the 
value of front, by re-using the same buffer (and that the 
proposed consumeFront might also be implemented with 'delayed 
destruction' to front), I am wondering how one is supposed to 
implement generic code correctly when this is unacceptable, e.g.:


void append(Range1,Range2)(Range1 input, ref Range2 output)
{
	// Usually works, unless input.popFront happens to be 
destructive?

foreach(e; input) output ~= e;
}

96 matches

Mail list logo