Re: What is the compilation model of D?

2012-07-25 Thread Roman D. Boiko

On Wednesday, 25 July 2012 at 20:25:19 UTC, David Piepgrass wrote:

I hope someone can give more details about this.


TDPL chapter 11 "Scaling Up".


That's where I was looking. As I said already, TDPL does not 
explain how compilation works, especially not anything about 
the low-level semantic analysis which has me most curious.


Strange, because it seems to me this chapter answers all your 
previous questions. What exact details are you interested in?


Re: What is the compilation model of D?

2012-07-25 Thread Roman D. Boiko

On Wednesday, 25 July 2012 at 19:54:31 UTC, David Piepgrass wrote:
It keeps diving deeper and deeper to find anything it can 
"start" with.
One it finds that, it'll just build everything back up in 
whatever

order is necessary.


I hope someone can give more details about this.


TDPL chapter 11 "Scaling Up".


Re: Study: build times for D programs

2012-07-24 Thread Roman D. Boiko
On Tuesday, 24 July 2012 at 15:06:58 UTC, Andrei Alexandrescu 
wrote:
Nevertheless, I think there is value in the study. We're 
looking at a real nontrivial application that wasn't written 
for a study, but for actual use, and that implements the same 
design and same functionality in both languages.


OK. And it could serve as a basis for further variations:

* introduce some feature (e.g., ranges), measure impact
* measure impact of multiple features alone and in combination

Of course, trivial changes would unlikely yield anything useful, 
but I believe there is a way to produce valuable data in a 
controlled research.




Re: Study: build times for D programs

2012-07-24 Thread Roman D. Boiko
On Tuesday, 24 July 2012 at 14:34:58 UTC, Andrei Alexandrescu 
wrote:
the D source is in D1 and should be adjusted to compile with 
D2),


That would provide performance (compilation and run-time) for D1 
only (with D2 compiler). Performance of a typical D2 app would 
likely be different.


Re: Let's stop parser Hell

2012-07-23 Thread Roman D. Boiko

On Monday, 23 July 2012 at 13:22:47 UTC, Philippe Sigaud wrote:
On Thu, Jul 5, 2012 at 10:00 PM, Tobias Pankrath 
 wrote:


If you are interested in parsing, than I wouldn't recommend 
the dragon book,

but this one

It really is an awesome book.


This little post to say a big 'Thanks' to Tobias. I'm reading 
Grune's
Parsing Techniques and find it absolutely wonderful: easy to 
read,
going into details and limits of the many parsing algos. Very 
cool!


Yes, I'm also reading it and find it amusingly high-quality.


Re: just an idea (!! operator)

2012-07-13 Thread Roman D. Boiko

On Friday, 13 July 2012 at 13:46:10 UTC, David Nadlinger wrote:

I guess that this operator is only really worth it in languages
where every type is nullable, though.

David


It might mean identity (return the argument unchanged) for value 
types.


Re: D versionning

2012-07-13 Thread Roman D. Boiko

On Friday, 13 July 2012 at 06:52:25 UTC, Adam Wilson wrote:
I hope Walter isn't against this, because I'm not seeing much 
community disagreement with this...


I would not be against having development and stable versions, 
but the price is not trivial: every pull request must be done in 
at least two branches, probably diverging significantly. And most 
benefits are already available: we have the git version and the 
last stable version (of course, the latter would be without the 
latest bug-fixes). That would mean slower progress in applying 
existing pull requests. (There are 100+ of those, aren't there?)


Also, nobody is preventing any person that considers this to be 
very important from creating a fork of stable branch and applying 
bug-fixes there. If this happens to be a very useful option, then 
it could be accepted as a policy.


So my point of view is that it might be too early to have such 
policy yet.


Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko
On Thursday, 12 July 2012 at 17:51:32 UTC, Andrei Alexandrescu 
wrote:

On 7/12/12 1:40 PM, David Piepgrass wrote:
1. Most importantly, the C++ template approach is a big pain 
for
large-scale systems, because in such systems you want to 
create DLLs/SOs
and C++ has neither a standard ABI nor a safe way to pass 
around
template instantiations between DLLs (in the presence of 
changes to
internal implementation details). Similar problems exist for 
D, yes?
It's a lot easier to define a standard ABI for classes than to 
solve the

cross-DLL template problem.


The thing is, that can be done in an opt-in manner. People who 
want methods in the root of the hierarchy can define a root 
that defines them. But there's no way to opt out of inheriting 
Object. Basically it's nice to not force people to buy into a 
constrained environment without necessity.


+1

4. Template bloat is no big deal on desktops but it becomes a 
bigger
problem as the target system gets smaller. Maybe some 
compromise should
be made to ensure D remains powerful and capable on small 
targets.


I think virtuals are an equally, if not worse, problem in 
memory-constrained systems. The simple solution people choose 
for such - they use them judiciously. Here actually templates 
may be at an advantage because of their opt-in quality.


+1. Such seams give flexibility that otherwise would be extremely 
hard to achieve.


There were two proposals yesterday that I liked. Taken 
together, they
address all the problems that were raised with const functions 
in Object:


1. Provide a 'safe workaround' for const, for caching and lazy
evaluation (implement it carefully to avoid breaking the 
guarantees of

immutable)


We should explore this option in any case. At this point I'm 
starting to believe (a) we're doing the right thing by 
marginalizing the four methods aside from this issue, (b) 
caching is good for other things than this particular problem.


+2 (Especially if it would be possible to redefine 'lazy', so 
that it would evaluate at most once.)


Re: just an idea (!! operator)

2012-07-12 Thread Roman D. Boiko
On Thursday, 12 July 2012 at 17:35:05 UTC, Alex Rønne Petersen 
wrote:
But on the other hand, C# has had it from day one and it's 
still widely used and encouraged today:


It has been introduced in C# 2.0 and quickly gained high 
popularity.


Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko

On Thursday, 12 July 2012 at 14:58:29 UTC, deadalnix wrote:

I think this is not a problem as big as it is stated.

Most of that code will be executed close to never, and 60Mb 
isn't a big deal for any modern computer, not even for most 
cell phones now.


L1 cache size is.


Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko
On Thursday, 12 July 2012 at 13:49:29 UTC, Andrei Alexandrescu 
wrote:

Too complicated. I think we can afford one comparison.


One comparison for each of these basic usages. Would branch 
prediction work fine in each use case?


Anyway I vote for removing those methods from the root class.


Re: just an idea (!! operator)

2012-07-12 Thread Roman D. Boiko

On Thursday, 12 July 2012 at 12:51:38 UTC, Jacob Carlborg wrote:

On 2012-07-12 13:35, Jonas Drewsen wrote:


Or the operator?? could be borrowed from c#

auto a = foo ?? new Foo();

is short for:

auto a = foo is null ? new Foo() : foo;

/Jonas



I really like that operator. The existential operator, also 
known as the Elvis operator :) . It's available in many 
languages with slightly different semantics.


+1


Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko
On Thursday, 12 July 2012 at 13:39:54 UTC, Steven Schveighoffer 
wrote:
On Thu, 12 Jul 2012 09:20:47 -0400, Andrei Alexandrescu 
 wrote:
If we define alternative free generic functions in object.d 
for the four culprit methods (and have the compiler, druntime, 
and stdlib use them instead of the methods), those functions 
can check whether a given class object has overridden the 
old-style functions. In that case, that means we're dealing 
with legacy classes and proceed the old-style way. Otherwise, 
proceed the new way.


Hm... I don't like this, it slows down a very basic function.

I think if we want a solution that allows old code to work, why 
not what Timon suggested? Have a base class for Object 
(RawObject was suggested) that does not implement the 
opFunctions.  It would still break code, but would be easy to 
fix (just specify your class derives from Object, not 
RawObject).


-Steve





Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko

On Thursday, 12 July 2012 at 13:41:52 UTC, Roman D. Boiko wrote:
...

ups. I meant +1.


Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko

On Thursday, 12 July 2012 at 12:43:01 UTC, Roman D. Boiko wrote:

On Thursday, 12 July 2012 at 12:36:18 UTC, RivenTheMage wrote:
On Thursday, 12 July 2012 at 12:06:49 UTC, Roman D. Boiko 
wrote:

Jon Skeet wrote on this long ago:


http://msmvps.com/blogs/jon_skeet/archive/2008/12/05/redesigning-system-object-java-lang-object.aspx

The fact that every object has a monitor associated with it 
was a
mistake in Java, and was unfortunately copied in .NET. This 
promotes the bad practice of locking on "this" and on types - 
both of which are typically publicly accessible references. I 
believe that unless a reference is exposed explicitly for the 
purpose of locking (like ICollection.SyncRoot) then you should 
avoid locking on any reference which other code knows about.


This has been discussed multiple times on the D forum, I 
believe.


Do you mean Monitor or all other issues from that post as well? 
Do you have any links? I would be interested to know 
conclusions.


OK, I found one myself from this post:
http://michelf.com/weblog/2012/mutex-synchonization-in-d/


Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko

On Thursday, 12 July 2012 at 12:36:18 UTC, RivenTheMage wrote:

On Thursday, 12 July 2012 at 12:06:49 UTC, Roman D. Boiko wrote:

Jon Skeet wrote on this long ago:


http://msmvps.com/blogs/jon_skeet/archive/2008/12/05/redesigning-system-object-java-lang-object.aspx

The fact that every object has a monitor associated with it was 
a
mistake in Java, and was unfortunately copied in .NET. This 
promotes the bad practice of locking on "this" and on types - 
both of which are typically publicly accessible references. I 
believe that unless a reference is exposed explicitly for the 
purpose of locking (like ICollection.SyncRoot) then you should 
avoid locking on any reference which other code knows about.


This has been discussed multiple times on the D forum, I 
believe.


Do you mean Monitor or all other issues from that post as well? 
Do you have any links? I would be interested to know conclusions.


Re: All right, all right! Interim decision regarding qualified Object methods

2012-07-12 Thread Roman D. Boiko
On Thursday, 12 July 2012 at 04:15:48 UTC, Andrei Alexandrescu 
wrote:

Required reading prior to this: http://goo.gl/eXpuX

You destroyed, we listened.
[...]

What say you?

Andrei


I agree, they are not needed.

This situation is similar to C#, but there the problem doesn't 
seem to be so severe.


Jon Skeet wrote on this long ago:

http://msmvps.com/blogs/jon_skeet/archive/2008/12/05/redesigning-system-object-java-lang-object.aspx

http://stackoverflow.com/questions/3096028/why-gethashcode-is-in-object-class



Re: Congratulations to the D Team!

2012-07-11 Thread Roman D. Boiko
On Wednesday, 11 July 2012 at 18:21:24 UTC, Steven Schveighoffer 
wrote:

...

It also seems to allow abuses.  For example:

class A
{
   private int _x;
   public @property x() const { return _x; }
}

class B : A
{
   private int _x2;
   public @property x() { return _x2++; }
}

Now I've completely changed the logistics of the x property so 
that it's essentially become mutable.  In effect, I overrode 
the const piece of x completely to make it non-const without a 
cast, and anyone calling x() on an A instance cannot trust that 
it won't increment the effective value of x().


I think the solution to this overall problem is simply to make 
object.opEquals use the most derived types available for 
comparison.


-Steve


I thought about this example. You can have a const method 
returning different values anyway (instead of returning what base 
class' method would return), and A._x has not been mutated. From 
my point of view, it is just a different concept, but arguably 
not a worse one.




Re: Congratulations to the D Team!

2012-07-11 Thread Roman D. Boiko

On Wednesday, 11 July 2012 at 18:10:04 UTC, H. S. Teoh wrote:

On Wed, Jul 11, 2012 at 08:01:44PM +0200, deadalnix wrote:

...

Did you saw the proposal of feep/tgehr on #d ?

It basically state that you can overload a const method with a 
non

const one if :
 - You don't mutate any data that belong to the parent.
 - You are prevented to create any immutable instance of that 
classe

or any subclasse.


+1, very good, I like this idea!


It is a trade-off, but relatively nice one, IMO.


Re: Inherited const when you need to mutate

2012-07-11 Thread Roman D. Boiko
On Wednesday, 11 July 2012 at 14:42:43 UTC, Andrei Alexandrescu 
wrote:

On 7/11/12 10:33 AM, Timon Gehr wrote:

Any takers for Cached? It would be good to assess its level of
usefulness first.


Andrei


Well, it is inefficient.


The idea here is to assess functionality provided in order to 
decide whether to go down this route or not.


Andrei


Probably using a separate thread would be better for voting or 
discussing, since this one already has many branches of ideas.


Re: Inherited const when you need to mutate

2012-07-11 Thread Roman D. Boiko
On Wednesday, 11 July 2012 at 14:42:43 UTC, Andrei Alexandrescu 
wrote:

On 7/11/12 10:33 AM, Timon Gehr wrote:

Any takers for Cached? It would be good to assess its level of
usefulness first.


Andrei


Well, it is inefficient.


The idea here is to assess functionality provided in order to 
decide whether to go down this route or not.


Andrei


Probably using a separate thread would be better for voting, 
since this one already has many branches of ideas.


Re: Inherited const when you need to mutate

2012-07-11 Thread Roman D. Boiko
On Wednesday, 11 July 2012 at 12:55:39 UTC, Andrei Alexandrescu 
wrote:
Any takers for Cached? It would be good to assess its level of 
usefulness first.


As for me, lazy computation (with caching) would likely be the 
last feature I really miss in D.


Re: Let's stop parser Hell

2012-07-10 Thread Roman D. Boiko

On Tuesday, 10 July 2012 at 20:25:12 UTC, Jonathan M Davis wrote:

On Tuesday, July 10, 2012 21:25:52 Timon Gehr wrote:

FWIW, this is what most HTML parsers are doing.


Which is horrible. You pretty much have to with HTML because of 
the horrid
decision that it should be parsed so laxly by browsers, but 
pretty much
nothing else should do that. Either it's correct or it's not. 
Having the
compiler "fix" your code would cause far more problems that it 
would ever fix.


Not having control over parser or source code causes problems. 
Ability to deliver useful functionality (see my post above) is a 
different use case.


Re: Let's stop parser Hell

2012-07-10 Thread Roman D. Boiko

On Tuesday, 10 July 2012 at 19:41:29 UTC, Philippe Sigaud wrote:
On Tue, Jul 10, 2012 at 9:25 PM, Timon Gehr  
wrote:


Do people really what error-repairing parsers? I want my 
parsers to
tell me something is bad, and, optionally to advance a 
possible
repair, but definitely *not* to automatically repair a 
inferred error

and continue happily.



FWIW, this is what most HTML parsers are doing.


Ah, right. I can get it for HTML/XML. JSON also, maybe.
I was thinking of parsing a programming language (C, D, etc)

Consider me half-convinced :)


It would still generate errors. But would enable a lot of useful 
functionality: autocompletion, refactoring, symbol documentation 
in a tooltip, displaying method overloads with parameters 
as-you-type, go to definition, etc.


Re: Let's stop parser Hell

2012-07-10 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 16:37:56 UTC, Roman D. Boiko wrote:
Note that PEG does not impose to use packrat parsing, even 
though it was developed to use it. I think it's a historical 
'accident' that put the two together: Bryan Ford thesis used 
the two together.


Note that many PEG parsers do not rely on packrat (Pegged does 
not).

There are a bunch of articles on Bryan Ford's website by a guy
writting a PEG parser for Java, and who found that storing the 
last rules was enought to get a slight speed improvement, buth 
that doing anymore sotrage was detrimental to the parser's 
overall efficiency.


That's great! Anyway I want to understand the advantages and 
limitations of both Pegged and ANTLR, and probably study some 
more techniques. Such research consumes a lot of time but can 
be done incrementally along with development.


One disadvantage of Packrat parsers I mentioned was problematic 
error recovery (according to the article from ANTLR website). 
After some additional research, I found that it is not a critical 
problem. To find the exact place of error (from parser's 
perspective, not user's) one only needs to remember the farthest 
successfully parsed position (among several backtracking 
attempts) and the reason that it failed.


It is also possible to rerun parsing with some additional 
heuristics after failing, thus enabling advanced error repair 
scenarios.


Since Pegged doesn't use Packrat algorithm, this solution might 
be either not relevant or not applicable, but I doubt that there 
will be any fundamental problem with error recovery.


Unpleasant debugging experience, however, should be relevant for 
any parser that uses backtracking heavily.


Re: getNext

2012-07-09 Thread Roman D. Boiko

By the way, this thread is quite old.

getNext looks similar to what I wanted when first dealt with 
ranges, but now it looks too heavyweight.


What happened to this proposal anyway? Was it deferred, 
discarded, or what?


Re: getNext

2012-07-09 Thread Roman D. Boiko

On Monday, 9 July 2012 at 20:21:18 UTC, Mehrdad wrote:
that's not what I meant, but I think another solution is better 
anyway:


Why isn't transform taking in both an input and an output range 
in the first place? Of course they might be the same, but they 
don't have to be.


Could you make a detailed proposal? I still completely don't 
understand neither the problem you're trying to solve, nor the 
solution you have in mind.


I was reading all your posts in this thread very carefully.


Re: getNext

2012-07-09 Thread Roman D. Boiko

On Monday, 9 July 2012 at 15:34:35 UTC, Mehrdad wrote:
On Monday, 9 July 2012 at 15:16:52 UTC, Andrei Alexandrescu 
wrote:

the only primitive of output ranges is "put".

Andrei


Sorry? I don't know what you mean


template isOutputRange(R,E)

Returns true if R is an output range for elements of type E. An 
output range is defined functionally as a range that supports the 
operation


void put(R, E)(ref R r, E e);


Re: Let's stop parser Hell

2012-07-09 Thread Roman D. Boiko

On Monday, 9 July 2012 at 06:35:38 UTC, Jacob Carlborg wrote:
I'm not expert in this field but I've heard that it's harder to 
get good error reporting with a parser generator.


Good point!


Re: Let's stop parser Hell

2012-07-08 Thread Roman D. Boiko
David, I would suggest you to create a proof-of-concept prototype 
and pay attention to some non-trivial use cases. This way you'd 
likely be able to reduce major risks significantly before making 
any serious time investments.




Re: Let's stop parser Hell

2012-07-08 Thread Roman D. Boiko
I believe it would not hurt generality or quality of a parser 
generator if it contained sews for inserting custom (optimized)


s/sews/seams


Re: Let's stop parser Hell

2012-07-08 Thread Roman D. Boiko

On Sunday, 8 July 2012 at 21:03:41 UTC, Jonathan M Davis wrote:
It's been too long since I was actively working on parsers to 
give any details, but it is my understanding that because a 
hand-written parser is optimized for a specific grammar, it's 
going to be faster.


My aim is to find out any potential bottlenecks and ensure that 
those are possible to get rid of. So, let's try.


I believe it would not hurt generality or quality of a parser 
generator if it contained sews for inserting custom (optimized) 
code where necessary, including those needed to take advantage of 
some particular aspects of D grammar. Thus I claim that 
optimization for D grammar is possible.


Also, look at dmd and dmc vs other compilers. They use 
hand-written parsers and are generally much faster than their 
competitors.


Due to which particular design or implementation decisions? Would 
it be possible to name a few which are relevant in this context?


One thing to remember about hand-written parsers vs generative 
ones though is that they usually are completely different in 
terms of the type of parser that you write (e.g. hand-written 
parsers are generally recursive-decent parser whereas 
generative ones usually use bottom-up parsers).


So far discussion goes in favor of LL(*) parser like ANTLR, which 
is top-down recursive-descent. Either Pegged will be optimized 
with LL(*) algorithms, or a new parser generator created.





Re: Let's stop parser Hell

2012-07-08 Thread Roman D. Boiko

On Sunday, 8 July 2012 at 20:06:07 UTC, Jonathan M Davis wrote:
most of the focus right now from people interested in parsing 
seems to be on pegged and parser generators (which are very 
cool and in some ways much more interesting, but I seriously 
question that that's the performant way to go if you're looking 
to parse D specifically).


Can you provide a *specific* example of performance optimization 
which a custom D compiler would have, but parser generator would 
be unlikely to catch up.


Re: Let's stop parser Hell

2012-07-08 Thread Roman D. Boiko

On Sunday, 8 July 2012 at 09:14:32 UTC, Dmitry Olshansky wrote:

On 08-Jul-12 13:05, Tobias Pankrath wrote:

Yup, LL(*) is my favorite so far.


That's Terence Parr's discovery, right? I've always liked 
ANTLR, so if
PEGs turn out to have issues LL(*) sounds like a promising 
alternative.


We should also consider using GLR if LL(*) doesn't work.


GLR ... are you serious? It still does parsing in n^3 if I'm 
not mistaken.


http://www.antlr.org/papers/LL-star-PLDI11.pdf describes some 
disadvantages of GLR with respect to LL(*).


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Sunday, 8 July 2012 at 01:15:31 UTC, David Piepgrass wrote:
I'm not seeing any tremendous error-handling difficulty in my 
idea. Anyway, I missed the part about information being thrown 
away...?


(I will use the word "you" to refer to some abstract person or 
compiler.)


Error handling could mean either error reporting and stopping 
after the first error, or reporting several errors and continuing 
parsing + semantic analysis whenever possible, so that the user 
would get partial functionality like code completion, information 
about method overloads, or "go to definition" / find all usages, 
etc.


The first method is the less powerful option, but still requires 
constructing a meaningful error message which could help the user.


There are many possible error recovery strategies. For example, 
you might decide to insert some token according to the parsing 
information available up to the moment when error is discovered, 
if that would fix the problem and allow to continue parsing. 
Another option is to ignore a series of further tokens (treat 
them a whitespace, for example), until parser is able to continue 
its work. Also there are many heuristics for typical error 
situations.


All of these can only be performed if parser gathers the syntax 
information which can be inferred from source code according to 
the grammar rules.


In stage 2 you have only performed some basic analysis, like, 
e.g., matched braces to define some hierarchy. This means that at 
the time when you find a missing brace, for example, you cannot 
tell anything more than that braces don't match. Or, if the user 
inserts a brace in an incorrect location, it is only possible to 
say that it is either missing somewhere and somewhere else 
another brace is missing, or that it is missing in one place, and 
some other brace is redundant. In many cases you won't even 
notice that a brace is incorrectly placed, and pass the resulting 
tree to the 3rd stage. You don't take any hint from grammar about 
the exact locations where some token should be present.


Now, stage 3 heavily depends on the output of stage 2. As I 
demonstrated in some examples, it could get the output which 
implies incorrect structure, even if that has not been found in 
the previous stage. You would need to analyze so much information 
attempting to find the real roots of a problem, that effectively 
it would involve duplicating (implicitly) the logic of previous 
stage, but with combinatorial complexity growth.


The problems you would need to deal with are much more complex 
than I described here. So you wouldn't be able to deliver error 
recovery at all, or (if you could) it would be either trivial or 
would require needlessly complex logic. Breaking the system at 
the wrong boundary (when the number of dependencies that cross 
the boundary is large) always introduces artificial complexity.


Described above is the illustration of what I call information 
loss. I may have described something not as clearly as needed, 
but I didn't have the goal to provide a thorough and verifiable 
analysis. I speculated and simplified a lot. If you decide to 
ignore this, it is not a problem and I don't state that you will 
fail any of your goals. Just admit that __for me__ this approach 
is not a viable option. Everything written above is IMHO and 
there may be many ways to resolve the problems with various 
degrees of success.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 22:40:05 UTC, Andrei Alexandrescu
wrote:


http://www.antlr.org/wiki/display/~admin/ANTLR+v4+lexers


Thanks, nice article.

Also found another post:
http://www.antlr.org/wiki/display/~admin/2008/11/30/Example+tree+rewriting+with+patterns,
which is related to some other discussion in this thread :)



Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 22:45:01 UTC, Chad J wrote:

I see what you mean.

And I think that I expressed myself badly. Let me rephrase.

When the memory hierarchy is deep, every level would require at 
least one bit position. Or even every base class would require a 
separate bit. (I think that the former + several bits to 
distinguish among hierarchies.) Otherwise it would not be easy to 
check by a bit-mask. Even if the above is incorrect (and that is 
likely since I didn't try to encode that compactly for the real 
grammar), I think that in general that information would only be 
possible to store in a fairly large integral. Especially if we 
try to generate parser from grammar, and thus can't do 
fine-tuning to pack the information tightly. This overhead would 
be paid per each AST node __instance__. But that is not 
necessary, since we could store information in a lookup table 
only once per node __type__.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 22:25:00 UTC, David Piepgrass wrote:
This is all true, but forgetting a brace commonly results in a 
barrage of error messages anyway. Code that guesses what you 
meant obviously won't work all the time, and phase 3 would have 
to take care not to emit an error message about a "{" token 
that doesn't actually exist (that was merely "guessed-in"). But 
at least it's nice for a parser to be /able/ to guess what you 
meant; for a typical parser it would be out of the question, 
upon detecting an error, to back up four source lines, insert a 
brace and try again.


So you simply admit that error recovery is difficult to 
implement. For me, it is a must-have, and thus throwing away 
information is bad.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 22:05:05 UTC, Dmitry Olshansky wrote:
That would not always work, but yes that's what we should do 
IMHO.
I didn't do a deeper research on this, just shared my current 
understanding. When that doesn't work in a particular case, we 
will need to find the problem and solve that.


It's not what packrats do however (when I say algorithm I mean 
specifically packrat). And PEGs are commonly associated with 
packrats.

As Philippe Sigaud admitted, that is an incorrect association.

it's obvious how backtracking comes in ... whan say A fails 
somewhere in the middle of it.
Simply stop tracking respective state. Process others in 
parallel as I described above.


Yeah it could be done, but it doesn't buy you a thing in a lot 
of cases unlike in regular grammar. The problem is that state 
is not as simple as in regex for instance (even in regex that 
must capture  some submatches DFA won't do BTW). Thus you can't 
merge seemingly identical states because they depend on 
(different) interpretation of previous part of input.
Well, if you must track the path to a particular state, then DFA 
simply doesn't match the problem and there's nothing we can do.


That's why DFA in ANTLR is only used to do lookahead or lexing, 
it doesn't maintain path through states during parsing.
See above. Is maintaining path really necessary? I thought that 
DFA with additional states could be created for any particular 
situation of ambiguity, etc. This needs further research.


Of course there is a fair amount of bookkeeping that reduces 
doing

work all over again but it does ... allocate.


The amount of used memory would be proportional to the length 
of input
after the branching point times number of rules (actually, of 
states in
the DFA, which is likely larger but still finite for a 
particular grammar).


Finite and proportional to input size? Another way to say why 
an O(n) memory requirement for packrats?
Sorry, I incorrectly included the O(n) multiplier. It should be 
finite. When you go to the next character, you don't need 
previous states any more.


Yup. But I'm not talking definitions here I'm talking the 
notion of "integrated lexing" and lexing is certainly about 
characters or was it?


I thought integrated lexing was about doing it along with parsing 
and specifying its rules directly inside the grammar. From the 
former we get structural context to narrow-down available 
options, and from the latter we have a uniform grammar 
description.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 22:03:20 UTC, Chad J wrote:

enum : SyntaxElement
{
  AST_EXPRESSION  = 0x0001___,
AST_UNARY_EXPR= 0x_0001__ |


This would cause wasting space (probably a lot). Definitely it 
would not be easy to implement in a parser generator, when 
various language properties are not known beforehand for 
fine-grained tuning.


This approach of course has shameful nesting limitations, but I 
feel like these determinations could be fairly well optimized 
even for the general case.  For example: another approach that 
I might be more inclined to take is to give each token/symbol a 
low-valued index into a small inheritance table.


Depending on implementation, that might introduce the multiplier 
overhead of table access per each comparison (and there would be 
many in case of searching for nodes of specific type).


I would expect the regex engine to call the isA function as one 
of it's operations.  Thus placing an AST_EXPRESSION into your 
expression would also match an AST_NEGATE_EXPR too.


But actually it is not so difficult to implement in a very 
similar way to what you described. I was thinking about a lookup 
table, but different from a traditional inheritance table. It 
would be indexed by AST node type (integral enum value), and 
store various classification information as bits. Maybe this is 
what you meant and I misunderstood you... Example is here: 
https://github.com/roman-d-boiko/dct/blob/May2012drafts/fe/core.d 
(sorry, it doesn't show how to do classification, and has a 
different context, but I hope you get the idea). The advantage 
over storing hierarchical information directly in each token is 
obviously memory usage.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 21:52:09 UTC, David Piepgrass wrote:
it seems easier to tell what the programmer "meant" with three 
phases, in the face of errors. I mean, phase 2 can tell when 
braces and parenthesis are not matched up properly and then it 
can make reasonable guesses about where those missing 
braces/parenthesis were meant to be, based on things like 
indentation. That would be especially helpful when the parser 
is used in an IDE, since if the IDE guesses the intention 
correctly, it can still understand broken code and provide code 
completion for it. And since phase 2 is a standard tool, 
anybody's parser can use it.


There could be multiple errors that compensate each other and 
make your phase 2 succeed and prevent phase 3 from doing proper 
error handling. Even knowing that there is an error, in many 
cases you would not be able to create a meaningful error message. 
And any error would make your phase-2 tree incorrect, so it would 
be difficult to recover from it by inserting an additional token 
or ignoring tokens until parser is able to continue its work 
properly. All this would suffer for the same reason: you loose 
information.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 21:43:58 UTC, David Piepgrass wrote:
Still... it's a fun concept, and even if the initial parsing 
ends up using the good-old lex-parse approach, semantic 
analysis and lowering can benefit from a tree parser. Tree 
parsing, of course, is just a generalization of linear parsing 
and so a tree parser generator (TPG) could work equally well 
for flat input.


Exactly. Semantic analysis is what would benefit from that, as 
well as client code (for example, refactoring tools, etc.)


Parser would become quite complicated. Probably it would need to 
perform complex tree navigation (which might decrease performance 
proportionally to the average tree depth or even more, if parser 
algorithms were themselves fast). Any non-trivial query would 
require direct character manipulation (like comparing sub-strings 
of captures with string literals, etc.). Probably you would get 
frequent cache misses because of the need to jump throughout the 
(incomplete) AST. It would definitely be problematic to build an 
immutable AST model, thus (IMO) significantly and needlessly 
constraining you and users of your library. And likely you would 
have to deal with many more problems __without__ significant 
gains.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 21:35:52 UTC, Dmitry Olshansky wrote:

On 08-Jul-12 01:24, Roman D. Boiko wrote:

But PEG itself doesn't require backtracking parsing, does it?


It does. Specifically the algorithm is to try option A, try 
option B, try option C...
There is no algorithm, only __grammar__. It specifies that option 
A has higher priority than option B in expression A / B / C. But 
it doesn't forbid, for example, to analyze all in parallel (track 
state transitions) for each character consequently, as a proper 
DFA implementation would do, until some option succeeds and all 
previous fail. A greedy regex would also have to check all 
successor options (C in the exapmle above) to determine the 
longest one.


it's obvious how backtracking comes in ... whan say A fails 
somewhere in the middle of it.
Simply stop tracking respective state. Process others in parallel 
as I described above.


Of course there is a fair amount of bookkeeping that reduces 
doing work all over again but it does ... allocate.


The amount of used memory would be proportional to the length of 
input after the branching point times number of rules (actually, 
of states in the DFA, which is likely larger but still finite for 
a particular grammar).


Tokens.. there is no such term in use if we talk about 'pure' 
PEG.


Terminal symbols.


Characters.
Nope. According to the definition, PEG is a set of non-terminal 
symbols, terminal symbols, production rules, and a starting 
non-terminal. Terminals are not necessarily characters.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 20:27:14 UTC, Dmitry Olshansky wrote:
So my view on the matter: whether or not lexer is integrated is 
two-fold question: notation vs implementation.


E.g. it may make regular lexer-like things a part of notation 
thus having rules like:

Identifier  -> Alpha (Alpha|Number|_)*

Yet it may or may use the same engine for _all_ _rules_, in 
fact if implementation optimize things by ripping off regular 
parts of grammar to small DFA engines it would make some kind 
of lexer behind the scenes!


This is the implementation I have in mind as the only sane way to 
parse PEG efficiently. Plus specializing DFA to only check for 
those terminals which are allowed according to available 
structural information at the moment of their invocation.




Anyway highly hybrid parsers are all the rage now, and reasons 
are obvious - they runs fast and can be bend by some magic to 
quite convoluted grammars of modern languages ;)





Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 21:08:43 UTC, Dmitry Olshansky wrote:

You may misunderstood me as well, the point is still the same:
there are 2 things - notation and implementation, the fact that 
lexer is integrated in notation like in PEGs is not related to 
the fact that PEGs in their classic definition never use term 
Token and do backtracking parsing essentially on character 
level.


But PEG itself doesn't require backtracking parsing, does it? So 
that's an implementation detail, or a specific algorithm. Lexer 
separation seems to be independent of this.


As for lexing multiple times, simply use a free list of 
terminals (aka tokens). I still assume that grammar is 
properly defined, so that there is only one way to split 
source into tokens.




Tokens.. there is no such term in use if we talk about 'pure' 
PEG.


Terminal symbols.



Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 20:29:26 UTC, Dmitry Olshansky wrote:
And given the backtracking nature of PEGs you'll do your 
distributed thing many times over or ... spend a lot of RAM to 
remember not to redo it. I recall lexing takes even more then 
parsing itself.


I think that your conclusions are about statistical evidences of 
PEG misuses and poor PEG parser implementations. My point was 
that there is nothing fundamentally worse in having lexer 
integrated with parser, but there are performance advantages of 
having to check less possible cases when the structural 
information is available (so that lexSmth could be called when 
Smth is expected, thus requiring less switch branches if switch 
is used).


As for lexing multiple times, simply use a free list of terminals 
(aka tokens). I still assume that grammar is properly defined, so 
that there is only one way to split source into tokens.




Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 20:26:07 UTC, David Piepgrass wrote:
I'd like to add that if we give tree parsing first-class 
treatment, I believe the most logical approach to parsing has 
three or more stages instead of the traditional two (lex+parse):


1. Lexer
2. Tree-ification
3. Parsing to AST (which may itself use multiple stages, e.g. 
parse the declarations first, then parse function bodies later)


The new stage two simply groups things that are in parenthesis 
and braces. So an input stream such as the following:


I bet that after stage 2 you would have performed almost the same 
amount of work (in other words, spent almost the same time) as 
you would if you did full parsing. After that you would need to 
iterate the whole tree (possibly multiple times), modify (or 
recreate if the AST is immutable) its nodes, etc. Altogether this 
might be a lot of overhead.


My opinion is that tree manipulation is something that should be 
available to clients of parser-as-a-library or even of 
parser+semantic analyzer, but not necessarily advantageous for 
parser itself.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko
On Saturday, 7 July 2012 at 20:09:56 UTC, Andrei Alexandrescu 
wrote:

On 7/7/12 3:17 PM, Dmitry Olshansky wrote:
I'll have to point out that the whole point about integrated 
lexing is
moot. It's more of liability then benefit. At very least it's 
just

implementation curiosity not advantage.


Interesting. I'll have to ask for more details on why.

Andrei


+1. Personally I associate PEG with a parser that includes a 
__distributed__ lexer inside, which gives the advantage of having 
to check less alternatives at each step (that's similar to 
deterministic parsing). If lexer is separate, it seems to be more 
difficult to scan for only a subset of possible tokens.




Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko
On Saturday, 7 July 2012 at 20:04:21 UTC, Andrei Alexandrescu 
wrote:
Doesn't ANTLR use full-fledged character-level LL(*) parsing 
even in the tokenizer?


Since I didn't understand your question I assume that my 
statement was somehow incorrect (likely because I made some wrong 
assumptions about ANTLR). I didn't know about its existence until 
today and still don't understand it completely. What I think I 
understood is that it uses DFA for deciding which grammar rule to 
apply instead of doing backtracking. I also think that it uses 
DFA for low-level scanning (I'm not sure).


The idea to introduce DFA for both determining which rule to 
apply and lexing of terminal symbols appeared to me much earlier, 
and the suggestion to introduce them into Pegged is one of 
options which I think could extremely improve performance.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 19:58:37 UTC, Roman D. Boiko wrote:

On Saturday, 7 July 2012 at 19:50:37 UTC, Timon Gehr wrote:

http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method


OK, at least I didn't misunderstand. So my question was whether 
the alternative that I described and which exists in PEG is 
somehow worse than OPP. Wikipedia seems to provide an answer to 
that: OPP tend to be simpler. (I didn't investigate this topic 
further.)
OK, now I agree, the need to perform several nested calls in 
order to parse some expression is costly enough to consider OPP 
superior to a general PEG for expressions.


At first I was surprised when found that D doesn't define 
operator precedence explicitly, but instead provides a hierarchy 
of expression types. Then I somehow concluded that the approaches 
are equivalent. (I started learning parsing techniques only in 
February '12.) Since then I never reconsidered my past 
conclusions.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 19:50:37 UTC, Timon Gehr wrote:

http://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method


OK, at least I didn't misunderstand. So my question was whether 
the alternative that I described and which exists in PEG is 
somehow worse than OPP. Wikipedia seems to provide an answer to 
that: OPP tend to be simpler. (I didn't investigate this topic 
further.)


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 19:29:47 UTC, Dmitry Olshansky wrote:
Another idea that I've never realized is to add operator 
precedence grammar to pegged. Of course it should fit naturally 
with traditional PEG, for instance taking responsibility for 
parsing expressions.


But that's already available by explicitly defining expression 
grammar via nested rules. See for example the examples/dgrammar.d 
in Pegged. This way, for example, multiplication has precedence 
over addition. (It looks like I misunderstood you. Did I?)


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 18:55:57 UTC, Chad J wrote:

I was thinking the same thing.

My intent is to create a kind of regular-expression-of-nodes 
with push/pop operators to recognize ascent and descent on the 
tree.  Such a regular expression would allow one to capture 
subtrees out of generalized patterns and then place them into 
new trees that then become the input for the next pattern or 
set of patterns.  I think this is much closer to how I 
conceptualize semantic analysis than how semantic analysis is 
done in front ends like DMD: it should to be done with pattern 
recognition and substitution, not with myriad nested 
if-statements and while-loops.
Funny, we've discussed an idea to introduce some hybrid of regex 
and xpath for querying, searching and traversing AST with Dmitry 
a few weeks ago. A custom NDepend-like Code Query Language was 
the major alternative we considered. Discussion started on this 
forum and continued via email.



In this vision I do not use classes and inheritance for my AST.
 Instead I use structs that contain some kind of nodeType 
member that would be one of the tokens/symbols in the grammar, 
like TOK_WHILE_NODE in the above code.  Dynamic dispatch is 
instead performed by (very fast) DFAs recognizing parts of the 
AST.


Exactly. This idea first came to me in April after I implemented 
the first top-down recursive descent custom parser for a D 
subset. I tried Visitor pattern before that and wasn't happy. 
There are some subtle difficulties which I believe will be 
possible to overcome, most important being the need to introduce 
a mechanism for hierarchical classification (like a pow 
expression being an assign expression at the same time).




Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 16:56:06 UTC, Chad J wrote:
Yeah, it's good to hear this notion reinforced.  I had this 
suspicion that the packrat parser is not necessarily the 
best/fastest solution, mostly because of the large allocation 
that has to happen before you get O(n) performance.  Thus I 
figured that pegged might eventually use different parsing 
strategies underneath it all, possibly with a lot of 
special-casing and clever hand-tuned and profiled 
optimizations.  At least that's what makes sense to me.


At the very least, we could use DFA instead of backtracking where 
possible. This is the approach implemented in ANTLR, but I wanted 
to introduce them before I knew about existence of the latter, 
simply because this would likely produce the fastest parsers 
possible.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 16:49:09 UTC, deadalnix wrote:
I tried that. This is almost impossible, dmd's parser and AST 
are very tightly mixed with dmd's internals.


+1


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 16:27:00 UTC, Philippe Sigaud wrote:

I added dstrings because

1- at the time (a few months ago), the lists here were awash in 
UTF-32

discussions and I thought that'd be the way to go anyway
2- other D parsing libraries seemed to go to UTF32 also (CTPG)
3- I wanted to be able to parse mathematical notation like 
nabla,

derivatives, etc. which all have UTF32 symbols.


I propose to switch code to use S if(isSomeString!S) everywhere. 
Client code would first determine source encoding scheme, and 
then instantiate parsers specifying a string type. This is not a 
trivial change, but I'm willing to help implementing it.


Note that PEG does not impose to use packrat parsing, even 
though it was developed to use it. I think it's a historical 
'accident' that put the two together: Bryan Ford thesis used 
the two together.


Note that many PEG parsers do not rely on packrat (Pegged does 
not).

There are a bunch of articles on Bryan Ford's website by a guy
writting a PEG parser for Java, and who found that storing the 
last rules was enought to get a slight speed improvement, buth 
that doing anymore sotrage was detrimental to the parser's 
overall efficiency.


That's great! Anyway I want to understand the advantages and 
limitations of both Pegged and ANTLR, and probably study some 
more techniques. Such research consumes a lot of time but can be 
done incrementally along with development.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 16:14:13 UTC, Tobias Pankrath wrote:

Interesting, I thought that PEG ⊂ CFG holds.

See section 3.4 of http://bford.info/pub/lang/peg.pdf

It contains a simple proof that a non-context-free language (a^n) 
(b^n) (c^n) can be described with PEG syntax. There are 
infinitely many such languages.




Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 15:42:30 UTC, Chad J wrote:

On 07/05/2012 08:28 AM, Roman D. Boiko wrote:
Pegged may eventually become standard, if it will be 
performanceoptimized and a bit more customizable


Interesting, I thought you were hand-writing this stuff.

I'm a fan of pegged and made some pull requests, one of which 
was aimed at making it more customizable by allowing the user 
to define what type gets used as a parse tree node, thus 
allowing one to potentially use their parse tree as an AST (and 
maybe do a couple other things too). It's a WIP, but the proof 
of concept is done: DMD can handle the extra templating at 
compile time, so it works.


What kind of things did you want in terms of customizability?


There are many possible customization point which would make it a 
better fit for my project while also being useful in general.


The most critical was the one you implemented: ability to define 
a custom parse tree. I also needed the ability to use immutable 
structures (almost) everywhere, while currently they must be 
mutable. Also it would be great to avoid UTF conversions (instead 
use functions and types templated by the string type). I also 
wanted to add ability to reuse parts of previously generated AST 
which correspond to source code that has not been changed, or to 
identical source code parsed previously. (This would allow 
incremental parsing, e.g., while user is typing.) The latter 
would require to track source code positions separately, or even 
generating them on demand (this is the feature actively 
criticized by deadalnix in some previous discussions but central 
to DCT architecture). AST would only hold information about 
widths of its nodes.


I've also written some notes (10-15 suggestions) while studying 
Pegged code, which will be shared later.


However, as I found a few hours ago, Packrat parsing (typically 
used to handle PEG) has serious disadvantages: it complicates 
debugging because of frequent backtracking, it has problems with 
error recovery, and typically disallows to add actions with side 
effects (because of possibility of backtracking). These are 
important enough to reconsider my plans of using Pegged. I will 
try to analyze whether the issues are so fundamental that I (or 
somebody else) will have to create an ANTLR-like parser instead, 
or whether it is possible to introduce changes into Pegged that 
would fix these problems.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 14:17:54 UTC, Tobias Pankrath wrote:

Proper CFG parsers all are liner-time anyway.


To be picky here:
The languages that can be parsed in linear time are a strict 
subset of CFGs.
Also any PEG defines a language with linear-time parsing. Some 
non-context-free languages can be described with PEG.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 11:33:18 UTC, Dmitry Olshansky wrote:
Yup, LL(*) is my favorite so far. As for debugging and error 
recovery they are always a problem.


It's interesting, that I also wanted to use DFA for non-terminals 
(similarly to LL(*)), but was considering their usage as pure 
performance optimization. Concerns raised in the article seem to 
be very reasonable, I will likely reconsider my previous plans...


This forum is an endless source of high-quality feed-back for me.



Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 10:26:39 UTC, Roman D. Boiko wrote:
I think that Pegged can be heavily optimized in performance, 
and there are no
fundamental issues which would make it inherently slower than 
LALR or a hand-written D-specific parser.


Hmm... found an interesting article: 
http://www.antlr.org/papers/LL-star-PLDI11.pdf


It describes some disadvantages of Packrat parsing, like problems 
with debugging and error recovery. These are important for DCT, 
so I'll have to perform additional research.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 10:05:30 UTC, Dmitry Olshansky wrote:
I can tell you that they are not slower then one another in 
principle. Quality of implementations trumps every theoretical 
aspect here. LALR is usually fast even if implemented by book 
but they are hard to optimize futher and quite restrictive on 
"semantic extensions".


Proper CFG parsers all are liner-time anyway.


Exactly, that is also my point. I think that Pegged can be 
heavily optimized in performance, and there are no fundamental 
issues which would make it inherently slower than LALR or a 
hand-written D-specific parser.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 09:06:57 UTC, Roman D. Boiko wrote:

http://stackoverflow.com/questions/11373644/performance-of-parsers-peg-vs-lalr1-or-llk


So far it looks like LALR parsers may have lower constant factors 
than Packrat.


The difference could be minimized by paying attention to parsing 
of terminal symbols, which was in my plans already. It is not 
necessary to strictly follow Packrat parsing algorithm.


The benefits of Pegged, in my view, are its support of Parsing 
Expression Grammar (PEG) and compile-time evaluation. It is 
easily extensible and modifiable.


When I implemented recursive-descent parser by hand in one of 
early drafts of DCT, I strongly felt the need to generalize code 
in a way which in retrospect I would call PEG-like. The structure 
of my hand-written recursive-descent parser was a one-to-one 
mapping to an implemented subset of D specification, and I 
considered it problematic, because it was needed to duplicate the 
same structure in the resulting AST.


PEG is basically a language that describes both, the 
implementation of parser, and the language syntax. It greatly 
reduces implicit code duplication.


I think that generated code can be made almost as fast as a 
hand-written parser for a particular language (probably, a few 
percent slower). Especially if that language is similar to D 
(context-free, with fine-grained hierarchical grammar). 
Optimizations might require to forget about strictly following 
any theoretical algorithm, but that should be OK.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 00:45:36 UTC, David Piepgrass wrote:
This work on parsers might be a good place for me to dive in. I 
have an interest in parsers and familiarity with LL, LALR, 
PEGs, and even Pratt parsers (fun!), but I am still 
inexperienced.

...
One thing that has always concerned me about PEGs is that they 
always say PEGs are slower than traditional two-phase LALR(1) 
or LL(k) parsers. However, I have never seen any benchmarks. 
Does anyone know exactly how much performance you lose in an 
(optimized) PEG compared to an (optimized) LALR/LL parser + 
LL/regex lexer?


I decided to ask a question about this:

http://stackoverflow.com/questions/11373644/performance-of-parsers-peg-vs-lalr1-or-llk

Don't hesitate to edit it if you would like to clarify some 
aspect.


Re: Let's stop parser Hell

2012-07-07 Thread Roman D. Boiko

On Saturday, 7 July 2012 at 01:47:21 UTC, H. S. Teoh wrote:

Frankly, I don't know what DCT stands for (anyone?)
That's a working (not final) name of my project, a.k.a. The D 
Compiler Tools. I've used this acronym in previous threads but 
didn't explain this time. Current state is very far from 
completion, it is in research phase. Link: 
https://github.com/roman-d-boiko/dct, but I'd suggest to contact 
me at r...@d-coding.com if anyone is interested.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 22:32:29 UTC, deadalnix wrote:
Well you did the mistake to introduce an over complex mechanism 
in log(n) to get back to source code when an obvious one in 
O(1) exists.


Let me doubt.


I'm fine with that, let you doubt.


Re: dfmt - D source code formatter

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 22:25:15 UTC, Walter Bright wrote:
I agree that whatever is inside the comment should be left 
alone. I was more talking about lining up comment blocks, etc.
Why would that be more difficult to do than code formatting? I'm 
wondering.




Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 22:11:41 UTC, deadalnix wrote:

Why not program instead of doing bureaucracy ?


To avoid programming things which are not needed or don't fit. 
I've thrown away several implementations already... time to 
reflect a little :)


But, actually, for DCT I do know what I need to do. That 
suggestion was with respect to any candidate for becoming 
standard. Everybody understands that their way (likely 
differently than others).




Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 20:28:32 UTC, Philippe Sigaud wrote:
Hmm 72 € by Springer, 55 € on Amazon. Something is not 
right.

Paperback vs perfect bound maybe?


http://www.komkon.org/~sher/books/parsing_techniques_2008.pdf
Not sure that it is legal, but definitely free.



Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko
isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [ns / 
lookup].
This one calculates a sum of all identifier code units. Included 
for comparison.



isKeyword_Dictionary: 4247 [microsec] total, 242 [ns / lookup].
Check whether an identifier is a keyword using AA (dictionary) 
lookup.


isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 
[ns / lookup].
Switch by identifier length at the top, then recursively switch 
by each character. Clearly a winner, but I think it can be 
improved further.


isKeyword_BinaryArrayLookup: 14351 [microsec] total, 820 [ns / 
lookup].

Binary search in an ordered array of keywords.

isKeyword_LinearArrayLookup: 59564 [microsec] total, 3405 [ns / 
lookup].

Ditto, search is linear.


isKeyword_UnicodeTrie: 4167 [microsec] total, 238 [ns / lookup].
Lookup a keyword in a trie, created by Dmitry. This will be 
improved.


isKeyword_UnicodeTrieBoolLookup: 3466 [microsec] total, 198 [ns 
/ lookup].
The same, but only determines whether an identifier is a keyword, 
not which exactly.



Total: 104183 identifiers + 17488 keywords.
Analyzed the largest Phobos file (DateTime? I didn't check the 
name.) Results are consistent for other files, too.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 19:54:39 UTC, Philippe Sigaud wrote:

On Thu, Jul 5, 2012 at 8:28 PM, Andrei Alexandrescu
 wrote:

I'll be glad to buy for you any book you might feel you need 
for this.
Again, there are few things more important for D right now 
than exploiting
its unmatched-by-competition features to great ends. I don't 
want the lack
of educational material to hold you down. Please continue 
working on this

and let me know of what you need.


That's nice of you, if a bit extreme for a public mailing list 
:)

Andrei, money is no problem :)
Interest in the field of parsing is no problem.
Interest in D future is no problem.
Having a demanding job, and three children, is a problem. No, 
scratch

that, you know what I mean.


I have four, from 1 to 7 years old... Wouldn't call them a 
problem, though :)))


But hey, Roman is doing interesting things on keyword parsing 
right
now, and changing the parser generated by Pegged is not 
difficult. We
will see where this thread lead. (Roman, you should send your 
results
here, because I'm still taken aback by the built-in AA speed 
compared

to linear array look-up for 100 keywords).


Well, I wouldn't call those "results" yet. Just some benchmarks. 
Here they are:


isKeyword_Dummy (baseline): 427 [microsec] total, 28 [nanosec / 
lookup].
isKeyword_Dictionary: 1190 [microsec] total, 214 [nanosec / 
lookup].
isKeyword_SwitchByLengthThenByChar: 466 [microsec] total, 83 
[nanosec / lookup].
isKeyword_BinaryArrayLookup: 2722 [microsec] total, 490 [nanosec 
/ lookup].
isKeyword_LinearArrayLookup: 13822 [microsec] total, 2490 
[nanosec / lookup].
isKeyword_UnicodeTrie: 1317 [microsec] total, 237 [nanosec / 
lookup].
isKeyword_UnicodeTrieBoolLookup: 1072 [microsec] total, 193 
[nanosec / lookup].

Total: 22949 identifiers + 5551 keywords.

isKeyword_Dummy (baseline): 2738 [microsec] total, 50 [nanosec / 
lookup].
isKeyword_Dictionary: 4247 [microsec] total, 242 [nanosec / 
lookup].
isKeyword_SwitchByLengthThenByChar: 1593 [microsec] total, 91 
[nanosec / lookup].
isKeyword_BinaryArrayLookup: 14351 [microsec] total, 820 [nanosec 
/ lookup].
isKeyword_LinearArrayLookup: 59564 [microsec] total, 3405 
[nanosec / lookup].
isKeyword_UnicodeTrie: 4167 [microsec] total, 238 [nanosec / 
lookup].
isKeyword_UnicodeTrieBoolLookup: 3466 [microsec] total, 198 
[nanosec / lookup].

Total: 104183 identifiers + 17488 keywords.


As Dmitry said, we might hit a CTFE wall: memory consumption,
computation speed, ...
(*channelling Andrei*: then we will correct whatever makes CTFE 
a

problem. Right)

Philippe

(Hesitating between 'The Art of the Metaobject Protocol' and
'Compilers, Techniques and Tools', right now)





Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 18:33:50 UTC, Dmitry Olshansky wrote:

Count me as interested.
CTFE needs more correctness & speed though. So to put it 
blantly - no it's not possible right NOW.
BUT it doesn't prevent us from planing and doing a proof of 
concept. Pegged seems a good starting point even if we end up 
re-writing it from scratch.


IMO, first priority is to make code generated by Pegged work 
fast, and second priority - make code generation itself fast.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko
On Thursday, 5 July 2012 at 18:28:21 UTC, Andrei Alexandrescu 
wrote:

On 7/5/12 2:16 PM, Philippe Sigaud wrote:
So Pegged or any other generator should *not* get the 
community focus right now.


Pegged should be the focus.


+10 (can I vote ten times?)


My plan would be as follow:

1- assemble a group of people knowing parsing. I don't think 
I'm exactly

knowledgeable, but I'm ready to be part of such a group.
2- create a github process.
3- translate an existing parser / adapt a D parser for Phobos. 
I'm ready

to be part of this (I'm sure I'll learn in the process)
4- spend 1-2 years fighting over LR parsing and such :) (Just 
kidding)

5- submit it to Phobos and have it adopted.


Sounds good. Replace 1-2 years with 1-2 months.


Well, probably with 3-4 months... :)


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 18:17:06 UTC, Philippe Sigaud wrote:

On 2012-07-05 18:32, Roman D. Boiko wrote:


My vote would be for Pegged, I guess.


As much as I'm flattered by that, my current impression is 
Pegged is very

far from being performant.

As a proof-of-concept that, in D,  it's possible to parse a 
string and
create a parse tree at compile-time and then generate code from 
this, it's

also successful. Go D!

As a parser proper, Pegged is awful :-) Nothing I'm ashamed of, 
as I learn
by coding. Hey, I just received the Dragon Book (International 
Edition),

I'm sure I'll learn many things in there.

So, if anyone is willing to change the code generated by 
Pegged, I'm game.
The results you showed me on keyword parsing are very 
interesting!


But, my impression is that the need for a 'D'-only parser and 
lexer is far
greater and much more imediate that the need for a parser 
generator. All
the reasons advanced upthread ask for a D parser, not a generic 
generator.
Parser generators are for those of us interested in having DSLs 
or macros

in D.
So Pegged or any other generator should *not* get the community 
focus right

now.


I'm sure it can generate **much** faster code. I'm going to focus 
on its part that generates D parser (i.e., to make it 
significantly faster and able to efficiently parse-as-you-type). 
Actually, I'm sure it will be able to beat any other parser with 
respect to performance. :)


1. So my plan is the following: invite whoever would want to help.
2. Prove my claims above in practice. :-)


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 16:38:27 UTC, Jacob Carlborg wrote:

On 2012-07-05 18:32, Roman D. Boiko wrote:


My vote would be for Pegged, I guess.


Aren't you voting on your own project :)


Well, I'm going to base parsing on Pegged, after tweaking it to
my needs.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 16:28:57 UTC, Jonathan M Davis wrote:

It's all too common for someone to suggest that we should
do something or implement something without ever attempting to 
do it themselves, and in general, stuff around here gets done 
because someone really wants it done, takes the time to do it, 
and sees it through until its done and in Phobos.


- Jonathan M Davis


Resume: everybody is welcome to join effort of translating DMD 
front end, and improving Pegged.



Also I would like to invite those interested in DCT project to 
help me with it. Right now I'm trying to understand whether it is 
possible to incorporate Pegged inside without losing anything 
critical (and I think it is very likely possible), and how 
exactly to do that.


Dmitry proposed to help improve Pegged (or some other compiler's) 
speed.


Anyone else?



Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 16:14:27 UTC, Jacob Carlborg wrote:

Original post:

http://www.digitalmars.com/d/archives/digitalmars/D/DCT_use_cases_-_draft_168106.html#N168141


OK, fairly complete. Let it be the starting point. Why not put it 
on some wiki, then add some more, discuss, vote, etc.? Meanwhile 
create a matrix of features implemented in each of interesting 
standardization candidates. And pick up one of them as "standard" 
or "reference" parser.


My vote would be for Pegged, I guess.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 15:40:53 UTC, Jacob Carlborg wrote:

On 2012-07-05 15:08, Roman D. Boiko wrote:


Anyway I propose to enumerate major use cases first.


Haven't we already done that a couple of times.


Well, we did something like that for DCT... but I doubt that it 
would fit general needs.


If we had, why haven't they been analyzed, classified, discussed, 
etc.? Or have they?


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 15:42:22 UTC, Caligo wrote:

Is the actual grammar available somewhere?  The online language
reference is all we got I guess? DMD doesn't seem to be using
yacc/bison either.

In PEG format, yes (not necessarily correct):
https://github.com/roman-d-boiko/Pegged/blob/dct/pegged/examples/dgrammar.d

I don't know about any others, though.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 13:05:41 UTC, Dmitry Olshansky wrote:

On 05-Jul-12 17:04, Roman D. Boiko wrote:
Well why not.
But first I'll need to deliver some stuff on my GSOC project.


I bet that after you finish with GSOC optimizing Pegged will not 
be less relevant than it is now :) And as for DCT, it will take 
another half a year (at least) until it will become usable for 
any real needs (except the most trivial).




Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko
On Thursday, 5 July 2012 at 12:53:02 UTC, Denis Shelomovskij 
wrote:

05.07.2012 16:30, Roman D. Boiko пишет:
Forgot to add DDMD, which also has been forked and redesigned 
recently

by someone (thus two more different compiler frontends).

But overall I doubt that any project could become standard 
very soon.


Why? Even were they all almost equal we can select any one. The 
point is to select something (anything) to guide a coder who 
want to do something with this task to a 
good/up-to-date/supported parser.


Well, I guess we'd like to select one and not change it later, 
don't we?
I'm not sure we can decide which is the best currently and will 
stay the best in the future for all major use cases.


Anyway I propose to enumerate major use cases first.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 12:32:19 UTC, Dmitry Olshansky wrote:
Then do it. It's all about having something so obviously good, 
fast and flexible that other stuff can be considered only as 
toys.


I might do one, but I'd rather just help other folks make it 
faster ;)


Would you help to make Pegged faster?


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko
Forgot to add DDMD, which also has been forked and redesigned 
recently by someone (thus two more different compiler frontends).


But overall I doubt that any project could become standard very 
soon.


Re: Let's stop parser Hell

2012-07-05 Thread Roman D. Boiko
On Thursday, 5 July 2012 at 12:11:33 UTC, Denis Shelomovskij 
wrote:
There are more and more projects requiring parsing D code (IDE 
plugins, DCT and dfmt now).


We definitely need a _standard_ fast D parser (suitable as 
tokenizer). We already have a good one at least in Visual D. 
Maybe dmd's parser is faster. If so, it can be (easily?) 
rewritten in D. We also already have some other parsers (in C# 
from Mono-D etc.).


What about to get one and call it standard?


Visual-D is not Boost-licensed (I think this would be possible to 
change)

Mono-D is written in C#, as you mentioned
Pegged may eventually become standard, if it will be performance 
optimized and a bit more customizable
Dscanner(https://github.com/Hackerpilot/Dscanner) from Brian 
Schott is pretty good, too.

SDC is another nice option
DIL (http://code.google.com/p/dil/) is very nice but GPL

I plan to try using Pegged inside my DCT project. Probably that 
will require huge modifications though...


Some more links from Pegged readme:

Hisayuki Mima's CTPG(https://github.com/youkei/ctpg), very 
similar, also done in D. Have a look!
Nick Sabalausky's Goldie 
(http://www.dsource.org/projects/goldie).
Benjamin Shropshire's dparser 
(http://dsource.org/projects/scrapple/browser/trunk/dparser).



Martin Nowak put these gists on the D newsgroup:
https://gist.github.com/1255439 - lexer generator
https://gist.github.com/1262321 - complete and fast D lexer




Re: Proposal: takeFront and takeBack

2012-07-05 Thread Roman D. Boiko

On Thursday, 5 July 2012 at 08:34:29 UTC, Roman D. Boiko wrote:
I make no conclusions, because have not run any benchmarks to 
estimate how complex the code should be to take those 30 ms. 
Such benchmarks would be valuable for discussion.


That has been written before I saw all benchmarks, so my post 
expired before being created.


Re: Proposal: takeFront and takeBack

2012-07-05 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 22:02:28 UTC, deadalnix wrote:

Le 04/07/2012 21:45, Jonathan M Davis a écrit :

On Wednesday, July 04, 2012 12:19:15 Jonathan M Davis wrote:
But at present, I'm seeing a performance improvement of 
approximately 70 -
80% in iterating over strings with consumeFront rather than 
front and

popFront (depending on the compiler flags and strings used).


I should clarify that. It's taking 70 - 80% as long to use 
consumeFront to
iterate over a string than it is to iterate using popFront and 
getting front
on every iteration. The way I worded that could imply that it 
was taking 20 -
30% as much time, which would be a _much_ larger improvement 
than I'm actually

seeing.

- Jonathan M Davis


The win is significant indeed.


I'm not sure. Let's speculate a bit with some almost random 
numbers.


E.g., suppose that some test without user code in a loop takes 70 
(or 80) ms (per iteration) instead of 100. Or, assuming that some 
optimization of front / popFront would make it proportionally 
twice faster, 35 instead of 50.


It may be significant, if user code inside the loop is relatively 
fast, but that is not often the case. Let's assume that it takes 
30 ms (that looks like pretty fast). So overall it would be (70 + 
30) vs (100 + 30) before front / popFront optimization, and (35 + 
30) vs (50 + 30) after (huge optimization). The latter is 65 vs 
80, which is about 81 vs 100 (instead of 70 vs 100 before 
optimization). If user code was slower, impact of front / 
popFront optimization would be relatively larger, and vice verse.


I make no conclusions, because have not run any benchmarks to 
estimate how complex the code should be to take those 30 ms. Such 
benchmarks would be valuable for discussion.


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko
On Wednesday, 4 July 2012 at 12:24:14 UTC, 
trav...@phare.normalesup.org (Christophe Travert) wrote:
I am not sure what would be worse, in the long run, between 
asking developpers to make front remain valid after popFront 
until next call to front, or having two different standard ways 
to iterate an input range (front/popFront and consumeFront).

Just to clarify (although I gave up with that idea):
my suggestion was not to require that value returned from front 
was valid until next call to front. It was the following:


in those ranges where value returned from front is invalidated 
after a call to popFront, define consumeFront in a such way that 
an equivalent of popFront is deferred until the next call to any 
of [front, popFront, consumeFront], and make empty take into 
account information whether popFront is pending.


Currently I think that idea was very bad, given that it would be 
difficult to find all ranges which invalidate previous front 
value on popFront.


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 12:40:41 UTC, Tobias Pankrath wrote:
You have to maintain 2 version of you algorithm. This is more 
work to test, to maintain, and it is likely to introduce more 
bugs.


More code == more bugs. More NPATH = more bugs and harder to 
test and to maintain.


In addition, this will result in practice in less generic 
code, because one may not need the version of the algorithm 
without consume primitives in his/her own code and not code it 
at all.


Since the performance problem seems mainly to materialize for 
string processing code an alternative to this proposal would be 
to add those functions only for strings. As far as I heared, 
most of phobos is already special cased for string so the 
maintenance overhead wouldn't be too big.


That seems to be reasonable.

Drawback is that Phobos algorithms would not ever take advantage 
of consumeFront for anything else (even if that was defined).


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 10:54:41 UTC, deadalnix wrote:
I do agree with that. However, the current proposal have the 
same drawback but on the code that use the range.


Both solutions are messy IMO.


What is error-prone in current client code?

If particular algorithm wants to take advantage of a potential 
speed-up, it may decide to check whether hasConsume!Range, and 
call consumeFront instead of front + popFront. Nobody is obliged 
to do so.


The only problem I can see is potential code duplication, which 
is lower priority in performance-critical part of code, IMO.




Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 08:40:46 UTC, Jonathan M Davis wrote:
It depends on the API. All that the word read indicates is that 
data was read.
It doesn't indicate anything about what was read from or what 
happened to it.
In some cases, read doesn't alter what's read at all. In others 
it does.


There's an iterator type that I deal with at work which uses 
read to indicate
that a value was read but the iterator wasn't moved, and it 
uses consume to
indicate that not only was the value read but that the iterator 
was moved.


- Jonathna M Davis


If we have both front and readFront, it would be natural to 
expect readFront to have side-effect on range. But I'm fine with 
both options, just personally I would select read.


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 08:31:05 UTC, Jonathan M Davis wrote:
But if the default is that there is no consumeFront, then it'll 
only be there when the programmer determines that they need it 
and defines it. So, the default is safe, but the option of 
efficiency is there if the programmer codes for it.
OK, you sold it to me :) If some developer decides to use 
consumeFront for efficiency reasons, it should not be a problem 
to statically check hasConsume. What I was trying to avoid was 
code bloat at clients, assuming that they used consumeFront to 
have less code (one call instead of two). That would be a very 
different use case, and currently proposed design doesn't fit 
that use case. But it is really minor comparing to potential risk.


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 08:14:54 UTC, Jonathan M Davis wrote:

On Wednesday, July 04, 2012 09:55:57 Roman D. Boiko wrote:

Still would be nice to get your (or community) feedback on my
previous proposals to localize custom logic inside ranges which
invalidate value returned from front.


It sounds messy and error-prone, to be honest. And any range 
that needed to

implement it and didn't would be broken.


It would be broken only in cases when consumeFront is used, and 
that would likely be easily detectable. Still I don't have any 
idea how many such ranges exist.


But overall I agree.


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 08:20:59 UTC, Mehrdad wrote:
I propose we just allow (but not require) popFront() to return 
ElementType!(R) instead of void?


That way, people who need the performance can check to see the 
return type, and use it without front() if needed.


That would be almost perfect. But has drawbacks:

* if element is big and may be not needed in all calls to 
popFront (when the user wants to ignore elements), there would be 
some overhead;


* it might be impossible to change some ranges which would work 
with current design; I'm not aware of such cases, though


* something else?


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 08:25:19 UTC, Mehrdad wrote:
Where have you last ever seen a read() method whose default 
behavior was NOT to modify the 
collection/range/iterator/stream/whatever?


+1


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 08:13:10 UTC, Mehrdad wrote:

What's wrong with moveFront()?


It has different semantics. For example, it is only supported for 
ranges which have movable elements. After moveFront, the element 
which was placed in the front position is replaced by E.init (E 
is element type).


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 04:33:14 UTC, Jonathan M Davis wrote:
If no one has any major objections to this scheme or can 
provide a better proposal, I'll create a pull request from what 
I have.


No objections.

Still would be nice to get your (or community) feedback on my 
previous proposals to localize custom logic inside ranges which 
invalidate value returned from front.




Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 07:53:32 UTC, Roman D. Boiko wrote:

On Wednesday, 4 July 2012 at 07:40:55 UTC, monarch_dodra wrote:
I'll propose "spendFront", which is pretty much the same 
thing, but is less loaded. I'm not entirely sure if it carries 
the meaning as well? I think it is a worth discussion.


I'm not a native speaker, but I feel that "spend" would be a 
wrong choice.


fetchFront was proposed by Dmitry, which I like more than 
spendFront, but slightly less than consumeFront.


Another option might be "readFront". That's somewhere in the 
middle amond "take", "consume" and "fetch"...


Re: Proposal: takeFront and takeBack

2012-07-04 Thread Roman D. Boiko

On Wednesday, 4 July 2012 at 07:40:55 UTC, monarch_dodra wrote:
I'll propose "spendFront", which is pretty much the same thing, 
but is less loaded. I'm not entirely sure if it carries the 
meaning as well? I think it is a worth discussion.


I'm not a native speaker, but I feel that "spend" would be a 
wrong choice.


fetchFront was proposed by Dmitry, which I like more than 
spendFront, but slightly less than consumeFront.


By the way, I don't have any associations from the above. Consume 
is widely used in everyday life, and it should not be a problem 
that there are various uses.


As for introducing a new concept, it looks like that would be 
necessary any way, because existing concepts correspond to 
something different :) For example, that was my original 
motivation to vote against takeFront.




Re: Proposal: takeFront and takeBack

2012-07-03 Thread Roman D. Boiko
Still, ranges which invalidate front on popFront could defer that 
(store a flag that popFront or consumeFront has been called and 
invalidate value previously returned from front in the next call 
to front). That would be intrusive with respect to such ranges, 
and a breaking change. But it would simplify client code 
significantly.


(This doesn't mean that I'm insisting, or strongly prefer some of 
my two ideas, just wanted to clearly restate the second one.)


Re: Proposal: takeFront and takeBack

2012-07-03 Thread Roman D. Boiko
On Tuesday, 3 July 2012 at 18:12:54 UTC, 
trav...@phare.normalesup.org (Christophe Travert) wrote:


Range have been designed with the idea that front is valid 
until next
call to popFront. If popFront was to be called right after 
front, then
it would just be a popFront that returns a value, and maybe a 
justPop or

something if you don't want to copy the value.

It's delicate to come now and decide that if a previously 
returned front

value is no longer valid after a call to popFront, it should be
documented by an enum. Who is going to review all libraries 
(written and
to come) to make sure the enum is placed when it should be? The 
reverse
should be done instead: if you want you range to be optimized 
by calling

takeFront, define something (for example... takeFront). Then use
algorithms that are specialized for takeFront.


takeFront -> consumeFront

hasConsume has been proposed by Jonathan already.

But not having to modify all existing ranges which invalidate 
value returned by front might be a good argument against my hack 
also.


However I'm not sure there are really many of such ranges. Those 
must return something with reference semantics in order to be 
able to invalidate it. Most common case would be when front value 
is always stored in the same memory (a buffer).


The benefit of my hacking idea is that clients would need no 
special casing.


  1   2   3   >