Re: [bitc-dev] Why is shap farting around with parser generators?

Ben Kloosterman Sun, 07 Sep 2014 03:01:28 -0700

Sorry to be contrarian .. but you did invite this with this subject line :-)

On Sun, Sep 7, 2014 at 4:21 AM, Jonathan S. Shapiro <[email protected]>
wrote:

> On Fri, Sep 5, 2014 at 6:31 PM, Ben Kloosterman <[email protected]>
> wrote:
>
>> IMHO a better test would be write it ASAP get the language out , then see
>> how hard it is to re-factor when you self host. I bet that will change the
>> whole way its structured and force language changes to make it easier.
>>
>
> How would you go about that? Remember that we've already done a C++-hosted
> version of this compiler using YACC.
>

That is very out of date version compared to some of  the things that have
been discussed. I was thinking  get it self hosting first use yacc  .  Get
yacc to create bitc code instead of C IF it proves problematic and then
keep improving.

>
>
>> "*ad hoc* parse rules and/or parse ambiguities." are common in nearly
>> all languages - with good reason they have taken short cuts to get
>> something out the door.
>>
>
> For many languages that is so. *None* of those languages are languages
> that anyone has attempted to do regular formal reasoning about
> successfully.
>

I can turn that around too :-) Are there any successful languages  that
have done formal reasoning ?

>
>
>> Javascript and Linux are  a great examples -our world is filled with
>> second rate products because the better ones never get finished.
>>
>
> Agreed. And if their approach had led to sufficiently survivable systems,
> we wouldn't be attempting to do BitC at all. But it didn't, did it?
>

Javascript continues to grow  and improve. C# and Java are very survivable
they just haven't done the lower level stuff  - C# could have replaced a
lot of C if they had gone just a bit further in that direction.  There are
other languages like Rust which may close the window ( though i think rust
is too hard to use for the average dev)  ,and new ones are coming.

C# ( like  most) has had massive overhauls and safe set based  operations
on LINQ have probably doubled the productivity of devs in the last few
years.

>
>
>> The whole purpose of modern languages / techniques is to be able to
>> re-factor and improve it...
>>
>
> That's just not true. But it *is* true that incremental re-factoring and
> improvement in place of disciplined design has been the fad in computing
> for the last ten years. That has happened for reasons of economic and
> competitive pressure rather than technical merit.
>

No it has happened for many  reasons , As you explore more  alternatives
and design changes are forced on you . And business ( and the world)  can
change very quickly these days. A flexible loose design rather than a
specific but fixed one  will be better able to handle inevitable change as
we learn more ( person , team , business , community) or as the
preconditions change.

Most project have fixed budgets , when there is  too much over engineering
( which is a bigger problem on many systems) or unexpected things come up
then you need to make short cuts.

Most project i see with great / fancy designs NEVER get finished.  I saw a
good one go belly up recently CSIRO which is Australia's government
research organization blew a good 10M building an underground automated
mining truck to put explosives in the ground ..  The main reason it failed
was because they were not happy with off the shelf components so they build
there own robot arm , operating system etc. By the time it got to the core
work most of the budget was gone and they couldn't overcome some serious
real world issues with what was left.

This means to me the more ambitious the goal the more i want to see the
core system working with key high risk features which is then improved.
 Once you have a working system the priority list becomes far more
accurate.   For bitc this is things like mutable types , copy   , regions /
memory management ,  new type system & interfaces , multiple code units and
type classes. After the initial version you were confident you had a good
handle on this all of which i agree with but  I would like to see these (
and yes im willing to help or more) .

You may say

 A new parser is just a nice to have to stop working with some dirty things
. To me use yacc or whatever to get it working if its too hard  than change
.

> It has left us with systems that are not just insecure, but *not
> securable in principal*. BitC is a step on the path to changing that.
> It's goals can't be met by making it up as we go along.
>

I dont think the development approach has anything to do with security (
though i appreciate how hacks with token make it difficult to verify this
part)  . Security mainly depends on having a security model and sticking to
it.. We are left with poor security models because when most of these
things were designed it was less of an issue and we have learned..The new
Windows 8 API has a great security model , which is better than android  ,
which is better than C# which is better than javascript which is better
than Java which is better than Modula which is better than C .  ( some of
these are arguable but it basically a timeline and we have learned better
techniques)

>
>
>> ( which is the main reason the parser and tokeniser are split - while its
>> worse / less efficient than a combined custom implementation it does allow
>> better / simpler code ).
>>
>
> Can you cite a source for that (fairly implausible) assertion?
>

Dont see how this is implausible .  Just common sense - strange that you
should question it.
Layering / stages always is less efficient but allows simplifications . A
single layer / stage allows tricks to do both which require hacks in
layers/ stages.

>
> Historically, both parts were done by hand, and they *were* fused. Parser
> generators came about because maintaining parsers by hand is too hard to
> allow reasonable maintenance or refactoring. Lexer generators came along
> with the same idea, but didn't prove to add enough value in practice to
> justify themselves.
>

I dont think we are disagreeing .. thats basically what i was saying ( or
intended to) layers / separate components to simplify and improve
maintenance ( and can sometimes be heavily optimized for their domain)  .
You can write a faster with less dirty hacks single stage  ( but harder to
maintain / more complex/ more costly) .

>
> The usual evolution of these things is that languages use a parser
> generator until the petrify, and then shift to a recursive descent
> hand-written parser for the sake of better error handling. This is actually
> why I'm doing an LL grammar rather than an LR grammar; the LL grammar can
> emit a recursive descent parser directly.
>
> I agree with this but the standard languages  may have gotten the core
components out .. and then with more people have the resources to dedicate
someone to improving the parser / token generator. Even if you do get large
independent funding that will be more likely with core concepts in place
and partially proven.

> One of the features that I initially resisted in BitC was layout. It turns
> out that layout is useful, but its implementation in most languages is a
> *mess*. Most of the popular layout schemes violate the parser/tokenizer
> phase boundary. The most notorious of these is the Haskell reliance on
> parse-error(t) in the L function (L, incidentally, may be the most opaque
> specification I've ever seen).
>
> :-)

>
> Michael Adams actually makes the point very well with regard to parsing of
> layout:
>
> The lack of a standard formalism for expressing these layout rules and of
> parser generators for such a formalism increases the complexity of writing
> parsers for these languages. Often, practical parsers for these languages
> have significant structural differences from the language specification..
> [T]he structural differences between the implementation and the
> specification make it difficult to determine if one accurately reflects the
> other.
>
>
> That statement could equally well be made of most modern language grammar
> specifications. C++ grammars, for example, are plagued with ambiguities and
> phase boundary violations that *commonly* lead to subtle disagreements
> between implementations, and it is not really possible to know from the
> standard which implementation is correct.
>
>
> Perhaps you have the impression that the parser generator is what has been
> slowing me down. Actually, that's not true. There are a lot of other things
> going on here, and I'm not getting to spend a whole lot of time on BitC. I
> make progress as I can.
>

Yes i did have that impression..  I know you want a great design and you
have that .. I just want to see it in reality ( and work on the GC once its
self hosted)  and not be caught in design paralysis.

Ben

_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Why is shap farting around with parser generators?

Reply via email to