AST based language: was Re: [The Java Posse] Re: JavaFX - oddities in the language? Week 2.

Joshua Marinacci Thu, 10 Sep 2009 07:50:00 -0700

I suspect you are right. I've asked this question of many people and  
gotten a variety of reasons why it won't work. They reasons are always  
valid,  but they always boil down to the same thing: compatibility  
with existing systems.  If we could start over fresh *for everything*,  
then I think a AST based language would quite well, and enable lots of  
very interesting things. I've changed the subject line since this is  
really getting off topic now.  My goal is to just think meta for a  
second.

If we could design a language, and all of it's tools, from scratch  
today; then how would we do things differently?

== Proposal ==

Consider a language that is defined not in terms of tokens but in  
terms of it's abstract syntax tree (I'm not a compiler guy so I hope  
I'm using the right terms here). Instead of saying:

        conditional is defined by 'if' + '(' + mathematical expression + ')'  
plus optional '{' then a clause etc.

what if it was defined as:

        conditional is defined by a boolean expression followed by two blocks

The details such as the 'if' keyword, requiring braces, using  
parenthesis, etc. would all be up to the individual developer (or at  
least defined by their tools) rather than defined by the language.  
Some sort of neutral binary or XML format would be the true storage  
mechanism and everything else we think of as "the language" would be  
defined at a higher level on a per developer basis.  The neutral  
format would be semantically equivalent to the code that the developer  
sees on screen, but specific entirely to them.

== Advantages ==

There are huge advantages to this approach.

* tabs vs spaces goes away. You see whatever you wish to see, and it  
doesn't affect other developers.

* comments could be nicely formatted rich text, including lists,  
tables, and diagrams.

* Line numbers in stacktraces:  Consider the work required to turn the  
location of the bytecode exception back into a line and column number.  
It would be easier to map back to the AST. The compiler / runtime  
would emit some sort of AST marker which the IDE would convert back to  
it's visualization of your line / column (assuming you are still  
editing in terms of lines and columns). Most likely it would highlight  
the exact problematic branch of the tree, not just a line and column.

* refactoring becomes far easier, and could enable far more  
interesting refactoring changes than the simple ones we have to day.

* since we are using a binary / xml blob for the real storage, we  
wouldn't have to worry about files and filenames anymore. What would  
matter is modules and compilation units. The actual files it's stored  
in become irrelevant.

* code analysis: tools which analyze your code should be able to do a  
better job when they work at the 'meaning' level rather than the  
'syntax' level.

* code visualizers: It should be trivial to build things which draw  
UML diagrams of your beans, or show nested structures with darkening  
backgrounds. Almost all of the cool things you want to do boil down to  
visualizing a branch of the tree, making possible all sorts of very  
interesting visualizations.

* many syntax errors go away: since the IDE knows what the valid tree  
should look like, it can prevent anything which would create an  
invalid tree. rather than scanning the whole file 20 times a second it  
can look at what you've done in the last few seconds that just made  
the tree invalid and isolate the error to that. the result is more  
accurate error reporting, even before you get to the compiler.

* never ever worry about some other developer f**king up your  
indentation, line breaks, curly brace scheme, etc.

* the potential to use different keywords, line terminators, and other  
syntax of your choosing and have it be completely isolated to your  
environment. No other developer is affected.

== Cons ==

* You've got to use an IDE. Yes, no more blindly editing text files  
with vi and emacs. Sorry. It's the 21st century. I edit images in  
Photoshop, not the command line. I will now edit programs in a  
programming tool.

* Youv'e got to write IDE support for this. Building this new language  
requires also building an IDE plugin that understands it.

* Text diff tools (and therefore source control systems) would have to  
be updated to understand this binary / xml format. In theory the diffs  
should be better since you'd have a better idea of what semantically  
changed (tree diffing, basically), but someone's still go to write the  
tools to do it.

* Two developers working on their own machines would see the code  
views they expect. One developer trying to help a second developer on  
his machine would see a view completely unfamiliar to what they expect.

* Web based code review tools would show a normalized view that is  
unfamiliar to all developers, or else code review tools would have to  
be a new module inside the IDE to pick up the prefs of the developer  
doing the reviewing.

Crazy idea, but it's the 21st century. We can handle it.  Now if  
you'll excuse me I've got to go take my flying car in for repairs  
before my weekend trip to Mars.

- j

On Sep 10, 2009, at 1:28 AM, Peter Becker wrote:

>
> And it alls starts with the language specs still being written at the
> abstraction level of a concrete syntax. Chapter 1: Tokenization.
>
>  Peter
>
>
> Joshua Marinacci wrote:
>> RANT!
>>
>> Why, in the 21st century, are we still writing code with ascii  
>> symbols
>> in text editors, and worried about the exact indentation and whether
>> to use tabs, spaces, etc?!!
>>
>> Since the IDE knows the structure of our code, why aren't we just
>> sharing ASTs directly, letting your IDE format it to your desire, and
>> only sharing the underlying AST with your fellow developers.  
>> Encoding,
>> spaces, braces, etc. is a detail that only matters when presented to
>> the human.
>>
>> What we do today is like editing image files from the commandline!
>>
>> On Sep 9, 2009, at 7:32 PM, Ryan Waterer wrote:
>>
>>> While experienced programmers might not worry about the braces on a
>>> single line, they become invaluable to any junior programmers.  I've
>>> trained a few in which they couldn't understand why the following
>>> code segment simply stopped working.  (Let's not even start a
>>> discussion about System.out.println as a valid debugging tool, ok?
>>> This is just an example of a n00blet mistake )
>>>
>>> for (int y = 0; y < lines; y++)
>>>   for (int x = 0; x < columns; x++)
>>>      System.out.println("The sum is: " + sum);
>>>       sum += cells[y][x];
>>>
>>>
>>> I agree that the braces add a bit of "clutter" to the visual look  
>>> and
>>> feel of code.  However,  I feel that it helps with the overall
>>> maintainability of the code and therefore, I disregard the way that
>>> it looks.
>>>
>>> --Ryan
>>>
>>>
>>> On Wed, Sep 9, 2009 at 8:24 PM, Jess Holle <je...@ptc.com
>>> <mailto:je...@ptc.com>> wrote:
>>>
>>>    I'll agree on the newlines and indents, but the braces are silly.
>>>
>>>    One might debate the extra whitespace inside the ()'s, but I find
>>>    it more readable with the whitespace -- to each his/her own in
>>>    that regard.
>>>
>>>
>>>    TorNorbye wrote:
>>>>    On Sep 9, 5:27 pm, Reinier Zwitserloot <reini...@gmail.com>  
>>>> <mailto:reini...@gmail.com> wrote:
>>>>
>>>>>    Here's a line from my code:
>>>>>
>>>>>    for ( int y = 0 ; x < lines ; y++ ) for ( int x = 0 ; x <  
>>>>> columns ; x+
>>>>>    + ) sum += cells[y][x];
>>>>>
>>>>    I guess that's where we disagree.
>>>>
>>>>    for (int y = 0; y < lines; y++) {
>>>>        for (int x = 0; x < columns; x++) {
>>>>            sum += cells[y][x];
>>>>        }
>>>>    }
>>>>
>>>>    is IMHO better because:
>>>>    (a) I can see immediately that I'm dealing with a nested  
>>>> construct
>>>>    here, and that's it's O(n^2)
>>>>    (b) I can more easily set breakpoints on individual statements  
>>>> of this
>>>>    code while debugging - and similarly other "line oriented"  
>>>> operations
>>>>    (like quickfixes etc) get more cluttery when it's all on one  
>>>> line.
>>>>    Profiling data / statement counts / code coverage highlighting  
>>>> for the
>>>>    line is also trickier when you mash multiple statements into  
>>>> one line.
>>>>    (c) I think it's less likely that I would have made the "x <  
>>>> lines"
>>>>    error that was in your code when typing it this way because the
>>>>    handling of y and x were done separately on separate lines  
>>>> (though
>>>>    this is a bit speculative)
>>>>    (d) I removed your spaces inside the parentheses, because they  
>>>> are
>>>>    Bad! Bad!
>>>>
>>>>    (Ok c and d are padding)
>>>>
>>>>    I am -not- looking to minimize the number of lines needed to  
>>>> express
>>>>    code.  If I wanted that, I'd be coding in Perl.  I  
>>>> deliberately add
>>>>    newlines to make the code more airy and to group logical  
>>>> operations
>>>>    together. I always insert a newline before the final return- 
>>>> statement
>>>>    from a function etc.
>>>>
>>>>    I think the extra vertical space you've gained, which arguably  
>>>> could
>>>>    help you orient yourself in your code by showing more of the
>>>>    surrounding context, is lost because the code itself is denser  
>>>> and
>>>>    more difficult to visually scan.
>>>>
>>>>    Oh no, a formatting flamewar -- what have I gotten myself into?
>>>>
>>>>    -- Tor
>>>>
>>>>    P.S. No tabs!
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>>
>
>
>
> >

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javaposse@googlegroups.com
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

AST based language: was Re: [The Java Posse] Re: JavaFX - oddities in the language? Week 2.

Reply via email to