Re: AST based language: was Re: [The Java Posse] Re: JavaFX - oddities in the language? Week 2.

Joshua Marinacci Thu, 10 Sep 2009 09:12:56 -0700

On Sep 10, 2009, at 8:15 AM, Reinier Zwitserloot wrote:

> In an AST based editing environment, this problem goes away. At write
> time you must of course have the plugin available to you, at which
>
Not necessarily. If the AST spec was written correctly you could have  
some sort of extensible block system (plugins, essentially).  If you  
are writing a regex block then AST would contain some encoding of the  
regex along with some metadata about the plugin that created it.  This  
metadata could include a URL to download the plugin so that it works  
when you are in another IDE. It would also contain some sort of  
compilable representation, meaning you could still compile the code  
even if you don't have that particular plugin available to you. You  
just wouldn't be able to edit it.  Photoshop files do this. All text  
layers are vectors but keep rasterized metadata.  If you don't have  
the font that the layer was created with then you can't edit it, but  
you can still view it and composite it with other layers. It degrades  
gracefully. Now imagine if Photoshop not only told you the font you  
were missing, but offered to download it as well. AWESOM-O!


> lit:Patt(ctrl+space to autocomplete it to java.util.regexp.Pattern)
> and then you type a regular expression; the job of marking the end is
> trivial where in raw character typing mode it's almost unsolvable.
>
> You can take this same idea even further and add support for macros:
> foreach could have been implemented as a macro, but this time, the AST
> node carries its origin  with it. This way, I can switch my editor at
> will, and type/read either in macro syntax, or in the desugared form.
> Making the IDE extensively pluggable would be so much easier. There
> would be no closure debate at all - those who like em use a macro
> plugin that renders closures as Anonymous Inner Class constructs.

I didn't follow the closure debate to closely, but I'm not sure this  
is true. Aren't there things which closures can do (semantically) that  
anonymous inner classes can't? Was it just syntax sugar?

> Because whitespace, import statements, and other ambiguities melt
> away, in certain ways the canonical textual representation (which ISNT
> how you're supposed to edit things and would be extremely unwieldy) is

Yes. It would probably be XML just to make life easier for tools (and  
to allow by hand editing in the rare cases when you must fix  
something) but 99% of the time the AST on disk representation is  
simply a black box. You must always use the tools.

> The biggest issue remains that so much of the entirety of the
> development ecosystem is built around the notion that source lives as
> raw streams of characters. There would definitely have to be a human-
> readable canonical representation so you can interop with such tools
> until they also see the light. There may also be an interesting lesson
> in how many typical geeks doing professional writing use something
> like HTML or LaTeX, writing it essentially 'raw', instead of using
> open office or word. I think there are different reasons for that, but
> it is nevertheless interesting to see that shiny, graphical tools are
> losing to raw char streams in some areas.

Yep. Backwards compatibility is a bitch. :)

- J

>
> To the galaxy, and beyond!
>
> On Sep 10, 4:49 pm, Joshua Marinacci <jos...@marinacci.org> wrote:
>> I suspect you are right. I've asked this question of many people and
>> gotten a variety of reasons why it won't work. They reasons are  
>> always
>> valid,  but they always boil down to the same thing: compatibility
>> with existing systems.  If we could start over fresh *for  
>> everything*,
>> then I think a AST based language would quite well, and enable lots  
>> of
>> very interesting things. I've changed the subject line since this is
>> really getting off topic now.  My goal is to just think meta for a
>> second.
>>
>> If we could design a language, and all of it's tools, from scratch
>> today; then how would we do things differently?
>>
>> == Proposal ==
>>
>> Consider a language that is defined not in terms of tokens but in
>> terms of it's abstract syntax tree (I'm not a compiler guy so I hope
>> I'm using the right terms here). Instead of saying:
>>
>>         conditional is defined by 'if' + '(' + mathematical  
>> expression + ')'
>> plus optional '{' then a clause etc.
>>
>> what if it was defined as:
>>
>>         conditional is defined by a boolean expression followed by  
>> two blocks
>>
>> The details such as the 'if' keyword, requiring braces, using
>> parenthesis, etc. would all be up to the individual developer (or at
>> least defined by their tools) rather than defined by the language.
>> Some sort of neutral binary or XML format would be the true storage
>> mechanism and everything else we think of as "the language" would be
>> defined at a higher level on a per developer basis.  The neutral
>> format would be semantically equivalent to the code that the  
>> developer
>> sees on screen, but specific entirely to them.
>>
>> == Advantages ==
>>
>> There are huge advantages to this approach.
>>
>> * tabs vs spaces goes away. You see whatever you wish to see, and it
>> doesn't affect other developers.
>>
>> * comments could be nicely formatted rich text, including lists,
>> tables, and diagrams.
>>
>> * Line numbers in stacktraces:  Consider the work required to turn  
>> the
>> location of the bytecode exception back into a line and column  
>> number.
>> It would be easier to map back to the AST. The compiler / runtime
>> would emit some sort of AST marker which the IDE would convert back  
>> to
>> it's visualization of your line / column (assuming you are still
>> editing in terms of lines and columns). Most likely it would  
>> highlight
>> the exact problematic branch of the tree, not just a line and column.
>>
>> * refactoring becomes far easier, and could enable far more
>> interesting refactoring changes than the simple ones we have to day.
>>
>> * since we are using a binary / xml blob for the real storage, we
>> wouldn't have to worry about files and filenames anymore. What would
>> matter is modules and compilation units. The actual files it's stored
>> in become irrelevant.
>>
>> * code analysis: tools which analyze your code should be able to do a
>> better job when they work at the 'meaning' level rather than the
>> 'syntax' level.
>>
>> * code visualizers: It should be trivial to build things which draw
>> UML diagrams of your beans, or show nested structures with darkening
>> backgrounds. Almost all of the cool things you want to do boil down  
>> to
>> visualizing a branch of the tree, making possible all sorts of very
>> interesting visualizations.
>>
>> * many syntax errors go away: since the IDE knows what the valid tree
>> should look like, it can prevent anything which would create an
>> invalid tree. rather than scanning the whole file 20 times a second  
>> it
>> can look at what you've done in the last few seconds that just made
>> the tree invalid and isolate the error to that. the result is more
>> accurate error reporting, even before you get to the compiler.
>>
>> * never ever worry about some other developer f**king up your
>> indentation, line breaks, curly brace scheme, etc.
>>
>> * the potential to use different keywords, line terminators, and  
>> other
>> syntax of your choosing and have it be completely isolated to your
>> environment. No other developer is affected.
>>
>> == Cons ==
>>
>> * You've got to use an IDE. Yes, no more blindly editing text files
>> with vi and emacs. Sorry. It's the 21st century. I edit images in
>> Photoshop, not the command line. I will now edit programs in a
>> programming tool.
>>
>> * Youv'e got to write IDE support for this. Building this new  
>> language
>> requires also building an IDE plugin that understands it.
>>
>> * Text diff tools (and therefore source control systems) would have  
>> to
>> be updated to understand this binary / xml format. In theory the  
>> diffs
>> should be better since you'd have a better idea of what semantically
>> changed (tree diffing, basically), but someone's still go to write  
>> the
>> tools to do it.
>>
>> * Two developers working on their own machines would see the code
>> views they expect. One developer trying to help a second developer on
>> his machine would see a view completely unfamiliar to what they  
>> expect.
>>
>> * Web based code review tools would show a normalized view that is
>> unfamiliar to all developers, or else code review tools would have to
>> be a new module inside the IDE to pick up the prefs of the developer
>> doing the reviewing.
>>
>> Crazy idea, but it's the 21st century. We can handle it.  Now if
>> you'll excuse me I've got to go take my flying car in for repairs
>> before my weekend trip to Mars.
>>
>> - j
>>
>> On Sep 10, 2009, at 1:28 AM, Peter Becker wrote:
>>
>>
>>
>>
>>
>>> And it alls starts with the language specs still being written at  
>>> the
>>> abstraction level of a concrete syntax. Chapter 1: Tokenization.
>>
>>>  Peter
>>
>>> Joshua Marinacci wrote:
>>>> RANT!
>>
>>>> Why, in the 21st century, are we still writing code with ascii
>>>> symbols
>>>> in text editors, and worried about the exact indentation and  
>>>> whether
>>>> to use tabs, spaces, etc?!!
>>
>>>> Since the IDE knows the structure of our code, why aren't we just
>>>> sharing ASTs directly, letting your IDE format it to your desire,  
>>>> and
>>>> only sharing the underlying AST with your fellow developers.
>>>> Encoding,
>>>> spaces, braces, etc. is a detail that only matters when presented  
>>>> to
>>>> the human.
>>
>>>> What we do today is like editing image files from the commandline!
>>
>>>> On Sep 9, 2009, at 7:32 PM, Ryan Waterer wrote:
>>
>>>>> While experienced programmers might not worry about the braces  
>>>>> on a
>>>>> single line, they become invaluable to any junior programmers.   
>>>>> I've
>>>>> trained a few in which they couldn't understand why the following
>>>>> code segment simply stopped working.  (Let's not even start a
>>>>> discussion about System.out.println as a valid debugging tool, ok?
>>>>> This is just an example of a n00blet mistake )
>>
>>>>> for (int y = 0; y < lines; y++)
>>>>>   for (int x = 0; x < columns; x++)
>>>>>      System.out.println("The sum is: " + sum);
>>>>>       sum += cells[y][x];
>>
>>>>> I agree that the braces add a bit of "clutter" to the visual look
>>>>> and
>>>>> feel of code.  However,  I feel that it helps with the overall
>>>>> maintainability of the code and therefore, I disregard the way  
>>>>> that
>>>>> it looks.
>>
>>>>> --Ryan
>>
>>>>> On Wed, Sep 9, 2009 at 8:24 PM, Jess Holle <je...@ptc.com
>>>>> <mailto:je...@ptc.com>> wrote:
>>
>>>>>    I'll agree on the newlines and indents, but the braces are  
>>>>> silly.
>>
>>>>>    One might debate the extra whitespace inside the ()'s, but I  
>>>>> find
>>>>>    it more readable with the whitespace -- to each his/her own in
>>>>>    that regard.
>>
>>>>>    TorNorbye wrote:
>>>>>>    On Sep 9, 5:27 pm, Reinier Zwitserloot <reini...@gmail.com>
>>>>>> <mailto:reini...@gmail.com> wrote:
>>
>>>>>>>    Here's a line from my code:
>>
>>>>>>>    for ( int y = 0 ; x < lines ; y++ ) for ( int x = 0 ; x <
>>>>>>> columns ; x+
>>>>>>>    + ) sum += cells[y][x];
>>
>>>>>>    I guess that's where we disagree.
>>
>>>>>>    for (int y = 0; y < lines; y++) {
>>>>>>        for (int x = 0; x < columns; x++) {
>>>>>>            sum += cells[y][x];
>>>>>>        }
>>>>>>    }
>>
>>>>>>    is IMHO better because:
>>>>>>    (a) I can see immediately that I'm dealing with a nested
>>>>>> construct
>>>>>>    here, and that's it's O(n^2)
>>>>>>    (b) I can more easily set breakpoints on individual statements
>>>>>> of this
>>>>>>    code while debugging - and similarly other "line oriented"
>>>>>> operations
>>>>>>    (like quickfixes etc) get more cluttery when it's all on one
>>>>>> line.
>>>>>>    Profiling data / statement counts / code coverage highlighting
>>>>>> for the
>>>>>>    line is also trickier when you mash multiple statements into
>>>>>> one line.
>>>>>>    (c) I think it's less likely that I would have made the "x <
>>>>>> lines"
>>>>>>    error that was in your code when typing it this way because  
>>>>>> the
>>>>>>    handling of y and x were done separately on separate lines
>>>>>> (though
>>>>>>    this is a bit speculative)
>>>>>>    (d) I removed your spaces inside the parentheses, because they
>>>>>> are
>>>>>>    Bad! Bad!
>>
>>>>>>    (Ok c and d are padding)
>>
>>>>>>    I am -not- looking to minimize the number of lines needed to
>>>>>> express
>>>>>>    code.  If I wanted that, I'd be coding in Perl.  I
>>>>>> deliberately add
>>>>>>    newlines to make the code more airy and to group logical
>>>>>> operations
>>>>>>    together. I always insert a newline before the final return-
>>>>>> statement
>>>>>>    from a function etc.
>>
>>>>>>    I think the extra vertical space you've gained, which arguably
>>>>>> could
>>>>>>    help you orient yourself in your code by showing more of the
>>>>>>    surrounding context, is lost because the code itself is denser
>>>>>> and
>>>>>>    more difficult to visually scan.
>>
>>>>>>    Oh no, a formatting flamewar -- what have I gotten myself  
>>>>>> into?
>>
>>>>>>    -- Tor
>>
>>>>>>    P.S. No tabs!
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javaposse@googlegroups.com
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: AST based language: was Re: [The Java Posse] Re: JavaFX - oddities in the language? Week 2.

Reply via email to