I suspect you are right. I've asked this question of many people and gotten a variety of reasons why it won't work. They reasons are always valid, but they always boil down to the same thing: compatibility with existing systems. If we could start over fresh *for everything*, then I think a AST based language would quite well, and enable lots of very interesting things. I've changed the subject line since this is really getting off topic now. My goal is to just think meta for a second.
If we could design a language, and all of it's tools, from scratch today; then how would we do things differently? == Proposal == Consider a language that is defined not in terms of tokens but in terms of it's abstract syntax tree (I'm not a compiler guy so I hope I'm using the right terms here). Instead of saying: conditional is defined by 'if' + '(' + mathematical expression + ')' plus optional '{' then a clause etc. what if it was defined as: conditional is defined by a boolean expression followed by two blocks The details such as the 'if' keyword, requiring braces, using parenthesis, etc. would all be up to the individual developer (or at least defined by their tools) rather than defined by the language. Some sort of neutral binary or XML format would be the true storage mechanism and everything else we think of as "the language" would be defined at a higher level on a per developer basis. The neutral format would be semantically equivalent to the code that the developer sees on screen, but specific entirely to them. == Advantages == There are huge advantages to this approach. * tabs vs spaces goes away. You see whatever you wish to see, and it doesn't affect other developers. * comments could be nicely formatted rich text, including lists, tables, and diagrams. * Line numbers in stacktraces: Consider the work required to turn the location of the bytecode exception back into a line and column number. It would be easier to map back to the AST. The compiler / runtime would emit some sort of AST marker which the IDE would convert back to it's visualization of your line / column (assuming you are still editing in terms of lines and columns). Most likely it would highlight the exact problematic branch of the tree, not just a line and column. * refactoring becomes far easier, and could enable far more interesting refactoring changes than the simple ones we have to day. * since we are using a binary / xml blob for the real storage, we wouldn't have to worry about files and filenames anymore. What would matter is modules and compilation units. The actual files it's stored in become irrelevant. * code analysis: tools which analyze your code should be able to do a better job when they work at the 'meaning' level rather than the 'syntax' level. * code visualizers: It should be trivial to build things which draw UML diagrams of your beans, or show nested structures with darkening backgrounds. Almost all of the cool things you want to do boil down to visualizing a branch of the tree, making possible all sorts of very interesting visualizations. * many syntax errors go away: since the IDE knows what the valid tree should look like, it can prevent anything which would create an invalid tree. rather than scanning the whole file 20 times a second it can look at what you've done in the last few seconds that just made the tree invalid and isolate the error to that. the result is more accurate error reporting, even before you get to the compiler. * never ever worry about some other developer f**king up your indentation, line breaks, curly brace scheme, etc. * the potential to use different keywords, line terminators, and other syntax of your choosing and have it be completely isolated to your environment. No other developer is affected. == Cons == * You've got to use an IDE. Yes, no more blindly editing text files with vi and emacs. Sorry. It's the 21st century. I edit images in Photoshop, not the command line. I will now edit programs in a programming tool. * Youv'e got to write IDE support for this. Building this new language requires also building an IDE plugin that understands it. * Text diff tools (and therefore source control systems) would have to be updated to understand this binary / xml format. In theory the diffs should be better since you'd have a better idea of what semantically changed (tree diffing, basically), but someone's still go to write the tools to do it. * Two developers working on their own machines would see the code views they expect. One developer trying to help a second developer on his machine would see a view completely unfamiliar to what they expect. * Web based code review tools would show a normalized view that is unfamiliar to all developers, or else code review tools would have to be a new module inside the IDE to pick up the prefs of the developer doing the reviewing. Crazy idea, but it's the 21st century. We can handle it. Now if you'll excuse me I've got to go take my flying car in for repairs before my weekend trip to Mars. - j On Sep 10, 2009, at 1:28 AM, Peter Becker wrote: > > And it alls starts with the language specs still being written at the > abstraction level of a concrete syntax. Chapter 1: Tokenization. > > Peter > > > Joshua Marinacci wrote: >> RANT! >> >> Why, in the 21st century, are we still writing code with ascii >> symbols >> in text editors, and worried about the exact indentation and whether >> to use tabs, spaces, etc?!! >> >> Since the IDE knows the structure of our code, why aren't we just >> sharing ASTs directly, letting your IDE format it to your desire, and >> only sharing the underlying AST with your fellow developers. >> Encoding, >> spaces, braces, etc. is a detail that only matters when presented to >> the human. >> >> What we do today is like editing image files from the commandline! >> >> On Sep 9, 2009, at 7:32 PM, Ryan Waterer wrote: >> >>> While experienced programmers might not worry about the braces on a >>> single line, they become invaluable to any junior programmers. I've >>> trained a few in which they couldn't understand why the following >>> code segment simply stopped working. (Let's not even start a >>> discussion about System.out.println as a valid debugging tool, ok? >>> This is just an example of a n00blet mistake ) >>> >>> for (int y = 0; y < lines; y++) >>> for (int x = 0; x < columns; x++) >>> System.out.println("The sum is: " + sum); >>> sum += cells[y][x]; >>> >>> >>> I agree that the braces add a bit of "clutter" to the visual look >>> and >>> feel of code. However, I feel that it helps with the overall >>> maintainability of the code and therefore, I disregard the way that >>> it looks. >>> >>> --Ryan >>> >>> >>> On Wed, Sep 9, 2009 at 8:24 PM, Jess Holle <je...@ptc.com >>> <mailto:je...@ptc.com>> wrote: >>> >>> I'll agree on the newlines and indents, but the braces are silly. >>> >>> One might debate the extra whitespace inside the ()'s, but I find >>> it more readable with the whitespace -- to each his/her own in >>> that regard. >>> >>> >>> TorNorbye wrote: >>>> On Sep 9, 5:27 pm, Reinier Zwitserloot <reini...@gmail.com> >>>> <mailto:reini...@gmail.com> wrote: >>>> >>>>> Here's a line from my code: >>>>> >>>>> for ( int y = 0 ; x < lines ; y++ ) for ( int x = 0 ; x < >>>>> columns ; x+ >>>>> + ) sum += cells[y][x]; >>>>> >>>> I guess that's where we disagree. >>>> >>>> for (int y = 0; y < lines; y++) { >>>> for (int x = 0; x < columns; x++) { >>>> sum += cells[y][x]; >>>> } >>>> } >>>> >>>> is IMHO better because: >>>> (a) I can see immediately that I'm dealing with a nested >>>> construct >>>> here, and that's it's O(n^2) >>>> (b) I can more easily set breakpoints on individual statements >>>> of this >>>> code while debugging - and similarly other "line oriented" >>>> operations >>>> (like quickfixes etc) get more cluttery when it's all on one >>>> line. >>>> Profiling data / statement counts / code coverage highlighting >>>> for the >>>> line is also trickier when you mash multiple statements into >>>> one line. >>>> (c) I think it's less likely that I would have made the "x < >>>> lines" >>>> error that was in your code when typing it this way because the >>>> handling of y and x were done separately on separate lines >>>> (though >>>> this is a bit speculative) >>>> (d) I removed your spaces inside the parentheses, because they >>>> are >>>> Bad! Bad! >>>> >>>> (Ok c and d are padding) >>>> >>>> I am -not- looking to minimize the number of lines needed to >>>> express >>>> code. If I wanted that, I'd be coding in Perl. I >>>> deliberately add >>>> newlines to make the code more airy and to group logical >>>> operations >>>> together. I always insert a newline before the final return- >>>> statement >>>> from a function etc. >>>> >>>> I think the extra vertical space you've gained, which arguably >>>> could >>>> help you orient yourself in your code by showing more of the >>>> surrounding context, is lost because the code itself is denser >>>> and >>>> more difficult to visually scan. >>>> >>>> Oh no, a formatting flamewar -- what have I gotten myself into? >>>> >>>> -- Tor >>>> >>>> P.S. No tabs! >>>> >>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >> >> >>> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "The Java Posse" group. To post to this group, send email to javaposse@googlegroups.com To unsubscribe from this group, send email to javaposse+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/javaposse?hl=en -~----------~----~----~----~------~----~------~--~---