Casper: I don't think you grokked my point. I'm saying it's impossible to build any java, vanilla or otherwise, that can handle this. For the reasons I stated: You'd have to flip the architecture upside down and resolve 'DSL' properly midway through tokenizing it. Be aware that this automatically means that any error caused by the DSL provider *HAS TO* stop the parsing right there on the spot, no further error reporting for anything that follows the DSL block. Tricks IDEs do to make a class file with whatever methods have syntax errors in them replaced with dummies that throw exceptions would be impossible.
You'd be giving up an awful lot. Don't get me wrong, I love the idea, but I haven't seen a workable proposal yet. I'm leaning towards the notion that it's impossible to get right. Fan tries to use a sufficiently arcane separator (bar- angle, so <| special code goes here |>), but if java uses the same thing, then you can't embed fan in java. That's not a solution. Here's a simplistic approach to something that might actually work: 1. identifier resolution is decoupled from the rest of the source file for parsing. In other words, the parser will parse all import statements, resolve them, and only then continue on its way. 2. blocks start with a hash, followed by a type identifier. This type identifier is resolved only according to import statements; to make this smooth, the definitions for how to handle these blocks MUST ALWAYS be top level members, no exceptions. Now the parser does not have to consider inner classes and such to resolve the name; the process of checking the current package and all import statements suffices. 3. The tokenizer will remember the character that followed the token (e.g. the non-identifier character that immediately followed the last identifier character in the DSL name, which can be a space, a quote, a brace, whatever), and restuffs this back into the source view. The tokenizer then hands the raw source (as a Reader or some such) off to the .tokenize() method of the provider. The tokenizer MUST return any object, and have consumed exactly up to (and including) the closing element of the DSL block. 4. During compilation, the DSL block (which is an expression which can have an arbitrary type, including void) is translated into a pure java expression by calling the .parse() method of the DSL provider. 5. Exceptions during the tokenize phase result in the immediate end of parsing that java source file, as javac will not know where to continue. Exceptions during the parse method aren't nearly as drastic; it just means there's an error in the DSL block and the block's expression is of an unknown type - certainly not rocket science compared to the advanced error recovery employed in many IDEs. public interface DSLProvider<T> { public T tokenize(SourceReader reader); public String parse(T token, Context c); } some open issues are: What should 'parse' return - there's an argument to be made for: 'bytecode', 'raw java source as a String', and 'a JCExpression object (from javac's internal AST classes). Each has its advantages and disadvantages. Context is some useful construct that allows access to variables legal in the current scope, the filer (for looking up types), and similar things. A lot of this API already exists (annotation processor API). Such a system could rather easily support a wide variety of stuff you may wish to inject into java source files: - String literals - Regexp literals - the compiled regexp tree would be stored into the class file. - XML literals - multiline and/or raw string literals. - python - even including python's whitespace based delimiting as the mechanism to delimit the block ITSELF, if you think that is a good idea. - Clojure, LISP, and other lisp dialects. - just about every programming language in existence (incl. ruby, Javascript, C, C#, C++, fortran, ada, and, sure, why not - APL). The documentation should stress that the .tokenize() method really should try its very best to return and not throw an exception. hypothetical source: int x = #python: 5 + 5 int thisIsJavaAgain; String long = #long """This is a long string where \backslashes need not be escaped""" + "this is parsed by javac again"; Pattern p = #regexp /[abc]d\s+(\d*)/i; Presuming that the context object is sufficiently advanced, this should also be possible, especially if you add a way to parse a java snippet in that context: private final Comparator<Integer> absoluteComparator = #closure Comparator(Integer a, Integer b) { return Integer.compare(Math.abs(a), Math.abs(b)); }; Of course, trying to include java inside such a block has the same issue as javac's original problem: How does the closure DSL provider know where the closure ends without being as complicated as javac's tokenizer? Theoretically java itself could be implemented with this scheme, and you could then start the snippet parser at the 'return' statement, getting a tokenized object back, which, during your parse phase, you can get parsed by calling on javac's own parse method. The central point is this: You have to split tokenizing and parsing. This is yet another instance where fan tries to take the easy way out. On Sep 2, 7:39 pm, Casper Bang <casper.b...@gmail.com> wrote: > > tell me how the compiler could possibly sort this out? The only way is > > for the compiler to hand off the entire process of TOKENIZING this > > stream to the DSL provider for 'longString', which is an entirely > > different architecture - right now all java parsers do the fairly > > usual thing of tokenizing the whole deal, then tree-izing the whole > > thing, and only then starting the process of resolving 'DSL' into > > "java.lang.DSL" or whatever you had in mind. > > Oh sure, I should had mentioned explicitly how this obviously won't > work with a vanilla javac. Anyway here's the original post I was > referring:http://www.jroller.com/scolebourne/entry/enhancing_java_multi_lingual... > > > You'd have to create very specific rules about how the compiler can > > find the end of the DSL string. I've thought about this and have not > > been able to come up with a particularly sensible rule. The only one I > > can think of is to stick to C-esque rules: strings are things in > > double or single quotes, and use backslash internally for escapes, and > > braces are supposed to be matched. However, these restrictions already > > remove most other languages: You can't put python in there (multi-line > > strings will screw up java's parser), you can't put regular > > expressions in there (no rule enforcing matched quotes or braces). You > > can't put XML in there (no rule enforcing matched braces or quotes). > > No go. > > Well it's not a trivial issue no, but this is how it work in > Fan:http://fandev.org/sidewalk/topic/438 > > /Casper --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "The Java Posse" group. To post to this group, send email to javaposse@googlegroups.com To unsubscribe from this group, send email to javaposse+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/javaposse?hl=en -~----------~----~----~----~------~----~------~--~---