Hi, or just do it like html does it... @DSL(lang="Brainfuck") { /* ++++++++++[>+++++++>++++++++++>+++> +<<<<-]>++.>+.+++++++..+++.>++.<<+++++++ ++++++++.>.+++.------.--------.>+.>. */ }
javac ignores the comment, preprocessors like projectlombok.org could do whatever they want with the comment (convert to string, generate code etc).. (of course the escaping char issue remains...) regards, - - - http://michael-bien.com On Sep 2, 10:22 pm, Reinier Zwitserloot <reini...@gmail.com> wrote: > Casper: I don't think you grokked my point. > > I'm saying it's impossible to build any java, vanilla or otherwise, > that can handle this. For the reasons I stated: You'd have to flip the > architecture upside down and resolve 'DSL' properly midway through > tokenizing it. Be aware that this automatically means that any error > caused by the DSL provider *HAS TO* stop the parsing right there on > the spot, no further error reporting for anything that follows the DSL > block. Tricks IDEs do to make a class file with whatever methods have > syntax errors in them replaced with dummies that throw exceptions > would be impossible. > > You'd be giving up an awful lot. > > Don't get me wrong, I love the idea, but I haven't seen a workable > proposal yet. I'm leaning towards the notion that it's impossible to > get right. Fan tries to use a sufficiently arcane separator (bar- > angle, so <| special code goes here |>), but if java uses the same > thing, then you can't embed fan in java. That's not a solution. > > Here's a simplistic approach to something that might actually work: > > 1. identifier resolution is decoupled from the rest of the source file > for parsing. In other words, the parser will parse all import > statements, resolve them, and only then continue on its way. > > 2. blocks start with a hash, followed by a type identifier. This type > identifier is resolved only according to import statements; to make > this smooth, the definitions for how to handle these blocks MUST > ALWAYS be top level members, no exceptions. Now the parser does not > have to consider inner classes and such to resolve the name; the > process of checking the current package and all import statements > suffices. > > 3. The tokenizer will remember the character that followed the token > (e.g. the non-identifier character that immediately followed the last > identifier character in the DSL name, which can be a space, a quote, a > brace, whatever), and restuffs this back into the source view. The > tokenizer then hands the raw source (as a Reader or some such) off to > the .tokenize() method of the provider. The tokenizer MUST return any > object, and have consumed exactly up to (and including) the closing > element of the DSL block. > > 4. During compilation, the DSL block (which is an expression which can > have an arbitrary type, including void) is translated into a pure java > expression by calling the .parse() method of the DSL provider. > > 5. Exceptions during the tokenize phase result in the immediate end of > parsing that java source file, as javac will not know where to > continue. Exceptions during the parse method aren't nearly as drastic; > it just means there's an error in the DSL block and the block's > expression is of an unknown type - certainly not rocket science > compared to the advanced error recovery employed in many IDEs. > > public interface DSLProvider<T> { > public T tokenize(SourceReader reader); > public String parse(T token, Context c); > > } > > some open issues are: What should 'parse' return - there's an argument > to be made for: 'bytecode', 'raw java source as a String', and 'a > JCExpression object (from javac's internal AST classes). Each has its > advantages and disadvantages. > > Context is some useful construct that allows access to variables legal > in the current scope, the filer (for looking up types), and similar > things. A lot of this API already exists (annotation processor API). > > Such a system could rather easily support a wide variety of stuff you > may wish to inject into java source files: > > - String literals > - Regexp literals - the compiled regexp tree would be stored into the > class file. > - XML literals > - multiline and/or raw string literals. > - python - even including python's whitespace based delimiting as the > mechanism to delimit the block ITSELF, if you think that is a good > idea. > - Clojure, LISP, and other lisp dialects. > - just about every programming language in existence (incl. ruby, > Javascript, C, C#, C++, fortran, ada, and, sure, why not - APL). > > The documentation should stress that the .tokenize() method really > should try its very best to return and not throw an exception. > > hypothetical source: > > int x = #python: > 5 + 5 > int thisIsJavaAgain; > > String long = #long """This is a long string where \backslashes need > not be escaped""" + "this is parsed by javac again"; > Pattern p = #regexp /[abc]d\s+(\d*)/i; > > Presuming that the context object is sufficiently advanced, this > should also be possible, especially if you add a way to parse a java > snippet in that context: > > private final Comparator<Integer> absoluteComparator = #closure > Comparator(Integer a, Integer b) { return Integer.compare(Math.abs(a), > Math.abs(b)); }; > > Of course, trying to include java inside such a block has the same > issue as javac's original problem: How does the closure DSL provider > know where the closure ends without being as complicated as javac's > tokenizer? Theoretically java itself could be implemented with this > scheme, and you could then start the snippet parser at the 'return' > statement, getting a tokenized object back, which, during your parse > phase, you can get parsed by calling on javac's own parse method. > > The central point is this: You have to split tokenizing and parsing. > This is yet another instance where fan tries to take the easy way out. > > On Sep 2, 7:39 pm, Casper Bang <casper.b...@gmail.com> wrote: > > > > tell me how the compiler could possibly sort this out? The only way is > > > for the compiler to hand off the entire process of TOKENIZING this > > > stream to the DSL provider for 'longString', which is an entirely > > > different architecture - right now all java parsers do the fairly > > > usual thing of tokenizing the whole deal, then tree-izing the whole > > > thing, and only then starting the process of resolving 'DSL' into > > > "java.lang.DSL" or whatever you had in mind. > > > Oh sure, I should had mentioned explicitly how this obviously won't > > work with a vanilla javac. Anyway here's the original post I was > > referring:http://www.jroller.com/scolebourne/entry/enhancing_java_multi_lingual... > > > > You'd have to create very specific rules about how the compiler can > > > find the end of the DSL string. I've thought about this and have not > > > been able to come up with a particularly sensible rule. The only one I > > > can think of is to stick to C-esque rules: strings are things in > > > double or single quotes, and use backslash internally for escapes, and > > > braces are supposed to be matched. However, these restrictions already > > > remove most other languages: You can't put python in there (multi-line > > > strings will screw up java's parser), you can't put regular > > > expressions in there (no rule enforcing matched quotes or braces). You > > > can't put XML in there (no rule enforcing matched braces or quotes). > > > No go. > > > Well it's not a trivial issue no, but this is how it work in > > Fan:http://fandev.org/sidewalk/topic/438 > > > /Casper --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "The Java Posse" group. To post to this group, send email to javaposse@googlegroups.com To unsubscribe from this group, send email to javaposse+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/javaposse?hl=en -~----------~----~----~----~------~----~------~--~---