Hi

The current Simple language in camel-core has reached its potential in
terms of maintenance and how easy it would be, for example to add new
functions and operators. Likewise the current error reporting is not
precise to point out where in the expression String the problem is.

The implementation is using regular expressions, and that is one of
the key problems. I think we have grown to the limit how it is to
maintain.

So I have experimented this weekend to build a prototype based on the
principle of a recursive descent parser
http://en.wikipedia.org/wiki/Recursive_descent_parser

I at first looked into using a parser framework such as JavaCC and
ANTLR. The former would be able to parse the input, but building the
AST nodes would require to use its tree compiler, which was not maven
exposed. Likewise JavaCC is not really maintained anymore. People on
stackoverflow recommended ANTLR. It has more bells ans whistles, but
as far as I could see ANTLR requires JAR files in the runtime.

And frankly I just wanted a fairly simple code, that anybody, would be
able to look into and help with.

So I cracked up some prototype code based on the principle from that
wikipedia article above.

So far I got a working prototype that is much better at parsing and
reporting exactly where the problem is.
For example suppose you do a predicate in a content based router, such as:

<simple>${header.high} = true</simple>

Notice how I have mistyped the == operator, as there is only one = sign.

In the old code, the reg exp parser would not catch this problem, and
the predicate would be evaluated to true, at runtime. There is two
reasons for that. The old parser is not good at detecting errors and
being able to pin point the problem. The grammer is not really defined
that well, as its based on a somewhat complicated regular expression.
The 2nd issues is the old parser would evaluate both predicates and
expressions, as expressions first, and then convert the expression to
a predicate. So in that given example above, it would be rendered as
"someHeaderValue = true" as an expression. And when converted to a
predicate it would be true, as the expression is not empty.

The new parser in the prototype is improved as it
- runs in two modes: predicate or expression
- has a grammer and is able to parse the input, and report precisely
where the problem is.

So for example what you see now is

org.apache.camel.language.simple.SimpleIllegalSyntaxException:
unexpected character symbol at location 15
${header.high} = true
               *

And then there is a * sign below where the problem is. Now if you
would show above text using monospaced font, you would see the star
below the = sign.

The exception message can be improved even more, as we could say
something about an unknown operator, etc.

Likewise I decided to let the new simple language be more restrictive
in terms of function placeholders. I decided that you now must always
use ${ } placeholders, as it makes the parsing easier, as well as for
end users, there is no confusion.

header.high == true
    Should be written as
${header.high} == true


The prototype is currently in my github at
https://github.com/davsclaus/camel-simple2


There is still some work to do
- implement the remainder binary operators (basically copy code from
old and adjust a little bit)
- add support for "and" and "or" grouping operators, I am inclined to
rename them to "&&" and "||" which is the operators you would use in
Java code etc.
- add support for "++" and "--" unary operators, so people for example
can use that to increment a counter in XML DSL without using any java
code
- possible add support for ( ) groups so you can define precedences
- possible add support for math operators if they would make sense
- refine error messages just a tad
- the parser is currently not thread safe, so we may want to refine
this (it stores some state during parsing)
- after parsing there is a bit logic to prepare the AST before we turn
it into Camel expression/predicates. Especially due to blocks and
binary operators. We may relax this as the binary operators could
potential "whirl through" the whitespace noise and be able to detect
its right and left hand side expressions. Currently I am removing any
in between noise, so the right/left hand side is exactly next-to the
operator.


So if anyone wanna help out, or have ideas for improvements, or have
any grief with the old simple language, that we can fix in the new
code, then fell free to help out.




-- 
Claus Ibsen
-----------------
FuseSource
Email: cib...@fusesource.com
Web: http://fusesource.com
Twitter: davsclaus, fusenews
Blog: http://davsclaus.blogspot.com/
Author of Camel in Action: http://www.manning.com/ibsen/

Reply via email to