rewriter needed changes

Benoit Chesneau Sun, 09 Jan 2011 16:33:16 -0800

There are 2 tickets open for the rewriter :

https://issues.apache.org/jira/browse/COUCHDB-1017
https://issues.apache.org/jira/browse/COUCHDB-1005


First one is about testing types of value to eventually encode them
(or decode) from the path or query string. 1017 speak about strings
but it could be integer as well. This isn't possible actually.

Second is to have a more enhanced rewriter.  First intention of
_rewriter was to offer a simple way to dispatch urls to a resource
(_show, _update, _list, _view, doc, attachment) based on path terms
(string, ':var", "*"). Path specifications are obtained by breaking
url into tokens via the "/" separator, Then we match them against path
terms. That's how we find urls. There is also the possibility to use
query arguments as a path term.  A rewriter like this is the easier
implementation we found, and as is the only that obtained a consensus.

The feature asked in 1005 need more power than simple pattern matching.

The more people will use CouchApps with CouchDB facing directly to the
web (without any proxy), the more people will ask for such features.

I see 2 alternatives and easy pattern matching we can use to solve such problem:


1.

Put var between "<>" like this <key>,
Then eventually say what is the type of the variable : <int:key> for integer.

Ex:

{
     "from": "/a/b/<key>/<int:id>",
     "to":"/c/<key>",
     "query": {
          "key": "<int:key>"
      }
}

/a/b/c/13 -> /c/c?key=13


This solve 1017 and potentially 1005 .

2. Use mongrel2 pattern matching:

<snip>
URL patterns always match from the start, routes are broken into
prefix and pattern part. We uses the routes to find the longest
matching prefix and then tests the pattern. If the pattern matches,
then the route works. If the route doesn't have a pattern, then it's
assumed to match, and you're done.

The only caveat is you have to wrap your pattern parts in parenthesis,
but these don't mean anything other than to delimit where a pattern
starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
work.

Here's the list of characters you can use in your patterns:

. (period) All characters.
\a Letters.
\c Control characters.
\d Digits.
\l Lowercase letters.
\p Punctuation characters.
\s Space characters.
\u Uppercase letters.
\w Alphanumeric characters.
\x Hexadecimal digits.
\z The 0 character (null terminator).
[set] Just like a regex [] where is a set of chars, like [0-9] for all digits.
[^set] Inverse character set, so [^0-9] is anything but digits.
⋆ Longest match of 0 or more of the preceding character.
+ Longest match of 1 or more of the preceding character.
- Shortest match of 0 or more of the preceding character.
? 0 or 1 match of of the preceding character
\bxy Balanced match a substring starting with x and ending in y. So
\b() will match balanced parentheses.
$ End of the string.
Using the uppercase version of an escaped character makes it work the
opposite way (i.e., \A matches any character that isn't a letter). The
backslash can be used to escape the following character, disabling its
special abilities (i.e., \\ will match a backslash).

Anything that's not listed here is matched literally.

</snip>

This solution is really simple, remove the useless things you have in
regexp and give complete power to the users. Also this kind of parsing
is relatively easy to do in erlang.


There may be a third solution. If we use something like emonk, erlv8,
... we could have the rewriter in a js function. But it won't happend
in next 6 months . I'm pretty supporter of the second solution though,
and quite ready to start a new parser.

Any thoughts ?


- benoît

rewriter needed changes

Reply via email to