On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau <bchesn...@gmail.com> wrote:
> There are 2 tickets open for the rewriter :
>
> https://issues.apache.org/jira/browse/COUCHDB-1017
> https://issues.apache.org/jira/browse/COUCHDB-1005
>
> First one is about testing types of value to eventually encode them
> (or decode) from the path or query string. 1017 speak about strings
> but it could be integer as well. This isn't possible actually.
>
> Second is to have a more enhanced rewriter.  First intention of
> _rewriter was to offer a simple way to dispatch urls to a resource
> (_show, _update, _list, _view, doc, attachment) based on path terms
> (string, ':var", "*"). Path specifications are obtained by breaking
> url into tokens via the "/" separator, Then we match them against path
> terms. That's how we find urls. There is also the possibility to use
> query arguments as a path term.  A rewriter like this is the easier
> implementation we found, and as is the only that obtained a consensus.
>
> The feature asked in 1005 need more power than simple pattern matching.
>
> The more people will use CouchApps with CouchDB facing directly to the
> web (without any proxy), the more people will ask for such features.
>
> I see 2 alternatives and easy pattern matching we can use to solve such 
> problem:
>
>
> 1.
>
> Put var between "<>" like this <key>,
> Then eventually say what is the type of the variable : <int:key> for integer.
>
> Ex:
>
> {
>     "from": "/a/b/<key>/<int:id>",
>     "to":"/c/<key>",
>     "query": {
>          "key": "<int:key>"
>      }
> }
>
> /a/b/c/13 -> /c/c?key=13
>
>
> This solve 1017 and potentially 1005 .
>
> 2. Use mongrel2 pattern matching:
>
> <snip>
> URL patterns always match from the start, routes are broken into
> prefix and pattern part. We uses the routes to find the longest
> matching prefix and then tests the pattern. If the pattern matches,
> then the route works. If the route doesn't have a pattern, then it's
> assumed to match, and you're done.
>
> The only caveat is you have to wrap your pattern parts in parenthesis,
> but these don't mean anything other than to delimit where a pattern
> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
> work.
>
> Here's the list of characters you can use in your patterns:
>
> . (period) All characters.
> \a Letters.
> \c Control characters.
> \d Digits.
> \l Lowercase letters.
> \p Punctuation characters.
> \s Space characters.
> \u Uppercase letters.
> \w Alphanumeric characters.
> \x Hexadecimal digits.
> \z The 0 character (null terminator).
> [set] Just like a regex [] where is a set of chars, like [0-9] for all digits.
> [^set] Inverse character set, so [^0-9] is anything but digits.
> ⋆ Longest match of 0 or more of the preceding character.
> + Longest match of 1 or more of the preceding character.
> - Shortest match of 0 or more of the preceding character.
> ? 0 or 1 match of of the preceding character
> \bxy Balanced match a substring starting with x and ending in y. So
> \b() will match balanced parentheses.
> $ End of the string.
> Using the uppercase version of an escaped character makes it work the
> opposite way (i.e., \A matches any character that isn't a letter). The
> backslash can be used to escape the following character, disabling its
> special abilities (i.e., \\ will match a backslash).
>
> Anything that's not listed here is matched literally.
>
> </snip>
>
> This solution is really simple, remove the useless things you have in
> regexp and give complete power to the users. Also this kind of parsing
> is relatively easy to do in erlang.
>
>
> There may be a third solution. If we use something like emonk, erlv8,
> ... we could have the rewriter in a js function. But it won't happend
> in next 6 months . I'm pretty supporter of the second solution though,
> and quite ready to start a new parser.
>
> Any thoughts ?
>
>
> - benoît
>

Since then I started couchapp_legacy :

https://github.com/benoitc/couchapp_legacy

It embed a new rewriter doing both reversed  and regexp based
dispatching with some other features like :

- Resource handlers plugin system, actually a rewriter and a proxy handler.
- Route caching: rules are build only on first access or when the
design doc is changed.

TODO:
- variable transformations : string -> int for ex


There will be other features in couchapp_legacy plugin (current name)
soon. Hope it helps to push the conversation further.

- benoit

Reply via email to