On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau <bchesn...@gmail.com> wrote: > There are 2 tickets open for the rewriter : > > https://issues.apache.org/jira/browse/COUCHDB-1017 > https://issues.apache.org/jira/browse/COUCHDB-1005 > > First one is about testing types of value to eventually encode them > (or decode) from the path or query string. 1017 speak about strings > but it could be integer as well. This isn't possible actually. > > Second is to have a more enhanced rewriter. First intention of > _rewriter was to offer a simple way to dispatch urls to a resource > (_show, _update, _list, _view, doc, attachment) based on path terms > (string, ':var", "*"). Path specifications are obtained by breaking > url into tokens via the "/" separator, Then we match them against path > terms. That's how we find urls. There is also the possibility to use > query arguments as a path term. A rewriter like this is the easier > implementation we found, and as is the only that obtained a consensus. > > The feature asked in 1005 need more power than simple pattern matching. > > The more people will use CouchApps with CouchDB facing directly to the > web (without any proxy), the more people will ask for such features. > > I see 2 alternatives and easy pattern matching we can use to solve such > problem: > > > 1. > > Put var between "<>" like this <key>, > Then eventually say what is the type of the variable : <int:key> for integer. > > Ex: > > { > "from": "/a/b/<key>/<int:id>", > "to":"/c/<key>", > "query": { > "key": "<int:key>" > } > } > > /a/b/c/13 -> /c/c?key=13 > > > This solve 1017 and potentially 1005 . > > 2. Use mongrel2 pattern matching: > > <snip> > URL patterns always match from the start, routes are broken into > prefix and pattern part. We uses the routes to find the longest > matching prefix and then tests the pattern. If the pattern matches, > then the route works. If the route doesn't have a pattern, then it's > assumed to match, and you're done. > > The only caveat is you have to wrap your pattern parts in parenthesis, > but these don't mean anything other than to delimit where a pattern > starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to > work. > > Here's the list of characters you can use in your patterns: > > . (period) All characters. > \a Letters. > \c Control characters. > \d Digits. > \l Lowercase letters. > \p Punctuation characters. > \s Space characters. > \u Uppercase letters. > \w Alphanumeric characters. > \x Hexadecimal digits. > \z The 0 character (null terminator). > [set] Just like a regex [] where is a set of chars, like [0-9] for all digits. > [^set] Inverse character set, so [^0-9] is anything but digits. > ⋆ Longest match of 0 or more of the preceding character. > + Longest match of 1 or more of the preceding character. > - Shortest match of 0 or more of the preceding character. > ? 0 or 1 match of of the preceding character > \bxy Balanced match a substring starting with x and ending in y. So > \b() will match balanced parentheses. > $ End of the string. > Using the uppercase version of an escaped character makes it work the > opposite way (i.e., \A matches any character that isn't a letter). The > backslash can be used to escape the following character, disabling its > special abilities (i.e., \\ will match a backslash). > > Anything that's not listed here is matched literally. > > </snip> > > This solution is really simple, remove the useless things you have in > regexp and give complete power to the users. Also this kind of parsing > is relatively easy to do in erlang. > > > There may be a third solution. If we use something like emonk, erlv8, > ... we could have the rewriter in a js function. But it won't happend > in next 6 months . I'm pretty supporter of the second solution though, > and quite ready to start a new parser. > > Any thoughts ? > > > - benoît >
Since then I started couchapp_legacy : https://github.com/benoitc/couchapp_legacy It embed a new rewriter doing both reversed and regexp based dispatching with some other features like : - Resource handlers plugin system, actually a rewriter and a proxy handler. - Route caching: rules are build only on first access or when the design doc is changed. TODO: - variable transformations : string -> int for ex There will be other features in couchapp_legacy plugin (current name) soon. Hope it helps to push the conversation further. - benoit