On 20.01.2011 16:29, Benjamin Young wrote:
On 1/18/11 5:47 PM, Benoit Chesneau wrote:
On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau<bchesn...@gmail.com>
wrote:
There are 2 tickets open for the rewriter :

https://issues.apache.org/jira/browse/COUCHDB-1017
https://issues.apache.org/jira/browse/COUCHDB-1005

First one is about testing types of value to eventually encode them
(or decode) from the path or query string. 1017 speak about strings
but it could be integer as well. This isn't possible actually.

Second is to have a more enhanced rewriter. First intention of
_rewriter was to offer a simple way to dispatch urls to a resource
(_show, _update, _list, _view, doc, attachment) based on path terms
(string, ':var", "*"). Path specifications are obtained by breaking
url into tokens via the "/" separator, Then we match them against path
terms. That's how we find urls. There is also the possibility to use
query arguments as a path term. A rewriter like this is the easier
implementation we found, and as is the only that obtained a consensus.

The feature asked in 1005 need more power than simple pattern matching.

The more people will use CouchApps with CouchDB facing directly to the
web (without any proxy), the more people will ask for such features.

I see 2 alternatives and easy pattern matching we can use to solve
such problem:


1.

Put var between "<>" like this<key>,
Then eventually say what is the type of the variable :<int:key> for
integer.

Ex:

{
"from": "/a/b/<key>/<int:id>",
"to":"/c/<key>",
"query": {
"key": "<int:key>"
}
}

/a/b/c/13 -> /c/c?key=13


This solve 1017 and potentially 1005 .

2. Use mongrel2 pattern matching:

<snip>
URL patterns always match from the start, routes are broken into
prefix and pattern part. We uses the routes to find the longest
matching prefix and then tests the pattern. If the pattern matches,
then the route works. If the route doesn't have a pattern, then it's
assumed to match, and you're done.

The only caveat is you have to wrap your pattern parts in parenthesis,
but these don't mean anything other than to delimit where a pattern
starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
work.

Here's the list of characters you can use in your patterns:

. (period) All characters.
\a Letters.
\c Control characters.
\d Digits.
\l Lowercase letters.
\p Punctuation characters.
\s Space characters.
\u Uppercase letters.
\w Alphanumeric characters.
\x Hexadecimal digits.
\z The 0 character (null terminator).
[set] Just like a regex [] where is a set of chars, like [0-9] for
all digits.
[^set] Inverse character set, so [^0-9] is anything but digits.
⋆ Longest match of 0 or more of the preceding character.
+ Longest match of 1 or more of the preceding character.
- Shortest match of 0 or more of the preceding character.
? 0 or 1 match of of the preceding character
\bxy Balanced match a substring starting with x and ending in y. So
\b() will match balanced parentheses.
$ End of the string.
Using the uppercase version of an escaped character makes it work the
opposite way (i.e., \A matches any character that isn't a letter). The
backslash can be used to escape the following character, disabling its
special abilities (i.e., \\ will match a backslash).

Anything that's not listed here is matched literally.

</snip>

This solution is really simple, remove the useless things you have in
regexp and give complete power to the users. Also this kind of parsing
is relatively easy to do in erlang.


There may be a third solution. If we use something like emonk, erlv8,
... we could have the rewriter in a js function. But it won't happend
in next 6 months . I'm pretty supporter of the second solution though,
and quite ready to start a new parser.

Any thoughts ?


- benoît

Since then I started couchapp_legacy :

https://github.com/benoitc/couchapp_legacy

It embed a new rewriter doing both reversed and regexp based
dispatching with some other features like :

- Resource handlers plugin system, actually a rewriter and a proxy
handler.
- Route caching: rules are build only on first access or when the
design doc is changed.

TODO:
- variable transformations : string -> int for ex


There will be other features in couchapp_legacy plugin (current name)
soon. Hope it helps to push the conversation further.

- benoit
Benoit,

Thanks for starting this conversation! :) I'd played with building a
RegEx-based rewriter for CouchDB, but I'm new to Erlang, so it's no
where near production ready. It's great to see someone else has an
interest in this piece of the puzzle as well.

In the legacy couchapp there's a route that uses an options section to
define patterns. It seems like a promising direction for extending the
rewriter. I'd like to propose we build something like this:

{
"method":"GET",
"from": "/page/:page",
"to": "/_show/post/:page",
"params": {
"page": {
"match": "\\w*",
"type": "string"
}
}
}

If the parameter appears in the params section, we should use it's
"match" rather than that standard (.*) pattern. "type" in that section
would refer to the output type. Variables would continue to be
represented with the colon notation to keep the URL space clean (vs.
using RegEx in the URL as I'd planned to do).

One other helpful addition might be an "engine" option to set the
matching system to use. I'd prefer using PCRE, you've mentioned Mongrel,
someone else might want grep. :)

Thanks for starting this discussion, Benoit. I look forward to your
thoughts.

Later,
Benjamin

Benjamin,

this is a quite simple example. Should the rewriter still be based on path, i.e. on slashes as separator (as it currently is), or would also things like this be possible:

{
  "from": "/page/:x/:y/:z",
  "to": "/_show/post/:x-:y-:z/something",
  "params": {
    "x": {
    "match": "\\d",
  },
    "y": {
    "match": "\\d",
  },
    "z": {
    "match": "\\d",
  }
}

Cheers,
  Volker

Reply via email to