Re: [Repoze-dev] unifying url dispatch and traversal

Chris McDonough Thu, 11 Jun 2009 19:10:49 -0700

So now that this work is done, I'm having some major problems
explaining its finer points in documentation.  I'm a bit worried that
I'll not explain it satisfactorily, and that will cause support and
adoption issues later.  Sorry about writing the novel below.  I don't
really expect anybody to read this, much less reply, but maybe the act
of writing it will help me think about how to make changes that will
simplify things a bit.

On the BFG trunk, nothing much is different than BFG 0.8+ if you don't
try to use *both* routes and traversal within the same application.
In fact, almost all existing applications will run unchanged.  The
only ones that won't run unchanged are those that make use of a routes
"context factory".

An application that uses Routes exclusively to map URLs to code will
still have declarations like this:

   <route
     path=":foo/:bar"
     name="foobar"
     view=".views.foobar"
     />

   <route
     path=":baz/:buz"
     name="bazbuz"
     view=".views.bazbuz"
     />

In other words, each route typically corresponds with a single view
function, and when the route is matched during a request, the view
attached to it is invoked.  Typically, applications that use only URL
dispatch won't have any "view" statements in them.  Simple enough.

Under the hood, up until 0.9.1, when such route statements were
executed, we'd register a view for a special "IRoutesContext"
interface as the context interface and IRequest as a request interface
using the route name as the view name.  On the BFG trunk, however, we
ditched the "IRoutesContext" interface because we disused the concept
of "context factories" in favor of unifying the idea of a "root
factory" as something both traversal and URL dispatch can use.  So
instead when we run into view declarations like the above on the
trunk, we register a view for each route for the context interface
``None`` (implying any context) and a route-statement-specific
(dynamically-constructed) request interface type using the empty
string as the view name (implying the default view).  In either case,
the point is to make it so that the named view will only be called
when the route it's attached to actually matches, and in the simplest
case they are logically equivalent.

As before, an application that uses *traversal* exclusively to map
URLs to code just won't have any "route" declarations.  Instead, its
ZCML (or bfg_view decorators) will imply declarations that look like
this:

   <view
     name="foobar"
     view=".views.foobar"
     />

   <view
     name="bazbuz"
     view=".views.bazbuz"
     />

The above view statements register a view using the context interface
``None``, the IRequest request interface with a name matching the
name= argument.  The "foobar" view above will match the URL
/a/b/c/foobar or /foobar, etc, assuming that no view is named "a",
"b", or "c" during traversal.  Nothing about this has changed since,
well, forever.

No example applications that use both <route> and <view> declarations
within the same application really exist, but this has always been
possible, and it still is.  It works exactly how it did in 0.8+:

   <route
     path=":foo/:bar"
     name="foobar"
     view=".views.foobar"
     />

   <view
     name="bazbuz"
     view=".views.bazbuz
     />

In all versions of BFG after 0.8 (including the trunk), this will
register a ".views.foobar" view that will be invoked when the url
matches ":foo/:bar" and will register a view named "bazbuz" against
any context/request interface pair that will be invoked when no routes
match and the URL resolves the view name "bazbuz".

So far so good.

The shit starts to hit the fan when I try to explain how to use these
two concepts *together* in more interesting ways. The trunk
unification effort has made this possible.  Here is the catalog of
horrors.

1.  The "view" declaration has grown a "route_name" attribute.

On the trunk, the "view" declaration has sprouted a "route_name"
attribute.  It's meant to associate a particular view declaration with
a route, using the route's name, in order to indicate that the view
should *only be invoked when the route matches*.  For example:

   <route
     path="/abc"
     name="abc"
     view=".views.abc"
     />

   <view
     name="bazbuz"
     view=".views.bazbuz"
     route_name="abc"
     />

The above <view> declaration is completely useless, because the view
name will never be matched when the route it references matches.  Only
the view associated with the route itself (".views.abc") will ever be
invoked when the route matches, because the default view is always
invoked when a route matches and when no post-match traversal is
performed.  But, if you add a special token to the route's "path"
named "*traverse" that matches a path remainder, associating a <view>
statement with a <route> statement starts to make a bit more sense:

   <route
     path="/abc/*traverse"
     name="abc"
     view=".views.abc"
     />

   <view
     name="bazbuz"
     view=".views.bazbuz"
     route_name="abc"
     />

Under this circumstance, traversal is performed *after* the route
matches.  So a url like "/abc/bazbuz" (and potentially
"/abc/def/ghi/bazbuz") might be matched by the "bazbuz" view
declaration above, at least if the default root factory was willing to
traverse intermediate names.  The traversal path, respectively, for
each example I just mentioned, is "bazbuz" and "def/ghi/bazbuz".

It's pretty difficult to explain traversal in general.  People who
choose to use routes exclusively as a by-god framework choice just
don't care about traversal, and they never will.  I don't really care
too much about trying to explain traversal to these folks; they can
just use route statements without any "*traverse" token in the path
pattern, and they'll be quite happy.

But trying to explain traversal-after-route-match to people who
understand both concepts is difficult, because combining the two
concepts seems to break a law of "the magical number seven plus or
minus 2"
(http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two),
at least for me.  These people need to understand 1) URL pattern
matching, 2) root factories and 3) the traversal algorithm, and the
interactions between all of them.  I'm not sure there is (or should
be) a solution to this, but it's definitely an advanced concept.  I
just don't know if I can explain it adequately in narrative
documentation to even bother.

Another hard thing to explain about the "route_name" attribute and
traversal-after-route-match even to people who do understand
traversal: a view that *doesn't* spell the route name won't match when
the route matches, even if it's defined in ZCML and seems like it
would otherwise. For example, in the below example, the "bazbuz" view
will never be invoked when the "abc" route matches even if the URL
ends with "bazbuz" and everything else indicates it would match:

   <route
     path="/abc/*traverse"
     name="abc"
     view=".views.abc"
     />

   <view
     name="bazbuz"
     view=".views.bazbuz"
     />

The "bazbuz" view won't match when the URL is "/abc/bazbuz" because
its declaration doesn't match: the "route_name" attribute is missing.
I'm thinking of changing this so that it *will* match.  This implies
deriving the route-specific request interface classes from
non-route-specific request interface classes, which isn't very hard to
do.  I just don't know that one is clearly better than the other,
really.

2. The "route" declaration's "view" attribute is now optional.

It's now possible to define a <route> statement that has no "view"
attribute.  In 0.8 - 0.9.1, this was not possible.

   <route
     path="/abc"
     name="abc"
     />

By itself, the above route statement is useless.  It will cause a
match when a request is processed, but since it isn't associated with
any view, a not found error will be returned unconditionally.
However, when you couple it with one or more views, it begins to make
sense:

   <route
     path="/abc"
     name="abc"
     />

   <view
     name=""
     view=".views.abc"
     route_name="abc"
     />

The above pair of declarations is actually logically equivalent to:

   <route
     path="/abc"
     name="abc"
     view=".views.abc"
     />

The reason we allow for the first (more verbose) form (and routes with
no "view" attribute) is to allow traversal to match views after
a routematch, ala:

   <route
     path="/abc/*traverse"
     name="abc"
     />

   <view
     name=""
     view=".views.abc"
     route_name="abc"
     />

   <view
     name="def"
     view=".views.def"
     route_name="abc"
     />

   <view
     name="ghi"
     view=".views.ghi"
     route_name="abc"
     />

We *could* maybe make this a bit more obvious by making <view>
declarations that are meant to match only a particular <route>
declarations into subdirectives or the <route>, ala:

   <route
     path="/abc/*traverse"
     name="abc"
     >

     <view
       name="def"
       view=".views.def"
       />

     <view
       name="ghi"
       view=".views.ghi"
       />

   </route>

.. but this would mean that people couldn't extend applications which
used post-routematch traversal with extra views within a separate ZCML
file.  I'm a bit loath to do *both* (the subdirective and the
route_name attribute).

3.  Route declarations need to come *before* view declarations which
name them in ZCML.

Currently, due to implementation vagaries, <route> directives that are
referred to by the ``route_name`` of <view> directives must *precede*
the view directive.  For example, this will work:

   <route
     path="/abc/*traverse"
     name="abc"
     />

   <view
     name=""
     view=".views.abc"
     route_name="abc"
     />

But this will raise an error at parse time:

   <view
     name=""
     view=".views.abc"
     route_name="abc"
     />

   <route
     path="/abc/*traverse"
     name="abc"
     />

I can probably solve this with enough elbow grease, but maybe not
soon.

4. Route statements need to be ordered relative to each other; view
statements don't.

<route> statement ordering is very important, because routes are
evaluated in a specific order, unlike traversal, which depends on
emergent behavior rather than an ordered list of directives.  It's
difficult to explain why this is the case.

5.  The "route" declaration can mention a "factory"

This has always been the case, but on the trunk, the "factory"
mentioned by a route statement now implies a "root factory", meaning
that it can potentially return something that can be traversed after a
route is matched.  For example, the following route declaration names
a factory:

   <route
    factory=".models.root_factory"
    path="/abc/*traverse"
    name="abc"
    />

The factory in .models.root_factory might look like so:

class Root:
     def __getitem__(self, name):
         if name == 'self':
             return self

def root_factory(environ):
     return Root()

The URL "/abc/foo" would try to invoke the "foo" view using the root
object as the view's context.  However, "/abc/self" would try to
invoke the default view against the root object after its __getitem__
had been called once.

This is really just an explaining-traversal problem I suppose

---

So... catalog of horrors over.  In conclusion....

One idea I had while writing this is to just deprecate the <route>
statement entirely and instead have only <view> directives, ala:

   <view
     route=":abc/:def"
     name=""
     view=".views.abc"
     />

   <view
     route=":abc/:def"
     name="foobar"
     view=".views.foobar"
     />

   <view
     route=":ghi/:jkl"
     name=""
     view=".views.abc"
     />

It would mean that <view> statements would grow a bunch of bullshit to
support all the extra "route" attributes, and it would make it
impossible to use the bfg_view decorator in complete symmetry with the
<view> ZCML declaration, because code definition order is usually
radically different than ZCML directive ordering (code definition
ordering is unimportant).  But it might be "an answer" to reducing
some complexity.

On 6/10/09 11:21 PM, Chris McDonough wrote:
> This work has now been done and merged into the trunk.  See
> http://svn.repoze.org/repoze.bfg/trunk/CHANGES.txt for more info.  I'll 
> probably
> release an alpha soon into the BFG "dev" index, maybe numbered something like
> "0.9.5" or so.
>
> - C
>
> On 6/5/09 11:33 AM, Chris McDonough wrote:
>> Paul and Tres recently taught a repoze.bfg tutorial at the Plone
>> Symposium at Penn State.  Tres mentioned to me that, by the reactions
>> of the tutorial attendees, he thought having two separate-but-equal
>> ways to do URL-to-code mapping (traversal vs. url dispatch/aka routes)
>> was too confusing.  He then suggested an alternative.
>>
>> Currently, a configured repoze.bfg application is in one of three
>> modes:
>>
>> - traversal-only, when a "root factory" is used but no routes are
>>      configured
>>
>> - routes-only, when routes are confgured but no root factory is used.
>>
>> - a hybrid model where if a route can't be matched, the system falls
>>      back to traversal from a single root.  In this mode, both a root
>>      factory and routes are configured.
>>
>> Tres' suggestion was essentially to cause BFG to always operate in a
>> hybrid mode, where a "root factory" was used to generate the context
>> object for views even when they are found via a route match instead of
>> via traversal.
>>
>> For example, let's say we had a root factory that looked like this:
>>
>>        class Root:
>>            pass
>>
>>        root = Root()
>>
>>        def root_factory(environ):
>>            return root
>>
>> .. and we configure it in to our BFG application like so:
>>
>> from repoze.bfg.router import make_app
>> from myapp.models import root_factory
>>
>> return make_app(root_factory)
>>
>> Currently the above "root_factory" callback is only used when the URL
>> is matched as a result of traversal.  But when any<route>   matches, it
>> is ignored.  Instead, when a<route>   matches:
>>
>> - If there's a "factory=" attribute on the route declaration, it names
>>      a "context factory".  The context factory is called when this route
>>      matches.
>>
>> - If there's no "factory=" attribute on the route declaration, a
>>      default routes context factory is used.
>>
>> But BFG never calls a "root factory" for an object matched via a route.
>>
>> In a system that operated under Tres' model, we'd essentially take
>> away the difference between a "root factory" and a "context factory".
>> Instead:
>>
>> - Each route will match a URL pattern.
>>
>> - If a route's URL pattern is matched on ingress, if the route has a
>>      "factory" attribute, the factory will be assumed to be a "root
>>      factory", and it will return a context object appropriate for that
>>      route.  If the route does *not* have a "factory" attribute, the
>>      "default root factory" would be used to compose the context.
>>
>> - There would be a "default root factory", used when no supplied route
>>      matches or when no "factory=" attribute was supplied along with a
>>      route statement.  What this boils down to is that the syste will
>>      have a "default route" will match any URL, but will be last in the
>>      route check ordering.  The default route will always use the
>>      "default root factory" as its factory.
>>
>> Benefits:
>>
>> - Makes the difference between an application that uses routes and one
>>      that doesn't far less pronounced.  Essentially, this change unifies
>>      the two models.
>>
>> - Adds the ability to do traversal through some set of names *after* a
>>      route is matched.  We'd allow some special signifier to be placed
>>      within a route path, ala "/foo/bar/*subpath"; we'd resolve the root
>>      related to "/foo/bar", then just traverse with the path info
>>      captured in "subpath".
>>
>> Risks:
>>
>> - If a "factory" is specified on a route, it will need to point at a
>>      function that had the same call/response convention as a traversal
>>      root factory.  This will break code.  "Context factories" accept
>>      key/value pairs assumed to be items that matched in the URL match.
>>      These would cease working, and would need to be rewritten as root
>>      factories, which accept a WSGI environment.
>>
>> - URL generation may become more difficult and costly.
>>
>> I'm apt to do this for 1.0, even at the risk of breaking code, because it 
>> does
>> nicely unify the traversal vs. routes story, which is definitely the
>> most up-in-the-air part of BFG today.
>>
>> Anybody have any objections?
>>
>> - C
>> _______________________________________________
>> Repoze-dev mailing list
>> Repoze-dev@lists.repoze.org
>> http://lists.repoze.org/listinfo/repoze-dev
>>
>
> _______________________________________________
> Repoze-dev mailing list
> Repoze-dev@lists.repoze.org
> http://lists.repoze.org/listinfo/repoze-dev
>

_______________________________________________
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev

Re: [Repoze-dev] unifying url dispatch and traversal

Reply via email to