Re: Refactoring QuerySet

Jeremy Dunck Sun, 25 Feb 2007 16:50:11 -0800

On 2/25/07, Malcolm Tredinnick <[EMAIL PROTECTED]> wrote:
> I noticed today that a new GIS branch has been created, which includes a
> goal to develop a custom QuerySet-derivative. There has also been a
> discussion on trying to work out an API for aggregates, which seems to
> be moving along, although I suspect the design has a bit more work
> required yet. Michael Radziej has inquired about trying to integrate
> custom SQL as a natural QuerySet result. That's possibly a bit
> pie-in-the-sky for all cases, but we might be able to get close.



It appears that I'll have more work to do to get GeoTypes in working
order for Django.  It's infrequently maintained and is currently
specific to psycopg one.

I'm working with the initd.org people to see if they'll accept such changes.

I agree that queryset shipping bits of text is not terribly flexible,
and I'm guessing SqlAlchemy is the quickest way to get there.  I
personally think SQLAlchemy goes *way* beyond the 80/20 point the DJ
ORM hits, but delegating the hard problems to it is probably useful.

In any case, I have to get to trunk and get GeoTypes updated for me to
care about QuerySet terribly, so my vote shouldn't count for much.

Cheers,
  Jeremy

>
> All of these (plus no doubt some other things I don't remember or know
> about) are going to impact QuerySets. So I thought I just lob out my
> plans/ideas for refactoring that part of the code prior to 1.0.
>
> What I would like here is for anybody planning to play in this space to
> have a think about whether their particular use-case is going to get
> clearer harder to implement under this scheme, so that changes can be
> made now.
>
> No really urgent action required on this. I keep bashing away at the
> code for this (and tweaking the design) whenever I have time. And I
> realise everybody at PyCon already has their head full of other stuff. I
> just want to inject the thoughts into peoples' brains since there are a
> bunch of you sitting in Texas talking about "the future".
>
> Apologies for the length. This makes Jacob's admin refactoring post look
> short. Life sucks like that sometimes.
>
> How QuerySets work at the moment
> ---------------------------------
> By magic!
>
> Seriously, the QuerySet class performs a few jobs: it constructs the
> database query (SQL), retrieves the data by calling the appropriate
> backend, and then provides an API for accessing the data in a
> Python-friendly format.
>
> There are really two portions that I want to separate here: the
> mechanics of getting the data from the database (constructing the query,
> particularly) and providing the Python interface via model instances,
> etc.
>
> When we construct an SQL query at the moment, it is mostly done by
> pushing string fragments around inside a couple of functions in the
> QuerySet class. This makes it very difficult to fiddle with query
> construction because you need to get at the right string fragment at
> just the right moment. You cannot go back later and tweak any portion of
> the query. The obvious example of where this is painful is if we
> implement slicing functionality for MS SQL Server: that database does
> not have the non-SQL-standard LIMIT/OFFSET keywords. Instead, you
> construct a sub-select in the FROM clause to get the equivalent
> behaviour. I've no doubt screwed up the details slightly there, but the
> important point here is that it's basically impossible without really
> special-case handling ("if using_ms_sql: ...") to hijack the FROM
> construction and not put in LIMIT/OFFSET bits. Oracle has similar, but
> not quite as difficult to accommodate requirements for extracting an
> ordered portion of the results, too.
>
> What would the new (QuerySet) world order look like
> ----------------------------------------------------
> What I propose to do is to split out the SQL construction from the body
> of QuerySet and make it into a separate class. This class (let's call it
> Query, for simplicity) would end up being a "query" attribute on
> QuerySets and it's __str__ function would return the appropriate SQL
> statement.
>
> The Query class would not push string fragments around. Instead, it
> would keep the parts of the SQL statement separated until asked to
> combine them (in __str__, for example). So there would be attributes
> called, say, select, tables, where, having, grouping, limit and offset.
> Initially, "having" and "grouping" wouldn't be used by our current
> functionality, but they're there for extending the Query class (and for
> use by things like aggregates and custom tweaks).
>
> The exact contents of these attributes isn't really important for the
> purposes of this discussion, but they will be Python objects that you
> can poke at and shuffle as much as you like. For example, the "select"
> attribute would be a list of (column name, alias) pairs (where alias
> could be None) and the "tables" attribute would be a (possibly nested)
> list of (table name, alias, join type) triples. Since the table names
> just need to be convertible to strings, it would be possible to use a
> Query class for the table name, giving us a way to inject
> SQL-Server-required views into the FROM clause.
>
> The Query class is primarily responsible for generating the SQL string.
> However, when I started writing code for this, it rapidly becomes
> apparent that it will need to know the database engine it is coding for.
> Tasks like constructing the right SQL for slicing (LIMIT/OFFSET vs
> SQL-Server requirements), for example, need that knowledge to construct
> the right string.
>
> So, at some level, it might make sense to pass off *all* of the database
> interaction to this class. The QuerySet class would create a Query
> instance that contains the right information, tell it the database
> engine, pass it the values for the parameters (maybe not all in the same
> place) and then call "run_the_query_please()" and get back an iterator
> over the results.
>
> This helps on another, slightly crazier level, too: if you want to
> replace the database-backed ORM with another type of storage, but were
> happy to keep using the Django API for querying, all the SQL-specific
> stuff is now in one place and you would only need to write your own
> Query replacement or subclass. Not a driving requirement, but very much
> nice to have, since not everything runs out of databases.
>
> How does this help for custom SQL and extra QuerySet functionality?
> -------------------------------------------------------------------
> Writing custom SQL fragments now becomes possible, because you have
> direct access to the individual pieces of the query before they are
> combined into a string. I'm not saying it's trivial, but that's not the
> goal. We try to make the common things easy and the hard things
> possible. This is in the "hard things are possible" category.
>
> Adding functionality to QuerySets as far as interfacing with the
> databases via extra SQL is another motivation here. You would subclass
> QuerySet and tell your subclass to use your subclass of Query for its
> default query construction (probably overriding QuerySet.__init__ --
> details to be determined, but they aren't important at this point
> anyway, since there are about 6 ways to do it). Then you can make
> whatever changes you want to a subclass (or even straight replacement)
> of the Query class. For example, you could add a "sum()" method to the
> derived QuerySet -- if we don't have it in the default version -- and
> this sum() method would know what method to call in the Query subclass
> to add the right things to the query.
>
> To clarify that a little bit, suppose you create MyQuerySet and MyQuery
> as subclasses of QuerySet and Query, respectively:
>
>         MyQuerySet.sum() knows what method in MyQuery to call and with
>         what information. It also knows how to present the result. Okay,
>         sum() is a trivial example, since it's going to return a number,
>         but, still, it's a number and not a model instance or a sequence
>         and you could imagine cases where the return type is something
>         more complex.
>
>         MyQuery has a method (let's call it sum()) that knows to add a
>         sum(*) to the select attribute and possibly something to the
>         grouping attribute as well. That's all it does: modified the
>         query based on the controls passed in from the caller.
>
> Addressing Michael Radjiez's (amongst others) request for somehow
> returning the results of custom queries and making it act like a real
> QuerySet: I think this can be done, to some extent, by subclassing
> QuerySet. Your subclass handles your custom Query modifications and also
> supports the normal QuerySet interface, so a call to, say, filter() on
> your custom QuerySet will work.
>
> The hard part about integrating custom queries with QuerySets is because
> we haven't evaluated your custom portion of the query at the time you
> start calling other QuerySet methods on it. So the new methods need to
> be able to extent the existing query that is being built up. At the
> moment, we cannot do that, because the query construction is fairly dumb
> about building up the query fragments and kind of stomps all over any
> custom fragments you might want to inject. By keeping all the fragments
> in the Query class and not combining them until the final __str__
> conversion, it makes it easier to augment an existing query.
>
> Is this really necessary?
> -------------------------
> I (and others, especially Russell) have spent quite a while living in
> the current QuerySet code fixing bugs with query construction and
> scratching our heads wondering how on earth we are going to fix some
> other cases. For example, we often don't use the most efficient (or the
> correct -- in some cases) type of table join when constructing queries.
> Joining QuerySets with particular combinations of disjunctions and
> conjunctions when common fields are involved leads to incorrect results.
> Thinking about how to join in support for aggregates just gives me a
> nose bleed, because constructing the GROUP BY and HAVING clauses is
> fiddly. All of these problems are much easier to fix by restructuring
> how queries are created -- they are, in fact, the motivation for this
> design.
>
> Based on periodic IRC visits and reading django-users, people often
> try/want to insert custom output columns in the select portion of a
> query, which are then linked to something custom in the where clause,
> but we don't make that possible. You cannot currently do things like
> insert a custom "tableA.column1 = tableB.column2" fragment in a where
> clause. For anything custom like this, the (not totally unacceptable)
> solution at the moment is to write out the whole thing as a custom SQL
> statement. I think the above should make some of this a bit easier and
> the rest no harder than it is now. This is another case where I'm not
> sure it's a driving requirement for the changes, since we aren't trying
> to be a completely isomorphic Python interface over SQL, so asking
> people to drop down to SQL if they want to use SQL functionality isn't
> evil. But it's a nice side-effect.
>
> Things I think become easier this way
> -------------------------------------
>         - aggregates and other computed output columns
>         - custom fragments in portions of a query (as opposed to writing
>         a whole query for the database)
>         - adding new query types (GIS?)
>         - different database syntax (LIMIT/OFFSET in both MS SQL Server
>         and Oracle)
>
> I don't know of any announced plans/wishes that affect QuerySets that
> would become more difficult under this change, so please sing out if you
> know of any.
>
> Regards,
> Malcolm
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Refactoring QuerySet

Reply via email to