Re: [GSOC] Multiple Database API proposal

Alex Gaynor Fri, 20 Mar 2009 22:39:19 -0700

On Sat, Mar 21, 2009 at 1:25 AM, Malcolm Tredinnick <
malc...@pointy-stick.com> wrote:


>
> On Sat, 2009-03-21 at 00:41 -0400, Alex Gaynor wrote:
> >
> >
> >         > One suggestion Eric Florenzano had was that we go above and
> >         beyond
> >         > just storing the methods and parameters, we don't even
> >         excecute them
> >         > at all until absolutely necessary.
> >
> >
> >         Excuse me for a moment whilst I add Eric to a special list
> >         I've been
> >         keeping. He's trying to make trouble.
> >
> >         Ok, back now... There are at least two problems with this.
> >
> >         (a) Backwards incompatible in that some querysets would return
> >         noticeably different results before and after that change. It
> >         would be
> >         subtle, quiet and very difficult to detect without auditing
> >         every line
> >         of code that contributes to a queryset. The worst kind of
> >         change for us
> >         to make from the perspective of the users.
> >
> > What scenario does it return different results, the one place I can
> > think of is:
> >
> > query = queryset.order_by('I AM NOT A REAL FIELD, HAHA')
> > render_to_response('template.html', {'q': query})
> >
> > which would raise an exception in the template instead of in the view.
>
> It's related to eager/deferred argument evaluation (which is done for
> the same reasons): any "smart" object like Q objects would require
> changing to handle deferring things correctly. They can currently be
> designed to evaluate only once and will work correctly.
>

I don't see this as an issue, simply because whatever happens in the
instantiation of these objects would be the same for whatever connection was
in use.


>
> >
> >
> >         (b) Intentionally not done right now and not because I'm
> >         whimsical and
> >         arbitrary (although I am). The problem is it requires storing
> >         all sorts
> >         of arbitrarily complex Python objects. Which breaks pickling,
> >         which
> >         breaks caching. People tend to complain, a lot, about that
> >         last bit.
> >
> >         That's why the Where.add() converts things to more basic types
> >         when they
> >         are added (via a filter() command).  If somebody really needs
> >         lazily
> >         evaluated parameters, it's easy enough via a custom Q-like
> >         object, but
> >         so far nobody has asked for that if they've gotten stuck doing
> >         it. It's
> >         even something we could consider adding to Django, although
> >         it's not a
> >         no-brainer given the potential to break caching.
> >
> > I vaguely recall there being a ticket about this that you wontfixed,
> > although that may have been about defering calling callables :).  In
> > any event the caching issue was one I hadn't considered, although one
> > solution would be not to pickle it with the ability to switch to a
> > different query type, it's a bit of a strange restriction, but I don't
> > think it's one that would practically affect people, and it's less
> > restricitive.
>
> You wrote a really long sentence there that didn't make a lot of sense
> (too many prepositions and commas, not enough nouns and full stops).
> Unclear which restriction you're arguing against, but the picklability
> of querysets is pretty much a requirement. It's something people really
> use.
>
> However, before we go too far down this path: this is a very minor
> thing. It's unlikely to be required. Adding it "because we can" is an
> argument Eric can propose at some much later date if it's not absolutely
> *required* for multi-db stuff. I think we won't need to worry about this
> at all.
>

Just to clear that up what I was say was:

When you pickly a QuerySet we build up the entire Query as we would right
before SQL excecution and then just pickle that.  Then the restriction is
that you can't change the database type to be used on an unpickled query.


>
> >
> >
> >         [...]
> >         >
> >         > Thanks for all the review Malcolm.
> >
> >
> >         No problems.
> >
> >         > One question that I didn't really ask in the initial post is
> >         what
> >         > parameters should a "DatabaseManager" receieve on it's
> >         methods, one
> >         > suggestion is the Query object, since that gives the use the
> >         maximal
> >         > amount of information,, however my concerns there are that
> >         it's not a
> >         > public API, and having a private API as a part of the public
> >         API feels
> >         > klunky.
> >
> >
> >         At first glance, I believe the word you're looking for is
> >         "wrong". :-)
> >
> > Yes, that's the one.
> >
> >
> >         Definitely a valid concern.
> >
> >         >   OTOH there isn't really another data structure that
> >         carries around
> >         > the information someone writing their sharding logic(or
> >         whatever other
> >         > scheme they want to implement) who inevitably want to have.
> >
> >
> >         Two solutions spring to mind, although I haven't thought this
> >         through a
> >         lot: it's not particularly germane to the proposal since it's
> >         something
> >         we can work out a bit later on. I've got limited time
> >         today(something
> >         about a beta release coming up), so I wanted to just get out
> >         responses
> >         to the two people who posted items for discussion. I suspect
> >         there's a
> >         lot of thinking needed here about the concept as a whole and I
> >         want to
> >         do that. Anyway...
> >
> >         One option is to use the piece of public API that is available
> >         which
> >         will always be carrying around a Query object: the QuerySet.
> >         Query
> >         objects don't exist in isolation. However, this sounds
> >         problematic
> >         because the implementation is going to be working at a very
> >         low-level --
> >         database managers are only really interesting to
> >         Query.as_sql() and it's
> >         dependencies. But that leads to the next idea, ...
> >
> >         The other is to work out a better place for this database
> >         manager in the
> >         hierarchy. It might be something that lives as an attribute on
> >         a
> >         QuerySet. Something like the user provides a function that
> >         picks the
> >         database based "some information" that is available to it and
> >         the base
> >         method selects the right database to use. Since it lives in
> >         the QuerySet
> >         namespace, it can happily access the "query" attribute there
> >         without any
> >         encapsulation violations. The database manager then becomes
> >         two pieces,
> >         an algorithm on QuerySet (that might just dispatch to the real
> >         algorithm
> >         on Query), plus some user-supplied code to make the right
> >         selections.
> >         That latter thing could be a callable object if you need the
> >         full class
> >         structure. But the stuff QuerySet/Query needs to know about is
> >         probably
> >         a much smaller interface than *requiring* a full class. (Did
> >         any of that
> >         make sense?)
> >
> >         I think this -- the database manager concept -- is the part of
> >         your
> >         proposal that is most up in the air with respect to what the
> >         API looks
> >         like. Which is fine. The fact that it's something to consider
> >         is good
> >         enough to know. Certainly put some thought into the problem,
> >         but don't
> >         sweat the details too much just yet (in the application
> >         period). This is
> >         one of those hard areas where you probably do need to think
> >         about it so
> >         much it costs you sleep, you forget to eat and so on.
> >
> >
> >
> > The concept of a database manager is somewhat important as it makes
> > automating your mI ultidb strategy far easier.
>
> That's never been argued against.
>
> >  My concern with just passing a QuerySet is it doesn't really hold any
> > information, if I want to say shard on the id then I need to poke at
> > the Query(the same for any information about the query other than the
> > type which we already know from the method),
>
> Hmm ... maybe. I think you might have the dependency directions reversed
> here. Think a bit more about what I wrote with regard to providing some
> methods to make the choice. QuerySet/Query could provide the worker
> routine which passes necessary information to a callback that is
> provided by the user, for example. That's why the design requires
> thinking here: there are at least two directions the control could flow
> and I suspect you're getting into difficulties from the direction you're
> currently approach (with DatabaseManager controlling the show and doing
> all the work).
>
> If DatabaseManager has to poke at Query, we've probably lost, because
> then it's tied to that Query class, not to the concept of storage
> management selection (which should work with any type of Query object
> and even general QuerySets).
>
> Don't try to solve this now. The concept of this type of utility is a
> good one. But it's a problem that requires thinking. So think up a dozen
> alternatives and filtering them down to two or three.
>
> >  and if we always need to actually touch the Query than passing the
> > QuerySet is a bit of an end run around.
>
> No. It's encapsulation. You're passing in the public object and only use
> methods on the public object.


Except right now the public API on a queryset doesn't give you any
information about what your "query" is asking for.  Therefore we are back to
asking what pieces of information do we reasonably want to have to decide
what database we are querying, and what's a reasonable format for providing
them.


>
> >
> > The nice thing about the DatabaseManager concept(as I've conceived it)
> > is that it can be implemented entirely seperately and after the rest
> > of the API.
>
> The concept isn't dependent on the implementation. It can be added later
> whether it's a separate class or a method on Querysets. It's a utility
> feature, pretty much by definition, so however it's implemented, it can
> be added later (just like multi-db support could always be added later
> to the ORM, however it was implemented).
>
> Regards,
> Malcolm
>
>
>
>
> >
>
As always, thanks for the time and thoughts,

Alex

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." --Voltaire
"The people's good is the highest law."--Cicero

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: [GSOC] Multiple Database API proposal

Reply via email to