[sqlalchemy] Ordered merge of shard results (update)

2008-05-03 Thread Kyle Schaffrick

On Thu, 1 May 2008 12:20:39 -0400
Michael Bayer [EMAIL PROTECTED] wrote:
 
 at first the second method seemed a little hacky to me, but now that
 i think about it, we're in the sharded query so removing the extra  
 entity is not a big deal.The first method is probably not as  
 fragile as you think, as if someone uses MyClass.someproperty in a
 SQL context, it is meant to have real meaning in that context, and in
 all cases it should be an InstrumentedAttribute.   Even an end-user- 
 defined attribute will use comparable_property() and will ensure
 that it generates reasonable SQL.

The specific problem I'm referring to is more that, in effect, the first
version digs around in the mappers of the entities in the query and
tries to find the key of some property corresponding to the expression.
Then, during the actual merge, it does getattr(instance, key_name)
(where instance is some instance of the user's mapped class in the
results) to get a value to compare on.

If what comes out of that attribute is too dissimilar from what actually
in the database (due to some custom getter function), the behavior will
range from breakage (whatever it is doesn't implement __cmp__) to
silently incorrect output (it orders differently from whats in the
database).

The second method avoids this entirely since it has direct access to the
data in the same form the DB sees, via the hidden column entities.

 But adding the clause explicitly
 to the SQL is a good idea here regardless since its analgous to ORDER
 BY, which also adds the clause to the SQL (just in a different way).

Yeah, I had wondered to myself if it would be possible to use clauses
added by ORDER BY to do the merge, but I wasn't sure if it always added
a clause to the selection, since some DBs don't require that (for
example PostgreSQL, which can ORDER BY arbitrary expressions that don't
themselves appear in the selected columns). I haven't bothered to check
if the various dialects exploit such features, if so it would make such
reuse more difficult than it probably already would be. It's just a
blue sky idea anyway.

 I haven't looked at the code yetbut you should also take a look
 at the Query implementation in user_defined_state (which is going to
 be 0.5 RSN) to see if you need to make any modifications over there.
 in particular the implementation style of a generative method is  
 different and you might want to use that newer system.
 
 It might be worth it for us just to totally consider this feature  
 within 0.5 since 0.4 is quickly becoming feature-frozen.

That sounds good, actually the _generative decorator simplifies things
for me somewhat. I'm in the process now of rebasing against
user_defined_state.

The one thing I haven't checked yet is if inheritance breaks the meth =
_generative(__foo_condition)(meth) call, since it's not clear to me that
the __foo_condition names are bound inside ShardedQuery's definition. I
haven't gotten far enough that I can test it. Could be interesting :)

Kyle

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---



[sqlalchemy] Ordered merge of shard results (update)

2008-05-01 Thread Kyle Schaffrick

I've done some more work on this little project and have managed to
produce a significantly more idiomatic version. Since it has turned into
a full blown patch series, I'll save your inboxes and direct you
to the whole queue at

  http://raidi.us/edarc/sqlalchemy

This contains the iterator result-concatenation patch and the
_merge_ordering() patch I already posted. The new material is
ShardedQuery._merge_by(), which behaves a lot like order_by() does:

 q = sess.query(Person).order_by(Person.age)
 mq = q._merge_by(Person.age)

There are two versions of its implementation, one based on the other.

The original one tries to find mapper properties on the instances or
column entities that match the expressions given. The upside is that it
doesn't alter the generated SQL. The downside is that it's kind of
fragile and can silently break in wierd ways, such as if the mapped
class sets up it's own property accessors that mutate the value in a way
that might change the comparison behavior and resultant ordering. It
also gives up and raises an exception if it can't find said properties
or columns (for reasons which may or may not constitute a user error).

The second version eschews this magic for a more robust method, which
involves adding column entities to the query that are used find the
merge ordering and are then stripped from the results before the query
returns them. Upside is that AFAIK it works with any expression that can
be order_by()'ed, downside is obviously that it changes the generated
SQL. It does, however, check to see if the user already explicitly added
a matching column entity to the query, and uses that instead of
duplicating it.

I didn't add tests for _merge_by() in these patches, but I do have some
standalone tests that I've run on them. I'll probably add some tests
into the SA suite for it in the next day or two.

There are some rough edges, particularly reduce_ordering_expression(),
that I'm fairly sure are not done The Right Way, and there are probably
some latent bugs, so I'd be interested in any feedback.

Enjoy,
Kyle

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~--~~~~--~~--~--~---