Re: [sqlalchemy] Alembic migrations hanging on busy RDS PostgreSQL servers

2018-01-17 Thread George V. Reilly
Thanks. A brief outage seems like the lesser evil.

On Tuesday, January 16, 2018 at 1:06:45 PM UTC-8, Mike Bayer wrote:
>
> even recent PG's seem to still do this: 
>
> https://dba.stackexchange.com/a/111092 
>
> so your only option other than downtime is to create a new table, 
> migrate all the data towards it, then drop the old table and rename - 
> then you'd need to restore additional data you might have lost while 
> that occurred.   the "batch" mode of Alembic helps with this though 
> I'd definitely want to produce this as a fixed SQL script, test it on 
> a staging database first, then run in production. 
>
> then again, the ALTER will probably be extremely quick if you can in 
> fact pause the load for a few minutes, since you are defaulting to 
> NULL. 
>
> On Tue, Jan 16, 2018 at 3:42 PM, George V. Reilly 
> <george@gmail.com > wrote: 
> > Twice recently, on two different PostgreSQL 9.5 databases hosted at 
> Amazon 
> > RDS, we've been unable to apply Alembic migrations. We have successfully 
> run 
> > dozens of Alembic migrations in the past against one of these databases 
> but 
> > those were quieter times for us. As far as I can tell, it's because our 
> > databases are too busy now and the load never lets up. 
> > 
> > The Alembic process just hangs and other database queries start backing 
> up. 
> > The only recourse is to kill the Alembic process. In at least one case, 
> > we've also had to track down the ALTER statement in pg_stat_activity and 
> > terminate the associated pid. Once I had to reboot the database at RDS 
> > before it recovered. 
> > 
> > The SQL generated for one of these migrations is: 
> >   ALTER TABLE redacted ADD COLUMN report_format VARCHAR(4) 
> > which is as simple as it gets. 
> > 
> > In both cases, we're trying to add columns to tables that are hardly 
> ever 
> > modified—and not at all while the Alembic migrations were executing. 
> > However, many of our most frequent SELECTs JOIN to these tables, so 
> they're 
> > constantly being read from. 
> > 
> > Aside, from making a planned outage, do we have any good options? 
> > 
> > Thanks! 
> > /George Reilly 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Alembic migrations hanging on busy RDS PostgreSQL servers

2018-01-16 Thread George V. Reilly
Twice recently, on two different PostgreSQL 9.5 databases hosted at Amazon 
RDS, we've been unable to apply Alembic migrations. We have successfully 
run dozens of Alembic migrations in the past against one of these databases 
but those were quieter times for us. As far as I can tell, it's because our 
databases are too busy now and the load never lets up.

The Alembic process just hangs and other database queries start backing up. 
The only recourse is to kill the Alembic process. In at least one case, 
we've also had to track down the ALTER statement in pg_stat_activity and 
terminate the associated pid. Once I had to reboot the database at RDS 
before it recovered.

The SQL generated for one of these migrations is:
  ALTER TABLE redacted ADD COLUMN report_format VARCHAR(4)
which is as simple as it gets.

In both cases, we're trying to add columns to tables that are hardly ever 
modified—and not at all while the Alembic migrations were executing. 
However, many of our most frequent SELECTs JOIN to these tables, so they're 
constantly being read from.

Aside, from making a planned outage, do we have any good options?

Thanks!
/George Reilly

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Capturing query parameters

2015-09-27 Thread George V. Reilly


On Saturday, September 19, 2015 at 8:14:31 AM UTC-7, Michael Bayer wrote:
>
>
>
> On 9/18/15 7:58 PM, George Reilly wrote:
>
>
> Thanks for the prompt answer. I tried after_cursor_execute, but it didn't 
> really help as I just got a tuple of the values, without a clear way to 
> associate them with the parameter names. Ultimately, I found that the 
> following gives me the best results:
>
> 
> ​named_params
>  = {}
> for k,v in query_params.iteritems():
> named_params[k] = v
> if v is None:
> label = text.binds[k]._identifying_key
> for param_dict in multiparams:
> if label in param_dict:
> named_params[k] = param_dict[label]
> break
>
> Now I can identify which queries are being called repeatedly with 
> identical parameters; I couldn't before.
>
>
> OK, you can also take a look at the "context" passed there, which has the 
> parameters in a consistent format, i think in .compiled_parameters.  Poke 
> around it to see.
>
 
Ah, that made it simpler.

def _extract_parameters_from_results(self, query_results):
params_dict = {}
for p in getattr(query_results.context, 'compiled_parameters', []):
params_dict.update(p)
return params_dict
 

> Changing topic somewhat, I find it frustrating that the session does not 
> cache queries.
>
> "caching queries" is an ambiguous term.  It could mean:
>
> 1. that we cache the production of the SQL string given a Query or 
> select() object.   A lot of that kind of caching *does* go on behind the 
> scenes in the ORM's flush process, but if you want it for SELECT objects 
> via Query, we have BakedQuery for that: 
> http://docs.sqlalchemy.org/en/rel_1_0/orm/extensions/baked.html 
>
> 2. that the results of an ORM query invocation are cached.   This is a 
> simple feature to implement, but as a user, it gets confusing very fast, 
> considering that the ORM emits lots of different kinds of statements in 
> different situations, especially if you're looking to invalidate the 
> cache.  For this reason, we ask that our users make an investment in 
> understanding the mechanics of query result caching, hence it is available 
> as an example suite, which you can see at Dogpile Caching: 
> http://docs.sqlalchemy.org/en/rel_1_0/orm/examples.html#module-examples.dogpile_caching
>  
> .
>
>
> 
> http://docs.sqlalchemy.org/en/rel_1_1/orm/session_basics.html#is-the-session-a-cache
>  
> makes it clear that this is deliberate, but I think that section could 
> benefit from further explanation about _why_ SQLAlchemy does not cache 
> queries.
>
> This section refers explicitly to the fact that "caching queries" as you 
> refer to here is a "second level cache", and links to the above dogpile 
> example.  The Session's caching is limited to transaction level identity 
> caching only.
>
>
Fair enough. Thanks

/George 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] SQLTap 0.3.10 released

2015-09-27 Thread George V. Reilly
SQLTap is a very useful library that helps you in profiling SQLAlchemy 
queries. It helps you understand where and when SQLAlchemy issues queries, 
how often they are issued, how many rows are returned, and if you are 
issuing queries with duplicate parameters. The last two are new in 
tonight's 0.3.10 release, which incorporates improved reporting from me.

https://github.com/inconshreveable/sqltap

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] deleting many-to-many association records

2014-05-21 Thread George V. Reilly
I've spent time unsuccessfully trying to fix some problems with a 
many-to-many table
and lazy joins.

Here's a simplified repro:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import random
from sqlalchemy import create_engine, ForeignKey
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker, joinedload

engine = create_engine('sqlite:///appts.db', echo=True)
Base = declarative_base()

#LAZYJOIN = 'dynamic'
LAZYJOIN = 'joined'

class Appointment(Base):
__tablename__ = 'appointments'

id = Column(Integer, primary_key=True, autoincrement=True, 
nullable=False)
subject = Column(String)
persons = relationship(
'AppointmentPerson',
cascade='delete, save-update, merge, expunge',
lazy=LAZYJOIN)
# rel1 = relationship('Foo')
# rel2 = relationship('Bar')

class Person(Base):
__tablename__ = 'persons'

id = Column(Integer, primary_key=True, autoincrement=True, 
nullable=False)
name = Column(String)

class AppointmentPerson(Base):
# association table
__tablename__ = 'appointment_persons'

appointment_id = Column(Integer, ForeignKey(Appointment.id, 
ondelete='CASCADE'), nullable=False, primary_key=True)
person_id = Column(Integer, ForeignKey(Person.id), nullable=False, 
primary_key=True)
person = relationship(Person)
# Other necessary columns omitted

Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

def create_data(num_appts=20, num_people=20):
random.seed(12345)

session = Session()
persons = []

for i in range(num_people):
person = Person(name=Person_%03d % (i+1))
session.add(person)
persons.append(person)

session.flush()

for i in range(num_appts):
appt = Appointment(subject=Appt_%03d % (i+1))
session.add(appt)
session.flush()

random.shuffle(persons)

attendee_count = random.randrange(4) + 1
for k in range(1, attendee_count):
p = persons[k]
print i, k, p.name
ap = AppointmentPerson(appointment_id=appt.id, 
person_id=p.id)
appt.persons.append(ap)

session.add(ap)

session.commit()

def query_all_appointments(session):
query = session.query(Appointment)
if LAZYJOIN == 'joined':
query = query.options(joinedload(Appointment.persons)
  .joinedload(AppointmentPerson.person))
return query.all()

def show_data():
print show_data
session = Session()
appts = query_all_appointments(session)
print len(appts)
for a in appts:
people = , .join([ap.person.name for ap in a.persons])
x = {0}: {1}.format(a.subject, people)

def update_appointment_people(appt_id):
session = Session()
appt = session.query(Appointment).filter(Appointment.id == 
appt_id).one()
appt.persons.delete()
session.commit()

if __name__ == '__main__':
create_data()
show_data()
update_appointment_people(7)

The code originally used

Appointment.persons = relationship(
'AppointmentPerson',
cascade='delete, save-update, merge, expunge',
lazy='dynamic')

where everything worked, but accessing the persons on every appointment
triggered a separate query for AppointmentPersons, followed by
queries for each Person.

Changing the relationship to `lazy='joined'` and using joinedload

query = (session.query(Appointment). 
 options(joinedload(Appointment.persons).
 joinedload(AppointmentPerson.person)))
return query.all()

reduces this to one SQL query:

SELECT appointments.id AS appointments_id,
appointments.subject AS appointments_subject,
persons_1.id AS persons_1_id,
persons_1.name AS persons_1_name,
appointment_persons_1.appointment_id AS 
appointment_persons_1_appointment_id,
appointment_persons_1.person_id AS appointment_persons_1_person_id
FROM appointments
LEFT OUTER JOIN
appointment_persons AS appointment_persons_1
ON appointments.id = appointment_persons_1.appointment_id
LEFT OUTER JOIN persons AS persons_1
ON persons_1.id = appointment_persons_1.person_id

which is great.

Aside: In the original code, we're using MySQL, not SQLite.
On my MacBook, I have been unable to reproduce the pathological select 
behavior
with MySQL 5.6, whereas it repros consistently on Amazon's Elastic Beanstalk
with MySQL 5.5 at RDS. (SQLAlchemy 0.9.2, Python 2.7 in both cases.)

However, with `lazy='joined'` and using joinedload,
I no longer know 

Re: [sqlalchemy] deleting from a many-to-many association table

2014-05-21 Thread George V. Reilly
Thanks! I went with

Appointment.persons = relationship(
'AppointmentPerson',
cascade='delete-orphan, delete, save-update, merge, expunge',
lazy=LAZYJOIN)

and used the slice notation to empty the InstrumentedList:

appt.persons[:] = []  

In my real app, I spent some time trying to figure out why the appt.persons
collection remained empty when I created new AppointmentPersons.
When I had the original lazy=dynamic relationship, the magic Query 
collection
automatically got populated as I created new AppointmentPersons.
Eventually I shrugged and decided to append them to appt.persons in this 
update case.

Regarding the empty-the-collection-then-append-possibly-identical-objects 
pattern,
how do you feel about that? Is there a straightforward way to do it more 
efficiently?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] Cross-shard queries

2012-03-11 Thread George V. Reilly
We ran into a nasty problem with sharding a while back. We came up with an 
effective workaround, but we don't fully understand the problem. Perhaps 
someone here can provide more insight.

We shard requests to our production MySQL databases using 
sqlalchemy.ext.horizontal_shard.ShardedSession. Our query_chooser uses  a 
consistent hashing scheme on our account_ids to compute shard_ids. In 
almost all cases, we know the account_id before performing a query, 
therefore a query goes directly to the correct shard and only to that 
shard. In the other cases, we're typically trying to find the account_id(s) 
given some other key.

The nasty problem arose thus. We made a cross-shard query, which apparently 
opened a database connection to more than one of the sharded databases. The 
query returned an account_id and our associated SQLAlchemy Account object. 
We then added data to the account and committed it — all on the same 
scoped_session. Result: the thread (process?) blocks on the connections to 
the other databases.

The workaround is to call commit() on the session after performing the 
cross-shard query, and then call scoped_session() to make a fresh session 
before adding data to the account.

Anyone understand what the underlying problem was? Is it in SQLAlchemy, 
MySQL-python, or some lower level?

The pattern that we're moving towards with these cross-shard queries is to 
perform them in a distinct session using a contextmanager which commits the 
session. Account_ids and other keys are returned from the contextmanager, 
but SQLAlchemy objects are not. One (or more) of our SQLAlchemy Account 
objects are then created in the main session. I'm worried about the 1 
case, since those accounts may be distributed across several shards.

Is this safe?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/VH_WL7tlmv4J.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Logging raw SQL statements

2011-02-09 Thread George V. Reilly
The combination of create_engine(..., echo=True) and
  logging.basicConfig(level=logging.DEBUG)
  logging.getLogger('sqlalchemy').setLevel(logging.DEBUG)
does the trick for me now.

Thanks!
/George

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Training or consultant?

2010-08-18 Thread George V. Reilly
Is there such a thing as SQLAlchemy training or a SA consultant? I'm
starting to think that my team might benefit from some time with
someone who really knows their stuff.

/George Reilly, Seattle

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Sharing types between dialects

2010-05-06 Thread George V. Reilly
Our production database uses MySQL. For various reasons, the schema
often uses MySQL-specific DDL. We want that represented in our
mapper classes. We also awant to be able to exercise our SQLAlchemy
unit tests against both local MySQL databases and SQLite databases.

Using the MySQL-specific types wasn't a problem with SA 0.5.x, but
it is with 0.6. I've started working around it like this:

class MSBigInteger(sqltypes.TypeDecorator):
impl = sqltypes.Integer

def load_dialect_impl(self, dialect):
return dialect.type_descriptor(
mysql.base.MSBigInteger if dialect.name == 'mysql'
else sqltypes.INTEGER)

This approach falls apart when the type's __init__ takes custom
parameters, such as collation or charset for the strings.

class MSLongText(sqltypes.TypeDecorator):
impl = sqltypes.String

def load_dialect_impl(self, dialect):
self.dialect = dialect
return dialect.type_descriptor(
mysql.base.MSLongText if dialect.name == 'mysql' else
sqltypes.TEXT)

def __init__(self, **kwargs):
# FIXME: figure out what the dialect is in __init__
(somehow) so can call
# correct dialect_impl
if 'collation' in kwargs: del kwargs['collation']
if 'charset' in kwargs: del kwargs['charset']
super(MSLongText, self).__init__(**kwargs)

What's a good way to fix this? I'm not wedded to the TypeDecorator
for MSBigInteger either, if there's a better approach.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Re: Sharing types between dialects

2010-05-06 Thread George V. Reilly
Thanks, Michael.

On May 6, 7:48 am, Michael Bayer mike...@zzzcomputing.com wrote:
 On May 6, 2010, at 3:35 AM, George V. Reilly wrote:

  Our production database uses MySQL. For various reasons, the schema
  often uses MySQL-specific DDL. We want that represented in our
  mapper classes. We also awant to be able to exercise our SQLAlchemy
  unit tests against both local MySQL databases and SQLite databases.

  Using the MySQL-specific types wasn't a problem with SA 0.5.x, but
  it is with 0.6.

 would be curious to know why that is.

I was getting errors like this:

  ...
  File ...\sqlalchemy-0.6.0-py2.6.egg\sqlalchemy\schema.py, line
1958, in create_all
bind.create(self, checkfirst=checkfirst, tables=tables)
  File ...\sqlalchemy-0.6.0-py2.6.egg\sqlalchemy\engine\base.py,
line 1504, in create
self._run_visitor(ddl.SchemaGenerator, entity,
connection=connection, **kwargs)
  ...
  File ...\sqlalchemy-0.6.0-py2.6.egg\sqlalchemy\sql\visitors.py,
line 48, in _compiler_dispatch
return getter(visitor)(self, **kw)
  File ...\sqlalchemy-0.6.0-py2.6.egg\sqlalchemy\sql\compiler.py,
line 1136, in visit_create_table
first_pk=column.primary_key and not first_pk
  File ...\sqlalchemy-0.6.0-py2.6.egg\sqlalchemy\dialects\sqlite
\base.py, line 234, in get_column_specification
colspec = self.preparer.format_column(column) +   +
self.dialect.type_compiler.process(column.type)
  File ...\sqlalchemy-0.6.0-py2.6.egg\sqlalchemy\engine\base.py,
line 734, in process
return type_._compiler_dispatch(self)
  File ...\sqlalchemy-0.6.0-py2.6.egg\sqlalchemy\sql\visitors.py,
line 48, in _compiler_dispatch
return getter(visitor)(self, **kw)
AttributeError: 'SQLiteTypeCompiler' object has no attribute
'visit_TINYINT'

  What's a good way to fix this? I'm not wedded to the TypeDecorator
  for MSBigInteger either, if there's a better approach.

 your load_dialect_impl() should be returning the fully constructed instance 
 of MSBigInteger, there's no need to call dialect.type_descriptor here.
 
Like this, presumably:

class MSBigInteger(sqltypes.TypeDecorator):
impl = sqltypes.Integer

def load_dialect_impl(self, dialect):
return mysql_base.MSBigInteger() \
if dialect.name == 'mysql' else sqltypes.INTEGER()


 If the type did have some DBAPI-specific subclass that needs to be invoked 
 there, type_descriptor should be handed the fully constructed MSBigInteger, 
 not the class.

 ultimately I'd like to add a new type to SQLAlchemy core which serves the 
 purpose of delegating some type to different backend implementations.   It 
 would look like this:

 Column('foo',   CompositeType(default=Integer(), 
 mysql=mysql.INTEGER(precision=12))

 CompositeType is a subclass of TypeDecorator and only needs to store those 
 initial types, and return them within load_dialect_impl().   Since you've 
 already written most of that logic yourself you can try going with that 
 approach.

This seems to work, but it's not well tested:

class CompositeType(sqltypes.TypeDecorator):
def __init__(self, default_type, **kwargs):
self.default_type = default_type
self.dialect_types = kwargs
super(CompositeType, self).__init__()

def load_dialect_impl(self, dialect):
return self.dialect_types.get(dialect.name,
self.default_type)

class MSString(CompositeType):
impl = sqltypes.VARCHAR

def __init__(self, length=None, **kwargs):
super(MSString, self).__init__(
default_type=sqltypes.VARCHAR(),
mysql=mysql_base.MSString(length, **kwargs))

class ToDoElement(Base): # declarative base
text = Column(MSString(256, charset='utf8'), nullable=False,
default='')

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Inserting comments into SQL queries

2010-04-13 Thread George V. Reilly
I'm at the MySQL conference. A couple of speakers have recommended
adding SQL comments to queries for debugging; e.g., attributing a
query to a higher-level operation, or that can be parsed by a slave
during replication.

Is there a way to do this in SQLAlchemy? That is, the generated SQL
has a /* comment */ inserted by my code.

/George V. Reilly, Seattle

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Re: Sharding with _BindParamClause

2010-04-02 Thread George V. Reilly
Michael Bayer wrote:
 check out r0ddd638f1d90 in mercurial.  I've added the function from the
 example below, plus support for in_op(), to the attribute_shard example.
 The old ClauseVisitor method is removed and replaced with this more robust
 method.

Very nice! Thanks, Michael.

/George

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Re: Sharding with _BindParamClause

2010-04-02 Thread George V. Reilly
On Apr 2, 4:43 pm, George V. Reilly george.v.rei...@gmail.com
wrote:
 Michael Bayer wrote:
  check out r0ddd638f1d90 in mercurial.  I've added the function from the
  example below, plus support for in_op(), to the attribute_shard example.
  The old ClauseVisitor method is removed and replaced with this more robust
  method.

 Very nice! Thanks, Michael.

I blogged about this at 
http://blogs.cozi.com/tech/2010/04/sqlalchemy-sharding.html

/George

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Re: Sharding with _BindParamClause

2010-04-01 Thread George V. Reilly
On Mar 30, 4:42 pm, Michael Bayer mike...@zzzcomputing.com wrote:
 George V. Reilly wrote:
  We're using SQLAlchemy sharding to partition accounts across a couple
  of databases. We want to add more partitions, but first we need to
  eliminate some unnecessary cross-partition queries.
 
  This works well most of the time, but we're finding that some queries
  do not have a value. These are all of the form
 
      SELECT shopping_list_items.version AS shopping_list_items_version
      FROM shopping_list_items
      WHERE shopping_list_items.account_id = :param_1
          AND shopping_list_items.shopping_list_item_id = :param_2
 
  and :param1 is of the form _BindParamClause(u'%(63636624 param)s',
  None, type_=UUID())
 
  Typically, I'm seeing this come out of the innards of SQLAlchemy,
  as one of several queries triggered by, say, a session.merge().

 The only Query() I can see getting generated that would have non-valued
 bindparams would be during a _get().  The values should be present in
 query._params.   If you need more info I can dig in to recall how the keys
 of that dictionary are formatted in this case.

Thanks, Michael. Here's what I came up with:

class FindShardableId(sqlalchemy.sql.ClauseVisitor):
def __init__(self, ids, key_fields, get_shard, params):
self.ids = ids
self.key_fields = key_fields
self.get_shard = get_shard
self.params = params

def visit_binary(self, binary):
if not self._check_side(binary, binary.left,
binary.right):
# Lazy load properties tend to be reversed, with the
constant on the left
self._check_side(binary, binary.right, binary.left)

def _check_side(self, binary, side, other_side):
if isinstance(side, sqlalchemy.Column) and side.name in
self.key_fields:
if binary.operator == sqlalchemy.sql.operators.eq:
value = getattr(other_side, value, None)
if (value is None and
isinstance(other_side,
sqlalchemy.sql.expression._BindParamClause)):
value = self.params.get(other_side.key)
if value is not None:
self.ids.append(self.get_shard(value))
return True
elif binary.operator ==
sqlalchemy.sql.operators.in_op:
for bind in other_side.clauses:
self.ids.append(self.get_shard(bind.value))
return True

class QuerySharder(object):
def sessionmaker(self, **sessionmaker_args):
Session = sqlalchemy.orm.sessionmaker(
class_ = sqlalchemy.orm.shard.ShardedSession,
**sessionmaker_args)
Session.configure(
shards=self.shards,
shard_chooser=self._shard_chooser,
id_chooser=self._id_chooser,
query_chooser=self._query_chooser)
return Session

def _query_chooser(self, query):
ids = []
if query._criterion is not None:
FindShardableId(
ids,
set([account_id, account_guid]),
lambda account_id:
self.shard_manager.shard_id_from_guid(account_id),
query._params
).traverse(query._criterion)
if len(ids) == 0:
logging.warn(\n\n! Executing query against all
shards; 
 this may not be optimal:\n\t{0}\n
\tParams: {1}\n.format(
 str(query), str(query._params)))
return self.shards.keys()
else:
return ids

I really don't like the
isinstance(other_side,
sqlalchemy.sql.expression._BindParamClause))
in the middle of _check_side. Is there a cleaner way to do this?

I found that a combination of
* the above _check_side and two-sided visit_binary
* doing a better job of declaring ForeignKey relationships in Columns
* some explicit primaryjoins in calls to relation()
cleaned up all the cases where SA wasn't providing the ids in queries

Perhaps the sharding sample in SA 0.6 could be expanded?
--
/George V. Reilly, Seattle

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Sharding with _BindParamClause

2010-03-30 Thread George V. Reilly
We're using SQLAlchemy sharding to partition accounts across a couple
of databases. We want to add more partitions, but first we need to
eliminate some unnecessary cross-partition queries.

class FindShardableId(sqlalchemy.sql.ClauseVisitor):
def __init__(self, ids, key_fields, get_shard):
self.ids = ids
self.key_fields = key_fields
self.get_shard = get_shard

def _check_side(self, binary, side, other_side):
has_constant = (isinstance(side, sqlalchemy.Column)
and side.name in self.key_fields
and binary.operator ==
sqlalchemy.sql.operators.eq
and getattr(other_side, value, None))
if has_constant:
self.ids.append(self.get_shard(other_side.value))
return has_constant

def visit_binary(self, binary):
if not self._check_side(binary, binary.left,
binary.right):
# Lazy load properties tend to be reversed, with the
constant on the left
self._check_side(binary, binary.right, binary.left)

def query_chooser(query):
ids = []
if query._criterion is not None:
FindShardableId(
ids,
set([account_id, account_guid]),
lambda account_id:
shard_manager.shard_id_from_guid(account_id)
).traverse(query._criterion)

if len(ids) == 0:
logging.warn(\n\nExecuting query against all shards; 
 this may not be optimal:\n
\t{0}.format(str(query)))
return shards.keys()
else:
return ids

This works well most of the time, but we're finding that some queries
do not have a value. These are all of the form

SELECT shopping_list_items.version AS shopping_list_items_version
FROM shopping_list_items
WHERE shopping_list_items.account_id = :param_1
AND shopping_list_items.shopping_list_item_id = :param_2

and :param1 is of the form _BindParamClause(u'%(63636624 param)s',
None, type_=UUID())

Typically, I'm seeing this come out of the innards of SQLAlchemy,
as one of several queries triggered by, say, a session.merge().

How do we work around this?

Thanks!
/George V. Reilly, Seattle

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.