[sqlalchemy] Bulk update using multi threading and exception handling

2014-07-22 Thread Milind Vaidya
I am developing a back-end scripts to populate a bunch of tables.

1. dbinterface.py : contains all mapper classes and DBHandler class which 
has required static methods for queries.
e.g. 

class Destination(Base): 
 __table__ = Base.metadata.tables['destination']


@staticmethod
def _save(rows):
try:
DBHandler.session.add_all(rows)
DBHandler.session.commit()
except Exception as exp:
logger.debug(Error saving data: %s, str(exp))

 Question 1: Is this a good idea to handle exceptions like this ? The save 
method accepts list of objects to be saved. If there is problem with one 
object I don't want to prevent others from getting updated. Other 
alternative will be handling exception in calling code, roll back and retry 
saving the list.


2. basetables.py : This will populate base tables in the DB. The data will 
be fetched from a web service using async calls.

def populate(hosts):
100 '''Fetch property value for each host'''
101 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as 
executor:
102 future_to_host = {executor.submit(call__ws, host.host_name): 
host
103 for host in hosts}
104 for future in concurrent.futures.as_completed(future_to_host):
105 host = future_to_host[future]
106 try:
107 property = future.result()
108 except Exception as exp:
109 logger.debug(Error fetching hosts from webervice: \
110 %s, exp)
111 else:
112 host.property = property
113 logger.info(host: %s, property: %s, host, property)
114 finally:
115 try:
116 colo = find_colo(host.host_name)
117 logger.info(Colo: %s, colo)
118 except Exception as exp:
119 logger.debug(Invalid colo: %s, exp)
120 hosts.remove(host)
121 else:
122 host_colo = DBHandler.fetch_colo(colo)
123 if host_colo is not None:
124 host.emitter_host_colo = host_colo
125 logger.info(Host Colo: %s, host_colo)
126 else:
127 hosts.remove(host)
128 logger.debug(Removing Host error processing 
colo : %s, host)
129  DBHandler._save(hosts)

Questions 2 : is this a good model considering there will be 1000s of hosts 
being updated in a bulk update and fact being session is static variable in 
DBHandler ? 
Colo is another table and hence mapper object. If the colo is not present 
for any of the host it will be fetched from static method fetch_colo of DB 
handler based on name. Now there is one to many mapping from colo to host, 
aka one colo can represent many hosts. In such case, would the call to DB 
be avoided if the colo is already fetched previously and only relationship 
with new host will be taken care of?

Pardon my ignorance for I am newbie to sqlalchemy n  python in general

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] [SQLAlchemy-migrate] Add a TSVectorType column to 4 existing tables

2014-07-22 Thread Yoann Mouscaz
Hi ! 

Does somebody here can tell me something about database migration with 
SQLAlchemy-migrate on a TurboGears application please ? 

In fact I want to add a column containing a TSVectorType (
http://sqlalchemy-searchable.readthedocs.org/en/latest/) to 4 of my 
existing tables and I’m not sure to know how to proceed… 
ex : 

search_vector = Column(TSVectorType('name', 'content'))


I watch the given example here 
https://sqlalchemy-migrate.readthedocs.org/en/latest/changeset.html with 
the col.create() of sqlalchemy-migrate but this example seems to be made 
for a « standart » type of column :/ 
ex : 

col = Column('col1', String, default='foobar')col.create(table, 
populate_default=True)



So I’m asking here for some advice from you guys…

Thanks a lot for your time.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Bulk update using multi threading and exception handling

2014-07-22 Thread Michael Bayer

On Jul 22, 2014, at 7:47 AM, Milind Vaidya kava...@gmail.com wrote:

 I am developing a back-end scripts to populate a bunch of tables.
 
 1. dbinterface.py : contains all mapper classes and DBHandler class which has 
 required static methods for queries.
 e.g. 
 
 class Destination(Base): 
  __table__ = Base.metadata.tables['destination']
 
 
 @staticmethod
 def _save(rows):
 try:
 DBHandler.session.add_all(rows)
 DBHandler.session.commit()
 except Exception as exp:
 logger.debug(Error saving data: %s, str(exp))
 
  Question 1: Is this a good idea to handle exceptions like this ? The save 
 method accepts list of objects to be saved. If there is problem with one 
 object I don't want to prevent others from getting updated. Other alternative 
 will be handling exception in calling code, roll back and retry saving the 
 list.

the use case of if there is a problem with one object i don't want others 
having an issue can't be handled generically, for the simple reason that if 
object X is dependent on object Y, a failure to persist X means Y can't be 
persisted either.

It sounds like what you're really dealing with is inserting unrelated rows.   
The mechanics of transactions at both the SQLAlchemy level and in many cases at 
the DB level prevents the transaction from proceeding from a failed INSERT, 
*unless* you use a savepoint.

So the pattern, when you expect *individual* rows to fail, is:

for row in rows:
try:
with session.begin_nested():
session.add(row)
except orm_exc.IntegrityError:
   logger.error(Error, exc_info=True)
session.commit()

the above will not be as performant as a mass insert of all the rows as once as 
the Session needs to flush for each one individually.


 
 2. basetables.py : This will populate base tables in the DB. The data will be 
 fetched from a web service using async calls.
 
 def populate(hosts):
 100 '''Fetch property value for each host'''
 101 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
 102 future_to_host = {executor.submit(call__ws, host.host_name): host
 103 for host in hosts}
 104 for future in concurrent.futures.as_completed(future_to_host):
 105 host = future_to_host[future]
 106 try:
 107 property = future.result()
 108 except Exception as exp:
 109 logger.debug(Error fetching hosts from webervice: \
 110 %s, exp)
 111 else:
 112 host.property = property
 113 logger.info(host: %s, property: %s, host, property)
 114 finally:
 115 try:
 116 colo = find_colo(host.host_name)
 117 logger.info(Colo: %s, colo)
 118 except Exception as exp:
 119 logger.debug(Invalid colo: %s, exp)
 120 hosts.remove(host)
 121 else:
 122 host_colo = DBHandler.fetch_colo(colo)
 123 if host_colo is not None:
 124 host.emitter_host_colo = host_colo
 125 logger.info(Host Colo: %s, host_colo)
 126 else:
 127 hosts.remove(host)
 128 logger.debug(Removing Host error processing colo 
 : %s, host)
 129  DBHandler._save(hosts)
 
 Questions 2 : is this a good model considering there will be 1000s of hosts 
 being updated in a bulk update and fact being session is static variable in 
 DBHandler ? 
 Colo is another table and hence mapper object. If the colo is not present for 
 any of the host it will be fetched from static method fetch_colo of DB 
 handler based on name. Now there is one to many mapping from colo to host, 
 aka one colo can represent many hosts. In such case, would the call to DB be 
 avoided if the colo is already fetched previously and only relationship with 
 new host will be taken care of?
 
 Pardon my ignorance for I am newbie to sqlalchemy n  python in general

I'm not very familiar with concurrent.futures but I will note that the Session 
isn't thread safe, as it refers to a single DBAPI Connection/transaction in 
progress as well as lots of internal state, and all the objects associated with 
a given Session are in fact proxies to the state of that Session and 
transaction so they aren't either.   If you are running concurrent, multiple 
threads, you typically have a Session per thread, and every object that is 
handled within that thread should originate within that thread.   To pass the 
state of objects between threads, you should use the Session.merge() method.   
I talk in depth about why the Session works this way in my talk The SQLAlchemy 
Session in Depth 
http://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth .   There is 
also discussion of this in 

[sqlalchemy] Re: condition on every query

2014-07-22 Thread mail
I had the same problem and decided to look into implementing a custom 
relationship like you suggested. I'll post what I have here in case anyone 
else has the same problem. :) 

The first thing I noticed was the query_class attribute on the 
RelationshipProperty class. If you don't want to implement a new 
relationship, you can just do this, which is a little cleaner than 
respecifying the primaryjoin:

children = relationship(Child, query_class=MyPrefilteredQuery)


(Where MyPrefilteredQuery is the PreFilteredQuery class you implemented as 
per 
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery 
.)

Making your own relationship is pretty straightforward from there, though:
 


from sqlalchemy.util.langhelpers import public_factory
from sqlalchemy.orm.relationships import RelationshipProperty

class MyPrefilteredRelationship(RelationshipProperty):
def __init__(self, *args, **kwargs):
kwargs['query_class'] = MyPrefilteredQuery 
super(MyPrefilteredRelationship, self).__init__(*args, **kwargs)

filtered_relationship = public_factory(MyPrefilteredRelationship, 
.mylib.db) 


Then you can use it like so:

from mylib.db import filtered_relationship 

class Parent(Base):
__table__ = parent_table
children = filtered_relationship(Child)


On Sunday, February 20, 2011 8:19:58 PM UTC-8, robert wrote:

 On Feb 20, 8:59 pm, Michael Bayer mike...@zzzcomputing.com wrote: 
  the cleanest and most foolproof way is to join on relationships that 
 have the condition: 
  
  related_non_deleted = relationship(Related, 
 primaryjoin=and_(Related.deleted==False, Related.foo_id==id), 
 viewonly=True) 

 I agree that the explicit approach is the most foolproof.  The main 
 reason I was trying to avoid it is that, in my case, the condition is 
 ubiquitous. Thus all of my relationships that look like this 

 children = relationship(Child) 

 would have to change to 

 children = relationship(Child, primaryjoin=and_(Child.deleted==False, 
 Child.parent_id==id) 

 Perhaps I could write a custom relationship() to simplify things. 
 I'll give that a try.  I'll also have to 
 add .filter(foo.deleted==False) to all top level queries.  Maybe a 
 custom Session that always answers a query with that filter already 
 added would be cleaner.  There may ultimately be many queries and many 
 relationships in this app, and I'm trying to avoid having to remember 
 (or worse, have someone else remember) to always include the deleted 
 condition. 

 Thanks for your help.  I've got a bit of digging to do now.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Updating children of a one-to-many bi-directional relationship

2014-07-22 Thread Bala Ramakrishnan
Hi Michael,
thanks for the reply.

I thought I had tested the code, but I did not do so properly.

The essence of the logic I posted works. Here are the few corrections:

You are right that I should not use strings for partition_id when it is 
defined as integer. This is a piece of test code I cooked up. In the actual 
production code, partition_id comes from another API and it is already 
integer.

The other bug I had in my code was that:

in the lines:

for db_part in db_parts:
dev_part = dev_parts.get(db_part.partition_id, None)
if dev_part is not None:
# update db partition with that from device
db_part.partition_id = dev_part.partition_id
db_part.partition_name = dev_part.partition_name
# remove the device partition since we have updated the db
del dev_part[db_part.partition_id]

The last line should have read: del dev_parts[db_part.partition_id]

You also had mentioned that the code doesn’t seem to show anything that would 
result in any DELETEs emitted.

I am using a list comprehension to update the list of test_partitions 
associated with the test_device database entry.
This list comprehension updates the test_partition list ('partitions' 
attribute) for the device by keeping only those partitions
that are in the dict data structure (simulated to get data from a data 
collector on the device).

The lines of code that updates the 'partitions' attribute is:
# First delete partitions in database that are not in device
db_parts[:] = [db_part for db_part in db_parts
  if dev_parts.get(db_part.partition_id, None) is not None]


db_parts was set in an earlier line of code as follows:

ndev = 
sess.query(TestDevice).filter(TestDevice.dns_name=='testdev3').first()
print 'ndev:', ndev
if ndev is not None:
db_parts = ndev.partitions   #  --- partitions attribute set through 
backref on TestPartition object.

After I made the changes, the code works as expected:

Before update, the test_partition table:

basicinv= select * from test_partition ;
 id | partition_name | partition_id | device_id 
++--+---
 42 | test456|  456 |41
 43 | test345|  345 |41
 44 | test123|  123 |41
(3 rows)

After running the code:

basicinv= select * from test_partition ;
 id | partition_name | partition_id | device_id 
++--+---
 42 | test456-2  |  456 |41
 43 | test345-1  |  345 |41
 45 | test678|  678 |41
 46 | test567|  567 |41
(4 rows)

basicinv= 


As I expected, partitions with primary keys 42 and 43 got updated (ie. not 
deleted and inserted with new primary key rows).

Thanks for your feedback.



On Monday, July 21, 2014 7:13:46 AM UTC-7, Michael Bayer wrote:

 not to mention you’re using strings to set integer values, again a bad 
 idea, can only confuse your database:

 part.partition_id = ‘345’   # — this is a string


 partition_id = sa.Column(sa.Integer, nullable=False)  # — should be integer


 with this program you need to create a short test case that actually runs 
 and step through it with pdb as well as echo=True to analyze more 
 accurately what’s happening and when.



 On Jul 21, 2014, at 10:10 AM, Michael Bayer mik...@zzzcomputing.com 
 javascript: wrote:

 the code there doesn’t seem to show anything that would result in any 
 DELETEs emitted.   a DELETE here would only occur if you deassociated a 
 TestDevice and a TestPartition by removing from the TestDevice.partitions 
 collection or setting a TestPartition.device to None, and I don’t see that. 
   All of the manipulations you’re doing with part.partition_id have nothing 
 to do with any of that, SQLAlchemy’s relationships have no idea what you’re 
 doing with those, and overall it's a bad idea to mix the usage of 
 “obj.foreign_key = id” along with direct manipulation of the relationship 
 (where you say dev_part.device  = ndev) together.   SQLAlchemy’s 
 relationship management code knows nothing about any of those foreign key 
 sets.  See 
 http://docs.sqlalchemy.org/en/rel_0_9/faq.html#i-set-the-foo-id-attribute-on-my-instance-to-7-but-the-foo-attribute-is-still-none-shouldn-t-it-have-loaded-foo-with-id-7
 .







 On Jul 21, 2014, at 2:55 AM, Bala Ramakrishnan bal...@gmail.com 
 javascript: wrote:

 I had asked this question on Stack Overflow. The details are at this link:


 http://stackoverflow.com/questions/24836816/updating-a-few-children-in-one-to-many-relationship-deletes-all-rows-and-adds-ne

 The summary is I have a parent class A and a bidirectional one-to-many 
 relationship with class B. When I update the class B list for an instance 
 of A, the update may involve deleting some class B instances, updating some 
 of them, and adding new ones. However, I find that SqlAlchemy deletes all 
 ROWS of 

[sqlalchemy] SQLAlchemy 0.8.7 Released

2014-07-22 Thread Michael Bayer
Hey list -

SQLAlchemy release 0.8.7 is now available.

Release 0.8.7 contains a selected set of bug fix backports from release 0.9, 
including a selected set of ORM and extension fixes as well as dialect fixes 
for MySQL, Postgresql, and SQL Server.

Users should carefully review the Changelog 
(http://docs.sqlalchemy.org/en/latest/changelog/changelog_08.html#change-0.8.7) 
to note which behaviors and issues are affected. We'd like to thank the many 
contributors who helped with this release.

SQLAlchemy 0.8.7 is available on the Download Page:

http://www.sqlalchemy.org/download.html

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] SQLAlchemy 0.9.7 Released

2014-07-22 Thread Michael Bayer
Hey list -

SQLAlchemy release 0.9.7 is now available.

The 0.9 series is now entering maintenance mode where major new features are 
now targeted at the 1.0 series. Release 0.9.7 contains a wide range of 
bugfixes, many of which are identified as regressions from previous 0.9.X 
releases including three introduced in 0.9.5. All relevant fixes are of course 
forwards-ported to the 1.0 series and a handful are also backported to 0.8.7, 
being released at the same time.

A small number of new features include Postgresql JSONB support, improved 
Postgresql text matching features and a new event hook which allows database 
exceptions to be intercepted and re-thrown as new ones.

Users should carefully review the Changelog 
(http://docs.sqlalchemy.org/en/latest/changelog/changelog_09.html#change-0.9.7) 
to note which behaviors and issues are affected. We'd like to thank the many 
contributors who helped with this release.

SQLAlchemy 0.9.7 is available on the Download Page:

http://www.sqlalchemy.org/download.html

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


[sqlalchemy] flake8 tips?

2014-07-22 Thread Jonathan Vanasco
i recently introduced flake8 testing into our automated builds. (an idea 
inspired by mike bayer)

i already ran into an issue with Flake8 generating errors for ` == True` 
comparisons ( which are required under SqlAlchemy)

does anyone have any tips/tricks/gotchas for dealing with flake8 in 
SqlAlchemy projects ?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


RE: [sqlalchemy] flake8 tips?

2014-07-22 Thread Michael Nachtigal
You can try adding # nopep8 or # noqa at the end of any line to suppress 
pep8/flake8 messages for that line.

There may be more in the docs about this about how to suppress which errors 
specifically:

https://pypi.python.org/pypi/flake8

--Mike

From: sqlalchemy@googlegroups.com [sqlalchemy@googlegroups.com] on behalf of 
Jonathan Vanasco [jonat...@findmeon.com]
Sent: Tuesday, July 22, 2014 5:36 PM
To: sqlalchemy@googlegroups.com
Subject: [sqlalchemy] flake8 tips?

i recently introduced flake8 testing into our automated builds. (an idea 
inspired by mike bayer)

i already ran into an issue with Flake8 generating errors for ` == True` 
comparisons ( which are required under SqlAlchemy)

does anyone have any tips/tricks/gotchas for dealing with flake8 in SqlAlchemy 
projects ?


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
sqlalchemy+unsubscr...@googlegroups.commailto:sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to 
sqlalchemy@googlegroups.commailto:sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] flake8 tips?

2014-07-22 Thread Jonathan Vanasco
thanks. I already added  `# noqa` to those lines.

that pep8 test is actually bad, digging into the docs, it's actually 
recommending the worst case format.  there's a bunch of debate on it in a 
github ticket.

do you know of any other odd behaviors? 

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Multiple tables and foreignkey constraints

2014-07-22 Thread Ken Roberts
If I'm understanding correctly, something like the following:

# Keep track of manufacturer names
manufacturers_table = Table(u'manufacturer', metadata,
Column(u'id', Integer, primary_key=True,
Column(u'name', String(20))
)

# Keep track of model names
models_table = Table(u'model', metadata,
Column(u'id', Integer, primary_key=True,
Column(u'name', String(20))
)

# Keep track of inputs available to models
sources_table = Table(u'source', metadata,
Column(u'id', Integer, primary_key=True,
Column(u'pjlink', String(2)),
Column(u'text', String(20))
)

# Foreign key table for mfgr/model/sources
projector_table = Table(u'projector', metadata,
Column(u'manufacturer_id', Integer, primary_key=True, 
ForeignKey(u'manufacturer.id')),
Column(u'model_id', Integer, primary_key=True, ForeignKey(u'model.id')),
Column(u'source_id', Integer, primary_key=True, 
ForeignKey(u'source.id')),
)

# Persistent storage for installed projectors
# model_id would be an index for model_table.id
installed_table = Table(u'installed', metadata,
Column(u'id', Integer, primary_key=True),
Column(u'name', String(20)),
Column(u'model_id', Integer)
)


Or am I still off in left field?

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.