Re: [sqlalchemy] User defined type with custom SQL?

2012-11-23 Thread Vlad K.

On 11/18/2012 03:48 PM, Michael Bayer wrote:

There's a lot of documentation on this, not sure how you're not finding it, 
unless you're not using 0.8 (which is required):

1. documentation for SQL bind/result column processing: 
http://docs.sqlalchemy.org/en/latest/core/types.html#types-sql-value-processing

2. a full example using PostGIS: 
http://docs.sqlalchemy.org/en/latest/orm/examples.html#examples-postgis   (this 
example has an ORM-centric version in 0.7 too, which is no longer current..)

3. geoalchemy is pretty relevant since you can read its source (current tip 
uses new techniques) to see how they are doing it.



Thanks for your reply.


I guess I'll have to try it with 0.8 then because the examples I tried 
do not, indeed, work for 0.7.9, or I'm doing something wrong. I did 
check geoalchemy source but just couldn't figure it out myself.



Thanks,


V

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] User defined type with custom SQL?

2012-11-18 Thread Vlad K.


Hello list,


I can't find an example for a user defined type that has to produce 
custom SQL. What I want is to implement a PostGIS geometry column and it 
basically has to do two things:



1. Implement own SQL for insert/update, to produce for example:

INSERT INTO  (..., xyzzy, )  VALUES (...,  
transform(PointFromText( %s ), 4269), 32661) , ) 


UPDATE  SET xyzzy=transform(PointFromText( %s ), 4269), 32661) ...


2. Internally carry a tuple X that will be bound as string POINT({0} 
{1}).format(x[0], x[1])


3. Parse PostgreSQL array result back into tuple X



I've managed to produce #2 and #3 with a UserDefinedType, but I don't 
know how to tell it to produce custom SQL, aside to bind_processor, for 
#1. I'm also using get_col_spec for the UserDefinedType for DDL.



BTW, GeoAlchemy is beside the point here. I want to learn how to produce 
custom types like this.



Thanks,

--


.oO V Oo.


Work Hard,
Increase Production,
Prevent Accidents,
and
Be Happy!  ;)

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Double many-to-many relation filter headache

2012-10-01 Thread Vlad K.



To answer my own question, seems like SQLA won't automatically process 
inner joins if you supply individual columns to the session.query(), if 
had to pass the declarative model class itself for this to work as expected.


Unless I'm doing something wrong, I guess I should use deferred columns 
to avoid loading unnecessary data for lists... (that's why I was 
querying individual columns instead of entire model).


--


.oO V Oo.


Work Hard,
Increase Production,
Prevent Accidents,
and
Be Happy!  ;)

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Double many-to-many relation filter headache

2012-09-28 Thread Vlad K.


Hello list,

I don't think my brain is currently capable of figuring out the SQL 
itself, let alone how to do it with SQLA.


I've got the following models:

City
Location
Item
Category


Location belongs to one of many Cities via simple Location.city_id 
foreign key
Location belongs to one or more Categories via LocationCategoryAssoc 
relationship(secondary) many-to-many
Item belongs to one or more Location via LocationItemAssoc 
relationship(secondary) many-to-many


I need to list Items that:
- have Item.some_flag == True
- are present in Location X  (have relationship with Location.id == X)
- belong to Category Y (have relationship with Category.category_id == Y)

Affected properties:

City.city_id
Category.category_id

Location.city_id

LocationCategoryAssoc.location_id   (fkey to Location)
LocationCategoryAssoc.category_id (fkey to Category)

LocationItemAssoc.location_id (fkey to Location)
LocationItemAssoc.item_id  (fkey to Item)

Item.some_flag



The Item.some_flag == True is simple, of course, but I'm not sure how to 
construct the query, joins and filters for the rest. The result I want 
is a list of Item instances.




Many thanks.

--


.oO V Oo.


Work Hard,
Increase Production,
Prevent Accidents,
and
Be Happy!  ;)

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Filtering for element in PgArray column

2012-08-14 Thread Vlad K.

On 08/13/2012 10:13 PM, Michael Bayer wrote:

we can keep turning the crank here, here's a full series of examples to make 
this happen:


Thanks for the example, that's definitely something I'll need sooner or 
later, already implemented my own PostgreSQL POINT data type and was 
wondering how to use compile properly.


But I was also wondering how to bind variables into a query (using the 
sqlalchemy.orm.query.Query object on the session) without stuffing them 
in the string, for a quick and dirty solution. Query.params() to the rescue:



rows = session.query(Model)\
  .filter(text(array_column @ ARRAY[:element_value]))\
  .params(element_value='1234')


I suppose this is valid usage, I mean the query works as expected.

--


.oO V Oo.


Work Hard,
Increase Production,
Prevent Accidents,
and
Be Happy!  ;)

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Filtering for element in PgArray column

2012-08-13 Thread Vlad K.


Hi all,


what is the proper way to filter for element in a PostgreSQL ARRAY 
column? in_() does not produce valid syntax for PostgreSQL (  does 
(value) instead of array[value] ).


For now I'm doing the following, but I'm not sure how to bind values and 
not have such a gaping sql injection vuln (even though somevalue is 
checked against a list of allowed (unicode) values, I want proper binding:



rows = session.query(Model).filter(text({0} @ 
{1}.format(Model.__table__.c.array_column, 'somevalue').all()



Which should produce:


SELECT * FROM model_table WHERE array_column @ ARRAY['somevalue']


--


.oO V Oo.


Work Hard,
Increase Production,
Prevent Accidents,
and
Be Happy!  ;)

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Filtering for element in PgArray column

2012-08-13 Thread Vlad K.

On 08/13/2012 08:53 PM, Michael Bayer wrote:

we don't currently have native support for ARRAY operations and there is some 
infrastructure work that should make it easier in 0.8 to add.

for now, when you need custom operators use op():

Model.array_column.op(@)(somevalue)


I tried that, but that doesn't work because the second operand has to be 
wrapped in ARRAY[], afaik that's the only way to lookup a value in an 
array column (which is also gist indexable).



 If I  do

Model.array_column.op(@)('ARRAY[123]')

I get

SELECT * FROM model_table WHERE array_column @ 'ARRAY[123]'


and I need ARRAY[123] without quotes.

--


.oO V Oo.


Work Hard,
Increase Production,
Prevent Accidents,
and
Be Happy!  ;)

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Need help using PostGIS + GeoAlchemy

2012-08-09 Thread Vlad K.


Hello,


I'm trying to use PostGIS with GeoAlchemy, being totally new to both. 
According to available documentation and what I need (A point 
representing longitude and latitude of row), I have


   class PointOfInterest(Base):
__tablename__ = ...

...

geom = geoalchemy.GeometryColumn(Point(2))



To read from geom column, I apparently need:

   poi = PointOfInterest()

   lon = session.scalar(poi.geom.x)
   lat = session.scalar(poi.geom.y)


And to set the value I need

   poi.geom = geoalchemy.WKTSpatialElement(POINT({lon}
   {lat}).format(lon, lat))


Of course, the above is possibly vulnerable for sql injections so I must 
make sure lon and lat are really numbers.



But is there a simpler/safer/saner way to do both? Can I update the geom 
column somehow to set only one value of the pair?



Also, I'm not quite sure how to query rows with X, Y being with N units 
of a reference point X0, Y0?



Thanks.


--


.oO V Oo.


Work Hard,
Increase Production,
Prevent Accidents,
and
Be Happy!  ;)

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] schema design feedback

2012-07-01 Thread Vlad K.


I see two approaches here. The first is partitioning as it has been 
suggested, except partitions need to be defined upfront, if I'm not 
mistaken, at least in MySQL (expression + size), whereas in PostgreSQL 
you need custom triggers on write operations on the main table, so I'm 
not sure if per-customer partitioning would work as that would require 
rewriting the triggers each time a customer is added or removed.


Perhaps time-based partitioning is better, e.g. monthly so you have 12 
partitions. Or anything else with fixed number of partitions.


The other approach, which can be combined with partitioning, is pivot 
tables and/or materialized views, especially if you need graphing. You 
can have triggers that update hourly, daily, weekly, monthly pivot data 
which is used for graphing, and you actually select through main log 
data only if you need to recreate the pivot data or look for particular 
record.


Unless I misunderstood the original intent?




On 07/01/2012 12:49 PM, Wolfgang Keller wrote:

Having a single table to store all the customer's logs didn't seem to
work because when the device is removed and the logs have to be
deleted the table gets locked for a while.

Huh? Bad choice of DBMS/OS? Bad application design?


My question is, is this a good idea? Am I missing something?

I would go for one table for all customers, but for _partitioning_ the
log table into one partition per customer.

A database such as e.g. PostgreSQL running on an operating system that
knows how to combine multitasking and I/O (read: _not_ MS Windows)
should be perfectly capable of serving clients while deleting an entire
table partition.

Sincerely,

Wolfgang




--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Understanding memory usage under SQLA

2012-06-21 Thread Vlad K.


On 05/17/2012 05:09 PM, Claudio Freire wrote:
Precompiling queries in SQLA, to populate the various SQLA's compiler 
caches, doing some queries that cause libpq and psycopg2 to excercise 
(and thus to allocate whatever permanent data structures it needs to), 
all at load time, will help keep fragmentation to a minimum. 
Fragmentation is a complex issue, and both python and SQLA are quite 
prone to it. But it can be worked around. 



Hi, I'm going back to this email because I'm interested in this 
precompiling queries. What exactly did you mean? Obtaining the 
finalized query string?



--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Understanding memory usage under SQLA

2012-05-17 Thread Vlad K.

  
  

Hello.

I have a problem when processing relatively large number of rows.
For example, when selecting 5000 main rows, each having a number of
many-to-one relationships, memory usage shown by top skyrockets into
200+ MB range (RES), while heapy shows cca 20MB of Python heap.
PostgreSQL backend via psycopg2.

I've made a minimum example case based on the problem I'm noticing
in my Pyramid app, so the session.commit() at line 130 is there to
simulate commit done by Transaction used in Pyramid at the end of
each request. If I'm understanding things correctly, committing
would expire all objects involved in the session, and I even tried
manual session.expunge(row), but there is no difference in memory
usage.

The following is source of an example case. Requires SQLAlchemy
(tested with 0.7.5 and 0.7.7), guppy, psycopg2 (tested with 2.4.2
and 2.4.4). Happens both on Fedora 15 64-bit and CentOS 6.2 32-bit,
though of course the 32-bit shows some 30% lower RES in top.

http://pastebin.com/UFgduWVw


Usage: setup a test database, update line 25 config. Prepopulate
database with -p flag, then run again without any flags.

I don't see where and how would any objects remain in memory, and
heapy showing much lower memory use suggests something is retained
in the involved C extensions?  I also tried with pympler, diff
before and after selecting rows, shows nothing near reported by top.
I guess there is no "leak" in traditional sense of the word because
repeating the task does not yield growing memory consumption. It
stabilizes at certain value and stays there.

Heapy before selecting rows:

Partition of a set of 102014 objects. Total size = 13160672
  bytes.
   Index  Count   % Size   % Cumulative  % Kind (class / dict of
  class)
   0  45901  45  4395296  33   4395296  33 str
   1  26041  26  2186184  17   6581480  50 tuple
   2   7039   7   900992   7   7482472  57 types.CodeType
   3   6836   7   820320   6   8302792  63 function
   4    235   0   761608   6   9064400  69 dict of module
   5    608   1   689792   5   9754192  74 dict (no owner)
   6    676   1   648544   5  10402736  79 dict of type
   7    676   1   608344   5  11011080  84 type
   8    199   0   206248   2  11217328  85 dict of class
   9    185   0   167320   1  11384648  87
  sqlalchemy.sql.visitors.VisitableType
  334 more rows. Type e.g. '_.more' to view.

Heapy after 5000 rows have been selected:

Partition of a set of 102587 objects. Total size = 16455168
  bytes.
   Index  Count   % Size   % Cumulative  % Kind (class / dict of
  class)
   0  45923  45  4397632  27   4397632  27 str
   1  1   0  3146024  19   7543656  46
  sqlalchemy.orm.identity.WeakInstanceDict
   2  26090  25  2189480  13   9733136  59 tuple
   3   7039   7   900992   5  10634128  65 types.CodeType
   4   6859   7   823080   5  11457208  70 function
   5    235   0   761608   5  12218816  74 dict of module
   6    657   1   705048   4  12923864  79 dict (no owner)
   7    676   1   650464   4  13574328  82 dict of type
   8    676   1   608344   4  14182672  86 type
   9    199   0   206248   1  14388920  87 dict of class
  372 more rows. Type e.g. '_.more' to view.




What am I doing wrong? I'm hoping something trivial and blatantly
obvious that I'm oblivious to. :)


Thanks.

-- 

.oO V Oo.
  




-- 
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.




Re: [sqlalchemy] Understanding memory usage under SQLA

2012-05-17 Thread Vlad K.


Hi, thanks for your reply.

Yes, I know the Python memory management model and that while it may 
free internally, it does not to OS. I've read somewhere that it has been 
fixed in 2.7 as well, but my testing on Fedora's 2.7.1 still shows the 
issue. That's why I thought perhaps there are dangling references in the 
C extension parts that are not visible to heapy (is that even possible?).


I tried with yield_per() and you once told me how that can't work if I 
use subqueryload, so I tried without subqueries. The problem is that the 
script then shoots from 5 seconds to over a minute to process same 
dataset which is unacceptable to me for other reasons (which is expected 
as there are two additional queries per each of the 5000 rows, making 
the app do 10001 queries + ORM overhead on each). However, with 
yield_per() the memory consumption stays as low as before the querying 
begins.


I've got three possible solutions here. One is repeated querying with 
limited result set AND subqueryloading which works like yield_per, 
except it requires additional sorting and offset. I just tried that and 
it indeed consumes much less memory. With sets 500 rows at once (and 
with full subqueryloads) the memory consumption is 1/10 of loading all 
rows at once which figures, 500 is 1/10 of 5000. This is acceptable.


Another is (materialized) views on the DB end with triggers and entire 
new model to select data from.


And yet another solution is to drop ORM and construct queries manually, 
returning relational data as subselects in arrays, and add a thin 
ORM-like layer that just converts row columns to named tuples so that 
the consumers of this data can use same model interface. But I'm 
guessing this is no different than the (materialized) views approach 
except the combining is done in the DB and not in the Python app. I 
still need separate model class or named tuples.





.oO V Oo.


On 05/17/2012 03:21 PM, Michael Bayer wrote:

There's a few different parts to what you're asking.

The first is that you're comparing Python's use of OS memory (I'm 
assuming this is the 200+ MB) to Python's actual amount of objects 
present.   This is a common mistake.  Python up through version 2.6 
does not release memory back to the OS once taken - this was improved 
in 2.7.   There's an old article about this here: 
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm 
as well as Alex Martelli's answer: 
http://stackoverflow.com/a/1316799/34549 .


Second is, what exactly is the large object you're creating here ?   
Answer - first, psycopg2 by default buffers the result set fully 
before returning it to SQLAlchemy - so it is first a list of 5000 
tuples.  Second, the ORM itself also by default buffers the full set 
of rows from the result set in the form of mapped objects, so 5000 
objects plus their related objects.A way to modify this behavior 
is to use the yield_per() option of Query, which will also in the case 
of psycopg2 tell psycopg2 to use its server side cursors feature 
which does not buffer.


However, yield_per() is not compatible with eager loading as eager 
loading involves being able to load collections across the full set of 
original objects.   Typically the better way to deal with large 
numbers of rows is to paginate, using either LIMIT/OFFSET or using 
window functions (see 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery ).


Thirdly, there is a modest growth in memory when a series of mappings 
are used for the first time, including the configuration of mappers, 
initialization of TypeEngine value processors, and such.   But the 
initial large resultset is the main thing causing the higher initial 
memory footprint.You'll notice this isn't a leak at all, as it 
doesn't grow.



On May 17, 2012, at 7:33 AM, Vlad K. wrote:



Hello.

I have a problem when processing relatively large number of rows. For 
example, when selecting 5000 main rows, each having a number of 
many-to-one relationships, memory usage shown by top skyrockets into 
200+ MB range (RES), while heapy shows cca 20MB of Python heap. 
PostgreSQL backend via psycopg2.


I've made a minimum example case based on the problem I'm noticing in 
my Pyramid app, so the session.commit() at line 130 is there to 
simulate commit done by Transaction used in Pyramid at the end of 
each request. If I'm understanding things correctly, committing would 
expire all objects involved in the session, and I even tried manual 
session.expunge(row), but there is no difference in memory usage.


The following is source of an example case. Requires SQLAlchemy 
(tested with 0.7.5 and 0.7.7), guppy, psycopg2 (tested with 2.4.2 and 
2.4.4). Happens both on Fedora 15 64-bit and CentOS 6.2 32-bit, 
though of course the 32-bit shows some 30% lower RES in top.


http://pastebin.com/UFgduWVw


Usage: setup a test database, update line 25 config. Prepopulate 
database with -p flag, then run again

[sqlalchemy] SQLA in a DB cluster

2012-05-05 Thread Vlad K.


Hi all.

I'm interested in learning proper application design patterns and best 
practices when using a SQLA powered web application (Pyramid based) in a 
DB cluster (PostgreSQL 9.1, psycopg2 driver).



The primary requirement for the cluster is realtime backup, so the idea 
is to have two or more DB backends in master-slave (WAL replicated 
probably) setup.


The second requirement is failover. If the master fails, the system 
should promote a slave into master, forget about former master and 
notify the admin. Similarly if a slave fails, it should be taken out of 
the equation, admin notified.


Last but not least is loadbalancing of selects and/or parallel queries 
to take the advantage of more than one DB nodes being available. Each 
application request is basically treated like a single transaction in 
Pyramid and there are mixed cases. Some requests are read only, some are 
read-write, and the application treats every request as single 
transaction. There is no scenario where single request can result with 
multiple read queries that are isolated.



Instead of reinventing the wheel I'm looking at using pgpool-II which 
can achieve all this but I'm not sure to what degree with SQLA. My 
primary concern is pgpool's inability to loadbalance queries within 
single explicit transaction block which is the case here (esp. with 
Pyramid and psycopg2 driver). I am also not sure whether to use pgpool's 
replication because it is statement level, ie. what happens if a node 
fails in transaction; or WAL replication.



Any advices, pointers, tutorials, your own experience?



Many thanks.


--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] SQLA in a DB cluster (repost)

2012-05-05 Thread Vlad K.


Hi all.


I'm posting this again because it seems my original post never reached 
the list?





I'm interested in learning proper application design patterns and best 
practices when using a SQLA powered web application (Pyramid based) in a 
DB cluster (PostgreSQL 9.1, psycopg2 driver).



The primary requirement for the cluster is realtime backup, so the idea 
is to have two or more DB backends in master-slave (WAL replicated 
probably) setup.


The second requirement is failover. If the master fails, the system 
should promote a slave into master, forget about former master and 
notify the admin. Similarly if a slave fails, it should be taken out of 
the equation, admin notified.


Last but not least is loadbalancing of selects and/or parallel queries 
to take the advantage of more than one DB nodes being available. Each 
application request is basically treated like a single transaction in 
Pyramid and there are mixed cases. Some requests are read only, some are 
read-write, and the application treats every request as single 
transaction. There is no scenario where single request can result with 
multiple read queries that are isolated.



Instead of reinventing the wheel I'm looking at using pgpool-II which 
can achieve all this but I'm not sure to what degree with SQLA. My 
primary concern is pgpool's inability to loadbalance queries within 
single explicit transaction block which is the case here (esp. with 
Pyramid and psycopg2 driver). I am also not sure whether to use pgpool's 
replication because it is statement level, ie. what happens if a node 
fails in transaction; or WAL replication.



Any advices, pointers, tutorials, your own experience?



Many thanks.

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Security issues?

2012-03-08 Thread Vlad K.


Hi all,

I was wondering if there is some kind of security announcements mailing 
list or anything where we could be informed of security issues directly 
from the horse's mouth? :) I did join the mailing list after 0.7.0 was 
released so I can't tell if this particular issue was mentioned, a quick 
google search of the group archives suggests it was not.



While this does not affect me because my apps use 0.7.5, I've just 
learned today, through CentOS/RH updates, about SQL injection 
vulnerability in versions prior to 0.7.0b wrt LIMIT and OFFSET clauses 
not being sanitized.


https://bugzilla.redhat.com/show_bug.cgi?id=783305


The SQLA changelog does mention this:

The limit/offset keywords to select() as well
as the value passed to select.limit()/offset()
will be coerced to integer.  [ticket:2116]
(also in 0.6.7)



I do read changelogs when I install/upgrade, but not coercing values is 
not the same as allowing SQL injection, ie. if string param was used, it 
should be properly escaped / bound, no? So this particular entry did not 
alarm me, especially since I always manually coerce integer input params 
to ints, a habit I developed in my PHP days.



Looking through the patches I see integer_or_raise() was added to SQL 
expressions (and utils) which is great, and I'm not sure whether 
Postgresql dialect had the same vuln as the mentioned Oracle dialect (in 
ticket 2116), as I'm using Postgres exclusively, and I'm not sure where 
to even begin looking.


Still, I wonder if there are more holes? Not a critique, just an honest 
question: I thought SQLA always used binding?



Thanks,

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Something in Pyramid is preventing gc on SQLAlchemy objects

2012-02-27 Thread Vlad K.


Thanks for the info, I've commented on that separate issue as well.

.oO V Oo.


On 02/25/2012 04:14 AM, Yap Sok Ann wrote:

I too encountered the same problem that wasted me days, until I caught
pdtb_sqla_queries with my very limited memory profiling skill.

I have actually filed a bug here:

https://github.com/Pylons/pyramid_debugtoolbar/issues/51

and there is a separate issue to make it less leaky:

https://github.com/Pylons/pyramid_debugtoolbar/issues/52


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Re: Something in Pyramid is preventing gc on SQLAlchemy objects

2012-02-23 Thread Vlad K.


/facepalm


Okay, debugtoolbar does that. If I disable it from config, everything 
works fine. My fault. Been hunting for this memory leak for two days 
straight and it never occurred to me that even the command line 
bootstrapped Pyramid, when using development.ini, will have the debug 
toolbar on.


Sorry for wasting your time, folks.


I'm sending this email to the SQLAlchemy list too, where I asked about 
this problem, for further reference in case anyone else has the same 
problem



.oO V Oo.


On 02/23/2012 02:24 PM, Vlad K. wrote:


Hi all.

I have a Pyramid bootstrapped script that has to process thousands of 
rows (on relatively complex model relationships) per call from command 
line. Pyramid is bootstrapped to reuse models and helper functions 
from the main web application this script is part of.


The problem I'm having is that in each iteration and/or individual 
transaction, the memory keeps growing, seems like SQLAlchemy objects 
are not released. A self-sufficient example script is here:



https://gist.github.com/d669e958c54869c69831


Without bootstrapping Pyramid (comment out line 266 and set env=None 
or something), the gcdelta is 0 or negative. With Pyramid bootstrapped 
(note that nothing else touches the application or request, only 
bootstrap is called) the gcdelta is in thousands per 200 iterations.



Fedora 16 x86_64
PostgreSQL 9.1

Pyramid 1.2.1 (except sqlalchemy and psycopg2, all dependencies are 
default given with 1.2.1 installation)

SQLAlchemy 0.7.5
Psycopg2 2.4.4
Transaction 1.1.1
Zope.Sqlalchemy 0.6.1



--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Working with large IN lists

2012-02-23 Thread Vlad K.


.oO V Oo.


On 02/23/2012 02:50 PM, Michael Bayer wrote:

I'd note that those subqueryloads() render the effectiveness of yield_per() to 
be almost nil.


I know. I've replicated the real use case in the application which has 
yet to see proper optimization which includes better query planning and 
reduction of unnecessary joins and relationships. I'd rather investigate 
in implementing prepared statements and do basically lazy=select 
instead of subqueries.




The next thing I'd look at is that second query for ModelC.   You can also load 
those upfront so that you don't need to do a query each time:

modelcs = dict(
 sess.query(ModelC.id_a, ModelC).join(ModelC.a).filter(ModelA.parent_id=same 
integer you're using against ModelA)
)

then you have a dictionary of id_a-ModelC as you loop through your ModelA records.   All the 
ModelC's are already in the session, so you don't need to use merge(), which is also not a super quick 
operation.  If an id_a is not in the dictionary then you know to create a new ModelC and use 
Session.add() to put it in.   If you're limiting the ModelA rows using a LIMIT or window recipe like I mentioned 
earlier, you'd apply that same criteria to the loading of the modelcs.

Later on you're doing something with query(ModelC).first() in a loop which is 
also something I hope isn't in the real application - looking there is seems 
like you'd only need to say query(ModelC).delete().


Might not be visible from this test script, but the scenario is this. 
ModelA represents certain data that has to be exported to external 
services (XMLRPC, REST and similar). In an ideal situation I just select 
all ModelA that has to be exported (by looking at timestamp of last 
modification vs timestamp of process run), but I can't do that because 
if such a transaction fails, it has to remain remembered for next 
batch run. So I use ModelC table which logs these pending transactions.


So the first phase selects rows from ModelA that are up for export and 
creates transaction logs in ModelC.


The second phase then loads and exports one by one row from ModelC 
(joined with ModelA and everything else required for the export). 
However, if single transaction fails, the entire script exist and 
continues when called next time. This I have to do for other reasons 
(preventing overload on possibly downed external service etc.., so I 
can't skip that row and fetch next).


It may happen, and does regularly, that on subsequent runs of the 
process there are no new ModelA rows to load, but there are ModelC rows 
that failed from last call (since they're in the table, it means they 
were not processed)


And that's the logic in query(ModelC).first(), processing, and then 
delete(). Also note that each row has to be an individual transaction 
(load, send to external service, remove from ModelC table), which means 
I can't rely on session/identity caching by pre-loading data instead 
of joins and subqueries.




Watching the SQL emitted with echo=True, and in some cases also seeing how 
large the results coming in are using echo='debug', is something I strongly 
recommend when first profiling an application.


Yes, I use logging and see all the SQL emitted.



Thanks for your input, I appreciate all the help and advice I can get. 
Still a ton of stuff to learn about SQLA.




V

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Working with large IN lists

2012-02-22 Thread Vlad K.


Hi,

thanks for your reply. I haven't yet tested this with a profiler to see 
exactly what exactly is happening, but the bottom line is that the 
overall memory use grows with each iteration (or transaction processed), 
to the point of grinding the server to a halt, and top shows only the 
Python process involved consuming all the memory.


I've already modified code to read one row at a time, by first creating 
a list of IDs to be affected, then going through that list and selecting 
+ updating/inserting one transaction at a time.


I suppose I can solve the problem entirely on the SQL side with a stored 
function but that's a maintenance overhead I'd like to avoid if possible.


Meanwhile I've gotten rid of convenience relationships and in some 
aspects decided on lazy=select instead of subquery or joined and have 
brought down total memory use, now the entire process can finish with 
the amount of RAM available on the server, but it still shows linear 
growth from the start to the end of the process.


.oO V Oo.


On 02/22/2012 07:23 PM, Michael Bayer wrote:

When we want to test if a Python program has a leak, we do that via seeing 
how many uncollected objects are present.   This is done via gc:

import gc
print total number of objects:, len(gc.get_objects())

That's the only real way to measure if the memory used by Python objects is growing 
unbounded.  Looking at the memory usage on top shows what the interpreter 
takes up - the CPython interpreter in more modern releases does release memory back, but 
only occasionally.   Older versions don't.

If you're doing an operation that loads thousands of rows, those rows are 
virtually always loaded entirely into memory by the DBAPI, before your program 
or SQLAlchemy is ever given the chance to fetch a single row.   I haven't yet 
looked closely at your case here, but that's often at the core of scripts that 
use much more memory than expected.

There's ways to get *some* DBAPIs to not do this (particularly psycopg2, if 
you're using Postgresql, see 
http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=yield_per#sqlalchemy.orm.query.Query.yield_per
 and 
http://docs.sqlalchemy.org/en/latest/core/connections.html?highlight=stream_results#sqlalchemy.engine.base.Connection.execution_options),
  though the better solution is to usually try loading chunks of records in at 
a time (one such recipe that I use for this is here: 
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/WindowedRangeQuery) .  Or 
better yet consider if the problem can be solved entirely on the SQL side (this 
entirely depends on exactly what you're trying to do with the data in question).


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Working with large IN lists

2012-02-22 Thread Vlad K.


Yes, definitely growing at a rate of 700-800 per iteration.

.oO V Oo.


On 02/22/2012 07:23 PM, Michael Bayer wrote:

When we want to test if a Python program has a leak, we do that via seeing 
how many uncollected objects are present.   This is done via gc:

import gc
print total number of objects:, len(gc.get_objects())


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Working with large IN lists

2012-02-22 Thread Vlad K.


Okay, thanks to this article:

http://neverfear.org/blog/view/155/Investigating_memory_leaks_in_Python


I made similar plot of object counts in time, showing top 50 types. The 
resulting PDF is here (you might wish to download it first, Google 
messes it up for me):


https://docs.google.com/open?id=0ByLiBlA59qDwYTY1MGIzYWEtYjMxZi00ZDVlLTk0OTEtOGI2ZjA3NDgyM2Y3


Everything seems to linearly grow in count. Something is keeping all 
those objects reference somewhere. What could possibly be the cause?



.oO V Oo.


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Working with large IN lists

2012-02-21 Thread Vlad K.


Hi all,

I have to read thousands of rows from a table and compile some data, but 
in certain conditions update those rows, all with same value. The ratio 
of reads and writes here is widest possible. Sometimes no rows, 
sometimes few, and sometimes all rows that are read have to be updated. 
The last case scenario is making me concerned.



for row in query.yield_per(100):
# Do something with data

if some_condition:
row.some_column = 123
session.flush()


I am thinking about just adding the row's ID to a list:

list_of_ids = []
for row in query.yield_per(100):
# Do something with data

if some_condition:
list_of_ids.append(row.primary_key)

and near the end of transaction do:

session.query(Model).filter(Model.primary_key.in_(list_of_ids)).update({some_column 
: 123}, False)


Yes I'm aware of increased memory requirements to store the ID list on 
the application side, and no I don't need to lock the rows for update, 
the logic of atomic update at the end is sufficient for my case.


But I think, and the real use benchmarks will probably show, I haven't 
tested yet, that single update query will work faster. I need lowest 
transaction processing time on the application side for entire call, 
even if takes more memory and more database iron.


What I'm concerned with here is if there are any limits or significant 
overheads with large .in_ lists?


The backend is PostgreSQL via psycopg2.



Thanks

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Working with large IN lists

2012-02-21 Thread Vlad K.


Thanks for your replies.

Using the IN list definitely speeds up the process, but I hate the 
resulting query which uses bound variables for each and every element of 
the list.


But I have another problem with this, there's a massive memory leak 
somewhere. Take a look at this model:




class GatewayTransaction(Base):
__tablename__ = gateway_transactions

realestate_id = Column(Integer, ForeignKey(realestate.realestate_id,
   ondelete=set null,
   onupdate=cascade), 
primary_key=True)

portal_id = Column(Text, primary_key=True)
realestate_portal_id = Column(Unicode)
operation = Column(Text, nullable=False)
agency_id = Column(Integer, ForeignKey(agencies.agency_id,
ondelete=set null,
onupdate=cascade), 
nullable=False)

agency_portal_id = Column(Unicode, nullable=False)
agency_export_token = Column(Unicode, nullable=False)
user_id = Column(Integer, ForeignKey(users.user_id,
 ondelete=set null,
 onupdate=cascade), 
nullable=False)

mod_images = Column(Boolean)

agency = relationship(Agency, lazy=joined)
realestate = relationship(Realestate, lazy=joined)
user = relationship(User, lazy=joined)




Now, when I do this:


for row in some_query.all():
gt = session.query(GatewayTransaction)\

.filter(GatewayTransaction.realestate_id==row.realestate_id)\

.filter(GatewayTransaction.portal_id==k)\
.first() or GatewayTransaction()

# Do some data processing
#
#


# Update existing or insert as new
gt = session.merge(gt)
session.flush()


It is very, very slow, it takes minutes to process 2000 rows and memory 
usage skyrockets into multiple GB range and I have to terminate it 
before it starts swapping like hell. With lazy=select, it flies fast, 
done in a couple of seconds with very little memory consumed, because at 
this point there are no rows in the table so nothing is additionally 
selected, instead inserted. Still, why would a join slow things down so 
drastically and shoot Python memory usage (not DB's) skyhigh?


Also, even if I try session.expunge(gt) or expunge_all() (previously 
preparing the row to be loaded one by one from a list of IDs), the 
memory always keeps growing, as if the instance do not die, never get 
garbage collected...




.oO V Oo.


On 02/21/2012 04:35 PM, Simon King wrote:


A *long* time ago (SQLALchemy 0.3), I had some performance problems
with large IN clauses, and the reason turned out to be SQLAlchemy
taking a lot of time to build long lists of bindparam objects. I've no
idea if this is still the case these days. The best thing you can do
is just try it.

Simon



--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Select ... for update of

2012-01-11 Thread Vlad K.


I need to select for update but locking the row only in one table in a 
query with outer joins. I'm using PostgreSQL backend. It has to be outer 
join, so I need FOR UPDATE OF [tablename]. How do I do that with SQLA? 
So far I was using Query.with_lockmode(update), but I can't specify 
the table with that...


Thanks!

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Order by the sequence in_ ?

2012-01-03 Thread Vlad K.


Thanks, I think that's exactly what I was looking for!


.oO V Oo.


On 12/28/2011 06:10 PM, Michael Bayer wrote:

On Dec 27, 2011, at 8:37 PM, Vlad K. wrote:


Hi all.


I need to select some rows where pkey is in a sequence. How do I order by that 
very sequence?


images_all = 
session.query(AdImage).filter(AdImage.image_id.in_(images)).order_by(  ?  
).all()


Postgresql backend.

typically with case():

order_by(
 case([
   (Adimage.id == 3, A),
   (Adimage.id == 1, B),
   (Adimage.id == 9, C),
 ])
)

unless you can use a simpler transformation on AdImage.id that converts it into 
a sortable value.

The above can be generalized:

case([(AdImage.id == value, literal(index)) for index, value in 
enumerate(images)])





--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Order by the sequence in_ ?

2011-12-27 Thread Vlad K.


Hi all.


I need to select some rows where pkey is in a sequence. How do I order 
by that very sequence?



images_all = 
session.query(AdImage).filter(AdImage.image_id.in_(images)).order_by(  
?  ).all()



Postgresql backend.


Thanks!

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] SQLA without the ORM?

2011-12-16 Thread Vlad K.


Hi all!

I have a few scenarios here that I believe are best solved without the 
ORM overhead. For example, various log tables that do not require a 
primary key, the rows are practically immutable, but are queried back 
for statistical analysis. It is my understanding that I cannot use the 
ORM without a primary key of some kind?


I am looking through the docs and I believe I should look into SQL 
Expression Language section for that, am I correct? Which basically 
means I should be using the expressions directly on the connection 
object (connection.execute()) instead of using the 
sqlalchemy.orm.scoped_session object?



Can you give me any pointers?



Thanks!


--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] SQLA without the ORM?

2011-12-16 Thread Vlad K.


So basically, if I'm understanding the docs correctly, and what you just 
wrote:


Using the session object does not mean using the ORM. The ORM comes in 
play with Mapper and Mapped instances, which in turn require a primary 
key defined. So, I can use session.execute() to do non-ORM querying? And 
ResultProxy to work with returned data?


How would I autocreate the tables if I don't use Mapping, with DDL 
events and pure SQL?



Thanks.

.oO V Oo.


On 12/16/2011 10:03 PM, Jon Nelson wrote:

On Fri, Dec 16, 2011 at 2:55 PM, Vlad K.v...@haronmedia.com  wrote:

Hi all!

I have a few scenarios here that I believe are best solved without the ORM
overhead. For example, various log tables that do not require a primary key,
the rows are practically immutable, but are queried back for statistical
analysis. It is my understanding that I cannot use the ORM without a primary
key of some kind?

I am looking through the docs and I believe I should look into SQL
Expression Language section for that, am I correct? Which basically means I
should be using the expressions directly on the connection object
(connection.execute()) instead of using the sqlalchemy.orm.scoped_session
object?

I don't use scoped_session but I do use the sessionmaker Session
instances from the .orm namespace, and I rarely use the ORM itself.

My pattern usually goes like this:

session_factory = sa.orm.sessionmaker()
sess = session_factory()

sess.begin()
try:
   .. do stuff with sess
except:
   sess.rollback()
   grump loudly
   raise
else:
   sess.commit() # if appropriate, sometimes rollback
   sess.close() # probably unnecessary





--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Properly handling sessions in rollback

2011-11-13 Thread Vlad K.


Hi.

Imagine the following scenario:



session = DBSession()
readonly_model = session.query(ReadOnlyModel).get(id)

# Readonly means the model will NOT have its data changed in the life of 
the transaction(s).


method_one(readonly_model.readonly_data, param_1, param_2, ...)
method_two(readonly_model.readonly_data, param_3, param_4, ...)

session.commit()



Now, the code here is the caller and the methods one and two are 
logically separate, meaning they do not know anything about the caller, 
nor the caller knows anything about the internal states and processing 
involved in the methods.


Suppose a transaction fails in one or both methods, say an 
IntegrityError. The methods know how to handle this, catch the 
exceptions and adapt accordingly. They manually do session.rollback() 
and proceed within new transaction.


The problem is, if this happens in method_one, then readonly_model is 
expired and has no readonly_model.readonly_data when method two is 
called. From what I've read in the docs, I am supposed to do 
session.refresh(), but that means the caller must know what happened 
within the methods one and two (rollback happened).


One way would be to extract data from the readonly_model:

readonly_data = readonly_model.readonly_data

method_one(readonly_data, param_1, param_2, ...)
...

Now, I am certain that this readonly data will NOT change between two 
transactions, 100% sure it will never happen in the life of entire 
request / all transactions involved in the process, so basically 
extracting it from the session/model is safe, within the logic of the code.


Any suggestions/advices about this? Am I doing something completely 
wrong? What if the readonly_data was not 100% sure not to change? Is my 
only option to have the caller somehow know there was rollback involved 
and refresh the session?



Thanks!

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Properly handling sessions in rollback

2011-11-13 Thread Vlad K.




Hi, thanks for your reply.



On 11/13/2011 05:15 PM, Michael Bayer wrote:

 From what I've read in the docs, I am supposed to do session.refresh(),

No that's not required at all.  All the objects that are still referenced 
outside the session, stay present in the session and will reload their data 
when accessed.



But they're not, I'm getting Instance XY is not present in this Session, 
for readonly_model when method_two is called, if there was a rollback in 
method_one.


Now, what I forgot to say is that I'm doing this within a Pyramid 
application which uses ZODB Transactions, so I can't directly access 
session.commit() .rollback() or .savepoint(). I am not quite sure what 
else does Transaction do when it encounters a failed state, I've already 
asked on the Pyramid mailing list about having multiple transactions 
within the same request, waiting to resolve that issue.







What you will lose is whatever pending changes were present in method one, 
whatever data was flushed but not committed.

But when you say method one proceeds within a new transaction - doesn't that 
imply that it re-tried its full series of steps, so that the required state is apparent 
at the end of method one ?if method one is where the data is being set up for method 
two, it of course would have to re-establish that state for which method two is dependent.


The issue is basically this. At the end of a HTTP request, after all 
main data has been read, changed and flushed to db, I have to make 
certain statistical entries for several relations. Now, these 
statistical entries have current date as one part of the primary key, 
meaning each row is one DAY.


So the stats recording method (two calls beacuse there are two 
statistical models/tables) tries to insert date as new row, and if it 
fails (has already been set by another process) with IntegrityError, 
then it tries to update existing rows. I can't merge, because between 
merge's select and decision whether to insert or update another process 
may have created the entry, so I would STILL have the IntegrityError and 
have to retry the steps.


I know that his may not be a best approach, perhaps I should just have 
insert only statistical entries and then aggregate everything into daily 
pivot tables, but nevertheless I wish to understand and learn how to 
properly deal with rollbacks and retrying transactions.



Anyways, even if I catch IntegrityError, the transaction fails with
TransactionFailedError: An operation previously failed, with 
traceback.  Meanwhile (since posting this original question) I've 
learned that the error belongs to Zope and not SQLAlchemy per se, and 
this whole problem may be because of that.


I've also tried with savepoints so that the integrity error above would 
rollback only THAT attempt, but I still encounter the 
TransactionFailedError...


So I'm guessing that the method one and two can simply use savepoints 
and isolate their own failures from the outside transaction? That way 
I preserve logical isolation of the code, keeping in mind only that all 
data will be flushed with savepoint. Which then makes this problem 
coming from the Pyramid framework implementation with ZODB transaction.





One way would be to extract data from the readonly_model:

readonly_data = readonly_model.readonly_data

method_one(readonly_data, param_1, param_2, ...)

is the issue here attempting to avoid redundant SELECT of the data ?   
readonly_model.readonly_data will otherwise refresh itself when you access it.


No, only to avoid Instance not present in Session error that appears if 
method_one failed and I'm trying to use readonly_model again.




if it were me, I'd never be even trying to redo an operation that rolled 
back.99% of the time if my app has to rollback a transaction in the middle of an 
operation, that's a bug in the application no different from a crash.


But with IntegrityErrors there is no other way? Relying on this error 
keeps the atomicity within the MVCC of the database (PostgreSQL in my 
case) and not in the application (via merge, or select for update -- 
which btw can't be done for inserts).





If I absolutely needed to use the pattern of try to insert a record with a key, it 
might fail because it's unique, I'd use Session.begin_nested() so that the method 
in question in effect uses SAVEPOINT, gets its own transactional state that doesn't 
affect things outside of it.


Yes, see my comments above about savepoints.


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Properly handling sessions in rollback

2011-11-13 Thread Vlad K.


On 11/13/2011 06:16 PM, Michael Bayer wrote:
But they're not, I'm getting Instance XY is not present in this 
Session, for readonly_model when method_two is called, if there was a 
rollback in method_one.


That would indicate you add()-ed it during the transaction.  Any data 
that was created during the transaction is gone - that has to be 
regenerated.You should really run the whole series of steps 
completely when a transaction fails.


Which would mean I have to select the model again which I thought would 
be done with session.refresh() as I originally posited. But no data has 
been created in this case, I have single select and then attempted 
(failed) insert into another table using another Model altogether.


But, it doesn't matter really. The proper way is to do with savepoints 
and the caller should not care about what happens within callees as long 
as the callees isolate themselves using savepoints.




I am going to ask that you take this message to the Pyramid list, and 
tell them I sent you.  I just came back from Ploneconf and met with 
Chris, Lawrence, and everyone else who uses zope.transaction heavily. 
 I learned about it and gained new appreciation for it.  But my 
skepticism was all about, what about this use case ?.  So here's 
another example where SQLAlhcemy by itself doesn't have an issue, but 
zope.transaction either has a different way of working here, or 
doesn't yet handle this case. But assuming my suggestion below 
regarding SAVEPOINT isn't doable, it's a zope.transaction issue.




Yes, again thanks for joining the topic there. Savepoints are doable by 
zope.transaction I am just not sure to what extent. I can't seem to find 
any docs and I have to walk through the code and not all features are 
commented or docstring'd.



I'm assuming you're mixing the terms ZODB transaction and 
zope.transaction here, as its the latter which integrates with 
SQLAlchemy, from my understanding.  The ZODB is just one of many kinds 
of data sources that can participate in a zope.transaction.


Actually the Transaction package is part of ZODB, at least it is listed 
as such both in the PyPi and any available docs I managed to find. There 
is no zope.transaction package.


This is the package:

http://pypi.python.org/pypi/transaction/1.1.1

And its homepage link 404'd. The only available docs I managed to find are:

http://www.zodb.org/documentation/tutorial.html#transactions
http://www.zodb.org/documentation/guide/transactions.html?highlight=transaction


I suppose it interacts with

http://pypi.python.org/pypi/zope.sqlalchemy



Vlad


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Specifying foreign keys in relationship

2011-11-11 Thread Vlad K.


Hi,


I have two models, A and B. Model B contains two foreign keys into table 
A, because it is a comparator model that describes certain logical 
interaction between two A models. However, I want model B to contain a 
relationship to both so I can access them through the model B instance:



class ModelB(Base):
__tablename__ = ...

id_b = ... # primary
some_comparison_data = ...

main_model_id = Column(Integer, ForeignKey(models_a.id_a, 
ondelete=cascade, onupdate=cascade))
duplicate_model_id = Column(Integer, ForeignKey(models_a.id_a, 
ondelete=cascade, onupdate=cascade))


main_model = relationship(ModelA, foreign_keys=[ ??? ], 
lazy=joined)
duplicate_model = relationship(ModelA, foreign_keys=[ ??? ], 
lazy=joined)




ModelA has no keys back to B, this is basically a one-to-two 
relationship from B to A.


Now as you can see I don't know what to specify for foreign keys. I 
tried with foreign_keys=[main_model_id]and
foreign_keys=[duplicate_model_id]   but it complains it couldn't 
determine join condition and that I should use primaryjoin. Evenso I 
wouldn't know how to use it because I don't know what to specify:
class variable? a string?   The examples in the docs are not for 
declarative...



Thanks.

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Specifying foreign keys in relationship

2011-11-11 Thread Vlad K.


Ah, so, thanks. My logic was that I could specify which foreign_key to 
use for the relationship, which is basically a subset of primaryjoin 
condition, but in my opinion cleaner. So, that wouldn't work? I must 
use always use primaryjoin?




I was looking at few paragraphs below, under Multiple Relationships 
against the same Parent/Child, which is basically my situation, and 
shows a mapper based solution.


http://www.sqlalchemy.org/docs/orm/relationships.html#multiple-relationships-against-the-same-parent-child



Thanks for help!

.oO V Oo.


On 11/11/2011 07:32 PM, Michael Bayer wrote:

On Nov 11, 2011, at 3:46 AM, Vlad K. wrote:


Hi,


I have two models, A and B. Model B contains two foreign keys into table A, because it is 
a comparator model that describes certain logical interaction between two A 
models. However, I want model B to contain a relationship to both so I can access them 
through the model B instance:


class ModelB(Base):
__tablename__ = ...

id_b = ... # primary
some_comparison_data = ...

main_model_id = Column(Integer, ForeignKey(models_a.id_a, ondelete=cascade, 
onupdate=cascade))
duplicate_model_id = Column(Integer, ForeignKey(models_a.id_a, ondelete=cascade, 
onupdate=cascade))

main_model = relationship(ModelA, foreign_keys=[ ??? ], lazy=joined)
duplicate_model = relationship(ModelA, foreign_keys=[ ??? ], 
lazy=joined)



ModelA has no keys back to B, this is basically a one-to-two relationship 
from B to A.

I would classify this as two distinct many to one relationships from B to A.



Now as you can see I don't know what to specify for foreign keys.

You don't need to specify foreign_keys here as the Column objects on ModelB already have 
valid ForeignKey objects back to ModelA.   You do however need to specify 
primaryjoin for each relationship() here, as it is otherwise ambiguous how a 
join from ModelB to ModelA should proceed:

main_model = relationship(ModelA, 
primaryjoin=ModelA.id_a==ModelB.main_model_id)
duplicate_model = relationship(ModelA, 
primaryjoin=ModelA.id_a==ModelB.duplicate_model_id)



I tried with foreign_keys=[main_model_id]and
foreign_keys=[duplicate_model_id]   but it complains it couldn't determine join 
condition and that I should use primaryjoin. Evenso I wouldn't know how to use 
it because I don't know what to specify:class variable? a string?   The 
examples in the docs are not for declarative...

I'm always looking to improve the documentation.  In this case, all examples in the 
relationship documentation now use declarative for most examples, including 
primaryjoin:

http://www.sqlalchemy.org/docs/orm/relationships.html#specifying-alternate-join-conditions-to-relationship

It's also in the declarative extension docs:

http://www.sqlalchemy.org/docs/orm/extensions/declarative.html#configuring-relationships

Perhaps you were looking at older documentation ?




--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Impedance mismatch: too much queries?

2011-10-02 Thread Vlad K.


I have an impedance mismatch problem, namely a model in X categories, 
each having its own set of fields and all having a set of common fields. 
I solved that with:


1. master table, contains the sequenced pkey and common fields, as 
well as category discriminator
2. individual category tables with fields specific to that category, and 
foreign key pkey to the master table.


I've been reading on the Joined Table Inheritance and this seems like a 
right way to do it, however I'm not quite sure I can use SQLAlchemy's 
polymorphic patterns. Here's why.


When I create new model of certain category I of course know which 
category it is, so that's not the problem. But when the system queries 
the model (a model) by ID, it does not know in advance which category it 
belongs to. So whatever I do I am forced to do two queries, first to the 
master table and then to the proper category table. AFAIK this is the 
only way to do it unless there's some SQL trick that can join tablename 
dynamically based on column value (and I don't mean triggers)?


Anyways, the problem I'm having is when I query the whole set. Each 
master-category pair belongs to a user, and I have to query ALL the 
models that belong to the user. So I end up querying everything from the 
master table that belongs to that user, and then FOR EACH, load up the 
belonging category model from appropriate category table. This means 
that if I had 10k rows, I'd have to issue 10k selects, next to the 
master select to pull in all 10k master table rows (and from what I 
ready psycopg2 can't yield, it has to buffer them all).


Thankfully, these kinds of queries are not done on user demand (realtime 
web experience), but within an automated dispatch system nightly (dump 
to xml), but still I'm concerned that I'm doing something wrong if I 
have to issue 10k selects for single user, having few hundred users in 
the system!


Am I missing some design pattern here? Should I simply go big table 
denormalize?



The backend is PostgreSQL 9.0.



Thanks!


--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Impedance mismatch: too much queries?

2011-10-02 Thread Vlad K.


Ah, a solution just came to me. If I had 10 categories and denormalized 
the user_id into them, I can reverse the queries and do only 10 selects, 
filtered by user_id and joined with the master table on pkey. I'd end up 
with equal number of rows selected, but only 10 queries issued. The 
denormalized user_id would not pose any problems and I can cascade 
updates and deletes via master table, which is in cascade from the users 
table.


Thoughts?


.oO V Oo.


On 10/02/2011 04:12 PM, Vlad K. wrote:


I have an impedance mismatch problem, namely a model in X categories, 
each having its own set of fields and all having a set of common 
fields. I solved that with:


1. master table, contains the sequenced pkey and common fields, as 
well as category discriminator
2. individual category tables with fields specific to that category, 
and foreign key pkey to the master table.


I've been reading on the Joined Table Inheritance and this seems like 
a right way to do it, however I'm not quite sure I can use 
SQLAlchemy's polymorphic patterns. Here's why.


When I create new model of certain category I of course know which 
category it is, so that's not the problem. But when the system queries 
the model (a model) by ID, it does not know in advance which category 
it belongs to. So whatever I do I am forced to do two queries, first 
to the master table and then to the proper category table. AFAIK this 
is the only way to do it unless there's some SQL trick that can join 
tablename dynamically based on column value (and I don't mean triggers)?


Anyways, the problem I'm having is when I query the whole set. Each 
master-category pair belongs to a user, and I have to query ALL the 
models that belong to the user. So I end up querying everything from 
the master table that belongs to that user, and then FOR EACH, load up 
the belonging category model from appropriate category table. This 
means that if I had 10k rows, I'd have to issue 10k selects, next to 
the master select to pull in all 10k master table rows (and from what 
I ready psycopg2 can't yield, it has to buffer them all).


Thankfully, these kinds of queries are not done on user demand 
(realtime web experience), but within an automated dispatch system 
nightly (dump to xml), but still I'm concerned that I'm doing 
something wrong if I have to issue 10k selects for single user, having 
few hundred users in the system!


Am I missing some design pattern here? Should I simply go big table 
denormalize?



The backend is PostgreSQL 9.0.



Thanks!




--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] UNION in many-to-many relation

2011-09-23 Thread Vlad K.


Hi, thanks for your help! That works.

.oO V Oo.


On 09/19/2011 11:03 PM, Michael Bayer wrote:

On Sep 19, 2011, at 12:45 PM, Vlad K. wrote:


Hi!

I have a model, let's call it Resource. And another, let's call it Container. Each 
container can have any number of Resources, so they're in many-to-many relationship. 
What's worse, they're in association object pattern relationship because the 
association also carries the amount of particular Resource associated with teh Container.

Now, when a Container is viewed I need to construct a (html) table listing ALL 
resources available (simple select all from resources), but ordered so that those 
resources that are in the container (amount  0) are at the top (ordered desc), 
followed by the resources that are not in the container (implicitly amount = 0).

this is an ORDER BY derived from join.  SQL would be like:

select * from resource left outer join container_to_resource on 
resource.id=container_to_resource.resource_id and 
container_to_resource.container_id=my container id  order by 
coalesce(container_to_resource.count, 0)

ORM:

Session.query(Resource).\
outerjoin(ResourcesInContainers.resource).\
filter(ResourcesInContainers.container_id=mycontainer.id).\
order_by(func.coalesce(ResourcesInContainers.amount, 0))

I'd stay away from UNION as they are awkward and rarely needed - only if you 
have two disjoint selectables that really need to be in the same result.



--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Implementing a radio button behavior?

2011-09-08 Thread Vlad K.


Hi all!

I'm looking for a programming pattern to deal with a radio button like 
behavior, thatis in a group of rows, all with same group_id, only one 
can have a flag column set as true. Without using table triggers that 
make all rows in the group lose the flag if the updating row carries it 
(or doing it in the application), is there any pattern with SQLAlchemy 
that I'm looking for here?


As a by the way to this question, I've noticed that the order of 
queries given before flush() is not preserved for the flush(). Any way 
to enforce the order?




Thanks!

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Question on session.expunge.all()

2011-09-08 Thread Vlad K.


No, I can't truncate the table for other reasons, as I mentioned in my 
original question. :)


The issue here was not how to sync the data, but whether processed rows 
stay in session even though the objects (model instances) are discarded 
at the end of each iteration (each csv row), or in other words whether I 
have to expunge_all() or not.


It seems I don't have to (SQLAlchemy 0.7, PostgreSQL backend).


Thanks!

.oO V Oo.


On 09/08/2011 06:47 PM, Victor Olex wrote:

Since you are effectively overwriting the table with new file
contents, the fastest may well be to truncate the table then insert
all contents. If you were to just append and update then
session.merge() is convenient way to do this though I am not sure if
the fastest.


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Implementing a radio button behavior?

2011-09-08 Thread Vlad K.


For example the following:

row = session.query(Model).filter_by(pkey=pkey_value).first() or Model()
row.some_field = 123;
...


session.query(Model).filter_by(nonprimary_key=some_value).update({...}, 
false)

session.merge(row)
session.flush()

When flush() gets called, the merge() is executed (query sent to DB) 
before the update called above it, in this particular example.




.oO V Oo.


On 09/08/2011 04:37 PM, Michael Bayer wrote:

On Sep 8, 2011, at 9:32 AM, Vlad K. wrote:



As a by the way to this question, I've noticed that the order of queries 
given before flush() is not preserved for the flush(). Any way to enforce the order?

Trying to parse what this means.   Suppose you did a single SELECT, loaded five objects.  
Then changed them and did a flush.   What is the order of queries to be 
preserved?

Guessing, perhaps you mean, the order in which a particular object became 
present in the Session, that's the order in which UPDATE statements should be 
emitted.UPDATE statements are in fact ordered in terms of the primary key 
of the row.   The reason for this is to minimize the chance of deadlocks.
Process A and process B both need to update primary key 1 and 2 in a table.   
If process A starts with 1 and process B starts with 2, you have a deadlock.
So an ordering that is deterministic across transactions, where PK ordering is 
a pretty good assumption in most cases, is the best behavior here.

If you need UPDATE statements in a specific order, you can A. emit flush() specifically 
against a Session in which you're controlling what's dirty, B. use 
query.update(), C. use the Core SQL language instead of the ORM for this particular 
series of operations (though query.update() likely a happy medium).






--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Implementing a radio button behavior?

2011-09-08 Thread Vlad K.


Yes that's how I know the order of events. I just checked the logs again 
and put some sleep() between update() and merge(). It appears that the 
update() does some kind of implicit flush because that commits the 
dirtied properties of the row instance BEFORE the update is issued, so 
that when merge() comes, everything appears in sync to the session.


So, in short, I'm doing this:

1. select row or new
2. assign some values to it
3. issue an update to a group of other rows (this should happen FIRST)
4. merge the row with db, flush, commit. (this should happen SECOND)

What I'm getting:

1. SELECTed row data
2. UPDATE on changed values (implicit flush of dirtied session data)
3. UPDATE as expected in step 3 above
4. COMMIT



.oO V Oo.


On 09/08/2011 10:04 PM, Michael Bayer wrote:

On Sep 8, 2011, at 3:32 PM, Vlad K. wrote:


For example the following:

row = session.query(Model).filter_by(pkey=pkey_value).first() or Model()
row.some_field = 123;
...


session.query(Model).filter_by(nonprimary_key=some_value).update({...}, false)
session.merge(row)
session.flush()

When flush() gets called, the merge() is executed (query sent to DB) before the 
update called above it, in this particular example.

That isn't correct, query.update() emits UPDATE immediately.   Do you have a 
SQL log illustrating what is being emitted ?


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Implementing a radio button behavior?

2011-09-08 Thread Vlad K.


Yes, I earlier said it was merge() that took effect before update() 
because that's how it looked like (didn't know about autoflush). Putting 
a sleep before update() and merge() showed that merge() issued no SQL 
because the autoflush (as you say) of the update() practically synced 
the session with the database.


The update() does technically affect the row in teh database which is 
already selected and in session and dirtied, but not via primary key. 
What I'm doing is this:


1. select a row into session
2. assign some data to it (dirties it)
3. if this row's flag property is set to true, first set flag=false to 
all rows in the same group (the Update), this one included

4. now merge this row

Or via plain SQL:

1. SELECT ...
2. UPDATE tablename SET flag=false WHERE group_id=123;
3. UPDATE tablename SET flag=true, ... WHERE primary_key=456;


The end result is that only one row in the group can have the flag set 
to true. The blanket set flag=false is imho faster and cleaner than 
finding out which row in the group has the flag and then updating just 
that one row, before our main model row. No?


I thought that session was only tracking changes via primary key so it 
never occurred to me that session would realize it is holding a row 
that's about to be updated, so it issues a flush first... Or am I 
misunderstanding what is going on here?


Turning autoflush off did the trick and the updates are now in order. 
Many thanks for your help!




.oO V Oo.


On 09/09/2011 12:17 AM, Michael Bayer wrote:

On Sep 8, 2011, at 6:00 PM, Vlad K. wrote:


Yes that's how I know the order of events. I just checked the logs again and put some 
sleep() between update() and merge(). It appears that the update() does some kind of 
implicit flush because that commits the dirtied properties of the row 
instance BEFORE the update is issued, so that when merge() comes, everything appears in 
sync to the session.

that's autoflush, which is part of the update() (earlier you said the merge() 
was taking effect before the update()).Its a little strange to change an 
attribute on an object that dirties it for update, then manually do an update() 
that affects the same object - is it the same attribute you're trying to update 
there ?

Anyway, turn off autoflush.   Here's some recipes to do that as needed:

http://www.sqlalchemy.org/trac/wiki/UsageRecipes/DisableAutoflush






--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Question on session.expunge.all()

2011-09-07 Thread Vlad K.


Great, thanks!


.oO V Oo.


On 09/06/2011 04:48 PM, Michael Bayer wrote:

On Sep 6, 2011, at 10:40 AM, Vlad K. wrote:


I have a products database which is daily syncronized with an external source 
via a csv file. There are several thousand rows in question. The 
synchronization does two things:

1. Update only price if changed for existing products
2. Insert new products if they don't exist with all fields from csv

But basically, for each row in the csv, after the row is processed (one of the 
above two things is done), I don't need the object in session anymore. Memory 
and performance are of course an issue, and I can't find a way to test memory 
consumption with or without expunge_all() so my questions are:

1. Do I need to session.expunge_all() after each csv row is processed, or are 
they automatically garbage collected?
2. Is there any significant overhead inherent in expunge_all() that I'm not 
seeing right now?

Performance-wise, it seems the task is complete in more or less same time with 
or without expunge_all()

In modern SQLAlchemy, the Session maintains only weak references to objects that are 
clean, that is, are persistent in the database and have no pending changes to 
be flushed.As all references to them are lost, they are garbage collected by the 
Python interpreter.Note that objects are strongly referenced when they are present in 
the collection or attribute of a parent object, until that parent is also garbage 
collected.There is an overhead to process which occurs when the object is 
dereferenced and removed from the session (weakref callbacks handle the accounting).  But 
calling expunge_all() probably isn't doing much here as the objects are likely being 
cleaned out in the same way regardless.


While I'm at it, I also need to delete rows in the database that do not have 
corresponding row in the csv file (say linked by csv_key field), the first 
solution that comes to mind is building a list of keys in the csv file (few 
thousand keys) and then doing:

session.query(Product).filter(not_(Product.product_id.in_(csv_keys))).delete()

I believe there is less overhead in sending such a large (but single!) query to 
the database and leaving it to determine what to delete by itself, than 
selecting each row in the database and checking if its csv_key exists in the 
csv_keys list on the application side and then issuing delete statements for 
rows that matched the criteria. Am I wrong?

That's definitely a dramatically faster way to do things, rather than to load each record 
individually and mark as deleted - it's the primary reason delete() and update() are 
there.   You'll probably want to send False as the value of 
synchronize_session to the delete() call so that it doesn't go through the effort of 
locating local records that were affected (unless you need that feature).



--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Question on session.expunge.all()

2011-09-06 Thread Vlad K.


I have a products database which is daily syncronized with an external 
source via a csv file. There are several thousand rows in question. The 
synchronization does two things:


1. Update only price if changed for existing products
2. Insert new products if they don't exist with all fields from csv

But basically, for each row in the csv, after the row is processed (one 
of the above two things is done), I don't need the object in session 
anymore. Memory and performance are of course an issue, and I can't find 
a way to test memory consumption with or without expunge_all() so my 
questions are:


1. Do I need to session.expunge_all() after each csv row is processed, 
or are they automatically garbage collected?
2. Is there any significant overhead inherent in expunge_all() that I'm 
not seeing right now?


Performance-wise, it seems the task is complete in more or less same 
time with or without expunge_all()




While I'm at it, I also need to delete rows in the database that do not 
have corresponding row in the csv file (say linked by csv_key field), 
the first solution that comes to mind is building a list of keys in the 
csv file (few thousand keys) and then doing:


session.query(Product).filter(not_(Product.product_id.in_(csv_keys))).delete()

I believe there is less overhead in sending such a large (but single!) 
query to the database and leaving it to determine what to delete by 
itself, than selecting each row in the database and checking if its 
csv_key exists in the csv_keys list on the application side and then 
issuing delete statements for rows that matched the criteria. Am I wrong?


(I can't truncate table and reinsert products for other reasons)


The database is Postgresql.


Thanks!


--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Re: Building hierarchy tree in reverse

2011-08-02 Thread Vlad K.


Yes I'm using PostgreSQL and now that you've linked to the docs, I 
remember there was a possibility for recursion. Thanks for suggestion, 
I'll look into it.


.oO V Oo.


On 08/02/2011 03:41 AM, Gunnlaugur Briem wrote:
You could look for recursive CTE (Common Table Expressions), if your 
database engine supports such queries. See e.g. 
http://www.postgresql.org/docs/8.4/static/queries-with.html for 
PostgreSQL. That allows arbitrary-depth queries, as opposed to join 
chains that have to assume a fixed depth. You could probably apply two 
recursive queries, one downward and one upward from the given node, to 
avoid querying the whole tree.


SQLAlchemy has no support for CTEs directly, though of course you can 
construct the query manually and execute and fetch results through 
SQLAlchemy. You *can* get some support for recursive queries 
under SQLAlchemy in https://github.com/marplatense/sqla_hierarchy/ but 
be warned, that project is ... youthful :)


Regards,

- Gulli

--
You received this message because you are subscribed to the Google 
Groups sqlalchemy group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/g7-7S4mBC3wJ.

To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.


--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Building hierarchy tree in reverse

2011-08-01 Thread Vlad K.


Hi.

I have a problem and am not sure where to begin. I need to construct a 
hierarchy tree, something like adjacency_list but in reverse. More 
precisely, I need entire branch but only the branch containing given 
node ID. In practice, I need this for a product category tree menu which 
shows items in the currently selected branch only (where for example I 
don't need children of bba, or ba, or A because node ID is not in their 
branches):


A
B
ba
bb
bba
bbb
bbc - this node id is given at first
bbca
bbcb
bbcd
bc
bd
C
D


Basically, the way I see it, I need to:

1. Find node by ID
2. Find node's children
3. Find node's siblings
4. Node's parent becomes node, repeat from step 3 as long as there's a 
parent


The table is constructed with parent_id foreign key reference to itself, 
and I can fetch entire tree at level X using joinedload_all as given in 
this example:


http://www.sqlalchemy.org/trac/browser/examples/adjacency_list/adjacency_list.py

I have an idea how to do it manually but I was wondering if there is a 
feature of SQLAlchemy I could use. I'd google for any similar problems 
or implementations since I don't think this is an uncommon problem, but 
I'm not sure what to look for.


Thanks!

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



[sqlalchemy] Managing one-to-one relation?

2011-07-05 Thread Vlad K.


Hi.

I have a situation where I have X data models, and while each has its 
own properties, they all share a set of fields with common meaning, like 
id, title, description, and some others.


What way would you recommend to approach this problem? If I wasn't using 
SQLAlchemy, I'd have one main table with serial pkey and several 
subtables (not using database inheritance functionality) in one to one 
relation via the pkey. The backend is Postgres.



Thanks!

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.



Re: [sqlalchemy] Managing one-to-one relation?

2011-07-05 Thread Vlad K.


Many thanks, that's exactly what I need!

.oO V Oo.


On 07/05/2011 10:13 PM, Conor wrote:

SQLAlchemy can model this via joined table inheritance:
http://www.sqlalchemy.org/docs/orm/inheritance.html#joined-table-inheritance.
You are still responsible for setting up the tables in the way you
described, but SQLAlchemy will take care of the rest.

-Conor



--
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.