[sqlalchemy] Referencing DELETE table from nested statement
Hi, How can I reference a table that I am deleting from in an inner/nested statement? Here is my simplified oritinal version, no inner condition: rownum = sql.expression.over(sql.func.row_number(), partition_by=table.c.some_id, order_by=table.c.dw_created_date.desc()) rownum = rownum.label(rownum) condition = table.c.dw_valid_to_date == None primary = sql.expression.select([table.c.dw_some_key, rownum]) primary = primary.where(condition).alias(primary) condition = primary.c.rownum == 1 unique = sql.expression.select([primary.c.dw_some_key]) unique = unique.where(condition).alias(unique) condition = table.c.dw_some_key.notin_(unique) delete = sql.expression.delete(table, whereclause=condition) Which produces something like the following (manually formatted + added comments about my intention): DELETE FROM foo a WHERE a.dw_key NOT IN ( SELECT primary.dw_key FROM ( SELECT foo.dw_key AS dw_key, row_number() OVER (PARTITION BY foo.some_id ORDER BY foo.dw_created_date DESC) AS rownum FROM foo WHERE foo.dw_valid_to_date IS NULL -- Need to add this: -- AND a.some_id = foo.some_id ) AS primary WHERE primary.rownum = 1 ) I want to add the condition commented-out in the above statement. My attempt was to alias the table: Aliased version with inner condition: inner = table.alias(inner) rownum = sql.expression.over(sql.func.row_number(), partition_by=inner.c.some_id, order_by=inner.c.dw_created_date.desc()) rownum = rownum.label(rownum) condition = inner.c.dw_valid_to_date == None condition = and_(inner.c.dw_valid_to_date == None, inner.c.some_id == table.c.some_id) primary = sql.expression.select([inner.c.dw_some_key, rownum]) primary = primary.where(condition).alias(primary) # Select unique and filter non-unique # condition = primary.c.rownum == 1 unique = sql.expression.select([primary.c.dw_some_key]) unique = unique.where(condition).alias(unique) condition = table.c.dw_some_key.notin_(unique) delete = sql.expression.delete(table, whereclause=condition) However this produces a product of the inner statement and the original table: DELETE FROM footable WHERE footable.dw_some_key NOT IN ( SELECT primary.dw_some_key FROM ( SELECT inner.dw_some_key AS dw_some_key, row_number() OVER (PARTITION BY inner.some_id ORDER BY inner.dw_created_date DESC) AS rownum # # Here is the problem introduced by the alias: # FROM footable AS inner, footable WHERE inner.dw_valid_to_date IS NULL AND inner.some_id = footable.some_id) AS primary WHERE primary.rownum = 1 You can see that the inner condition is not referencing the table used in the DELETE statement but a new reference to the table. How can I construct a statement where I can reference the tables as I need to in my case? Thanks for any hints, Stefan -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Referencing DELETE table from nested statement
why don’t you use a NOT EXISTS correlated subquery? that way the subquery can refer to the outer table completely, rather than having just one column where you can call IN. On Dec 3, 2014, at 12:36 PM, Stefan Urbanek stefan.urba...@gmail.com wrote: Hi, How can I reference a table that I am deleting from in an inner/nested statement? Here is my simplified oritinal version, no inner condition: rownum = sql.expression.over(sql.func.row_number(), partition_by=table.c.some_id, order_by=table.c.dw_created_date.desc()) rownum = rownum.label(rownum) condition = table.c.dw_valid_to_date == None primary = sql.expression.select([table.c.dw_some_key, rownum]) primary = primary.where(condition).alias(primary) condition = primary.c.rownum == 1 unique = sql.expression.select([primary.c.dw_some_key]) unique = unique.where(condition).alias(unique) condition = table.c.dw_some_key.notin_(unique) delete = sql.expression.delete(table, whereclause=condition) Which produces something like the following (manually formatted + added comments about my intention): DELETE FROM foo a WHERE a.dw_key NOT IN ( SELECT primary.dw_key FROM ( SELECT foo.dw_key AS dw_key, row_number() OVER (PARTITION BY foo.some_id ORDER BY foo.dw_created_date DESC) AS rownum FROM foo WHERE foo.dw_valid_to_date IS NULL -- Need to add this: -- AND a.some_id = foo.some_id ) AS primary WHERE primary.rownum = 1 ) I want to add the condition commented-out in the above statement. My attempt was to alias the table: Aliased version with inner condition: inner = table.alias(inner) rownum = sql.expression.over(sql.func.row_number(), partition_by=inner.c.some_id, order_by=inner.c.dw_created_date.desc()) rownum = rownum.label(rownum) condition = inner.c.dw_valid_to_date == None condition = and_(inner.c.dw_valid_to_date == None, inner.c.some_id == table.c.some_id) primary = sql.expression.select([inner.c.dw_some_key, rownum]) primary = primary.where(condition).alias(primary) # Select unique and filter non-unique # condition = primary.c.rownum == 1 unique = sql.expression.select([primary.c.dw_some_key]) unique = unique.where(condition).alias(unique) condition = table.c.dw_some_key.notin_(unique) delete = sql.expression.delete(table, whereclause=condition) However this produces a product of the inner statement and the original table: DELETE FROM footable WHERE footable.dw_some_key NOT IN ( SELECT primary.dw_some_key FROM ( SELECT inner.dw_some_key AS dw_some_key, row_number() OVER (PARTITION BY inner.some_id ORDER BY inner.dw_created_date DESC) AS rownum # # Here is the problem introduced by the alias: # FROM footable AS inner, footable WHERE inner.dw_valid_to_date IS NULL AND inner.some_id = footable.some_id) AS primary WHERE primary.rownum = 1 You can see that the inner condition is not referencing the table used in the DELETE statement but a new reference to the table. How can I construct a statement where I can reference the tables as I need to in my case? Thanks for any hints, Stefan -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] Handling big Python objects
Hello list, sorry for the possibly noob question, I've googled around without much success looking for an answer. Basically, I am given a series of this huge Python class (a Simulation object), which contains an enormous amount of information - when I cPickle it (with highest protocol), it can result to files 200-250 MB in size, although rarely it can get up to 2 GB. I am looking for intelligent ways to store these objects into a database. I have to say that I don't have that much control on this Simulation class, so I can't change its internal structure - I'm just looking for a better alternative to what I am doing. So, what I am doing now is basically storing this huge object as a string. I have these two methods: import cPickle import zlib import base64 def serialize(my_simulation): my_db_object = base64.b64encode(zlib.compress(cPickle.dumps(obj, cPickle.HIGHEST_PROTOCOL))) return my_db_object def deserialize(my_db_object): my_simulation = cPickle.loads(zlib.decompress(base64.b64decode(my_db_object))) return simulation I can use them to store/retrieve this big Python classes to/from the database, but I feel it's not a particularly effective way to handle this problem. I've tried to get my head around BLOBs and LargeBinary stuff, but I'm not sure I'm on the right path here. I appreciate any suggestion on how to approach the problem, to make the storing/retrieving of these objects a bit less time/memory consuming (especially time). On a related note, I am now using MySQL as a backend - but I am open to suggestions about possible alternatives that may make this problem more tractable: I have some difficulties in installing other backends (like PostgreSQL or psycopg2), but I know I can use Access and Oracle backends. I know that Access is not particularly famous in the SQLAlchemy world. Of course, if any other backend has advantages over MySQL in dealing with this issue, I may try to convince IT to get it installed on our machines. All suggestions and comments are most welcome. Thank you in advance for your help. Andrea. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Handling big Python objects
I would recommend just storing them on disk and let the OS VMM deal with caching for speed. If you are not constrained for space I would recommend not zlib-ing it either. AM On 12/3/14 1:18 PM, Andrea Gavana wrote: Hello list, sorry for the possibly noob question, I've googled around without much success looking for an answer. Basically, I am given a series of this huge Python class (a Simulation object), which contains an enormous amount of information - when I cPickle it (with highest protocol), it can result to files 200-250 MB in size, although rarely it can get up to 2 GB. I am looking for intelligent ways to store these objects into a database. I have to say that I don't have that much control on this Simulation class, so I can't change its internal structure - I'm just looking for a better alternative to what I am doing. So, what I am doing now is basically storing this huge object as a string. I have these two methods: import cPickle import zlib import base64 def serialize(my_simulation): my_db_object = base64.b64encode(zlib.compress(cPickle.dumps(obj, cPickle.HIGHEST_PROTOCOL))) return my_db_object def deserialize(my_db_object): my_simulation = cPickle.loads(zlib.decompress(base64.b64decode(my_db_object))) return simulation I can use them to store/retrieve this big Python classes to/from the database, but I feel it's not a particularly effective way to handle this problem. I've tried to get my head around BLOBs and LargeBinary stuff, but I'm not sure I'm on the right path here. I appreciate any suggestion on how to approach the problem, to make the storing/retrieving of these objects a bit less time/memory consuming (especially time). On a related note, I am now using MySQL as a backend - but I am open to suggestions about possible alternatives that may make this problem more tractable: I have some difficulties in installing other backends (like PostgreSQL or psycopg2), but I know I can use Access and Oracle backends. I know that Access is not particularly famous in the SQLAlchemy world. Of course, if any other backend has advantages over MySQL in dealing with this issue, I may try to convince IT to get it installed on our machines. All suggestions and comments are most welcome. Thank you in advance for your help. Andrea. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Handling big Python objects
What do you want to be able to do with the objects? I'd recommed storing them as files and then referencing the filename in the database. Ed On 12/03/2014 04:18 PM, Andrea Gavana wrote: Hello list, sorry for the possibly noob question, I've googled around without much success looking for an answer. Basically, I am given a series of this huge Python class (a Simulation object), which contains an enormous amount of information - when I cPickle it (with highest protocol), it can result to files 200-250 MB in size, although rarely it can get up to 2 GB. I am looking for intelligent ways to store these objects into a database. I have to say that I don't have that much control on this Simulation class, so I can't change its internal structure - I'm just looking for a better alternative to what I am doing. So, what I am doing now is basically storing this huge object as a string. I have these two methods: import cPickle import zlib import base64 def serialize(my_simulation): my_db_object = base64.b64encode(zlib.compress(cPickle.dumps(obj, cPickle.HIGHEST_PROTOCOL))) return my_db_object def deserialize(my_db_object): my_simulation = cPickle.loads(zlib.decompress(base64.b64decode(my_db_object))) return simulation I can use them to store/retrieve this big Python classes to/from the database, but I feel it's not a particularly effective way to handle this problem. I've tried to get my head around BLOBs and LargeBinary stuff, but I'm not sure I'm on the right path here. I appreciate any suggestion on how to approach the problem, to make the storing/retrieving of these objects a bit less time/memory consuming (especially time). On a related note, I am now using MySQL as a backend - but I am open to suggestions about possible alternatives that may make this problem more tractable: I have some difficulties in installing other backends (like PostgreSQL or psycopg2), but I know I can use Access and Oracle backends. I know that Access is not particularly famous in the SQLAlchemy world. Of course, if any other backend has advantages over MySQL in dealing with this issue, I may try to convince IT to get it installed on our machines. All suggestions and comments are most welcome. Thank you in advance for your help. Andrea. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Handling big Python objects
On Wednesday, December 3, 2014 4:23:31 PM UTC-5, Ams Fwd wrote: I would recommend just storing them on disk and let the OS VMM deal with caching for speed. If you are not constrained for space I would recommend not zlib-ing it either. I'll second storing them to disk. Large object support in all the databases is a pain and not very optimal. Just pickle/unpickle a file and use the db to manage that file. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Handling big Python objects
On Wednesday, December 3, 2014 10:42:27 PM UTC+1, Jonathan Vanasco wrote: On Wednesday, December 3, 2014 4:23:31 PM UTC-5, Ams Fwd wrote: I would recommend just storing them on disk and let the OS VMM deal with caching for speed. If you are not constrained for space I would recommend not zlib-ing it either. I'll second storing them to disk. Large object support in all the databases is a pain and not very optimal. Just pickle/unpickle a file and use the db to manage that file. Thanks to all of you who replied. A couple of issues that I'm sure I will encounter by letting the files on disk: 1. Other users can easily delete/overwrite/rename the files on disk, which is something we really, really do not want; 2. The whole point of a database was to have everything centralized in one place, not leaving the simulation files scattered around like a mess in the whole network drive; 3. As an aside, not zlib-ing the files saves about 5 seconds/simulation (over a 20 seconds save) but increases the database size by 4 times. I'll have to check if this is OK. Thank you again for your interest. Andrea. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Handling big Python objects
On Wed, Dec 3, 2014 at 7:23 PM, Andrea Gavana andrea.gav...@gmail.com wrote: 3. As an aside, not zlib-ing the files saves about 5 seconds/simulation (over a 20 seconds save) but increases the database size by 4 times. I'll have to check if this is OK. You can usually specify a compression level of 1 to get a considerable speedup if you care about speed more than disk efficiency. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
[sqlalchemy] How to handle dynamically-changing tables (gaining and losing columns)
I'm creating a web application where the underlying tables will be changing during the operation of the application. Each user is registered to an agency. Each agency has their own schema of data tables. Users log in to edit their data within their agency's schema. Example: my_agency_schema: - table_one - col_a - col_b - table_two - table_three The initial version of the application was straight-forward - all tables and columns were static, hard-coded. Any given table always had the same columns. But now we want to give the user the ability to add or remove certain columns. Let's say a user adds a new column to table_one - how does the SQLAlchemy model now know to look for that? *How do I keep my SQLAlchemy models in line with the changes happening to the tables* (or is it even possible)? I tested using database reflection, and got it working in sort of a hacky workaround that I am not proud of. With my hack, each CRUD request generates a new SQLAlchemy model that uses introspection to reflect the columns of the table, then uses an instance of that model to perform the update. It is fairly resource-intensive to reflect the table each time, so I really don't think I should go this route. I considered using a table (or cache like Redis) to keep track of what columns each table of each agency has so that the table doesn't need to be reflected each time, but that sounds like a large (and somewhat messy) undertaking, and I want to be sure what I implement actually makes sense to do first. To put it simply, I want to know what makes sense to do in a situation like this, where table columns are being added and removed during program execution. I'm not sure if SQLAlchemy is meant to be used for a problem like this. How would you handle such a problem? -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Referencing DELETE table from nested statement
Thanks, I used exists (reversed condition): condition = primary.c.rownum != 1 unique = sql.expression.exists([1], whereclause=condition) delete = sql.expression.delete(table, whereclause=unique) I'm still getting: FROM footable AS inner, footable for the inner-most statement. Note that I have deeper nesting – the primary is already nested, because I can't use window functions in the WHERE clause. Note that I need to have the inner.key = to_be_deleted.key condition in the inner-most SELECT, not in the EXISTS portion. See my 'condition' for 'promary' in my example above (there is one stray line, that should be ignored). Or how do you meant to use the the exists() construct? Stefan On Wednesday, December 3, 2014 2:05:32 PM UTC-5, Michael Bayer wrote: why don’t you use a NOT EXISTS correlated subquery? that way the subquery can refer to the outer table completely, rather than having just one column where you can call IN. On Dec 3, 2014, at 12:36 PM, Stefan Urbanek stefan@gmail.com javascript: wrote: Hi, How can I reference a table that I am deleting from in an inner/nested statement? Here is my simplified oritinal version, no inner condition: rownum = sql.expression.over(sql.func.row_number(), partition_by=table.c.some_id, order_by=table.c.dw_created_date.desc()) rownum = rownum.label(rownum) condition = table.c.dw_valid_to_date == None primary = sql.expression.select([table.c.dw_some_key, rownum]) primary = primary.where(condition).alias(primary) condition = primary.c.rownum == 1 unique = sql.expression.select([primary.c.dw_some_key]) unique = unique.where(condition).alias(unique) condition = table.c.dw_some_key.notin_(unique) delete = sql.expression.delete(table, whereclause=condition) Which produces something like the following (manually formatted + added comments about my intention): DELETE FROM foo a WHERE a.dw_key NOT IN ( SELECT primary.dw_key FROM ( SELECT foo.dw_key AS dw_key, row_number() OVER (PARTITION BY foo.some_id ORDER BY foo.dw_created_date DESC) AS rownum FROM foo WHERE foo.dw_valid_to_date IS NULL -- Need to add this: -- AND a.some_id = foo.some_id ) AS primary WHERE primary.rownum = 1 ) I want to add the condition commented-out in the above statement. My attempt was to alias the table: Aliased version with inner condition: inner = table.alias(inner) rownum = sql.expression.over(sql.func.row_number(), partition_by=inner.c.some_id, order_by=inner.c.dw_created_date.desc()) rownum = rownum.label(rownum) condition = inner.c.dw_valid_to_date == None condition = and_(inner.c.dw_valid_to_date == None, inner.c.some_id == table.c.some_id) primary = sql.expression.select([inner.c.dw_some_key, rownum]) primary = primary.where(condition).alias(primary) # Select unique and filter non-unique # condition = primary.c.rownum == 1 unique = sql.expression.select([primary.c.dw_some_key]) unique = unique.where(condition).alias(unique) condition = table.c.dw_some_key.notin_(unique) delete = sql.expression.delete(table, whereclause=condition) However this produces a product of the inner statement and the original table: DELETE FROM footable WHERE footable.dw_some_key NOT IN ( SELECT primary.dw_some_key FROM ( SELECT inner.dw_some_key AS dw_some_key, row_number() OVER (PARTITION BY inner.some_id ORDER BY inner.dw_created_date DESC) AS rownum # # Here is the problem introduced by the alias: # FROM footable AS inner, footable WHERE inner.dw_valid_to_date IS NULL AND inner.some_id = footable.some_id) AS primary WHERE primary.rownum = 1 You can see that the inner condition is not referencing the table used in the DELETE statement but a new reference to the table. How can I construct a statement where I can reference the tables as I need to in my case? Thanks for any hints, Stefan -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com javascript:. To post to this group, send email to sqlal...@googlegroups.com javascript:. Visit this group at
Re: [sqlalchemy] Handling big Python objects
On 12/3/14 2:23 PM, Andrea Gavana wrote: On Wednesday, December 3, 2014 10:42:27 PM UTC+1, Jonathan Vanasco wrote: On Wednesday, December 3, 2014 4:23:31 PM UTC-5, Ams Fwd wrote: I would recommend just storing them on disk and let the OS VMM deal with caching for speed. If you are not constrained for space I would recommend not zlib-ing it either. I'll second storing them to disk. Large object support in all the databases is a pain and not very optimal. Just pickle/unpickle a file and use the db to manage that file. Thanks to all of you who replied. A couple of issues that I'm sure I will encounter by letting the files on disk: 1. Other users can easily delete/overwrite/rename the files on disk, which is something we really, really do not want; If this is windows group policies are your friends :). If this is linux, permissions with a secondary service to access the files are a decent choice. 2. The whole point of a database was to have everything centralized in one place, not leaving the simulation files scattered around like a mess in the whole network drive; The last time I did it a post processing step in my data pipeline organized the files based on a multi-level folder structure based on the first x-characters of their sha1. 3. As an aside, not zlib-ing the files saves about 5 seconds/simulation (over a 20 seconds save) but increases the database size by 4 times. I'll have to check if this is OK. To use compression or not depends on your needs. If the difference in time consumed is so stark, I would highly recommend compression. HTH AM Thank you again for your interest. Andrea. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Handling big Python objects
Hi, On Thursday, December 4, 2014 12:02:42 AM UTC+1, Ams Fwd wrote: On 12/3/14 2:23 PM, Andrea Gavana wrote: On Wednesday, December 3, 2014 10:42:27 PM UTC+1, Jonathan Vanasco wrote: On Wednesday, December 3, 2014 4:23:31 PM UTC-5, Ams Fwd wrote: I would recommend just storing them on disk and let the OS VMM deal with caching for speed. If you are not constrained for space I would recommend not zlib-ing it either. I'll second storing them to disk. Large object support in all the databases is a pain and not very optimal. Just pickle/unpickle a file and use the db to manage that file. Thanks to all of you who replied. A couple of issues that I'm sure I will encounter by letting the files on disk: 1. Other users can easily delete/overwrite/rename the files on disk, which is something we really, really do not want; If this is windows group policies are your friends :). If this is linux, permissions with a secondary service to access the files are a decent choice. Yes, this is Windows, but no, I can't go around and tell the users that the simulation they just saved is not accessible anymore. The database is part of a much larger user interface application, where users can generate simulations and then decide whether or not they are relevant enough to be stored in the database. At a rate of 300 MB per simulation (or more), it gets quickly to the size issue. 2. The whole point of a database was to have everything centralized in one place, not leaving the simulation files scattered around like a mess in the whole network drive; The last time I did it a post processing step in my data pipeline organized the files based on a multi-level folder structure based on the first x-characters of their sha1. Again, I am dealing with non-Python people - and in general with people who are extremely good at what they do but they don't care about the overall IT architecture - as long as it works and it is recognizable from a Windows Explorer point of view. A file-based approach is unfortunately not a good option in the current setup. 3. As an aside, not zlib-ing the files saves about 5 seconds/simulation (over a 20 seconds save) but increases the database size by 4 times. I'll have to check if this is OK. To use compression or not depends on your needs. If the difference in time consumed is so stark, I would highly recommend compression. I will probably go that way, 5 seconds more or less do not make that much of a difference overall. I just wish the backends didn't complain when I pass them cPickled objects (bytes instead of strings...). Andrea. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Handling big Python objects
If you need to handle large objects, you should look into DBAPI drivers that can stream results. The only drivers i know that can handle a stream are psycopg2 (postgres) and oursql (mysql). There have been a handful of recipes/threads of people using streams for blobs in the archives. I would still do a filesystem based approach. You wouldn't have to b64 your files, and could use external tools to compress (ie, bzip2 which would take longer, but compresses more). Your users should never know where this data is, it should be treated as a hidden/secure storage area that only the app uses -- just like your mysql. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.