Re: [sqlalchemy] storing a large file into a LargeBinary question
Thanks for the feedback Michael. Lots of good information in there. I will read up on buffer() and memoryview() and also on custom SQLAlchemy types. Thanks again, Andre On 13-02-12 04:33 PM, Michael Bayer wrote: On Feb 12, 2013, at 3:22 PM, Andre Charbonneau andre.charbonn...@nrc-cnrc.gc.ca wrote: Greetings everyone, I have a piece of code in a web app where I need to store a large binary file (uploaded file stored on disk by Apache server), into an object's LargeBinary attribute. That's pretty easy to do with a syntax like: myobject.uploaded_file = xyz.file.read() The problem is that I don't want to load the entire file into memory when I set the LargeBinary attribute. If my understanding is correct, the above call will first cause the entire content of the uploaded file to be loaded into memory and then that is assigned to the myobject.uploaded_file LargeBinary attribute. Correct? (Then when sqlalchemy eventually issues the INSERT statement to store the data in the database... But then I don't really know how the data transfer is done...) I have tried to find another way of passing the data to the LargeBinary object that would not have to load the entire file into memory at once, but stream it in chunks during the INSERT statement, but I was not able to find anything. :-( In the old days these streaming binary interfaces were common, but as memory has become plentiful you don't see them used anymore. Even systems like Oracle, you see client libraries reliant upon having to set the allowed size of memory to be bigger than the largest value you need to store. psycopg2 does work with buffer() and memoryview() objects as the input to a bytea column, and you could send these in as arguments where SQLAlchemy should pass them through (or if not, its easy to make a custom type that passes it through). Though these objects don't appear to work around having to load the data into memory, they just make memory usage more efficient by removing the need for it to be copied internally. I'm not intimately familiar with them enough to know if they support some way to stream from a file handle or not. There's also a facility I've not previously heard of in Postgresql and psycopg2 called the large object system, which appears to be an entirely separate table pg_largeobject that stores them. Dropping into psycopg2, you can store and retrieve these objects using the object interface: http://initd.org/psycopg/docs/connection.html#connection.lobject as far as how to get that data into your table, it seems like you'd need to link to the OID of your large object, rather than using bytea: http://www.postgresql.org/docs/current/static/lo-funcs.html . So you'd need to forego the usage of bytea. Again SQLAlchemy types could be created which transparently perform these tasks against the OID. I'd ask on the psycopg2 list as to which feature they recommend, and I'm betting they will likely say that memory is very cheap and plentiful these days and you should just assume the data will be fit into memory. -- André Charbonneau Research Computing Support Analyst Shared Services Canada | National Research Council Canada Services partagés Canada | Conseil national de recherches Canada 100 Sussex Drive | 100, promenade Sussex Ottawa, Ontario K1A 0R6 Canada andre.charbonn...@ssc-spc.gc.ca Telephone | Téléphone: 613-993-3129 -- André Charbonneau Research Computing Support Analyst Shared Services Canada | National Research Council Canada Services partagés Canada | Conseil national de recherches Canada 100 Sussex Drive | 100, promenade Sussex Ottawa, Ontario K1A 0R6 Canada andre.charbonn...@ssc-spc.gc.ca Telephone | Téléphone: 613-993-3129 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
[sqlalchemy] storing a large file into a LargeBinary question
Greetings everyone, I have a piece of code in a web app where I need to store a large binary file (uploaded file stored on disk by Apache server), into an object's LargeBinary attribute. That's pretty easy to do with a syntax like: myobject.uploaded_file = xyz.file.read() The problem is that I don't want to load the entire file into memory when I set the LargeBinary attribute. If my understanding is correct, the above call will first cause the entire content of the uploaded file to be loaded into memory and then that is assigned to the myobject.uploaded_file LargeBinary attribute. Correct? (Then when sqlalchemy eventually issues the INSERT statement to store the data in the database... But then I don't really know how the data transfer is done...) I have tried to find another way of passing the data to the LargeBinary object that would not have to load the entire file into memory at once, but stream it in chunks during the INSERT statement, but I was not able to find anything. :-( Anyone managed to implement something like this before, or know where I can read some more info about possible ways of doing this with sqlalchemy? Thanks a lot, Andre -- André Charbonneau Research Computing Support Analyst Shared Services Canada | National Research Council Canada Services partagés Canada | Conseil national de recherches Canada 100 Sussex Drive | 100, promenade Sussex Ottawa, Ontario K1A 0R6 Canada andre.charbonn...@ssc-spc.gc.ca Telephone | Téléphone: 613-993-3129 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] storing a large file into a LargeBinary question
One piece of extra information I forgot to mention in my original email is that the myobject.uploaded_file LargeBinary column attribute maps to a Postgresql bytea column. Best regards, Andre On 13-02-12 03:22 PM, Andre Charbonneau wrote: Greetings everyone, I have a piece of code in a web app where I need to store a large binary file (uploaded file stored on disk by Apache server), into an object's LargeBinary attribute. That's pretty easy to do with a syntax like: myobject.uploaded_file = xyz.file.read() The problem is that I don't want to load the entire file into memory when I set the LargeBinary attribute. If my understanding is correct, the above call will first cause the entire content of the uploaded file to be loaded into memory and then that is assigned to the myobject.uploaded_file LargeBinary attribute. Correct? (Then when sqlalchemy eventually issues the INSERT statement to store the data in the database... But then I don't really know how the data transfer is done...) I have tried to find another way of passing the data to the LargeBinary object that would not have to load the entire file into memory at once, but stream it in chunks during the INSERT statement, but I was not able to find anything. :-( Anyone managed to implement something like this before, or know where I can read some more info about possible ways of doing this with sqlalchemy? Thanks a lot, Andre -- André Charbonneau Research Computing Support Analyst Shared Services Canada | National Research Council Canada Services partagés Canada | Conseil national de recherches Canada 100 Sussex Drive | 100, promenade Sussex Ottawa, Ontario K1A 0R6 Canada andre.charbonn...@ssc-spc.gc.ca Telephone | Téléphone: 613-993-3129 -- André Charbonneau Research Computing Support Analyst Shared Services Canada | National Research Council Canada Services partagés Canada | Conseil national de recherches Canada 100 Sussex Drive | 100, promenade Sussex Ottawa, Ontario K1A 0R6 Canada andre.charbonn...@ssc-spc.gc.ca Telephone | Téléphone: 613-993-3129 -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
Re: [sqlalchemy] attribute for the size of a LargeBinary
On 12-10-05 11:27 AM, Michael Bayer wrote: On Oct 5, 2012, at 10:26 AM, Andre wrote: I would like to make this more efficient by adding an attribute to the DocumentBase class above, which simply does a SQL query to get the size of the file_object at the DB level, thinking that this will not require to load the binary object into memory to compute its size. Of course, this behavior could be done by adding an extra field (such as file_size) to my DB tables with binary object, which would be set when the binary object is stored in the database, but I thought that it would be more elegant if the DB didn't have that extra column just to keep track of the binary object size. IMO there's no question you'd use SQL side functions to get this length, unless they were found to be inefficient. Storing the length separately arguably breaks normalization. After reading and searching the web, I think one way to do this is using a column_property attribute in my DocumentBase class, such as: @declared_attr def file_size(cls): return column_property(select([func.length(cls.file_object)]).where(cls.id == id)) But the above is not working for me, and gives me the following error: File /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/compiler.py, line 388, in visit_column raise exc.CompileError(Cannot compile Column object until CompileError: Cannot compile Column object until it's 'name' is assigned. I'm sure I'm missing something very basic here. Or maybe there is a simpler way of doing this? this was a little confusing because I kept assuming it was calling func.length(cls.file_object) that was throwing things off. But it's not (well it is, but separately), it's that WHERE clause. The column_property() is an expression that's loaded inline with your class, so unless you are correlating a different table back onto the parent table, you don't need a WHERE clause for it. the usage of WHERE here would at most describe selecting from an alias of the table correlated against its unaliased form, but that's not needed here and also the mechanics of column_property() + declarative mixins + declared_attr are already getting into territories where things are more prone to not being possible, due to Python class mechanics (which also give us all these great mixin capabilities in the first place, not to detract from them!). In fact the cls.id == id there is creating a binary expression against the Python built-in function id(). Anyway, there's no need to embed a SELECT in here, you just need len(col), so remove the select(), and also we need to navigate around that deferred() to get at the Column (this is something maybe I can improve, I added a ticket for that): class DocumentBase(Base): __abstract__ = True filename = Column(String) @declared_attr def file_object(cls): return deferred(Column('file_object', LargeBinary)) @declared_attr def file_size(cls): return column_property(func.length(cls.file_object.columns[0])) class Document(DocumentBase): __tablename__ = 'document' print Session().query(Document) SQL: SELECT length(:length_2) AS length_1, document.id AS document_id, document.filename AS document_filename FROM document now even if you wanted it to work as, when I call file_size, it emits a SQL query, you can do that too - just defer() that expression also: @declared_attr def file_size(cls): return deferred(func.length(cls.file_object.columns[0])) then you'll get SQL output like this when you say some_document.file_size: SELECT length(file_object) AS length_1 FROM document WHERE document.id = ? Thanks Michael for the quick and detailed reply. This works and is exactly what I needed. Many thanks. Much appreciated. Regards, Andre (I'm using sqlalchemy 0.7) Thanks a lot, Andre -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/3XisZc_YH9IJ. To post to this group, send email to sqlalchemy@googlegroups.com mailto:sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com mailto:sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- You received this message because you are subscribed to the Google Groups sqlalchemy group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en. -- Andre Charbonneau Shared Services Canada Telephone: 613-993-3129