Re: [sqlalchemy] storing a large file into a LargeBinary question

2013-02-13 Thread Andre Charbonneau
Thanks for the feedback Michael.  Lots of good information in there.
I will read up on buffer() and memoryview() and also on custom
SQLAlchemy types.

Thanks again,
  Andre

On 13-02-12 04:33 PM, Michael Bayer wrote:
 On Feb 12, 2013, at 3:22 PM, Andre Charbonneau 
 andre.charbonn...@nrc-cnrc.gc.ca wrote:

 Greetings everyone,
 I have a piece of code in a web app where I need to store a large binary
 file (uploaded file stored on disk by Apache server), into an object's
 LargeBinary attribute.

 That's pretty easy to do with a syntax like:

myobject.uploaded_file = xyz.file.read()


 The problem is that I don't want to load the entire file into memory
 when I set the LargeBinary attribute.



 If my understanding is correct, the above call will first cause the
 entire content of the uploaded file to be loaded into memory and then
 that is assigned to the myobject.uploaded_file LargeBinary attribute. 
 Correct?

 (Then when sqlalchemy eventually issues the INSERT statement to store
 the data in the database... But then I don't really know how the data
 transfer is done...)


 I have tried to find another way of passing the data to the LargeBinary
 object that would not have to load the entire file into memory at once,
 but stream it in chunks during the INSERT statement, but I was not able
 to find anything. :-(
 In the old days these streaming binary interfaces were common, but as memory 
 has become plentiful you don't see them used anymore.  Even systems like 
 Oracle, you see client libraries reliant upon having to set the allowed size 
 of memory to be bigger than the largest value you need to store.

 psycopg2 does work with buffer() and memoryview() objects as the input to 
 a bytea column, and you could send these in as arguments where SQLAlchemy 
 should pass them through (or if not, its easy to make a custom type that 
 passes it through).  Though these objects don't appear to work around having 
 to load the data into memory, they just make memory usage more efficient by 
 removing the need for it to be copied internally. I'm not intimately 
 familiar with them enough to know if they support some way to stream from a 
 file handle or not.

 There's also a facility I've not previously heard of in Postgresql and 
 psycopg2 called the large object system, which appears to be an entirely 
 separate table pg_largeobject that stores them.  Dropping into psycopg2, 
 you can store and retrieve these objects using the object interface:

 http://initd.org/psycopg/docs/connection.html#connection.lobject

 as far as how to get that data into your table, it seems like you'd need to 
 link to the OID of your large object, rather than using bytea: 
 http://www.postgresql.org/docs/current/static/lo-funcs.html .  So you'd need 
 to forego the usage of bytea.   Again SQLAlchemy types could be created which 
 transparently perform these tasks against the OID.

 I'd ask on the psycopg2 list as to which feature they recommend, and I'm 
 betting they will likely say that memory is very cheap and plentiful these 
 days and you should just assume the data will be fit into memory.




-- 
André Charbonneau
Research Computing Support Analyst
Shared Services Canada | National Research Council Canada
Services partagés Canada | Conseil national de recherches Canada
100 Sussex Drive | 100, promenade Sussex Ottawa, Ontario  K1A 0R6
Canada
andre.charbonn...@ssc-spc.gc.ca
Telephone | Téléphone:  613-993-3129


-- 
André Charbonneau
Research Computing Support Analyst
Shared Services Canada | National Research Council Canada
Services partagés Canada | Conseil national de recherches Canada
100 Sussex Drive | 100, promenade Sussex
Ottawa, Ontario  K1A 0R6
Canada
andre.charbonn...@ssc-spc.gc.ca
Telephone | Téléphone:  613-993-3129

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[sqlalchemy] storing a large file into a LargeBinary question

2013-02-12 Thread Andre Charbonneau
Greetings everyone,
I have a piece of code in a web app where I need to store a large binary
file (uploaded file stored on disk by Apache server), into an object's
LargeBinary attribute.

That's pretty easy to do with a syntax like:

myobject.uploaded_file = xyz.file.read()


The problem is that I don't want to load the entire file into memory
when I set the LargeBinary attribute.



If my understanding is correct, the above call will first cause the
entire content of the uploaded file to be loaded into memory and then
that is assigned to the myobject.uploaded_file LargeBinary attribute. 
Correct?

(Then when sqlalchemy eventually issues the INSERT statement to store
the data in the database... But then I don't really know how the data
transfer is done...)


I have tried to find another way of passing the data to the LargeBinary
object that would not have to load the entire file into memory at once,
but stream it in chunks during the INSERT statement, but I was not able
to find anything. :-(


Anyone managed to implement something like this before, or know where I
can read some more info about possible ways of doing this with sqlalchemy?


Thanks a lot,
  Andre



-- 
André Charbonneau
Research Computing Support Analyst
Shared Services Canada | National Research Council Canada
Services partagés Canada | Conseil national de recherches Canada
100 Sussex Drive | 100, promenade Sussex 
Ottawa, Ontario  K1A 0R6
Canada
andre.charbonn...@ssc-spc.gc.ca
Telephone | Téléphone:  613-993-3129

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] storing a large file into a LargeBinary question

2013-02-12 Thread Andre Charbonneau
One piece of extra information I forgot to mention in my original email
is that the myobject.uploaded_file LargeBinary column attribute maps to
a Postgresql bytea column.

Best regards,
Andre

On 13-02-12 03:22 PM, Andre Charbonneau wrote:
 Greetings everyone,
 I have a piece of code in a web app where I need to store a large binary
 file (uploaded file stored on disk by Apache server), into an object's
 LargeBinary attribute.

 That's pretty easy to do with a syntax like:

 myobject.uploaded_file = xyz.file.read()


 The problem is that I don't want to load the entire file into memory
 when I set the LargeBinary attribute.



 If my understanding is correct, the above call will first cause the
 entire content of the uploaded file to be loaded into memory and then
 that is assigned to the myobject.uploaded_file LargeBinary attribute. 
 Correct?

 (Then when sqlalchemy eventually issues the INSERT statement to store
 the data in the database... But then I don't really know how the data
 transfer is done...)


 I have tried to find another way of passing the data to the LargeBinary
 object that would not have to load the entire file into memory at once,
 but stream it in chunks during the INSERT statement, but I was not able
 to find anything. :-(


 Anyone managed to implement something like this before, or know where I
 can read some more info about possible ways of doing this with sqlalchemy?


 Thanks a lot,
   Andre





-- 
André Charbonneau
Research Computing Support Analyst
Shared Services Canada | National Research Council Canada
Services partagés Canada | Conseil national de recherches Canada
100 Sussex Drive | 100, promenade Sussex Ottawa, Ontario  K1A 0R6
Canada
andre.charbonn...@ssc-spc.gc.ca
Telephone | Téléphone:  613-993-3129


-- 
André Charbonneau
Research Computing Support Analyst
Shared Services Canada | National Research Council Canada
Services partagés Canada | Conseil national de recherches Canada
100 Sussex Drive | 100, promenade Sussex
Ottawa, Ontario  K1A 0R6
Canada
andre.charbonn...@ssc-spc.gc.ca
Telephone | Téléphone:  613-993-3129

-- 
You received this message because you are subscribed to the Google Groups 
sqlalchemy group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [sqlalchemy] attribute for the size of a LargeBinary

2012-10-05 Thread Andre Charbonneau
On 12-10-05 11:27 AM, Michael Bayer wrote:
 On Oct 5, 2012, at 10:26 AM, Andre wrote:

 
  I would like to make this more efficient by adding an attribute to the 
  DocumentBase class above, which simply does a SQL query to get the size of 
  the 
  file_object at the DB level, thinking that this will not require to load 
  the 
  binary object into memory to compute its size.  Of course, this behavior 
  could 
  be done by adding an extra field (such as file_size) to my DB tables with 
  binary object, which would be set when the binary object is stored in the 
  database, but I thought that it would be more elegant if the DB didn't have 
  that extra column just to keep track of the binary object size.

 IMO there's no question you'd use SQL side functions to get this length, 
 unless 
 they were found to be inefficient.   Storing the length separately arguably 
 breaks normalization.

 
  After reading and searching the web, I think one way to do this is using a 
  column_property attribute in my DocumentBase class, such as:
 
  @declared_attr
  def file_size(cls):
  return 
  column_property(select([func.length(cls.file_object)]).where(cls.id == id))
 
  But the above is not working for me, and gives me the following error:
 
   File /usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/compiler.py, 
  line 388, in visit_column
  raise exc.CompileError(Cannot compile Column object until 
  CompileError: Cannot compile Column object until it's 'name' is assigned.
 
  I'm sure I'm missing something very basic here.  Or maybe there is a 
  simpler 
  way of doing this?

 this was a little confusing because I kept assuming it was calling 
 func.length(cls.file_object) that was throwing things off.  But it's not 
 (well 
 it is, but separately), it's that WHERE clause.  The column_property() is an 
 expression that's loaded inline with your class, so unless you are 
 correlating a 
 different table back onto the parent table, you don't need a WHERE clause for 
 it.  the usage of WHERE here would at most describe selecting from an alias 
 of 
 the table correlated against its unaliased form, but that's not needed here 
 and 
 also the mechanics of column_property() + declarative mixins + declared_attr 
 are 
 already getting into territories where things are more prone to not being 
 possible, due to Python class mechanics (which also give us all these great 
 mixin capabilities in the first place, not to detract from them!).

 In fact the cls.id == id there is creating a binary expression against the 
 Python built-in function id().

 Anyway, there's no need to embed a SELECT in here, you just need len(col), so 
 remove the select(), and also we need to navigate around that deferred() to 
 get 
 at the Column  (this is something maybe I can improve, I added a ticket for 
 that):

 class DocumentBase(Base):
  __abstract__ = True
  filename = Column(String)

  @declared_attr
  def file_object(cls):
  return deferred(Column('file_object', LargeBinary))

  @declared_attr
  def file_size(cls):
  return column_property(func.length(cls.file_object.columns[0]))

 class Document(DocumentBase):
  __tablename__ = 'document'

 print Session().query(Document)

 SQL:

 SELECT length(:length_2) AS length_1, document.id AS document_id, 
 document.filename AS document_filename
 FROM document

 now even if you wanted it to work as, when I call file_size, it emits a SQL 
 query, you can do that too - just defer() that expression also:

  @declared_attr
  def file_size(cls):
  return deferred(func.length(cls.file_object.columns[0]))

 then you'll get SQL output like this when you say some_document.file_size:

 SELECT length(file_object) AS length_1
 FROM document
 WHERE document.id = ?
Thanks Michael for the quick and detailed reply.
This works and is exactly what I needed.
Many thanks.  Much appreciated.

Regards,
  Andre













 
 
  (I'm using sqlalchemy 0.7)
 
  Thanks a lot,
Andre
 
  -- 
  You received this message because you are subscribed to the Google Groups 
  sqlalchemy group.
  To view this discussion on the web visit 
  https://groups.google.com/d/msg/sqlalchemy/-/3XisZc_YH9IJ.
  To post to this group, send email to sqlalchemy@googlegroups.com 
  mailto:sqlalchemy@googlegroups.com.
  To unsubscribe from this group, send email to 
  sqlalchemy+unsubscr...@googlegroups.com 
  mailto:sqlalchemy+unsubscr...@googlegroups.com.
  For more options, visit this group at 
  http://groups.google.com/group/sqlalchemy?hl=en.

 -- 
 You received this message because you are subscribed to the Google Groups 
 sqlalchemy group.
 To post to this group, send email to sqlalchemy@googlegroups.com.
 To unsubscribe from this group, send email to 
 sqlalchemy+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/sqlalchemy?hl=en.


-- 
Andre Charbonneau
Shared Services Canada
Telephone: 613-993-3129