2009/2/17 Mladen Turk <mt...@apache.org>: > Graham Dumpleton wrote: >> >> 2009/2/17 Joe Orton <jor...@redhat.com>: >>>> >>>> I did used to perform a dup, but was told that this would cause >>>> problems with file locking. Specifically was told: > >>> I'm getting lost here. What has file locking got to do with it? Does >>> mod_wscgi rely on file locking somehow? >> > > I'm lost as well :)
Consider: fd1 = .... lock(fd1) fd2 = dup(fd1) close(fd2) # will release the lock under some lock APIs even though not last reference to underlying file object write(fd1) # lock has already been released so not gauranteed that only writer close(fd1) At least that is how I understand it from what is being explained to me and pointed out in various documentation. So, if fd2 is the file descriptor created for file bucket in Apache, if it gets closed before application later wants to write to file through fd1, then application has lost its exclusive ownership acquired by way of the lock and something else could have acquired lock and started modifying it on basis that it has exclusive onwership at that time. >> In WSGI applications, it is possible for the higher level Python web >> application to pass back a file object reference for the response with >> the intent that the WSGI adapter use any optimised methods available >> for sending it back as response. This is where file buckets come into >> the picture to begin with. > > Now it looks that you are trying to intermix the third party > maintained native OS file descriptors and file buckets. > You can create the apr_file_t from apr_os_file_t Which is what it does. Simplified code below: apr_os_file_t fd = -1; apr_file_t *tmpfile = NULL; fd = PyObject_AsFileDescriptor(filelike); apr_os_file_put(&tmpfile, &fd, APR_SENDFILE_ENABLED, self->r->pool); > (Think you'll have platform portability issues there) The optimisation is only supported on UNIX systems. > but the major problem would be to ensure the life cycle > of the object, since Python has it's own GC and httpd has > it's pool. > IMHO you will need a new apr_bucket provider written in > Python and C for something like that. CPython uses reference counting. What is referred to as GC in Python is actually just a mechanism that kicks in under certain circumstances to break cycles between reference counted objects. Having a special bucket type which holds a reference to the Python file object will not help anyway. This is because the close() method of the Python file object can be called prior to the file bucket being destroyed. This closing of the Python file object would occur before the delayed write of file bucket resulting due to the EOS optimisation. So, same problem as when using naked file descriptor. Also, using a special bucket type opens another can of works. This is because multiple interpreters are supported as well as multithreading. Thus it would be necessary to track the named interpreter in use within the bucket and have to reaquire the lock on the interpreter being used and ensure thread state is correctly reinstated. Although possible to do, it gets a bit messy. Holding onto the file descriptor to allow the optimisation isn't really desirable for other reasons as well. This is because the WSGI specification effectively requires the response content to have been flushed out to the client before the final call back into the application to clean up things. In the final call back into the application to perform cleanup and close stuff like files, it could technically rewrite the content of the file. If Apache has not finished writing out the contents of the file, presuming the Python file object hadn't been closed, then Apache would end up writing different content to what was expected and possibly truncated content if file resized. Summary, you need to have a way of knowing that when you flush something that it really has been flushed and that Apache is all done with it. Graham