WSGI question: reading headers before message body has been read

2009-01-18 Thread Ron Garret
I'm writing a WSGI application and I would like to check the content-
length header before reading the content to make sure that the content
is not too big in order to prevent denial-of-service attacks.  So I do
something like this:

def application(environ, start_response):
status = 200 OK
headers = [('Content-Type', 'text/html'), ]
start_response(status, headers)
if int(environ['CONTENT_LENGTH'])1000: return 'File too big'

But this doesn't seem to work.  If I upload a huge file it still waits
until the entire file has been uploaded before complaining that it's
too big.

Is it possible to read the HTTP headers in WSGI before the request
body has been read?

Thanks,
rg

--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Diez B. Roggisch

Ron Garret schrieb:

I'm writing a WSGI application and I would like to check the content-
length header before reading the content to make sure that the content
is not too big in order to prevent denial-of-service attacks.  So I do
something like this:

def application(environ, start_response):
status = 200 OK
headers = [('Content-Type', 'text/html'), ]
start_response(status, headers)
if int(environ['CONTENT_LENGTH'])1000: return 'File too big'

But this doesn't seem to work.  If I upload a huge file it still waits
until the entire file has been uploaded before complaining that it's
too big.

Is it possible to read the HTTP headers in WSGI before the request
body has been read?


AFAIK that is nothing that WSGI defines - it's an implementation-detail 
of your server. Which one do you use?


Diez
--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Petite Abeille


On Jan 18, 2009, at 8:01 PM, Ron Garret wrote:


def application(environ, start_response):
   status = 200 OK
   headers = [('Content-Type', 'text/html'), ]
   start_response(status, headers)
   if int(environ['CONTENT_LENGTH'])1000: return 'File too big'


How would that work for chunked transfer-encoding?

Cheers,

--
PA.
http://alt.textdrive.com/nanoki/

--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Ron Garret
On Jan 18, 11:29 am, Diez B. Roggisch de...@nospam.web.de wrote:
 Ron Garret schrieb:



  I'm writing a WSGI application and I would like to check the content-
  length header before reading the content to make sure that the content
  is not too big in order to prevent denial-of-service attacks.  So I do
  something like this:

  def application(environ, start_response):
      status = 200 OK
      headers = [('Content-Type', 'text/html'), ]
      start_response(status, headers)
      if int(environ['CONTENT_LENGTH'])1000: return 'File too big'

  But this doesn't seem to work.  If I upload a huge file it still waits
  until the entire file has been uploaded before complaining that it's
  too big.

  Is it possible to read the HTTP headers in WSGI before the request
  body has been read?

 AFAIK that is nothing that WSGI defines - it's an implementation-detail
 of your server. Which one do you use?

Apache at the moment, with lighttpd as a contender to replace it.

rg

--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Ron Garret
On Jan 18, 11:43 am, Petite Abeille petite.abei...@gmail.com wrote:
 On Jan 18, 2009, at 8:01 PM, Ron Garret wrote:

  def application(environ, start_response):
     status = 200 OK
     headers = [('Content-Type', 'text/html'), ]
     start_response(status, headers)
     if int(environ['CONTENT_LENGTH'])1000: return 'File too big'

 How would that work for chunked transfer-encoding?

It wouldn't.  But many clients don't use chunked-transfer-encoding
when uploading files whose size is known.  In that case it would be
nice to let users know that their upload is going to fail BEFORE they
waste hours waiting for 10GB of data to go down the wire.

rg

--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Diez B. Roggisch

Ron Garret schrieb:

On Jan 18, 11:29 am, Diez B. Roggisch de...@nospam.web.de wrote:

Ron Garret schrieb:




I'm writing a WSGI application and I would like to check the content-
length header before reading the content to make sure that the content
is not too big in order to prevent denial-of-service attacks.  So I do
something like this:
def application(environ, start_response):
status = 200 OK
headers = [('Content-Type', 'text/html'), ]
start_response(status, headers)
if int(environ['CONTENT_LENGTH'])1000: return 'File too big'
But this doesn't seem to work.  If I upload a huge file it still waits
until the entire file has been uploaded before complaining that it's
too big.
Is it possible to read the HTTP headers in WSGI before the request
body has been read?

AFAIK that is nothing that WSGI defines - it's an implementation-detail
of your server. Which one do you use?


Apache at the moment, with lighttpd as a contender to replace it.



Together with mod_wsgi?

Diez
--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Ron Garret
On Jan 18, 12:40 pm, Diez B. Roggisch de...@nospam.web.de wrote:
 Ron Garret schrieb:



  On Jan 18, 11:29 am, Diez B. Roggisch de...@nospam.web.de wrote:
  Ron Garret schrieb:

  I'm writing a WSGI application and I would like to check the content-
  length header before reading the content to make sure that the content
  is not too big in order to prevent denial-of-service attacks.  So I do
  something like this:
  def application(environ, start_response):
      status = 200 OK
      headers = [('Content-Type', 'text/html'), ]
      start_response(status, headers)
      if int(environ['CONTENT_LENGTH'])1000: return 'File too big'
  But this doesn't seem to work.  If I upload a huge file it still waits
  until the entire file has been uploaded before complaining that it's
  too big.
  Is it possible to read the HTTP headers in WSGI before the request
  body has been read?
  AFAIK that is nothing that WSGI defines - it's an implementation-detail
  of your server. Which one do you use?

  Apache at the moment, with lighttpd as a contender to replace it.

 Together with mod_wsgi?

 Diez

Yes.  (Is there any other way to run WSGI apps under Apache?)

rg

--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Graham Dumpleton
On Jan 19, 6:01 am, Ron Garret r...@flownet.com wrote:
 I'm writing a WSGI application and I would like to check the content-
 length header before reading the content to make sure that the content
 is not too big in order to prevent denial-of-service attacks.  So I do
 something like this:

 def application(environ, start_response):
     status = 200 OK
     headers = [('Content-Type', 'text/html'), ]
     start_response(status, headers)
     if int(environ['CONTENT_LENGTH'])1000: return 'File too big'

You should be returning 413 (Request Entity Too Large) error status
for that specific case, not a 200 response.

You should not be returning a string as response content as it is very
inefficient, wrap it in an array.

 But this doesn't seem to work.  If I upload a huge file it still waits
 until the entire file has been uploaded before complaining that it's
 too big.

 Is it possible to read the HTTP headers in WSGI before the request
 body has been read?

Yes.

The issue is that in order to avoid the client sending the data the
client needs to actually make use of HTTP/1.1 headers to indicate it
is expecting a 100-continue response before sending data. You don't
need to handle that as Apache/mod_wsgi does it for you, but the only
web browser I know of that supports 100-continue is Opera browser.
Clients like curl do also support it as well though. In other words,
if people use IE, Firefox or Safari, the request content will be sent
regardless anyway.

There is though still more to this though. First off is that if you
are going to handle 413 errors in your own WSGI application and you
are using mod_wsgi daemon mode, then request content is still sent by
browser regardless, even if using Opera. This is because the act of
transferring content across to mod_wsgi daemon process triggers return
of 100-continue to client and so it sends data. There is a ticket for
mod_wsgi to implement proper 100-continue support for daemon mode, but
will be a while before that happens.

Rather than have WSGI application handle 413 error cases, you are
better off letting Apache/mod_wsgi handle it for you. To do that all
you need to do is use the Apache 'LimitRequestBody' directive. This
will check the content length for you and send 413 response without
the WSGI application even being called. When using daemon mode, this
is done in Apache child worker processes and for 100-continue case
data will not be read at all and can avoid client sending it if using
Opera.

Only caveat on that is the currently available mod_wsgi has a bug in
it such that 100-continue requests not always working for daemon mode.
You need to apply fix in:

  http://code.google.com/p/modwsgi/issues/detail?id=121

For details on LimitRequestBody directive see:

  http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody

Graham
--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Graham Dumpleton
On Jan 19, 6:43 am, Petite Abeille petite.abei...@gmail.com wrote:
 On Jan 18, 2009, at 8:01 PM, Ron Garret wrote:

  def application(environ, start_response):
     status = 200 OK
     headers = [('Content-Type', 'text/html'), ]
     start_response(status, headers)
     if int(environ['CONTENT_LENGTH'])1000: return 'File too big'

 How would that work for chunked transfer-encoding?

Chunked transfer encoding on request content is not supported by WSGI
specification as WSGI requires CONTENT_LENGTH be set and disallows
reading more than defined content length, where CONTENT_LENGTH is
supposed to be taken as 0 if not provided.

If using Apache/mod_wsgi 3.0 (currently in development, so need to use
subversion copy), you can step outside what WSGI strictly allows and
still handle chunked transfer encoding on request content, but you
still don't have a CONTENT_LENGTH so as to check in advance if more
data than expected is going to be sent.

If wanting to know how to handle chunked transfer encoding in
mod_wsgi, better off asking on mod_wsgi list.

Graham
--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Ron Garret
On Jan 18, 1:21 pm, Graham Dumpleton graham.dumple...@gmail.com
wrote:
 On Jan 19, 6:01 am, Ron Garret r...@flownet.com wrote:

  I'm writing a WSGI application and I would like to check the content-
  length header before reading the content to make sure that the content
  is not too big in order to prevent denial-of-service attacks.  So I do
  something like this:

  def application(environ, start_response):
      status = 200 OK
      headers = [('Content-Type', 'text/html'), ]
      start_response(status, headers)
      if int(environ['CONTENT_LENGTH'])1000: return 'File too big'

 You should be returning 413 (Request Entity Too Large) error status
 for that specific case, not a 200 response.

 You should not be returning a string as response content as it is very
 inefficient, wrap it in an array.

  But this doesn't seem to work.  If I upload a huge file it still waits
  until the entire file has been uploaded before complaining that it's
  too big.

  Is it possible to read the HTTP headers in WSGI before the request
  body has been read?

 Yes.

 The issue is that in order to avoid the client sending the data the
 client needs to actually make use of HTTP/1.1 headers to indicate it
 is expecting a 100-continue response before sending data. You don't
 need to handle that as Apache/mod_wsgi does it for you, but the only
 web browser I know of that supports 100-continue is Opera browser.
 Clients like curl do also support it as well though. In other words,
 if people use IE, Firefox or Safari, the request content will be sent
 regardless anyway.

 There is though still more to this though. First off is that if you
 are going to handle 413 errors in your own WSGI application and you
 are using mod_wsgi daemon mode, then request content is still sent by
 browser regardless, even if using Opera. This is because the act of
 transferring content across to mod_wsgi daemon process triggers return
 of 100-continue to client and so it sends data. There is a ticket for
 mod_wsgi to implement proper 100-continue support for daemon mode, but
 will be a while before that happens.

 Rather than have WSGI application handle 413 error cases, you are
 better off letting Apache/mod_wsgi handle it for you. To do that all
 you need to do is use the Apache 'LimitRequestBody' directive. This
 will check the content length for you and send 413 response without
 the WSGI application even being called. When using daemon mode, this
 is done in Apache child worker processes and for 100-continue case
 data will not be read at all and can avoid client sending it if using
 Opera.

 Only caveat on that is the currently available mod_wsgi has a bug in
 it such that 100-continue requests not always working for daemon mode.
 You need to apply fix in:

  http://code.google.com/p/modwsgi/issues/detail?id=121

 For details on LimitRequestBody directive see:

  http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestbody

 Graham

Thanks for the detailed response!

rg

--
http://mail.python.org/mailman/listinfo/python-list


Re: WSGI question: reading headers before message body has been read

2009-01-18 Thread Diez B. Roggisch

Ron Garret schrieb:

On Jan 18, 12:40 pm, Diez B. Roggisch de...@nospam.web.de wrote:

Ron Garret schrieb:




On Jan 18, 11:29 am, Diez B. Roggisch de...@nospam.web.de wrote:

Ron Garret schrieb:

I'm writing a WSGI application and I would like to check the content-
length header before reading the content to make sure that the content
is not too big in order to prevent denial-of-service attacks.  So I do
something like this:
def application(environ, start_response):
status = 200 OK
headers = [('Content-Type', 'text/html'), ]
start_response(status, headers)
if int(environ['CONTENT_LENGTH'])1000: return 'File too big'
But this doesn't seem to work.  If I upload a huge file it still waits
until the entire file has been uploaded before complaining that it's
too big.
Is it possible to read the HTTP headers in WSGI before the request
body has been read?

AFAIK that is nothing that WSGI defines - it's an implementation-detail
of your server. Which one do you use?

Apache at the moment, with lighttpd as a contender to replace it.

Together with mod_wsgi?

Diez


Yes.  (Is there any other way to run WSGI apps under Apache?)


Well, not so easy, but of course you can work with mod_python or even 
CGI/fastcgi to eventually invoke a WSGI-application.


However, the original question - that's a tough one.

According to this, it seems one can use an apache-directive to prevent 
mod_wsgi to even pass a request to the application if it exceeds a 
certain size.


http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines

Search for Limiting Request Content

However, I'm not sure how early that happens. I can only suggest you try 
 contact Graham Dumpleton directly, he is very responsive.



Diez

--
http://mail.python.org/mailman/listinfo/python-list