Re: HTTPSConnection script fails, but only on some servers (long)

Steve Holden Tue, 12 Apr 2005 00:38:09 -0700

Paul Winkler wrote:

This is driving me up the wall... any help would be MUCH appreciated.
I have a module that I've whittled down into a 65-line script in
an attempt to isolate the cause of the problem.

(Real domain names have been removed in everything below.)

SYNOPSIS:

I have 2 target servers, at https://A.com and https://B.com.
I have 2 clients, wget and my python script.
Both clients are sending GET requests with exactly the
same urls, parameters, and auth info.

wget works fine with both servers.
The python script works with server A, but NOT with server B.
On Server B, it provoked a "Bad Gateway" error from Apache.
In other words, the problem seems to depend on both the client
and the server. Joy.

Logs on server B show malformed URLs ONLY when the client
is my python script, which suggests the script is broken...
but logs on server A show no such problem, which suggests
the problem is elsewhere.

DETAILS

Note, the module was originally written for the express
purpose of working with B.com;  A.com was added as a point of reference
to convince myself that the script was not totally insane.
Likewise, wget was tried when I wanted to see if it might be
a client problem.

Note the servers are running different software and return different
headers. wget -S shows this when it (successfully) hits url A:

 1 HTTP/1.1 200 OK
 2 Date: Tue, 12 Apr 2005 05:23:54 GMT
 3 Server: Zope/(unreleased version, python 2.3.3, linux2) ZServer/1.1
 4 Content-Length: 37471
 5 Etag:
 6 Content-Type: text/html;charset=iso-8859-1
 7 X-Cache: MISS from XXX.com
 8 Keep-Alive: timeout=15, max=100
 9 Connection: Keep-Alive

... and this when it (successfully) hits url B:

 1 HTTP/1.1 200 OK
 2 Date: Tue, 12 Apr 2005 04:51:30 GMT
 3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
 4 Via: 1.0 XXX.com
 5 Content-Length: 0
 6 Connection: close
 7 Content-Type: text/plain

Only things notable to me, apart from the servers are the "Via:" and
"Connection:" headers. Also the "Content-Length: 0" from B is odd, but
that doesn't seem to be a problem when the client is wget.

Sadly I don't grok HTTP well enough to spot anything really
suspicious.

The apache ssl request log on server B is very interesting.
When my script hits it, the request logged is like:

A.com - - [01/Apr/2005:17:04:46 -0500] "GET
https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.1" 502 351

... which apart from the 502, I thought reasonable until I realized
there's
not supposed to be a protocol or domain in there at all.  So this is
clearly
wrong. When the client is wget, the log shows something more sensible
like:

A.com - - [01/Apr/2005:17:11:04 -0500] "GET
/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.0" 200 -

... which looks identical except for not including the spurious
protocol and domain, and the response looks as expected (200 with size
0).

So, that log appears to be strong evidence that the problem is in my
client
script, right?  The failing request is coming in with some bad crap in
the path, which Jboss can't handle so it barfs and Apache responds with

Bad Gateway.  Right?

So why does the same exact client code work when hitting server B??
No extra gunk in the logs there. AFAICT there is nothing in the script
that could lead to such an odd request only on server A.


THE SCRIPT

#!/usr/bin/python2.3

from httplib import HTTPSConnection
from urllib import urlencode
import re
import base64

url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+)(:[0-9]+)?')

target_urls = {
    'B': 'https://B/SkinServlet/zopeskin',
    'A': 'https://A/zope/manage_main',
}

auth_info= {'B':    ('userXXX', 'passXXX'),
            'A':    ('userXXX', 'passXXX'),
            }

def doRequest(target, **kw):
    """Provide a trivial interface for doing remote calls.
    Keyword args are passed as query parameters.
    """
    url = target_urls[target]
    user, passwd = auth_info[target]
    proto,host,port=url_re.match(url).groups()
    if port:
        port = int(port[1:])   # remove the ':' ...
    else:
        port = 443
    creds = base64.encodestring("%s:%s" % (user, passwd))
    headers = {"Authorization": "Basic %s" % creds }
    params = urlencode(kw).strip()
    if params:
        url = '%s?%s' % (url, params)
    body = None # only needed for POST
    args =('GET', url, body, headers)
    print "ARGS: %s" % str(args)
    conn = HTTPSConnection(host)
    conn.request(*args)
    response = conn.getresponse()
    data = response.read()
    if response.status >= 300:
        print
        msg = '%i ERROR reported by remote system %s\n' %
(response.status,
                                                           url)
        msg += data
        raise IOError, msg
    print "OK!"
    return data

if __name__ == '__main__':
    print "attempting to connect..."
    result1 = doRequest('A', skey='id', rkey='id')
    result2 = doRequest('B', action='updateSkinId',
                        skinId='406',  facilityId='1466')
    print "done!"


# EOF


So... what the heck is wrong here?

at-wits-end-ly y'rs,

Paul Winkler

Paul:

I don't claim to have analyzed exactly what's going on here, but the most significant difference between the two is that you are accessing site B using HTTP 1.1 via an HTTP 1.0 proxy (as indicated byt he "Via:" header).

Whether this is a clue or a red herring time alone will tell.

It's possible that wget and your client code aren't using the same proxy settings, for example.

regards
 Steve
--
Steve Holden        +1 703 861 4237  +1 800 494 3119
Holden Web LLC             http://www.holdenweb.com/
Python Web Programming  http://pydish.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list

Re: HTTPSConnection script fails, but only on some servers (long)

Reply via email to