> Please excuse this really long post, but there's a lot to cover. > > First, let me introduce myself. My name is Marc Saegeser and I've been a > committer on the Jakarta-Tomcat project for over a year. I was the release > manager for the Tomcat 3.2.2-3.2.4 releases. I'm not currently a committer > on any Jakarta-Commons projects.
Welcome :) > Now a question. What is the status of the HttpClient 2.0 release? The code > is currently tagged alpha 1 but the RELEASE_PLAN_2_0.txt document hasn't > been modified since October, 2001. I ask because, depending on how iminent > an actual release is, some of the changes that I'm proposing should probably > be made on a separate branch. Maybe Rodney could comment on that. > Here's my story. I have need of something like HttpClient in my product but > I found that I had to extend it somewhat. The extensions are very generic > and I believe useful to others so I'd like to add to the HttpClient project. > I also found several bugs that I fixed along the way. I've documented these > changes below. > > I need to be able to use HttpClient (or a derivative) to navigate around the > web pretty much like a regular user-agent. I want to be able to access any > site and any web application that I can reach with a reasonably modern > browser. HttpClient does a good job of implementing the client side of RFC > 2616. Unfortunately, there are lots of sites and some very big name > applications that do not implement the server side correctly. Some sites > (Yahoo! in particular) actually require a broken client implementation just > to log in. Here are two examples of things I've found so far. > RFC2616/10.3.3 forbids changing a 302 redirected POST method into a GET > method but acknowledges that most clients are broken in this regard (this is > the failure that Yahoo! requires). I have found sites that send relative > URLs in the Location: header of a redirect (this violates RFC2616/14.30). > Supporting these sites will require 'breaking' HttpClient. I propose adding > some kind of flag to put HttpClient into a 'compatability mode' that > impelements this and any other required broken behaviour. That sounds reasonable. > A second need is to provide a mechanism for getting user acknowledgment for > certain actions. For exampe when redirecting from secure to non-secure > sites. > > I am going to start working on these changes next but I want to discuss them > with the HttpClient community so see if they feel they belong in the commons > HttpClient project or if the project should be forked. > > Anyway, below is a description of the modified and new files. The patches > and new files are attached. > > Modified files... > > Cookie.java > - Added support for old Netscape cookies. The biggest difference is that > the test for valid domains is different for Netscape cookies and RFC 2109 > cookies. > - Added space after the semicolons separating the values. This is > required by sites that only implement the old Netscape cookie specification. > - Added additional date format for expiration times. > > HttpConnection.java > - The write*() and print*() methods now throw HttpRecoverableException. > > HttpMethodBase.java > - Added a new exception class, HttpRecoverableException. There are some > error conditions that we can try to recover from internally. The biggest > one I found was when a server unexepectedly closed the socket. In this case > we should just try to re-open the connection and try the request again. > - Fixed a problem with the handling of 100 status codes. If we get a 100 > after we've already sent the request body, RFC 2616 states that the response > should be ignored. The currently implementation incorrectly broke out of > the loop looking for the response. > - Always recreate the cookie header. A redirect response may have > included additional cookies that we need to send with the redirected request > and the path may have changed thus requiring a different cookie set. > - Fixed readRequestBody implementation. A new version of this function > also takes an output stream. This makes it easier for subclasses to use > this implementation directly instead of having to re-implement it in order > to support things like saving the response to a file. > - Better support for responses that don't contain a Content-Length or > Transfer-Encoding header. By the specification, if these headers are both > absent, the response has no body content. In the real world what this means > is that the server probably didn't know the length when the response was > committed. It just sends the response and closes the connection when the > body is complete. This assumption falls apart when we get a response that > *can not* contain a body. In this case, the simple implemenation keeps > reading looking for a response body and actually ends up reading the next > response headers as the body. I've added a list of responses that, > according to the specification, can not ever have a body and fixed > readResponseBody() to not read a body for these responses. > > URIUtil.java > - Added getPath() method. This method returns the path portion of a > given URL. The only difference from java.net.URL.getPath() is that this > method returns "/" if the URLs path is empty. > > GetMethod.java > - Switched to new HttpMethodBase.readResponseBody(). > > New files... > > HttpMultiClient.java > - Replacement for HttpClient. This class serves two purposes. First it > handles off-site redirects. Second, it is intended to be used within a > multithreaded application that, like a browser, may have more than one > request outstanding to a given server and have requests going to more than > one server. > - Since HttpMultiClient, unlike HttpClient, simultaneously handles > requesets for multiple servers it can't use HttpMethod classes directly > because they only include path information, not server information. A new > interface, HttpUtlMethod, is used that extends HttpMethod. > > HttpSharedState.java > - A simple wrapper around HttpState to synchronized access to data. This > is required to support the multi-threaded nature of HttpMultiClient. > > HttpConnectionManager.java > - This is actually the heart of HttpMultiClient. It keeps track of > available HttpConnections for host:port combinations. The number of > connections to a given host:port is limited (per RFC 2616) and if the limit > is reached calls to getConnection() will block until a connection becomes > available. > > HttpRecoverableException.java > - Extends HttpException. This exception is thrown when a potentially > recoverable error has occurred (e.g. a socket connection was closed > unexpectedly). Higher level code can attempt to try the operation again. > > HttpUrlMethod.java > - An interface that extends HttpMethod. HttpUrlMethod classes are > initialized with a fully qualified URL instead of just the path component. > > UrlGetMethod.java > UrlPostMethod.java > UrlDeleteMethod.java > UrlOptionsMethod.java > UrlPutMethod.java > - These classes exetend their respective method classes and impelement > HttpUrlMethod. >From my point of view, these changes are fine as they don't seem to modify the API too much (and if they did, that wouldn't be a big problem to me, as I'm still using the HTTP client 1.0), and add some useful functionality. I would be ok directly modifying HttpMethod, but I definitely could understand if some didn't agree. Remy -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>