Marc Saegesser wrote: [snip]
>Now a question. What is the status of the HttpClient 2.0 release? The code >is currently tagged alpha 1 but the RELEASE_PLAN_2_0.txt document hasn't >been modified since October, 2001. I ask because, depending on how iminent >an actual release is, some of the changes that I'm proposing should probably >be made on a separate branch. > It's waiting on the committer's being comfortable that it's ready. I've been doing mainly maintenance on httpclient recently, so i'm not the best one to decide when it's ready to go. >Here's my story. I have need of something like HttpClient in my product but >I found that I had to extend it somewhat. The extensions are very generic >and I believe useful to others so I'd like to add to the HttpClient project. >I also found several bugs that I fixed along the way. I've documented these >changes below. > Cool. >I need to be able to use HttpClient (or a derivative) to navigate around the >web pretty much like a regular user-agent. I want to be able to access any >site and any web application that I can reach with a reasonably modern >browser. HttpClient does a good job of implementing the client side of RFC >2616. Unfortunately, there are lots of sites and some very big name >applications that do not implement the server side correctly. Some sites >(Yahoo! in particular) actually require a broken client implementation just >to log in. Here are two examples of things I've found so far. >RFC2616/10.3.3 forbids changing a 302 redirected POST method into a GET >method but acknowledges that most clients are broken in this regard (this is >the failure that Yahoo! requires). I have found sites that send relative >URLs in the Location: header of a redirect (this violates RFC2616/14.30). >Supporting these sites will require 'breaking' HttpClient. I propose adding >some kind of flag to put HttpClient into a 'compatability mode' that >impelements this and any other required broken behaviour. > This sounds like a great idea >A second need is to provide a mechanism for getting user acknowledgment for >certain actions. For exampe when redirecting from secure to non-secure >sites. > >I am going to start working on these changes next but I want to discuss them >with the HttpClient community so see if they feel they belong in the commons >HttpClient project or if the project should be forked. > You've emailed the development community. I'm not sure many of the 'user' community hang out here. My preference in this one is that it belongs in httpclient as a strict vs relaxed mode. >Anyway, below is a description of the modified and new files. The patches >and new files are attached. > >Modified files... > >Cookie.java > - Added support for old Netscape cookies. The biggest difference is that >the test for valid domains is different for Netscape cookies and RFC 2109 >cookies > - Added space after the semicolons separating the values. This is >required by sites that only implement the old Netscape cookie specification. > - Added additional date format for expiration times. > >HttpConnection.java > - The write*() and print*() methods now throw HttpRecoverableException. > >HttpMethodBase.java > - Added a new exception class, HttpRecoverableException. There are some >error conditions that we can try to recover from internally. The biggest >one I found was when a server unexepectedly closed the socket. In this case >we should just try to re-open the connection and try the request again. > - Fixed a problem with the handling of 100 status codes. If we get a 100 >after we've already sent the request body, RFC 2616 states that the response >should be ignored. The currently implementation incorrectly broke out of >the loop looking for the response. > This last one sounds like a bug that should be fixed anyway. > > - Always recreate the cookie header. A redirect response may have >included additional cookies that we need to send with the redirected request >and the path may have changed thus requiring a different cookie set. > Ditto. > > - Fixed readRequestBody implementation. A new version of this function >also takes an output stream. This makes it easier for subclasses to use >this implementation directly instead of having to re-implement it in order >to support things like saving the response to a file. > - Better support for responses that don't contain a Content-Length or >Transfer-Encoding header. By the specification, if these headers are both >absent, the response has no body content. In the real world what this means >is that the server probably didn't know the length when the response was >committed. It just sends the response and closes the connection when the >body is complete. This assumption falls apart when we get a response that >*can not* contain a body. In this case, the simple implemenation keeps >reading looking for a response body and actually ends up reading the next >response headers as the body. I've added a list of responses that, >according to the specification, can not ever have a body and fixed >readResponseBody() to not read a body for these responses. > Again, sounds like another bug. >URIUtil.java > - Added getPath() method. This method returns the path portion of a >given URL. The only difference from java.net.URL.getPath() is that this >method returns "/" if the URLs path is empty. > >GetMethod.java > - Switched to new HttpMethodBase.readResponseBody(). > >New files... > >HttpMultiClient.java > - Replacement for HttpClient. This class serves two purposes. First it >handles off-site redirects. Second, it is intended to be used within a >multithreaded application that, like a browser, may have more than one >request outstanding to a given server and have requests going to more than >one server. > - Since HttpMultiClient, unlike HttpClient, simultaneously handles >requesets for multiple servers it can't use HttpMethod classes directly >because they only include path information, not server information. A new >interface, HttpUtlMethod, is used that extends HttpMethod. > >HttpSharedState.java > - A simple wrapper around HttpState to synchronized access to data. This >is required to support the multi-threaded nature of HttpMultiClient. > >HttpConnectionManager.java > - This is actually the heart of HttpMultiClient. It keeps track of >available HttpConnections for host:port combinations. The number of >connections to a given host:port is limited (per RFC 2616) and if the limit >is reached calls to getConnection() will block until a connection becomes >available. > >HttpRecoverableException.java > - Extends HttpException. This exception is thrown when a potentially >recoverable error has occurred (e.g. a socket connection was closed >unexpectedly). Higher level code can attempt to try the operation again. > >HttpUrlMethod.java > - An interface that extends HttpMethod. HttpUrlMethod classes are >initialized with a fully qualified URL instead of just the path component. > >UrlGetMethod.java >UrlPostMethod.java >UrlDeleteMethod.java >UrlOptionsMethod.java >UrlPutMethod.java > - These classes exetend their respective method classes and impelement >HttpUrlMethod. > >Marc Saegesser > These all sound like good additions. What I think we need to work out is how do we turn this on or off? -- dIon Gillard, Multitask Consulting http://www.multitask.com.au/developers -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>