Hi St.Ack,
Your feedback is really appreciated. I am quite happy that we now have
one less development list to spam ;)

See my comments inline

> The upgrade took way longer than I anticipated, a couple of days rather 
> than a couple of hours.  While some of the time was spent on refactoring 
> only slightly related to the httpcilent upgrade and testing to see all 
> httpclient used features still work post upgrade, the bulk of the time 
> was spent on redoing our auth system to fit the redesigned httpclient 
> auth system. I had trouble figuring out how things work now in the 
> absence of example. Our usage is a little out-of-the-ordinary in that we 
> manage own store of credentials and manage when to load them onto a 
> httpmethod.  Previous, HttpAuthenticator would select the scheme for 
> me.  Now it seems like I have to do it myself using AuthChallengeParser 
> and then iterate over the returns.  In general the new auth system 
> changes look to be for the best. 

I am not sure if that's going to be of help in your particular case, I
just want to note that one may replace the standard authentication
schemes with custom ones and provide additional custom schemes, if so is
desired:

http://jakarta.apache.org/commons/httpclient/3.0/authentication.html#Custom%20authentication%20scheme

> The IBM SSL socket timeout issues I'm seeing when I get an SSLSocket 
> with a timeout (I set the timeout by getting a socket with the null arg 
> constructor then doing an SSLSocket$connect with a timeout).  The 
> exceptions do not happen when I use SUN JVM 1.4.2.  These are probably 
> IBM JVM issues but I'll list them here anyways:
> 
> 1. The IBM JVM 141 (cxia321411-20030930) NPEs setting the NoTcpDelay.  
> Is anyone else seeing this?
>  java.lang.NullPointerException
>     at com.ibm.jsse.bf.setTcpNoDelay(Unknown Source)
>     at 
> org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:683)
>     at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1328)
> 
> 2. Using the IBM JVM 142, its saying SSL connection not open when we go 
> to use inputstreams.
>  java.net.SocketException: Socket is not connected
>     at java.net.Socket.getInputStream(Socket.java:726)     at 
> com.ibm.jsse.bs.getInputStream(Unknown Source)
>     at 
> org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:715)
>     at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1328)
> 

We have already had a few reports regarding IBM JSSE semantical
incompatibilities with Sun JSSE. It appears IBM JSSE implementation
unlike Sun's does not like attempts to set socket parameters when the
socket is closed. I believe it is clearly a bug in IBM JSSE but we can
think of working it around in HttpClient. 


> By way of feedback on the 3.0 API, I'll describe the two places where 
> the API is lacking regards our requirements forcing us to do yucky 
> overlays.  First some context.  The crawler must record the response 
> headers and response content exactly as it comes back over the wire and 
> its supposed to be tenacious.
> 
> Regards recording exactly what the server sent us, we overlay 
> HttpConnection with a version that wraps the socket input and output 
> streams.  Here's the diff:
> 
> +// HERITRIX import.
> +import org.archive.util.HttpRecorder;
> +
>  /**
>   * An abstraction of an HTTP [EMAIL PROTECTED] InputStream} and [EMAIL PROTECTED] 
> OutputStream}
>   * pair, together with the relevant attributes.
> @@ -676,7 +679,6 @@
>              highly interactive environments, such as some client/server
>              situations. In such cases, nagling may be turned off through
>              use of the TCP_NODELAY sockets option." */
> -
>              socket.setTcpNoDelay(this.params.getTcpNoDelay());
>              socket.setSoTimeout(this.params.getSoTimeout());
> 
> @@ -701,8 +703,23 @@
>              if (inbuffersize > 2048) {
>                  inbuffersize = 2048;              }
> -            inputStream = new 
> BufferedInputStream(socket.getInputStream(), inbuffersize);
> -            outputStream = new 
> BufferedOutputStream(socket.getOutputStream(), outbuffersize);
> +            // START HERITRIX Change
> +            HttpRecorder httpRecorder = HttpRecorder.getHttpRecorder();
> +            if (httpRecorder == null) {
> +                inputStream = new BufferedInputStream(
> +                    socket.getInputStream(), inbuffersize);
> +                outputStream = new BufferedOutputStream(
> +                    socket.getOutputStream(), outbuffersize);
> +            } else {
> +                inputStream = httpRecorder.inputWrap((InputStream)
> +                    (new BufferedInputStream(socket.getInputStream(),
> +                    inbuffersize)));
> +                outputStream = httpRecorder.outputWrap((OutputStream)
> +                    (new BufferedOutputStream(socket.getOutputStream(),
> +                    outbuffersize)));
> +            }
> +            // END HERITRIX change.
> +
> 

What does exactly httpRecorder do? Probably we could think of a less
intrusive way of getting the same thing done.  

> The other overlay we make is of HttpParser so we can persist through a 
> bad header parse:
> 
>  /apache/commons/httpclient/HttpParser.java 
> src/java/org/apache/commons/httpclient/HttpParser.java --- 
> /home/stack/bin/commons-httpclient-3.0-alpha2/src/java/org/apache/commons/httpclient/HttpParser.java
>         
> 2004-09-19 13:41:05.000000000 -0700 +++ 
> src/java/org/apache/commons/httpclient/HttpParser.java      2004-09-29 
> 14:23:03.000000000 -0700
> @@ -185,11 +185,21 @@
>                  // Otherwise we should have normal HTTP header line
>                  // Parse the header name and value
>                  int colon = line.indexOf(":");
> +                // START HERITRIX Change
> +                // Don't throw an exception if can't parse.  We want to 
> keep
> +                // going even though header is bad. Rather, create
> +                // pseudo-header.
>                  if (colon < 0) { -                    throw new 
> ProtocolException("Unable to parse header: " + line); 
> +                    // throw new ProtocolException("Unable to parse 
> header: " ++                    //      line);
> -                    throw new ProtocolException("Unable to parse 
> header: " + line); +                    // throw new 
> ProtocolException("Unable to parse header: " ++                    
> //      line);
> +                    name = "HttpClient-Bad-Header-Line-Failed-Parse";
> +                    value = new StringBuffer(line);
> +
> +                } else {
> +                    name = line.substring(0, colon).trim();
> +                    value = new StringBuffer(line.substring(colon + 
> 1).trim());                 }
> -                name = line.substring(0, colon).trim();
> -                value = new StringBuffer(line.substring(colon + 
> 1).trim()); +               // END HERITRIX change.
>              }
> 

This is a known problem. Basically it appears there's no one right way
to parse HTTP status line and headers that fits all type of
applications. Our plan is to provide a plugin mechanism for custom HTTP
parsers in the version 4

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25468

Cheers,

Oleg

***************************************************************************************************
The information in this email is confidential and may be legally privileged.  Access 
to this email by anyone other than the intended addressee is unauthorized.  If you are 
not the intended recipient of this message, any review, disclosure, copying, 
distribution, retention, or any action taken or omitted to be taken in reliance on it 
is prohibited and may be unlawful.  If you are not the intended recipient, please 
reply to or forward a copy of this message to the sender and delete the message, any 
attachments, and any copies thereof from your system.
***************************************************************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to