Re: refresh header proxy
Kalnichevski, Oleg wrote: I think all you need to know is what the header looks like, as i did look at the logs. It simply ignores the header. The header looks like this: Refresh: 0; URL=https:// Well, things _may_ be a little bit more complicated than that. [ ... ] I had to do some parsing of this type of header when writing a parser that extracted these from their in-html incarnation. At the time I couldn't find much out about them either. FWIW, the following regexp caught a lot of the html pages I saw in the wild: ;\s*[Uu][Rr][Ll]=\s*([^\s]+)\s*$ The main thing to watch out for was the variation in case of the URL= part. This may not be an issue if the header is generated by an actual http server (as opposed to being in some html or added by a CGI script). -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NTLM class
On Thursday, August 14, 2003, at 03:36 pm, Michael Becke wrote: +1 for me as well. Me too (+1, obviously non-binding). I'm about a quarter the way through integrating rc1 into some code and the internal JCE hidden setup would be a total spanner-in-the-works. -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: normalizing a URI with ..'s in it ?
Michael Becke wrote: On Wednesday, July 23, 2003, at 06:18 PM, Mike Moran wrote: [Oleg agreed] Right. Out of interest, which set of test cases does the URI class use, the ones from rfc2396 or rfc2396bis? The tests are from rfc2396bis. This is verging rapidly off-topic, but I was wondering if you knew anywhere I could keep up-to-date on the standardis track of rfc2396bis? I've written some code to do the normalization we were talking about and I am swithering about whether I should enable it. The different handling of paths such as /../../ between rfc2396 and rfc2396bis may have knock-on effects in my code, so it would be nice if they were *standard* knock-on effects :-) -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: normalizing a URI with ..'s in it ?
On Wednesday, Jul 23, 2003, at 20:37 Europe/London, Michael Becke wrote: Mike Moran wrote: Btw, I presume this is the algorithm given in section 5.2 of http://www.apache.org/~fielding/uri/rev-2002/ rfc2396bis.html#absolutize? If so, this is just a draft (draft-fielding-uri-rfc2396bis-03.txt). It does actually differ from rfc2396 in how it handles abnormal URLs (though I think that's irrelevant here). Yes, this is the algorithm. We decided to upgrade to ensure that URI parsing was consistent across Apache. I think this was at the request of Roy Fielding. Oleg, is that correct? [Oleg agreed] Right. Out of interest, which set of test cases does the URI class use, the ones from rfc2396 or rfc2396bis? The string my/relative/../../another/relative would never be output from merge() or given to remove_dot_segments() in the section 5.2 algorithm. If you are just applying remove_dot_segments() to this string then it will get confused and output a wierd answer because it's not expecting that input (ie a path that doesn't have a / at the start). I may be wrong, but I didn't think normalization could be applied to anything but absolute URLs. I agree that when resolving a path relative to a base URI a relative path should never be passed to remove_dot_segments(). However, according to section 6.2.2.3 remove_dot_segments() can be used for path segment normalization. I guess what is comes down to is that normalization is meant to generate a URI with a valid absolute path. The value output in this case is a little strange but I think it's correct. Essentially normalize should not be used on relative URIs. I would agree. Doesn't this mean that normalize() should thrown an exception if it *is* called on a non-absolute URI? -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 21754] - NullPointerException when releasing connection
On Monday, Jul 21, 2003, at 09:32 Europe/London, Kalnichevski, Oleg wrote: Mike, I can guarantee that there will be no API changes of what so ever as long as you are using stable 2.0 branch. However, if we are no allowed to change even internal APIs on the development 2.1 branch, I personally might also just seriously consider stopping contributing to HttpClient at all. I understand this as a recent bugbear of yours :-) I am not asking for no change, far from it. Instead, what I am asking is: what is the supported 'user profile'? For example, will this profile support 'component assemblers' as well as 'User-Agent users'? From my point of view I see the HttpClient library as a grab-bag of components to stick together, not a single entry point. I have followed the discussion around the need to refactor the internals of the HttpClient library for some time and I would certainly agree that they need reworking. But what is the end goal? Is it almost a User-Agent (just add water) ie the HttpClient class-to-be? The main reason I am coming to the HttpClient library is because existing solutions such as the Sun one or innovation.ch have bugs or aren't transparent enough. I already have a User-Agent which does everything the HttpClient class does. I just need a way to plug in an HTTP 1.1 provider which performs better and does not have the same bugs/deficiencies as Sun/innovation.ch. If you wish I can itemize exactly what is wrong with these libraries and tell you what I need. Again, many thanks for the work (I think I've probably spent more time and chars talking about this issue than I'll use doing the actual work :-) ) -- Mike Moran - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Post Method
Michael Becke wrote: I think there may be a bug here as well. According to the spec, space characters should be represented as '+' but URIUtil is encoding them as '%20'. I think the resevered character set is perhaps also incorrect. According to rfc 1738 ;, /, ?, :, @, = and are the reserved chars but URI is also uncluding +, $ and ,. My guess is that most servers translate all hex encoded characters but it seems that we are not quite to spec. You may want to double-check this with rfc2396, which updates rfc1738. My interpretation of the '+'/'%20' issue was that both were legal escapings of space, however it may be worth another reading on my part. -- Mike Moran - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Questions related to the use of HttpClient classes
Adrian Sutton wrote: On Friday, June 6, 2003, at 08:55 AM, Om Narayan wrote: [ ... ] I have another question to add: How do I prime the connections in the MultiThreadedHttpConnectionManager pool? I am running into a problem where it is taking a long time (almost 30 secs using https) to open the first connection. As a result the session bean that is opening the connection is timing out. I need a way to make the first call to create the connection work without timing out. Any suggestion is welcome. I'd say the delay here is actually initialising JSSE rather than actually making the first connection. There should be a way to initialise JSSE without actually making a connection. [ ... ] This might be difficult. I ran in to a problem like this while using a JCE implementation. The setup time seems to be caused by Suns implementation of SecureRandom. The following thread may be useful: http://forum.java.sun.com/thread.jsp?thread=4250forum=2message=11205 -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 300 Multiple Choices handling?
Adrian Sutton wrote: Hi Mike, HttpClient returns 300 as the status code as would be expected in such a case. Sounds reasonable. Does it also make the body available in this case? The developer is then free to select whichever option they want. The URL you gave however return 302 not 300 and HttpClient throws an exception because cross-site redirects are not supported. Umm. I think HttpClient and wget must disagree: $ wget --server-response http://www.blooberry.com//indexdotpreview/html/index8.htm --12:48:55-- http://www.blooberry.com//indexdotpreview/html/index8.htm = `index8.htm' Resolving www.blooberry.com... done. Connecting to www.blooberry.com[204.122.16.82]:80... connected. HTTP request sent, awaiting response... 1 HTTP/1.1 300 Multiple Choices ... A telnet to port 80 for that page also gives 300 Multiple Choices. I'll create a patch for the docs to mention 300 responses. Anything particularly important about them that I should note? I'm not sure what the docs should say other than pointing out that you'll need to parse or display the body in some non-HTTP way to get any sense out of it. -- Mike Moran - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 300 Multiple Choices handling?
Ortwin Glück wrote: Has anybody ever seen a 300 in the wild? Yes. I gave an example in the email I sent. This page is linked to on that site. However, I would say that they aren't numerous. -- Mike Moran - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 300 Multiple Choices handling?
Adrian Sutton wrote: On Monday, June 2, 2003, at 09:57 PM, Mike Moran wrote: Adrian Sutton wrote: [ ... ] Umm. I think HttpClient and wget must disagree: $ wget --server-response http://www.blooberry.com//indexdotpreview/html/index8.htm --12:48:55-- http://www.blooberry.com//indexdotpreview/html/index8.htm = `index8.htm' Resolving www.blooberry.com... done. Connecting to www.blooberry.com[204.122.16.82]:80... connected. HTTP request sent, awaiting response... 1 HTTP/1.1 300 Multiple Choices ... A telnet to port 80 for that page also gives 300 Multiple Choices. Interesting I do get a 300 from telnet, but a 302 from HttpClient: 302 !DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN HTMLHEAD TITLE302 Found/TITLE /HEADBODY H1Found/H1 The document has moved A HREF=http://www.eskimo.com/notfound.html;here/A.P /BODY/HTML [ ... ] I wonder... is HttpClient perhaps parsing out www.blooberry.com/ as the value for the Host: header? My URL is slightly bogus, though perhaps technically valid. What happens with: http://www.blooberry.com/indexdotpreview/html/index8.htm (Note no double slash) www.eskimo.com and www.blooberry.com appear to be on the same subnet; www.eskimo.com is perhaps the default VirtualHost. Just wondering ... -- Mike Moran - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
HashMap does use equals() [was Re: DO NOT REPLY [Bug 18355] - HttpState cannot differentiate credentials for different hosts with sameRealm names]
[EMAIL PROTECTED] wrote: [ ... ] The reason is because HashMap only compares the hashCodes of the objects and never consults equals. [ ... ] This is not the case. HashMap uses equals() when the hashCode() of two objects are the same. If you want good performance it is a good idea to have the hashCode() different or at least evenly distributed. However, it is not necessary. As a test, try the following code: import java.util.*; public class HashMapThing { private static class Key { private int hashCode; private String content; private String tag; public Key(int hashCode, String content, String tag) { this.hashCode = hashCode; this.content = content; this.tag = tag; } public int hashCode() { System.out.println(hashCode() called on + toString()); return hashCode; } public boolean equals(Object o) { System.out.println(equals() called on + toString() + , + o.toString()); if (o instanceof Key) { Key other = (Key) o; return this.content.equals(other.content); } else { return false; } } public String toString() { return tag: \ + tag + \ code: + hashCode + content: \ + content + \; } } public static void main(String[] args) throws Exception { Key entryA = new Key(0, 0, A); Key entryB = new Key(0, 0, B); Key entryC = new Key(0, 1, C); System.out.println(Adding entries:); Map map = new HashMap(); System.out.println(A); map.put(entryA, 1); System.out.println(B); map.put(entryB, 2); System.out.println(C); map.put(entryC, 3); System.out.println(\nGetting entries:); String out1 = (String) map.get(entryA); String out2 = (String) map.get(entryC); System.out.println(\nEntries:); System.out.println(1: + out1); System.out.println(2: + out2); } } -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [REMINDER] HttpClient IRC event - irc log
On Friday, February 28, 2003, at 01:51 AM, Dennis Cook wrote: Here is the complete session: [ ... ] Jandalf 1) I'm kinka amazed that cross host redirects has been absent for so long. Its seems glaring. [ ... ] Indeed it is. A large amount of the redirects I see are cross-site redirects. It makes sense, given the number of cgi scripts there are which count people leaving a site to go elsewhere. Incidentally, some sites don't even give an absolute URL; you have to resolve it relative the current URL ie you treat the redirected-from URL as the base. I would think the RFC says this is a non-no, and I probably wouldn't expect HttpClient to go this far. However, it's one example why I wouldn't use any built-in redirection. -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Significant HttpClient HttpMethodBase overhaul. Need earlyfeedback
Sam Maloney wrote: Makes sense to me, I would definatly agree on your point that 'Client' logic should be in HttpClient and not in HttpMethodBase. (I would say redirect, auth and even auto-retry would count as 'Client' logic). [ ... ] I would also agree. I would like and expect my use of the library to be one-to-one with respect to HTTP methods eg I ask it to do one thing and it does it. If anything makes the fetch not `complete' then it informs me and I decide whether to do a retry/follow a redirect/handle authentication. Also: - In my context the limits on redirects are not local to a call ie the number of redirects that are allowed on any one call may be affected by the general number of fetches that have been done so far. - I require full control of when a fetch is done. This is to allow limiting of number and type of hits to a web server. If auto-retry is enabled then I have no way of throttling or scheduling hits. As I think Laura has mentioned, it would be nice if HttpClient could be decomposed into reusable components for implementing a user agent, but it is essential that I be able to peel away any user agent layer that exists. From the sound of it, the current HttpMethodBase does too much on its own for me to use it. -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 17487] - waitForResponse is using busywait
Michael Becke wrote: Agreed. A second thread should not be needed though. It can be done using something like: [ ... ] Oleg Kalnichevski wrote: [ ... ] Oleg On Thu, 2003-02-27 at 15:36, Ortwin Glück wrote: Oleg Kalnichevski wrote: Odi, are you sure you want to have an extra thread per HttpMethod? I do not think so Oleg Better than a busy wait, isn't it? I just wanted to butt into this to point out that on some platforms, such as 2.2 linux, a thread equates to a process id, and you can quickly run out of them. In these cases, a busy wait is far preferable to a new Thread. Also firing off a Thread when things are slow can cause sudden spikes in Thread use. I've recently seen an analogous problem with an older version of jboss when doing RMI connections which was a pain in the arse to work around. -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 10807] - Handle virtual hosts, relativeurls, multi-homing
[EMAIL PROTECTED] wrote: [ ... ] --- Additional Comments From [EMAIL PROTECTED] 2003-02-27 18:37 --- I'd like to go ahead and tackle this one, but I need a little clarification. Does the following correctly describe what we want? - we want to perform a get on www.google.com, let's say www.google.com has X 1 IP addresses - we want to specify which IP address x to actually connect to - we want www.google.com in the Http header instead of x If this is the case, it sounds like we may want support for custom DNS resolution. Though this might be a little more than is needed for this simple case I think that is what it boils down to. I would support addition regardless, but then that's just me. I assume by custom DNS resolution you mean passing in the resolved values eg the HttpClient library is told: here's the Name/IP mapping, do a GET to this IP with this Host: Name? -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 17487] - waitForResponse is using busywait
Oleg Kalnichevski wrote: Mike May I add your comments to the bug report? Ye, sure. -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 10807] - Handle virtual hosts, relativeurls, multi-homing
Michael Becke wrote: I would support addition regardless, but then that's just me. I assume by custom DNS resolution you mean passing in the resolved values eg the HttpClient library is told: here's the Name/IP mapping, do a GET to this IP with this Host: Name? I was thinking more along the lines of an interface that would provide DNS resolution. This interface would be used by HttpConnection when opening connections. In general DNS names would be used everywhere except for when creating sockets. That would also suffice. If possible, I would prefer to pass an implementation of the interface in to HttpConnection upon creation rather than have it as a global setting, but I presume that's easier anyway. One other thing is that, currently, as a side-effect of using the Socket(DNS name, ...) constructor, the DNS lookup and Socket connection processes seem to be rolled into one. I was wondering if it is worthwhile setting a separate timeout for DNS lookup? As it stands, if DNS becomes slow for some reason then, even if the remote server is responding quickly, you'll get exceptions. Most people wouldn't care, but it would be nice to be able to set a longer timout for DNS responses if you happened to know they sometimes take a while. This would also allow some leeway for an implementation of this DNS interface to do retries internally without worrying about a `connection' timeout externally. -- Mike (moran) :-) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
nogoop: HttpClient competition
I just found this on my travels: http://www.nogoop.com/product_16.html I thought I'd mention it as there was a thread going on a little while ago about HttpClient competitors. They make explicit comparisons to HttpClient. -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: using httpclient without a HttpClient object (was Redirects?)
On Monday, February 3, 2003, at 09:33 PM, Laura Werner wrote: Jeffrey Dever wrote: Is there anyone out there that has code that actually calls the HttpMethod.execute()? Anything that looks like this: HttpState state = new HttpState(); HttpConnection = new HttpConnection(host, port, secure); HttpMethod method = new GetMethod(path); int status = method.execute(state, connection); I do. I'm the one who doesn't use HttpClient at all, because it's too simplistic for me. I need to maintain a single HttpConnectionManager but a bunch of HttpState objects (one per thread in my application), so I have my own function that does the same thing as HttpClient.executeMethod. [ ... ] I would second the request to leave entry points into the `engine' behind HttpClient.java. As far as I understand it, HttpClient.java is just an implementation of a simple user agent. My expected use (and limited current use) of the API would likely not even mention HttpClient.java and would itself constitute a user-agent. To take the example of redirects, this is something I need control of (auto-redirects is the first thing I turn off in Sun's HttpURLConnection). -- Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Configurable DNS resolver?
Ortwin Glück wrote: Michael Becke wrote: If you set the host of a method or HttpClient to an IP address then it will connect to that address. DNS names are not required, but will be resolved using the default Java method if used. Mike This causes problems on Multi-Homed sites. A DNS name is required in the HTTP request (Host request header) to uniquely reference the site. I was having a look through the latest release code and that was one of the things that occurred to me. Am I right in thinking that it would be HostConfiguration/HttpConnection that would require extra methods/calls to make a distinction between the host connected to (Socket level) and the advertised host (Host: header)? -- Mike -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]