Re: Any interest in an HTML DOM in C / C# / similar? (was Re: State of the AJAX Union)
Can you explain how wrapping a C/C# DOM implementation and using that from Perl is in conflict with using Perl's regex engine? John On Thu, 23 Nov 2006, Christopher Hart wrote: For one particular application, I need the speed of Perl's regex engine and I have not been able to match it in C# or even with limited attempts using C++ and Boost's regex library. So, regardless of feature availability on other platforms, I'm going to continue pursuing DOM & JS functionality in Perl extending WWW::Mechanize. That said, I do a lot of *other* work in C# and what you suggest would be useful. If you already have something in the works (you mentioned publishing some code), I'd be interested in learning more. On 11/23/06, John J Lee <[EMAIL PROTECTED]> wrote: On Wed, 22 Nov 2006, Christopher Hart wrote: > Would an "easier" (yet still monumental) starting point be to tackle the DOM > implementation independent of a JS engine? [...] > This seems like a great open source project - it's way too much to handle > for most individual developers, but I think could be tackled with a > moderately organized team of folks with a good design laid out in advance. [...] So, having pooh-poohed Christopher's proposal, I'm going to make a very similar proposal of my own. Give me some slack, I have the excuse of actually having put in some implementation effort on this, and published the code :-) (but not in Perl). And I'm asking with a view to writing some more code myself (albeit only a slim chance of that happening). So, my question / semi-serious proposal is: is anybody here interested in collaborating in writing a portable HTML DOM library and DOM builder in a language *other* than Perl, with the intent of wrapping it for Perl and other languages (Python is my own interest)? I'm afraid I'm not interested in Perl 6's cross-language support, though. If it were me working on it, it would probably have to be C, C++ or C# (or possibly Java, or something weirder like Caml). Cross-language use is one reason for my thinking about doing it in one of those languages. Memory usage and execution speed is another. I lean towards C or C#. John
Any interest in an HTML DOM in C / C# / similar? (was Re: State of the AJAX Union)
On Wed, 22 Nov 2006, Christopher Hart wrote: Would an "easier" (yet still monumental) starting point be to tackle the DOM implementation independent of a JS engine? [...] This seems like a great open source project - it's way too much to handle for most individual developers, but I think could be tackled with a moderately organized team of folks with a good design laid out in advance. [...] So, having pooh-poohed Christopher's proposal, I'm going to make a very similar proposal of my own. Give me some slack, I have the excuse of actually having put in some implementation effort on this, and published the code :-) (but not in Perl). And I'm asking with a view to writing some more code myself (albeit only a slim chance of that happening). So, my question / semi-serious proposal is: is anybody here interested in collaborating in writing a portable HTML DOM library and DOM builder in a language *other* than Perl, with the intent of wrapping it for Perl and other languages (Python is my own interest)? I'm afraid I'm not interested in Perl 6's cross-language support, though. If it were me working on it, it would probably have to be C, C++ or C# (or possibly Java, or something weirder like Caml). Cross-language use is one reason for my thinking about doing it in one of those languages. Memory usage and execution speed is another. I lean towards C or C#. John
Re: State of the AJAX Union
On Wed, 22 Nov 2006, Christopher Hart wrote: I agree that folks have been talking about JS for a long time, and that it's frustrating, but what I'm suggesting is that we need to tackle a different problem first. [...] An HTML DOM implemention is a necessary part of JS support, sure (though Stefan points out that for web app testing -- as opposed to scraping -- an XML DOM may be sufficient for some purposes). Forgive me for being blunt, but that's not the blinding flash of insight that's needed What's needed is for somebody to actually write and publish some code -- and then for people to *keep on* working on it. No doubt it sometimes happens, but I've never seen a "free-range" open source project (as opposed to one started within a company) start up based on one person's plan and another's code. perspiration-not-inspiration-ly y'rs, John
Re: State of the AJAX Union
On Wed, 22 Nov 2006, Christopher Hart wrote: I'm willing to take a crack at laying out a vision, high level objectives and some implementation requirements based on my experiences and see how [...] Everyone who's seriously interested is willing to do that. Indeed, many have surely done that already, including myself. John
Re: State of the AJAX Union
On Wed, 22 Nov 2006, apv wrote: I've also been interested for a long time and tried to work on this 2 years ago but didn't get far enough to bother trying to release anything. [...] I would gladly throw down if there was a group effort with a real plan. I'm not the right hacker to lead this project though. Publish and be damned! Doesn't matter if you don't want to lead something. Somebody has to get the ball rolling if it's to happen, so why not you? John
Re: State of the AJAX Union
On Wed, 22 Nov 2006, Stefan Seifert wrote: [...] I too thought about that. Maybe using the JavaScript or JavaScript::Spidermonkey module and XML::DOM. I will certainly experiment around with them, as we need it at work. Doesn't seem to be Sigh, we've had this same little discussion at least five times here. The browser object model is not the XML DOM. It is the HTML DOM (which is ill-defined in practice, and is not really a superset of the XML DOM), plus other stuff. There is currently no implementation of it outside of browsers. Plus you have to build the damned DOM in the first place :-) too hard to me, but of course, I'm underestimating that :) Yes. As I've said many times before here, getting something working is not too hard, getting something useful is harder (how much depends on the audience, I guess), getting something good is a lot of work. Maybe this is universally true, but especially so of JS support for LWP :-) John
Re: State of the AJAX Union
On Fri, 3 Nov 2006, Christopher Hart wrote: I know there is a rich history of challenges implementing any kind of JavaScript interpretation using Mechanize or any other web scripting/automation utility, but I was wondering if anyone has tried to focus on "Mechanizing" AJAX? I realize this would take at least some degree of JavaScript interpretation and most likely some kind of internal DOM representation to maintain the No doubt you could profitably concentrate your implementation / bug-fixing effort on the DOM features you're interested in, but I don't think there's any terribly obvious closed subset of the DOM &c. that you could implement and save yourself lots of work as compared with implementing the full monty. state of the page, and that it's probably extraordinarily challenging. Probably only extraordinarily challenging in that it involves lots of work -- that's why nobody has done it :-) Nonetheless, with the increasing popularity of AJAX, it seems like it eventually needs to be done. I'm watching more and more of the sites I've written automation for slowly migrate to AJAX and it's getting increasingly difficult to work around these designs. [...] Whether or not it "needs" to be done, it won't be, unless somebody steps up to do it. John
Re: using LWP getting a PDF file which comes up blank
On Tue, 8 Aug 2006, Churton Budd wrote: [...] using LWP for display within this portal. When I get the return of the PDF, the adobe acrobat plugin pops up but it comes up blank. For multi page ECG's it comes up with multiple blank pages. I have saved this blank file and looked at it, it seems like it has the same size as a PDF which displays in the inherent web application. Loading this saved file into Adobe from the desktop, its still blank. Some characters hex code are different though (so I'm not sure this is an encoding issue). Anyone have any thoughts why the PDF file displays [...] Didn't read your code, but: Are you processing the PDF file at any stage as a text file? If so, stop doing that :) John
Re: Query Results on Multiple Pages
On Mon, 10 Jul 2006, flynfast wrote: I'm trying to write a script to send a query to the patent and trademark office webpage and capture the URL's pointing to the patents identified. The problem is that the results appear on more than one page (like Google lists its results on multiple pages). How do I write a script that will access the other pages? I hereby wager 50 British pence that you'll find that on CPAN. Have you looked for a patent search module there? (the winner must collect his/her winnings in person ;-) John
Re: Java script FAQ revised
On Thu, 6 Apr 2006, Peter Stevens wrote: [...] One typical use of Javascript is to perform argument checking before posting to the server. The URL you want is probably just buried in the Javascript function. Do a regular expression match on | $mech->content()| to find the link that you want and |$mech->get| it directly (this assumes that you know what your are looking for in advance). In more difficult cases, the Javascript is used for URL mangling to satisfy the needs of some middleware. In this case you need to figure out what the Javascript is doing (why are these URLs always really long?). There is probably some function with one or more arguments which calculates the new URL. [...] Another very common thing that's important for would-be scrapers is manipulation of forms (adding form controls and list items, submitting forms). In a sense that's just URL manipulation, of course, but in the FAQ it might be useful to draw people's attention to this specific case. Script can also set cookies. John
Re: Java script FAQ [was Re: :Mechanize]
On Wed, 5 Apr 2006, Mike Schilli wrote: [...] As soon as someone gets going and comes up with a reference implementation (every browser naturally has its own DOM implementation, that's why IE and Firefox behave differently at times), WWW::Mech is in business. How cool would that be! [...] Sadly, that's not something that's going to come out of the Mozilla camp: http://article.gmane.org/gmane.comp.mozilla.devel.dom/4227 John
Re: Javascript Execution
On Sat, 17 Dec 2005, Andy Lester wrote: > On Sat, Dec 17, 2005 at 12:16:29PM -0500, Christopher Hart ([EMAIL > PROTECTED]) wrote: > > There are also JavaScript engines available in C and Java > > (SpiderMonkey and Rhino, respectively, available on mozilla.org). You > > may be able to leverage those. > > I didn't know about SpiderMonkey. I'm going to have a look at it to see > if it will fit into WWW::Mechanize. Hi Andy As I've posted about here before a few times (search Gmane), I actually did this with my Python port of WWW::Mechanize a few years back, using spidermonkey. My implementation was a first-cut half-baked thing, but I did get it working for a few pages. I decided that was enough excitement for me ;-) I know a few people used it for projects of their own and improved on it a bit, though (eg. one guy used it in a college project to make JS-using pages accessible on non-JS devices, by having a proxy server and executing the JS there -- nice idea). The code is still available at wwwsearch.sf.net I made use of the Perl wrapper of SpiderMonkey to write something very similar for Python. IIRC, I had to extend it a little over what was in the Perl thing. I used an existing HTML DOM, but had to modify both the DOM, and of course the DOM builder (and add event stuff and browser object model). This is where the work lies :-) If you intend to try this, and you're not intimately familiar with the bizarre ways in which people can and do use
Re: URI::javascript and LWP::Protocol::javascipt...who'done it?
On Fri, 18 Nov 2005, Christian Montanari wrote: [...] > My Quest has been in the dream of many others already. > It is all about tackling this javascripting trought WWW::Mechanize but, tell > me if > my ideas about this topic is wrong, it seems that no good souls has ever yet > done it! [...] http://search.gmane.org/search.php?group=gmane.comp.lang.perl.modules.lwp&query=javascript http://thread.gmane.org/gmane.comp.lang.perl.modules.lwp/1285 http://article.gmane.org/gmane.comp.lang.perl.modules.lwp/1107/match=javascript Win32::IE::Mechanize John
Mailing list archives 2001-2005?
I've lost the archives for this list again. I'm sure somebody has one on the web. Can anybody point me to it? There are lots of links to old sites that stop in 2001, and GMANE seems to start in 2005, but I can't find anything between 2001 and 2005. Cheers John
Re: Bug in cookies in libwww-perl-5.803
On Thu, 28 Jul 2005, Mysql user wrote: > I'm trying to write a perl program to access the configuration of a VOIP > telephone through its web interface. The web interface assigns you a > session id cookie once you've logged in. It works with browsers but not > with libwww-perl5.803 as shipped with Fedora Core 4. > > Here is the set-cookie header: > Set-Cookie: SessionId="ab6931f2c09b05c9"; Version=1; Path=/ > > > Here is the correct cookie being sent (by Firefox): > Cookie: SessionId="bf754500cc94652f" > > Here is the cookie being sent by LWP: > Cookie: $Version=1; SessionId="\"ab6931f2c09b05c9\""; $Path="/" [...] Have you tried turning off RFC 2965 handling? Even then, perhaps Version 1 cookies won't be downgraded to V0 cookies, I don't recall. But try it and see. John
Re: Javascript and WWW::Mechanize or LWP::UserAgent?
On Thu, 14 Jul 2005, Peter Stevens wrote: > Under the heading of small serious amounts of work... > > I mentioned previously Win32::IE::Mechanize - does anybody have any > ideas on how to do the same thing with Firefox under Linux? [...] Warning: I'm not up-to-date on this, take what I say with a pinch of salt. I see you want linux, but first a comment about doing this on Windows: I believe there's a (MS-)COM IWebBrowser2 wrapper of Firefox's XP-COM interfaces, so *in theory* you should be able to point Win32::IE::Mechanize at that under Windows. I wouldn't be at all surprised if it were much harder than it should be, though (due to the complexities of COM, XP-COM and Firefox rather than the Perl side particularly)... Also, note the comment below about XP-COM only supporting in-process clients -- not sure exactly how one does things, given this. Under linux, I guess you'd have to do one of: 1. Extend the Perl module to interface with XP-COM direct (note XP-COM, unlike COM, is in-process only IIUC, so I guess you have to rebuild Firefox with your new code, which may be a "wonderful learning experience", even if you *are* a battle-hardened C++ veteran ;-) 2. Build Perl support into Firefox. I don't know if such functionality still exists in Firefox (there used to at least some support for Perl, but I have a feeling that was a long time ago, not sure if it's still there... Also, I don't know if it allowed external processes to talk to the browser. 3. Forget Perl and just write what you want in JavaScript. Not ideal, I know, but practical: obviously JS support is excellent in Firefox. See Selenium for inspiration. John
Re: Javascript and WWW::Mechanize or LWP::UserAgent?
[John Lee] > That's not a small amount of work you've just set Warren to do. :-) > > (speaking as somebody who made a semi-serious attempt at it, in Python) [deborah sciales] > Well, I guess it depends on his set of needs, and he does have > tokeparser and treebuilder, etc to use. > > If his javascript is inside of script tags, he can use treebuilder to > get those nodes, and then work with them. I see a host of Javascript > modules on CPAN. > > Here's why I would not try to write this in Python just yet, unless i > had the time: > > Perl -MCPAN -e shell; > > cpan> i/JavaScript/ > > > use a module from CPAN, use a few modules from CPAN, or patch and > improve a module! I don't think people here are interested in Python/Perl comparisons. Nevertheless, the problems with the existing libraries for either language are roughly the same, AFAICT (I too started with existing libraries. That was kind of the easy part). Are there a *specific* set of HTML parsing, HTML (not XML) DOM-building (with all the
Re: Javascript and WWW::Mechanize or LWP::UserAgent?
[Warren Pollans] > The problem I'm running into is "trying to deal with scripts that use > javascript" - so far, I've had to ignore them or, at least, those [deborah sciales] > You might also try writing your own javascript parsing routines? That's not a small amount of work you've just set Warren to do. :-) (speaking as somebody who made a semi-serious attempt at it, in Python) John
Re: Javascript and WWW::Mechanize or LWP::UserAgent?
On Mon, 11 Jul 2005, Warren Pollans wrote: > I've been using WWW::Mechanize to automate testing of cgi scripts - > works great! > > The problem I'm running into is "trying to deal with scripts that use > javascript" - so far, I've had to ignore them or, at least, those [...] > I really like being able to test from my unix box instead of having to > find a windows box to run quicktest pro or winrunner on. Try Selenium instead of mechanize. Not ideal for scraping (you have to drag in the entire browser, including its GUI, to use it, and the "driven" mode is currently buggy), but good for functional testing. Written in cross-browser JavaScript (in good OO style, too). http://selenium.thoughtworks.com/ It's quite new, but it's the only free tool in its class (cross-browser playback of functional tests) that I know of. With any luck, somebody may write a decent test recorder for it too (ie. record a test simply by doing what you'd do if you were manually testing -- ATM there are HTTP-proxying test recorders that try and do this, but one written in JS is what's really needed). John
Re: Authentication problem?
On Sat, 12 Mar 2005, Andrew Johnson wrote: > I've been wrestling with a script to scrape some information off of [...] > What else could I try? [...] Hi Andrew Read some past messages on this list from me. I think I've made the same guesses about fifty times now ;-/ and most of the debugging hints are always taken from the same fairly small set. Feel free to come back if you've tried those and are still stuck, of course! IIRC this list is on gmane now, so it should be easy to search. Does this list have a FAQ, anybody? My own FAQs (Python, not Perl, but that's not really all *that* relevant): http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html http://wwwsearch.sourceforge.net/ClientCookie/doc.html#debugging Other standard responses: 1. use WWW::Mechanize 2. perdoc lwpcook John
Re: Architectural Question rgd LWP::UserAgent, WWW::Mechanize
On Sat, 5 Mar 2005, Robert Barta wrote: > I am using WWW::Mechanize/LWP and some of their subclasses now for > several things and I see an architectural problem I will be facing in > some future: > > For downstream developers (and for me) I need to offer a facility > to choose a user agent which supports a number of features: > > - local caching > > - specialized cookie handling for specific web sites > > - scripting (controlling the user agent via a dedicate language > and not via Perl method calls to WWW::Mechanize). > > - triggering of application specific code at particular events > (page loaded, link selection, page unload) > > - maybe optional JavaScript/DOM coverage later > > Now much of this functionality is already there (I have implemented > scripting recently), but somehow spread over several packages in > incompatible ways. But for a downstream developer it is not possible > to say something like this: > > my $ua = new LWP::UserAgent::Pluggable; > > $ua->add_plugin (new LWP::UserAgent::Plugins::Cache (size => '4M')); > $ua->add_plugin (new LWP::UserAgent::Plugins::Scriptable (plan => ...)); > $ua->add_plugin (new LWP::UserAgent::Plugins::Hooks ( > ('http://specialsite/page' => sub { do something; })); > > Does this make sense? Yes! Python's urllib2 works like this, so I'm sure looking at that is well worth the time if you want something similar in Perl. I extended it in a fairly simple way in Python 2.4, and it now works quite nicely to support all kinds of things (cookies, auth (various flavours), http, ftp, gopher etc., refresh handling, referer handling, http-equiv, redirection, seek()-able responses, robots.txt observance...) using a single, relatively simple, plugin handler interface. Caching (of both content and connections) would naturally and easily fit into that. Recently noticed the yum package manager / urlgrabber developers have added more features (what I assume are decent implementations of throttling, persistent connections, mirror selection, etc ...), I assume mostly using the same plugin handler system (though they're pretty application-focused). There's no requirement to shoehorn everything into some elegant scheme in order to enable customisation and re-use, though, is there? Module designs need effort expended to keep them open and reusable, true, but that doesn't mean (mythical) perfect genericity (although really generic interfaces can sometimes be just the ticket and very useful, as with urllib2's handlers). A few examples of where, despite urllib2's rather nice handlers, I don't feel a need to fit into any grand generic interface: For cookie policy, I have (in ClientCookie, and now cookielib in Python stdlib), CookiePolicy objects -- *not* a handler -- rather, each cookie handler *has* a CookieJar, which *has* a CookiePolicy. Hooks as you describe might well be done best with explicit support from standard handlers, I would guess (though I woouldn't know for sure 'till I try). Mind you, I have a couple of useful debug handlers, eg. for printing redirected response bodies. Never tried scripting, but I don't see any obvious reason for wanting that as a plugin handler in the urllib2 sense (FWIW, never looked at it, but I know there's a scripting system based on urllib2 + my libraries (in turned based in large part on ports from LWP), called PBP). I've not considered more elaborate generic plugin systems that might offer the opportunity for having eg. this kind of scripting as a plugin to some browser object (too much else more valuable I could do first!), but maybe that'd be an interesting idea to think about a bit. In my port of WWW::Mechanize, I added simple methods back on top of the urllib2 handler system, mostly for convenience of *removing* handlers without rebuilding an opener object each time (eg. Browser.handle_refresh(handle) -- where handle is a boolean arg). Works fairly nicely, I think. I also started on Javascript support. You need a browser model for that (same goes for proper Referer handling, though eg. my mechanize.HTTPRefererProcessor is written as an object that works just like any other handler -- it just happens to use a Browser class in its implementation), so the sort of handlers I refer to above aren't the main issue. See DOMForm and python-spidermonkey here: http://wwwsearch.sourceforge.net/ Enough rambling. Hope this helps stir you to write something interesting and share it... John
Re: R: Help!! I'm stuck! - using LWP for single sign on purposes
On Tue, 1 Mar 2005, Andrea Setti wrote: > Thank you for the answer. > > I had a look th WWW::Mechanize and it does almost everything that i need. > > The only thing i cannot understand is: how can i forward the cookie to the > real browser? > I need to fetch it from the real login page and then forward it to the > referrer... [...] You don't have to forward the cookies explicitly -- it's all done under the covers automatically. You just have to make sure it's switched on. I don't recall if cookie handling is on by default in mechanize, though. I would imagine so, but don't trust me: read the docs. John
Re: R: Help!! I'm stuck! - using LWP for single sign on purposes
On Tue, 1 Mar 2005, Peter Stevens wrote: > HTTP::Cookies has two submodules. one for Mozilla browsers and one for > Microsoft browsers. Unfortunately the MS version does not support saving > the cookies. (BTW - everybody knows, Firefox is the better browser ;-) ). [...] Those are only needed if you want to interoperate with those browsers. Use HTTP::Cookies itself otherwise. John
Re: Mechanize - redirect problem
On Fri, 25 Feb 2005, Martin Kos wrote: > hi john > > > It wants this header (or similar, but this is a minimal one): > > Accept: text/html > i have added this header and it just works!!! thanks a LOT! > > > Maybe mechanize should sent an Accept header by default? > i think that would be a good idea for the text/html type. > > > BTW, Martin: I debugged this by just looking at what Firefox sends. Get > > livehttpheaders. > very handy firefox-plugin! i haven't knew it before. > how have you "see" that mechanize is missing the accept-header and that > the servers "needs" it ? was it only a guessing because firefox sends it? 1. Blindly copied firefox headers that I noticed mechanize (in fact, Python httplib/urllib2/mechanize) didn't send, or had obviously different values (the latter, in the case of Accept). 2. Saw that it now worked. 3. Deleted hdrs until it stopped working again :-) John
Re: Mechanize - redirect problem
On Tue, 22 Feb 2005, Martin Kos wrote: [...] > i try to login to the page http://mymobile.sunrise.ch/ but it seems like > mechanize is not doing the redirect that is on the start site... if i > try with my browser or wget i get redirect to a page like > http://mymobile.sunrise.ch/portal/res/guest;jsessionid=HCCISJ1USYYSVQFIGZAXRAQ?paf_dm=full&paf_gear_id=11&?successURL=/portal/res/member%3Bjsessionid%3DHCCISJ1USYYSVQFIGZAXRAQ > > i tried it with a simple "get" but it doesn't work and i don't see what > the problem could be... any idea what i'm doing wrong? It wants this header (or similar, but this is a minimal one): Accept: text/html Maybe mechanize should sent an Accept header by default? BTW, Martin: I debugged this by just looking at what Firefox sends. Get livehttpheaders. John
Re: automating javascript data forms
On Tue, 18 Jan 2005, Edward Peschko wrote: > hey all, > > I've got a data retrieval problem - I need to get data from a secure > website (ie: https) which has forms using javascript. > > What base technology can I use to do this? Will LWP suffice? > > I can't believe this isn't a FAQ - I searched up and down for this, > without luck. Is there an easy workaround around javascript? [...] http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html the bit on "Embedded script is messing up my web-scraping. What do I do?" applies just as much to Perl as to Python (with minor translation...). John
Re: automating javascript data forms
On Wed, 19 Jan 2005, Peter Stevens wrote: > Edward Peschko wrote: [...] > >If there was an integration between LWP and seamonkey, what form > >of integration would people feel would be most useful? [...] > I think seamonkey integration would be a good thing and see it as an > alternative to mech. Essentially the same methods as mech, but there > would be two advantages: > >1. Since the browser is supported by (an increasing number of) > websites, there will be fewer issues of "it works under > Firefox/IE6/etc, but not with my script". >2. support for javascript. A lot of sites use javascript to do > argument checking before dispatching to the actual link. I'd like > to invoke a method 'click on the button', have it do the > javascript and get/post/whatever the link. As it is, I have to use > hard coded URLs or do regex matching on the javascript to find > where the button actually posts to. Inelegant at best and fragile > at worst. [...] Do you mean spidermonkey (Mozilla's JavaScript interpreter)? Or do you mean Mozilla itself, through XP-COM? (Wasn't seamonkey the original project to get a working browser out of Netscape's source code? Or is there some project now to make the Mozilla source code usable as a library?) The latter would be essentially a replacement for LWP, rather than something that you would integrate with it. If you mean the former, that doesn't remove the need for LWP and mechanize. I got a first attempt at automatic JavaScript interpretation working for the Python port of mechanize and parts of LWP: http://wwwsearch.sourceforge.net/DOMForm/ http://wwwsearch.sourceforge.net/python-spidermonkey/ http://wwwsearch.sourceforge.net/mechanize/ If there's a good HTML DOM parser for Perl, it will be fairly easy to get something like this working with a few changes to Claes Jacobssen's JavaScript module (Perl wrapper of spidermonkey, which I borrowed from when doing the stuff above). Never did anything with it, though: I think it would be a LOT of work to make it work really well, certainly if nobody has already written a browser-style (rather than standards-compliant!) HTML DOM for you. I had to hack a DOM together from somebody's unmaintained pre-standards implementation of the HTML DOM. The tree builder literally gave me a headache (the version on my web site is certainly very incorrect, though if anybody is interested in doing a Perl version, I can probably dig out some patches that people sent me to make it work something approaching correctly). I don't want to put people off, though: a module that gives a useful level of compatibility with real browsers that is much better than my effort is quite doable in somebody's spare time, I think. It would be nice to see it done well -- browsers are such heavy things to drag into your code when all you want to do is fetch one lousy URL without poring over somebody else's JavaScript! John
Re: asp sessions
On Thu, 30 Dec 2004, [EMAIL PROTECTED] wrote: [...] > Why does WWW::Mechanize get directed to cookieerror.htm? Beats me too. I get the same problem with this site using Python's urllib2 &c. I tried near-identical headers to Firefox, and got the error page. Another guess: Sounds odd, I know, but I wonder if IIS/ASP.NET is insisting on a persistent connection? I haven't tested that theory, but I do notice that neither Lynx, nor Python/urllib2, nor libwww-perl (IIRC on the latter) use persistent connections, and all fail; Firefox does, and succeeds. [...] > I tried to establish a session with the server using IO::Socket::SSL > using the example from the POD. You can see the code at the following > URI: [...] Dropping back to a low level is obviously the right plan of attack, but I can't help with the Perl code... John
Re: asp sessions
On Thu, 30 Dec 2004, [EMAIL PROTECTED] wrote: > > I am trying to determine why the following commands to WWW::Mechanize::Shell > result as they do: > > [EMAIL PROTECTED] trwww]$ perl -MWWW::Mechanize::Shell -e 'shell' > >get https://www.setsivr.odjfs.state.oh.us/Login.asp > Retrieving https://www.setsivr.odjfs.state.oh.us/Login.asp(200) > https://www.setsivr.odjfs.state.oh.us/cookieerror.htm> [...] A guess: JavaScript setting a cookie? John
Re: Warnings in HTTP::Cookies
On Mon, 27 Sep 2004, Ed Avis wrote: [...] > It's difficult to produce a self-contained test case since this > program spends its time hitting someone else's website. I hope that [...] Easy but helpful would be to turn on HTTP::Cookies' debugging and post the output (censored if necessary). John
Re: Help, Please: Can't Get a Hold of
On Mon, 13 Sep 2004, Daniel E. Doherty wrote: [...] > Here is the javascript function that gets invoked: > function FormSubmit(objForm) > { > var strVersion = new String(navigator.appVersion); > var arrVersion = strVersion.split(" "); > var intVersion = new Number(arrVersion[0]); > > objForm.BrowserName.value = navigator.appName; > if (navigator.appName == "Netscape") > { > //alert("here"); > objForm.action = "NSPrint.asp"; > } > objForm.submit(); > } > > One of the solutions you recommend is to do in python what the script > does. It looks to me like this script just submits the form or prints > for Navigator (though I don't know much about javascript). It would [...] The last line submits the form. The rest of it doesn't . For example, objForm.action = "blah" sets the form element's action attribute to "blah", thus causing objForm.submit() to submit to a different URL. Google for "HTML 4.01 spec", and read the documentation for the Form element's action attribute. Then try looking at the HTML that contains the JavaScript function above, and figure out what objForm.BrowserName.value = "whatever" does (search for "BrowserName" in the HTML). John
Re: Help, Please: Can't Get a Hold of
On Wed, 8 Sep 2004, Daniel E. Doherty wrote: > I hit a page on the FDIC website that allows me to download Bank > Performance Reports, so-called "Call Reports." I can fill in the fields > on the page, but the button that kicks off the file transfer is > generated by an HTML tag like this: > > [...] > Is this right? I'm no HTML expert, but my 1998 O'Reilly book on HTML, > covering HTML 4.0 states that is a synonym for > . No, they're not synonymous. Quoting from my FAQ on my Python module based on HTML::Forms (see second bullet point -- your callback JavaScript function is FormSubmit()): http://wwwsearch.sourceforge.net/ClientForm/#faq Why does .click()ing on a button not work for me? - Clicking on a RESET button doesn't do anything, by design - this is a library for web automation, not an interactive browser. Even in an interactive browser, clicking on RESET sends nothing to the server, so there is little point in having .click() do anything special here. - Clicking on a BUTTON TYPE=BUTTON doesn't do anything either, also by design. This time, the reason is that that BUTTON is only in the HTML standard so that one can attach callbacks to its events. The callbacks are functions in SCRIPT elements (such as Javascript) embedded in the HTML, and their execution may result in information getting sent back to the server. ClientForm, however, knows nothing about these callbacks, so it can't do anything useful with a click on a BUTTON whose type is BUTTON. - Generally, embedded script may be messing things up in all kinds of ways. See the answer to the next question. John
Re: How to simulate https secured login using lwp
On Wed, 8 Sep 2004, Joseph Alotta wrote: [...] > > thing, so I don't read any impoliteness into your request, but only > > because of conscious effort not to. [...] > He did say "please". I think his request was polite and to the point. > > You are asking too much from non-native speakers. Let's see how well > you do in his language. You're right: I probably should have said that in private email (which I guess would be OK, extrapolating from the fact that *I'd* rather be told if my use of a language came across as impolite). Sorry, Suryya: no impoliteness was meant on *my* end either! John
Re: How to simulate https secured login using lwp
On Wed, 8 Sep 2004, Suryya Ghosh wrote: > How to simulate https secured login using lwp? [...] > I have open ssl installed in my system and Net ssleay 2.25 installed in > my system , but while loging in i am getting arespnse of 302 Moved > Temporarily Doesn't sound like an SSL problem. Does it redirect you somewhere you don't expect to end up? Or are you asking how to follow the redirection? [...] > please provide me a code sample. A friendly tip: no matter how politely phrased, directly commanding that people to give you code comes across as rude. "Does anyone have any code samples?" or "I'd be grateful if anyone can show me..." work fine. I know there are big differences between languages and cultures in this kind of thing, so I don't read any impoliteness into your request, but only because of conscious effort not to. John
Re: Cookie2: $Version="1" by default? (fwd)
On Mon, 9 Aug 2004, Andy Lester wrote: [...] > LWP is RFC-compliant. Gisle has done a marvelous job of making sure it > does just what the RFC says. > > WWW::Mechanize is a subclass and superset of LWP that does more > "browser-like" stuff. Mechanize is meant as a browser in an object, > whereas LWP does the strictly correct thing. > > That division of labor has served us well for a few years now. [...] I guess that makes a fair amount of sense. RFC 2965 really is dodo-like in its dead-ness, though. I think it's unusual in that respect: Most RFCs are much healthier, even if they're frequently ignored. The only practical uses of 2965 code I can think of are on intranets (*maybe*), and 2109 cookies (viz, those that have a Version attribute of 1 and arrive in a Set-Cookie: header). AFAICT, it makes sense to treat 2109 cookies as if they were 2965 cookies. Not sure if LWP does so, though, and I don't imagine it's a burning issue in many users' minds John
Re: Cookie2: $Version="1" by default? (fwd)
[Juan] > Why Cookie2: $Version="1" is still sent by default by LWP? > No browser sends that header by default, neither MSIE, Mozilla nor Konqueror. > > I suggest to remove it (at least by default). > I find much more useful to make LWP masquerade as MSIE instead > following an RFC nobody follows. [...] I agree that RFC 2965 handling should probably be switched off by default. Did you actually run into a problem though, or are you just paranoid?-) John
Re: WWW:Mechanize help clicking button
On Thu, 5 Aug 2004, Joseph Alotta wrote: [...] > I think > there is something going on in the java code in the first part. [...] That sounds like a fair bet . Your next step is to figure out what that something is. (Actually, it's JavaScript code. Java != JavaScript -- the two are quite different (JavaScript is much better designed <0.5 wink>).) Luckily, it really doesn't require any JavaScript knowledge to figure out what the code does -- if you know Perl, you won't have any trouble guessing what's going on. You just have to knuckle down and read it. John
Re: javascript, and cookie
On Mon, 9 Aug 2004, Richard Lawson wrote: [...] > I saw your post at libwww mail list but no answer. > It's an old post but it's the closest thing in the archives to my issue. > I have a similar problem where the page sets a client cookie and I need to > set it in LWP, but I can't seem to confirm that the cookies are being sent. [...] 1. Get a copy of ethereal 2. Turn on LWP's debug output HTH John
Cookie2: $Version="1" by default? (fwd)
Juan asked me to forward this to this list. (just this once, Juan; get yourself a free email account to post from -- eg. fastmail.fm) John -- Forwarded message -- Date: Mon, 09 Aug 2004 21:27:18 GMT From: JUANMARCOSMOREN <[EMAIL PROTECTED]> To: John J Lee <[EMAIL PROTECTED]> Subject: Cookie2: $Version="1" by default? Could you post this message to <[EMAIL PROTECTED]> ? I did not post there in the firt place because perl.org has changed its policy to not allow mails from terra.es: > Recipient: <[EMAIL PROTECTED]> > Reason:Mail from terra.es rejected because it does not accept bounces. This > violates RFC 821/2505/2821 http://www.rfc-ignorant.org/ --8<-- Why Cookie2: $Version="1" is still sent by default by LWP? No browser sends that header by default, neither MSIE, Mozilla nor Konqueror. I suggest to remove it (at least by default). I find much more useful to make LWP masquerade as MSIE instead following an RFC nobody follows. Could people on this list at least express their feelings about this subject? Do you want something that works or just something that follows an RFC nobody follows. Making LWP more MSIE complaint will make it more useful for everyone. Juan
RE: url/query question...
bruce, please don't cross-post unless you have some valid reason for it. On Mon, 28 Jun 2004, bruce wrote: [...] > however, if you examine the headers between the server/browser app, you can > more or less.. see what's being transfered back/forth... in this case, the > content/post data is available, and looks to be some ~6k of data... Sure. > it was my understanding that combining this information with the URl, > """should""" be able to get to the targeted page.. assuming all things are > equal.. however.. this does not appear to be the case.. There's no "should" here. No standard (prescriptive or de-facto) says that URL query string parameters and POST data are interchangeable. In many cases, server code merely *happens* to work that way. > i've been able to successfully simulate what a post does with a number of > sites, by simply combining the URL with the requisite data and dropping the [...] Yup, no surprise there. Equally, no surprise that it *doesn't* work in other cases. John
Re: url/query question...
This is nothing to do with win32, so I've cut that list from the To: line. On Sun, 27 Jun 2004, bruce wrote: [...] > i was under the impression that if i concatenated the url and the > content/query from the headers, that i'd be able to "simulate" the submit What do you mean by "the content/query from the headers"? I guess you mean the POST data? POST data != header data. An HTTP request contains 1. GET / POST line (containing the URL path), 2. headers, and 3. data. If you're taking a POST request you sniffed by some means, and issuing the corresponding GET request (GET /foo.cgi?post=data&goes=here HTTP/1.1), then, yes, whether or not that works is indeed entirely dependent on the way the code on the server was written. [...] > with the stjohn's site, the header information indicates that ~6-8k of > information is in the content portion of the URL. could this be correct?? Yes. > when i try to stuff this much (cut/paste) into the browser url/address it > cuts it off.. Don't do that, then. Do a POST instead, using LWP. > i was under the impression that you were limited with regards > to the size of the content/query portion of the URL... [...] Apparently so. POST data is not part of the URL, though. John
Re: www::mechanize issues
On Sat, 29 May 2004, bruce wrote: > hi... Hi [...] > basically, i'm looking to be able to get class schedule information from the > http://lca.lehman.cuny.edu/dept/registrar/schedule/coursefinder.asp site. [...] #!/usr/bin/perl -w use WWW::Mechanize; my $b = WWW::Mechanize->new(); $b->get("http://lca.lehman.cuny.edu/dept/registrar/schedule/coursefinder.asp";); $b->form_number(0); print $b->current_form()->dump(); $b->field("u_input", "CHE"); $b->field("sortby", "Instructor"); open(F, ">out.html"); print F $b->submit()->content(); close(F) or you could defect : http://wwwsearch.sf.net/mechanize/ #!/usr/bin/env python import mechanize b = mechanize.Browser() b.open("http://lca.lehman.cuny.edu/dept/registrar/schedule/coursefinder.asp";) b.select_form(nr=0) print b # I confess this is actually a bit of an accident (see below) b["u_input"] = ["CHE"] b["sortby"] = ["Instructor"] f = open("out.html", "w") f.write(b.submit().read()) f.close() Why does 'print b' print the current form? Because Browser delegates all unknown attribute access to ClientForm.HTMLForm. .__str__() is one such method. For almost everything, this is useful, but it's not really what one wants in this particular case. Since mechanize (the Python module) is still alpha (though fairly dilute of bugs, I believe), I shall go away now and add Browser.form_as_string() and Browser.__str__() methods :-) John
Re: URI support for OpenURL
On Thu, 13 May 2004, Tim Brody wrote: [...] > To the best of my knowledge there aren't any other standards for the > transport of bibliographic data through URIs. Sounds fair enough to me. > Besides that, OpenURL is > likely to become the standard method of linking within the multi-billion > dollar scholarly publishing industry: > http://www.crossref.org/02publishers/16openurl.html [...] Oh joy. racket-is-a-more-accurate-word-than-industry-ly y'rs John
Re: cookie handling patch
On Thu, 1 Apr 2004, JUANMARCOSMOREN wrote: [...] > > >Aleksandr Guidrevitch wrote: > > >>We've found that LWP incorrectly handles cookies > > >>containing ';' in the cookie value. > > >>The patch (test case and fix) is attached [...] > So, why do you want ';' in cookies if they are not handled > correctly by the most used HTTP implementations (MSIE and Mozilla)? Right. > > According RFC in **quoted** string you can put almost anything. > > See http://www.cse.ohio-state.edu/cgi-bin/rfc/rfc2109.html for > > definition of cookie: > > People don't care much about the HTTP RFC what people really want is to > be compatible with MSIE and Mozilla. [...] The algorithm in browsers has apparently always been pretty much 'split on ';'', so right again. John
ANN: mailing list for Python web client / URL programming
[yeah, I know this is a Perl list, but I thought people here might be interested, since I like to follow the Perl list] A new list for discussion of anything related to either web-client software or URL-processing / -fetching software written in Python. This includes, but is not limited to, the software at the wwwsearch.sf.net site. To subscribe or post messages to the list ([EMAIL PROTECTED]), visit the Mailman Info Page: http://lists.sourceforge.net/lists/listinfo/wwwsearch-general John
Re: HTTP traffic? (use LWP::Debug qw(conns); not working)
On Sun, 25 Jan 2004, Philippe 'BooK' Bruhat wrote: > Le dimanche 25 janvier 2004 à 21:22, John J Lee écrivait: > > > > > In fact, you can already use HTTP::Proxy to see inside a HTTPS connection: [...] > > Any recommendations for a specific one? > > Well, I was talking about my pet module, HTTP::Proxy. Version 0.12 on a > CPAN mirror near you. ;-) [...] Gah, sorry, not reading carefully again, am I? Thanks. John
Re: HTTP traffic? (use LWP::Debug qw(conns); not working)
On Sun, 25 Jan 2004, Philippe 'BooK' Bruhat wrote: > Le dimanche 25 janvier 2004 à 15:46, John J Lee écrivait: > > > > BTW, anybody have any tips on software / usage thereof for HTTPS proxying, > > for debugging purposes, and how to set up with LWP? I've always used > > browser plugins or debugging output from (Python) code until now. > > > > Well, in June 2004, I suppose HTTP::Proxy will support working in a man > in middle manner, so this kind of thing should be quite easy to do. Whoops -- I scanned that thread, then forgot about it ;-) Thanks. > (I have yet to understand how to use Net::SSLeay, though.) Worked out of the box for me (I'm running somebody else's code, though). > In fact, you can already use HTTP::Proxy to see inside a HTTPS connection: > set HTTPS_PROXY to point to your HTTP::Proxy proxy, use env_proxy > with your LWP::UA object. LWP::UA does a GET https://www.example.com/ > to the proxy, which will fetch the data with SSL, and return it in a > plain (cleartext) HTTP session. [...] Any recommendations for a specific one? Hmm, do these proxies check certificates / revocation lists when they do that (not that I care, particularly -- just curious)? What happens when they fail, if so? John
HTTP traffic? (use LWP::Debug qw(conns); not working)
Attempting to look at the network traffic generated by a Perl program that uses LWP for doing HTTPS POSTs, I put this in the driver script: use LWP::Debug qw(conns); But, though I see some debugging messages, I don't actually see the HTTP headers or body data. Same happens with plain HTTP (no SSL involved). The docs say: conns : show all data transfered over the connections I don't get HTTP headers or data from this, either: use LWP::Debug qw(+); What's wrong? I'm using LWP 5.69, Perl 5.6.1, Debian 2.2. BTW, anybody have any tips on software / usage thereof for HTTPS proxying, for debugging purposes, and how to set up with LWP? I've always used browser plugins or debugging output from (Python) code until now. John
Re: Problem logging on to site with MECHANIZE
On Fri, 23 Jan 2004, Gedanken wrote: > On Fri, 23 Jan 2004, John J Lee wrote: > > Yuck. Does it also work if you wave a dead chicken at it? ;-) > > Why not check the HTTP headers to find out what's going wrong? > > the headers are identical as far as i can tell. after all, the code > snippet i sent doesnt actually change anything. whether its mechanize > having problems or the javascript on the servers, i have not a clue. > > i agree with your chicken waving comment, i just dont have a better > explanation. Forgot to add: the browser must be doing it "right", and if the browser reloads, it's not doing that because it happens to feel like it: There must (presumably!) be some reason -- even if bogus -- why it does so. John
Re: Problem logging on to site with MECHANIZE
On Fri, 23 Jan 2004, Gedanken wrote: > On Fri, 23 Jan 2004, John J Lee wrote: > > Yuck. Does it also work if you wave a dead chicken at it? ;-) > > Why not check the HTTP headers to find out what's going wrong? > > the headers are identical as far as i can tell. after all, the code [...] Did you actually check to make sure (eg. with ethereal)? John
Re: Problem logging on to site with MECHANIZE
On Fri, 23 Jan 2004, bzzt wrote: > I'm trying to log on to this site (www.thecityvibe.com/forum/) with the > followin script but doesn't seem to succeed. Anyone knows what the problem > might be? (without reading your script): no cookie jar? I don't recall if WWW::Mechanize makes one by default if none is supplied to the constructor. If not, that could be your problem. Look at the HTTP headers. What do you get back from the server? John
Re: Different outcomes with same request
On Fri, 23 Jan 2004, Justin Cook wrote: [...HTML saved from browser and fetched with LWP appear different...] > going on here? Is it the difference between a dynamic page and a static > page being posted to? No. > Am I not recieving all the chunks of response in > time to get the transaction id? Is my regex just plain lame? Could be (I haven't checked). > I'm a > semi-newbie and have played with this for several days but to no avail. > Any help would greatly be appreciated. Change your script to save the response data (ie. the HTML) you fetched with LWP to a file. Compare it with the HTML you saved from your browser. If you find the data LWP is fetching seems wrong, you have to figure out what LWP is doing different from your browser -- if you get stuck there after some effort, ask here again. If you find that the data LWP is fetching seems OK, debug your parsing code. John
Re: Problem logging on to site with MECHANIZE
On Fri, 23 Jan 2004, Gedanken wrote: [...] > basically i manually set the form action... to the same thing it was > already set to. and voila, stuff starts working. Ill edit your version > below to show you what i mean. Yuck. Does it also work if you wave a dead chicken at it? ;-) Why not check the HTTP headers to find out what's going wrong? John
Re: Different outcomes with same request
On Fri, 23 Jan 2004, Philippe 'BooK' Bruhat wrote: [...] > Maybe the transaction is put in the page by some javascript > (document.print?). Your browser saves the resulting page, while > WWW::Mechanize works on what the server sends. No, browsers always save the original document. At least, that's what they've always done when I've asked them to save... John
Re: found my mech problem
On Tue, 23 Dec 2003, Gedanken wrote: [...] > because, unbeknownst to me, the action for that form happened to have the > phrase '&lang=FR' in it. well apparently &lang has a special meaning, as > i can see from my request object that it has been encoded into an escape > sequence against my will =) [...] What byte string do you get? John
Re: RFC: WWW::Mechanize::Compress or LWP patch?
On Wed, 3 Dec 2003, John J Lee wrote: [...] > Not in KDE 3.2: it decompresses automatically, so when you save or open > with KWrite, it's just 200_gzip.xml. ...and I'd take a guess that's because Safari (Apple's browser based on Konqueror) does the same, because 3.2 apparently includes a lot of changes merged back from Safari. John
Re: RFC: WWW::Mechanize::Compress or LWP patch?
On Wed, 3 Dec 2003, Gisle Aas wrote: > [EMAIL PROTECTED] writes: [...] > > http://diveintomark.org/tests/client/http/200_gzip.xml > > > > IE "just does it". > [...] > Konqueror suggest saving or opening the file in an > external app, but the file saved or given to an external app is still > gzipped. Not in KDE 3.2: it decompresses automatically, so when you save or open with KWrite, it's just 200_gzip.xml. John
HTTP::Cookies and URI character encodings
I think there might be a problem with _normalize_path, from HTTP::Cookies. I'll explain what happens with my Python port, because I have no idea how Perl and unicode interact: a unicode URI got passed to my equivalent of _normalize_path() (a unicode string is a separate type from an ordinary byte-string in Python). That function complained because there were non-ASCII characters in the unicode string, and it refused to guess which encoding to use. The stated purpose of _normalize_path is to allow plain string-comparison of HTTP URI paths, but I don't understand a) how that's possible given that the URI character set isn't always known, and b) why it's necessary -- why not just compare without any normalization? The trouble is, RFC 2396 doesn't specify any URI character encoding, but allows %-escapes, which are defined in terms of octets. So, when you see a URI containing %-escaped chars, you have to know the original URI character encoding in order to work out what characters they represent. Unfortunately, I don't think that's always possible (is it?), so normalizing to "fully-escaped" form (as _normalize_path does) may involve assuming a different encoding than was used to partially escape the URI before HTTP::Cookies had anything to do with it. Escaping with inconsistent character encodings certainly seems bad. Am I correct? Why not just leave URIs un-normalized? If they must be normalized, how should unicode URIs (or non-ASCII ones, generally) get normalized? This is all very confusing, especially to an English speaker who never reads or writes anything but ASCII! John
Re: Mechanize, Yahoo, and cookies
On Wed, 19 Nov 2003, John J Lee wrote: [...] > The Yahoo email login page is full of Javascript code doing complicated [...] BTW, as I must have said here before, the first thing everybody seems to do is to try to automate their Yahoo email account, so I'm sure there's lots of free pre-existing code around that already does this. John
Archive? [was: Re: Submiting a javascript...]
I was about to say "search the archives", but I can't find them. Surely they exist?? There are several places that have archives years out of date, and one with a couple of messages from 2003 and nothing else. Can the real libwww-perl archive stand up, please? On Sun, 16 Nov 2003, tv fw wrote: [...] > javacript [...] LWP doesn't handle JavaScript. Figure out what it does, then copy it using LWP, or get a browser to interpret it for you (eg. COM automation of MSIE -- eg. samie project seems to be a bunch of convenience functions layered on top of that, or use Java's httpunit, or Mozilla / XPCOM, or Konqueror / KParts or DCOP). John
Re: Mechanize, Yahoo, and cookies
On Tue, 18 Nov 2003, Brian Spiegel wrote: [...] > The launched browser, if the login was successful, should take me to my > inbox. However, I get a page stating that my browser doesn't allow cookies. > Has anyone attempted logins with Yahoo or any of these other services? Is > there something in their auth/cookie mechanism that needs special handling? The Yahoo email login page is full of Javascript code doing complicated stuff. Either read and understand it and copy what it does using LWP, or try something else: eg. automate MSIE. John
Re: cookies
On Sun, 16 Nov 2003, John J Lee wrote: > On Sat, 15 Nov 2003, allan juul wrote: [...] > > no - sorry,i didn't mean kill in that unix sense - i close the program > > with an exit or die or nothing more to do, then restart the program a > > bit later and at that point i have gotten a completely new cookie. > > I don't know what the problem is. Try sticking a print statement in the [...] For the record, the OP reports by email that the problem was ignore_discard (you need to pass that argument to the CookieJar constructor to tell it to save even session cookies). John
Re: cookies
On Sat, 15 Nov 2003, allan juul wrote: > On Saturday, Nov 15, 2003, at 16:36 Europe/Copenhagen, John J Lee wrote: [...] > > How did you kill the process? If you kill -kill it in Unix, then Perl > > won't have a chance to run the code to save your cookies. > > > > If you shut down your program normally, do the cookies get persisted > > OK? > > no - sorry,i didn't mean kill in that unix sense - i close the program > with an exit or die or nothing more to do, then restart the program a > bit later and at that point i have gotten a completely new cookie. I don't know what the problem is. Try sticking a print statement in the DESTROY method of HTTP::Cookies to check it's actually getting called, and trace things through save(), as_string() to figure out what's going wrong. John
Whoever is subscrib'd from cathaybk.com.tw, please fix your subscription address
Somebody is subscribed with an old address, apparently. Every time I post here, I get this: -- Forwarded message -- Date: Sat, 15 Nov 2003 23:37:50 +0800 From: Postmaster <[EMAIL PROTECTED]> To: John J Lee <[EMAIL PROTECTED]> Subject: AutoReply Reminding Message: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> domain changed as @Cathaybk.com.tw Original Subject: Re: cookies Thanks for send us email. We've delivered your email to @cathaybk.com.tw! Reminding!! Our Email domain is changed as @cathaybk.com.tw~ Thanks
Re: cookies
On Sat, 15 Nov 2003, allan juul wrote: [...] > i have tried something like > > $robot->cookie_jar( { > autosave => 1, > file => 'cookie.lwp' > } > ); > > > then i have tried to print out the cookies i get before i kill the > robot process and after i re-start the robot and they are different. ? > eh, what am i doing wrong ? How did you kill the process? If you kill -kill it in Unix, then Perl won't have a chance to run the code to save your cookies. If you shut down your program normally, do the cookies get persisted OK? John
Re: SSL interface for HTTPS within LWP
On Wed, 12 Nov 2003, Haroon Rafique wrote: [...] > To have SSL capability you need either one of the following 2 modules on > your Windows 2003 Server machine: > > Crypt::SSLeay > IO::Socket::SSL or the stuff from Johnny Lee (no, not me, we just happen to have similar names), which only depends on MSIE, I think. John
Re: Help with a login in a site
On Sat, 8 Nov 2003, Alexandre Loureiro wrote: [...] > I´ve talked to support of the site and they gave some hints.. [...] > - Then I´m redirected to a Login.php Script that gets my information in a > LDAP Server If that really is the information you need, why not start and end right there? Use LDAP directly, and forget LWP. Far saner than messing about with JavaScript. http://perl-ldap.sourceforge.net/ [...] > Question nº 1 - How can I submit this page above to autenticate the session > to access the page i want ? ( It will redirect to the main page) Use HTML::Form. > Question nº 2 - After submit I need to maintain this session alive to enter > the page i want ? ( Like using LWP::ConnCache , if needed send me some > examples) A session is not the same as a connection. ConnCache caches HTTP connections -- this is an optimisation you don't care about (until you get it working, anyway!). Sessions are usually maintained by cookies and/or data passed in POST or GET requests. The former are set by HTTP headers (Set-Cookie) and/or JavaScript code. The latter comes from JavaScript code, and/or from form submission (including INPUT TYPE=HIDDEN controls). POST data is passed in the HTTP body. GET data is passed in the URL as a query string (like ?foo=bar&spam=eggs) John
Re: Mysterious Redirect
On Wed, 5 Nov 2003, David Busby wrote: [...] > works on another similar site. So the question is not how to handle this in > LWP but what other methods would these folks use for redirecting me that > will work in IE and Netscape but not LWP. Embedded script (JavaScript, usually) or Refresh redirect (either a normal Refresh: HTTP header or, commonly, one that sits in a META HTML tag). See recent posts here on both. > The traffic is over SSL > connection, how could I watch that data? Is there some clever way with > stunnel that I can "watch the wire"? How can I debug this s---? See third question here http://wwwsearch.sourceforge.net/bits/clientx.html John
Re: I can´t make perl log into a server
On Sun, 2 Nov 2003, Alexandre Loureiro wrote: > I´m new to perl and using some books I´ve done some simple scripts to > make things easier. But now I´m having some problems login into a web > site ( secure using php login). I can log into the site but to navigate > futher i need a cookie to send some information to the site. > > It´s like an authentification, then it goes to empty page as follow: [...big ball of muddy JavaScript...] Bleck. What are you actually trying to do? If it's just a watchdog, well, OK. Good luck ;-) If it's functional testing, I think you're better off using a browser (automated from Perl). I imagine you'll be lucky if somebody here feels like hacking through that mess. :-) John
Re: How to handle an onLoad body attribute
...just to add: I wasn't implying that Rod *is* doing something he shouldn't. I have no knowledge about the site in question or the motivations of the people involved. John
Re: How to handle an onLoad body attribute
On Thu, 30 Oct 2003, Poly wrote: > This reminds me of a script that was supposed to post news articles > somewhere until the site decided to keep hackers and automatic scripts > out by inserting a session ID in an intermediary form... Most of the time this stuff isn't an attempt to keep people out. If it is, you probably shouldn't be trying to get around it. [...] > onLoad(). Now if they catch hacking their site, that where the fun is > right? [...] Rod, please *don't* "hack their site". John
Re: How to handle an onLoad body attribute
On Thu, 30 Oct 2003, Roderick A. Anderson wrote: [...] > There is a intermediate document being returned with an onLoad attribute > in the body tag to automagically submit the new form. >Needless to say this causes my script to fail and the as_string method > doesn't include the original form and data or the intermediate form and > data. (This making any sense?) Options: 0. Get the XML stuff working (recommended). 1. Figure out exactly what the script does and emulate it in your code. 2. Give up on libwww-perl and use browser automation. Browsers know about stuff like embedded script. You should be able to do this from any mainstream language, including Perl. Keywords: COM, MSIE, XPCOM, Mozilla, DCOP, Konqueror. 3. Use another language which has libraries that provide JavaScript interpretation. I imagine there's probably a Perl-Java bridge out there, for example, and the httpuunit library can do this, though not fantastically well (mostly because the browser object model bindings are not fully implemented, and what is implemented is not a faithful copy of browsers behaviour). John
Re: Help on LWP: college project on sms.ac
On Thu, 23 Oct 2003, Abhishek jain wrote: [...] > I am a B.Tech student and as a part of college project I have to make a > program that is able to send sms using www.sms.ac website.I tried to > make the project myself but I am having some cookies problem. The site > is not accepting the cookies generated by mine LWP pogram. I am sending [...] 1. Forget about CGI until you've got the thing working on the command line (or in your IDE, or whatever) 2. Have you actually checked what cookies are getting set and returned? Turn on HTTP::Cookie's debugging (IIRC, you have to ask LWP's centralised debug logging facility to do that), and it should tell you exactly what the server is sending, why it doesn't like any rejected cookies, and why it's not returning any cookies whose domain matches the site you're talking to. A sniffer like ethereal will also likely be useful, to see exactly what's going on (try sniffing both a browser and your program, and compare the two). HTH John
Re: The saga continues
On Fri, 17 Oct 2003, Roderick A. Anderson wrote: [...] > my $res = LWP::UserAgent->new->request($form->click); > > Are there any methods to search $res (which contains another form) to pull > out specific inputs that have been returned? [...] Just do another HTML::Form->parse() on the response data (or the response itself if you've got the latest LWP, IIRC), and call the find_input method on one of the forms that returns. I think there's a possible_values method, too, which you might find useful. Have you found the HTML::Form docs? John
Re: Building HTML document in memory
On Wed, 15 Oct 2003, Roderick A. Anderson wrote: > Back again. Getting more things solved but I can't find _anything_ on how > to build a HTML document in memory. All I really need is a method to POST > to a page with data I already have collected. I know the form inputs [...] HTTP POST does not require building an HTML document. I don't really understand what you wrote, so it's hard to guess what your problem is. John
Re: libwww-perl-5.71
On Wed, 15 Oct 2003, Gisle Aas wrote: [...] >2 [...] > Of the browsers I have here Mozilla displays "2" for the second value > while konqueror shows "x". I guess my question is what MSIE shows? 2! Damn. (For IE 5 -- has there been any standards-compliance effort with IE 6? I certainly doubt it on this particular point.) [...] > > > Female > > > > > > in the expected way. > > > > Officially, those s aren't allowed, are they? > > Yes. After the input you are in plain text context. The input tag is > implicitly empty. Oh, right. > > seen them 'in the wild', though :-( Probably I should strip tags even for > > OPTION element contents... > > But I think is different as it is a container. You can't > have here, but I have not tested what browsers do. Officially you can't, right, but we all know how this game works ;-). But as I said, my parser won't pick any up anyway, being event-driven, and IIRC yours code is similar in that respect (albeit 'pull' rather than 'push', which is a nicer way of doing it). [...] > > I see. It can still be used for the list items of INPUT type=radio, or > > whatever, though (though, as I said, probably rarely). > > I'll deal with it if somebody complains :) Good plan. John
Re: libwww-perl-5.71
On Wed, 15 Oct 2003, John J Lee wrote: [...] > seen them 'in the wild', though :-( Probably I should strip tags even for > OPTION element contents... [...] Hmm, I guess both our parsers do that naturally, anyway. :-) John
Re: libwww-perl-5.71
On Wed, 14 Oct 2003, Gisle Aas wrote: > John J Lee <[EMAIL PROTECTED]> writes: [...] > I could potentially let there be multiple 'value_names' for a single > value, but I could also just let an explicit label override the option > as per spec. Is this something that real browsers implement? I'm not sure I understand what you're asking. Real browsers do implement the defaulting of OPTION values to OPTION element contents (well, I tested Konqueror just now, and it does), and the labels certainly do, of course. So, if you have 1 2 You see 1 and 2 as the labels in the browser's GUI, and the server gets sent foo=1, or whatever. Personally, I chose to have users explicitly say that items are specified by label if that's what they want -- otherwise, items are assumed to be specified by value. I can see that searching both sets of names for a match by default might be better in some ways, though. > The label just seems like a mechanism to get shorter labels when > is used and as such the labels are likely not to be unique. > That makes them bad names for selecting values. I don't know why the label attribute was introduced, but I implemented the feature because I ran into a page where the labels seemed less likely to change than the values. [...] > > I see. That's not a part of the HTML spec IIRC (unlike the case of > > OPTION), but I guess it could be useful. > > I think so. The problem is to know where to stop collecting text. I > implemented a new get_phrase method to HTML::TokeParser to get it to > do what I think makes sense. It will deal with stuff like: > > Female > > in the expected way. Officially, those s aren't allowed, are they? No surprise if you've seen them 'in the wild', though :-( Probably I should strip tags even for OPTION element contents... > > Of course, there can also be explicit LABEL elements, but I suspect > > people rarely use them, so probably not very useful. > > All the examples in the HTML4 spec use the label more like a prompt > text and as such it is more an alternative name for the input. The > following example is given: [...] I see. It can still be used for the list items of INPUT type=radio, or whatever, though (though, as I said, probably rarely). John
Re: libwww-perl-5.71
On Tue, 14 Oct 2003, John J Lee wrote: [...] > OK. Did you notice that both the value and the label of OPTION default to > the contents (eg. Female here), according to the HTML 4 spec? In my [...] Just to be clear, OPTION actually has a label attribute, unlike INPUT (which needs a special LABEL element if you want it to have a label). As a result, I guess OPTION labels are relatively common. John
Re: [spam score 5/10 -pobox] Re: libwww-perl-5.71
On Tue, 14 Oct 2003, Gisle Aas wrote: > John J Lee <[EMAIL PROTECTED]> writes: [...] > Yes. If a form contains: > > >Female >Male >Unknown > > > Then the values that this field might take becomes "F", "M" and "?", > while the value names are "Female", "Male" and "Unknown". With newer > version of HTML::Form you can use both to modify the values. The > statement [...] OK. Did you notice that both the value and the label of OPTION default to the contents (eg. Female here), according to the HTML 4 spec? In my Python module, I decided to allow people to set options 'by label'. I guess your scheme doesn't let you use the label (foo) here: bar while my scheme doesn't let you use the element contents (bar) in that same case. Hm... [...] > No. It's the same concept as for the select/option shown above. If > you have a form containing: > > Female > Male > Unknown > > then the values and value names for the sex field will be the same as > in the previous example and: > >$form->param(sex => "male") > > will just work. That is if you are not surprised by $form->("sex") > returning "M" even if you just set it to "male". I see. That's not a part of the HTML spec IIRC (unlike the case of OPTION), but I guess it could be useful. Of course, there can also be explicit LABEL elements, but I suspect people rarely use them, so probably not very useful. John
Re: libwww-perl-5.71
On Tue, 14 Oct 2003, Gisle Aas wrote: [...] > HTML::Form's dump now also print alternative value names. What does 'alternative value name' mean? Is this something to do with OPTION element contents (this bit) and labels? > HTML::Form will now pick up the phrase after a > or and use that as the name of the checked > value. [...] What does this mean? Something like this, maybe (no name attributes)? foo bar The browser I happen to be running right now (Konqueror) doesn't make those controls successful (but that says nothing about what other browsers do, of course :-( ). Or do you mean something else? John
Re: Getting returned from values???
On Mon, 13 Oct 2003, Roderick A. Anderson wrote: [...] > Well I figured WWW:Mechanize was the ticket but I am now up against the > wall. I can't figure out how to take the returned page and get just that > field's value. All the examples I've found of using Mech are looking for > non-form information and though I could use those techniques I thought > there has to be an easier way. Do I need to use CGI.pm (which I'm already > using in the script) and if so how do I get the Mech results into a CGI > instance? HTML::Form (part of LWP)? Not certain it works nicely with WWW::Mechanize, because I've never used that. I doubt any CGI code is going to be very useful to you. John
RE: Cant "download" a webpage whit the same content my browser do es.
On Fri, 10 Oct 2003, Thurn, Martin wrote: > What's probably happening is that you have cookies enabled, but you're not > sending any. You have to GET the cookie from the search FORM page, in order > to SEND the cookie back with your POST of the query. Probably right. Jonathan: remember that Mozilla doesn't save session cookies to disk (by definition of 'session cookies', really -- they only last as long as the browser is open). John
Re: [patch] Uninitialized value in HTTP/Cookies.pm
On Wed, 24 Sep 2003, Christophe Chisogne wrote: > Just the same patch as in my previous post, but with a more > correct mime type ;-) > > I feel like excite.com sends bad cookies. > "Set-Cookie: uu=i=213.193.180.194-1064396543121MJ;; ..." > the double ';' is preceded by 2 '=' in the same > 'name=value' string. I guess its not right syntax. The equals inside the cookie value is allowed -- taking the standard as the de-facto one set by Netscape and IE, since the written spec. is very poorly defined and, indeed, wrong. I'm afraid the truth is that, if IE and Mozilla like it, it's 'correct'. I haven't seen a double semicolon before, though. John
Re: [PATCH] URI test failure on OS/2
On 19 Sep 2003, Gisle Aas wrote: [...] > The current behaviour is based on what made sense to me, not on how > stuff actually works in other apps on Windows. Anybody know a place > that describes the de-factor rules for file: URLs on Windows? [...] Probably a useless snippet: apparently both ':' and '|' are accepted by both Netscape 4 and IE (version 5 or 6, I guess). http://www.google.com/groups?threadm=87n0e1zqzh.fsf%40pobox.com John
Re: where is the submit button?
On Wed, 3 Sep 2003, wendy soros wrote: [...] > What I am trying to do is very similar to the > ABEBooks.com example in Burke (p.74): use the POST > method to submit some parameters to a form and save > the response to a local file. As done in the example, > I first got the name-value pairs of the form. Can you see the name-value pairs that will actually carry the data of interest back to the server (don't worry about the submit part)? I mean, can you see them from the HTML::Form interface? > The > problem I have is that I don't know how to submit the > form. In the webpage, there are two buttons "Search" > and "Clear Form", but I can't find the pair of name > and value for submission. That doesn't necessarily matter. > In the source code, the two > buttons are represented by two images: > "search_gray.gif" and "clearform_gray.gif", below is > the part of the source code I guess is relavent. [...] You have JavaScript code embedded in the HTML. Luckily, it seems from what you say that the JavaScript doesn't actually generate the name/value pairs, but only does the submission. You could look at the submitForm function to see exactly what it does (it might be in a separate HTML file, referenced in a src attribute -- it will likely have a .js extension). However, you'll probably find that all that submitForm function does is to validate the form values, so you could just try submitting the form as-is after filling it in with HTML::Form. IIRC, $form->submit(); gets you an HTTP::Request, which you then use in the normal way (I forget what that is :-). As somebody here says periodically, it'd be nice if LWP could interpret JS automatically. What happened to the last guy who tried that? He reported some progress, then disappeared. I have some somewhat-working Python code to do this, if any enthusiastic Perl hacker wants to use it as a starting point (it uses a spidermonkey JS/Python bridge -- very similar to Perl's Javascript module -- to bind JS to a pure-Python level 2 HTML DOM and assorted paraphenalia). John
Re: help with accessing lists with HTML::Form (with code sample)
On 25 Aug 2003, Gisle Aas wrote: > Mark Stosberg <[EMAIL PROTECTED]> writes: > > > I would like to see an extension to this part of the interface which > > allows one to treat single and multiple SELECT lists the same way. In > > the current situation calling the same command can result in dealing > > with either the SELECT tag, or the OPTION tag, which I find less useful > > and confusing. > > But it is a model that give a uniform behaviour and interface for all > the inputs. I think this is a good thing and that it should be enough > to write better documentation that explains the mapping between the > HTML tags/elements and the HTML::Form input objects. As you > discovered it is not one-to-one when it comes to . [...] > Since there is no representation of the tag itself there is > no object to put this method on. [...] FWIW, I found this part of the way HTML::Form works confusing too, and changed it in my port (well, quite a few other things have changed, so it's not a simple port any more). All 'controls' (in the HTML 4 terminology) are represented by single Control objects. Maybe the examples of the modified API here are of interest (in Python): http://wwwsearch.sourceforge.net/ClientForm/src/README-0_1_7b.html John
RE: Question about wildcarding when getting files with LWP
On Thu, 21 Aug 2003, Patrick Collins wrote: [...] > If you control the webserver you could try upgrading to Apache2 and > using mod_dav. The Webdav protocol allows you to get parsable directory > listings from which you can then download whatever files you choose. [...] And no doubt there's a module out there somewhere that will try to parse any old directory listing. John
Re: TreeBuilder cgi memory problems
On Thu, 7 Aug 2003, [EMAIL PROTECTED] wrote: > Having a potential TreeBuilder memory problem when using it to parse > through a large HTML table (> 2K rows) where the memory allocation grows to > about 20M on my server and never goes down even after finishing with the > HTML and TreeBuilder structures. The Perl script runs as a CGI and Apache > gives up after awhile with the following line in the error logs - "Out of > Memory !!" [...] 20 Mb does seem a lot, but why would one expect the process memory usage to fall after parsing is comlpete? On most systems, memory used by a process and free'd isn't returned to the system until the process exits. Sorry, no actual help... John
Re: how to handle the
On Tue, 5 Aug 2003, Andrea Tasso wrote: [...] > and lynx is short and with a
Re: Crypt-SSLeay on Win32: support for 128 bit X509 CA certificates
On Mon, 21 Jul 2003 [EMAIL PROTECTED] wrote: [...] > I'm setting the HTTPS_CA_DIR and HTTPS_CA_FILE environment > variables as described in the documentation. > - Something else strange - when I don't(!) set the two > environment variables, then I can access both sites(!!). The > warning "Client-SSL-Warning: Peer certificate not verified" > is however still being issued. Presumably because it interprets the lack of those environment vars as an implicit request not to attempt verification. Since you didn't *ask* for verification, it figures it can go ahead and access the site without verifying. Python's httplib *never* verifies, IIRC :-( ...OTOH, neither do humans <0.5 wink> -- few people take any notice of failed verifications. John
Re: Help needed
On Mon, 14 Jul 2003, Octavian Rasnita wrote: [...] > After downloading an HTML page, what modules can I use to read the cell 4 > from the fifth row of a table, if that table is placed in another table in > the cell 2 of the row 3? [...] http://theoryx5.uwinnipeg.ca/CPAN/data/HTML-TableExtract/HTML/TableExtract.html It *looks* easy to use (I've never used it, so don't believe me). It has some terribly clever declarative ways of specifying which bits of which table you want, but hopefully you can ignore those. :-) A quick CPAN search turned up these, which I've never looked at: HTML::TableParser HTML::TableContentParser John
Re: HTML parsing
On Thu, 10 Jul 2003, John J Lee wrote: [...] > Another way is to use libtidy (a new shared library-ized HTMLTidy, with [...] Actually, it's called tidylib, not libtidy: http://tidy.sourceforge.net/libintro.html John
Re: HTML parsing
On Wed, 9 Jul 2003, Reinier Post wrote: > On Tue, Jul 08, 2003 at 03:34:12PM +0100, Richard Lamb wrote: > > working out a means of stripping HTML tags (via the DOM interface, which > > [...] > I have only tried HTML::TreeBuilder (not DOM, but the same principle; > uses heuristic HTML parsing and patching that does some unwanted things) [...] Another way is to use libtidy (a new shared library-ized HTMLTidy, with Perl binding) and a (standard, rather than TreeBuilder) DOM Parser. I think libtidy can output both XHTML and HTML (there is such a thing as HTML DOM, remember). John
Re: a better description of the problem.
On Wed, 9 Jul 2003, Jonathan Daigle wrote: [...] > $inref->{dbh}->prepare(qq| SELECT * from affiliate_account|); # for [...] This looks like web server code. This list is for discussion of web client code. John
Re: user-agent supporting tables,vbscript, frames, etc....how?
On Wed, 9 Jul 2003, Terry wrote: [...] > If a page has frames, the main page gets returned, not > the frames. Another request has to be made or > something to get the contents of the frame. How do I > go about that? [...] The same way you grabbed the main page? Just parse out the URLs, and fetch them. Maybe the thing you're using or WWW::Mechanize has code to do that, I don't know. John
Re: user-agent supporting tables,vbscript, frames, etc....how?
On Tue, 8 Jul 2003, Terry wrote: > I am using HTTP::WebTest to do site logon simulation. > However, there are browser checks on some websites. > For example, some require that the user-agent (client) > supports frames, tables, and vbscript, is there a way > to 'trick' the server into thinking my client can > support these things? Depends what you mean by 'some require that the user-agent... supports'. If you mean they won't even send you the frames, javascript etc. (does anybody still *require* VBScript?), then you probably just need to set the User-Agent header to something like "Mozilla/5.0". If you mean they do send you those things, but HTTP::WebTest doesn't know how to handle them, obviously that's harder. Ask about particular problems you're having. John
RE: Passing the same cookie and headers to a new site
On Mon, 23 Jun 2003, Alan Olegario wrote: > I tried checking what headers are being sent with ethereal, but it looks > like I can't get the info since it's going over https and being > encrypted. [...] There are several solutions to that. Look at the message I posted here a week or two ago for details. John
Re: Passing the same cookie and headers to a new site
On Sat, 21 Jun 2003, Matthew Darwin wrote: > The LWP behaviour looks like a security problem to me. > > For example, davin.ottawa.on.ca is not related to flora.ottawa.on.ca > So if one sets a cookie the other site can get it? > Very bad. > > Canadian domains are in the form ...ca > or ..ca or .ca Cookies are a security problem, not LWP's implementation of them. The behaviour you describe is an long-established part of the Netscape cookie protocol. If you want to have your cookies only sent back to your own domain, don't give any explicit Domain attribute in the Set-Cookie header. Even there, some browsers (MSIE 5) don't require an exact domain string-match (for example, a cookie set by www.foo.com can be returned to rhubarb.www.foo.com). In fact, IIRC, some browsers allow foo.co.uk to set a cookie for the entire .co.uk domain! Don't trust Netscape's 'standard' (cookie_spec.html) further than you can spit it: nobody has ever followed it, and nobody ever will. RFC 2965 (which LWP knows about) is much more clearly defined and better thought through, but hardly anybody uses it (neither IE nor Mozilla implements it -- nor RFC 2109 for that matter). The incentives just aren't there for it to come into widespread use. I've heard rumour that the European Union may pass legislation containing requirements for which P3P (which deals with third party cookies, amongst other things) is insufficient / inappropriate, and hence open a gap that RFC 2965 might fill, but I haven't tried to verify that there is/was any truth in that. There's also the fact that RFC 2965 has unresolved Netscape-protocol (old-style cookies) interoperability issues -- errata were being discussed, but that effort seems to have stalled in the last couple of months. John
RE: Passing the same cookie and headers to a new site
On Fri, 20 Jun 2003, Alan Olegario wrote: [...] > HTTP::Cookies::extract_cookies: Set cookie SMSESSION => [cookie info] > HTTP::Cookies::extract_cookies: Set cookie FORMCRED => > HTTP::Cookies::extract_cookies: Set cookie EntFXSessionR => [cookie info] > HTTP::Cookies::extract_cookies: Set cookie LOGIN => 0 [...] > HTTP::Cookies::add_cookie_header: Checking testsite.somesite.com for cookies > HTTP::Cookies::add_cookie_header: Checking .somesite.com for cookies > HTTP::Cookies::add_cookie_header: - checking cookie path=/ > HTTP::Cookies::add_cookie_header: - checking cookie LOGIN=0 > HTTP::Cookies::add_cookie_header:it's a match > HTTP::Cookies::add_cookie_header: - checking cookie FORMCRED= > HTTP::Cookies::add_cookie_header:it's a match > HTTP::Cookies::add_cookie_header: - checking cookie EntFXSessionR=[same cookie info > as above] > HTTP::Cookies::add_cookie_header:it's a match > HTTP::Cookies::add_cookie_header: - checking cookie SMSESSION=[same cookie info as > above] > HTTP::Cookies::add_cookie_header:it's a match [...] Looks OK to me. LWP wants to send all your www.somesite.com cookies back to testsite.somesite.com. Have you checked the headers that are actually being sent (eg. ethereal)? Checking what your browser is sending and comparing with what LWP sends will probably quickly let you find the problem. If the Cookie header is there, standard answer: what other state are you forgetting about (Referer, for example)? John