Re: Credentials in LWP

Sean M. Burke Tue, 02 Jul 2002 14:41:03 -0700

At 09:53 2002-07-02 -0500, Kenny G. Dubuisson, Jr. wrote:
>[...]I don't understand what it means by NetLoc and Realm[...]


Try hitting http://www.unicode.org/mail-arch/unicode-ml and it'll 
say  "enter your username and password for 
Unicode-MailList-Archives".  That "Unicode-MailList-Archives" string is the 
realm name.  NetLoc is the hostname plus colon plus the port number, by 
default ":80" -- in this case, "www.unicode.org:80".

Here's an extract from chapter 11 of my new book, /Perl and LWP/
(<http://www.amazon.com/exec/obidos/ASIN/0596001789>)
which you might find useful and worth buying:



Authenticating via LWP

To add a username and password to a browser object's key ring, call the 
credentials method on a user agent object:

   $browser->credentials(
     'servername:portnumber',
     'realm-name',
     'username' => 'password'
   );

In most cases, the port number is 80, the default TCP/IP port for HTTP. For 
example:

   my $browser = LWP::UserAgent->new;
   $browser->name('ReportsBot/1.01');
   $browser->credentials(
     'reports.mybazouki.com:80',
     'web_server_usage_reports',
     'plinky' => 'banjo123'
   );
   my $response = $browser->get(
     'http://reports.mybazouki.com/this_week/'
   );

One can call the credentials method any number of times, to add all the 
server-port-realm-username-password keys to the browser's key ring, 
regardless of whether they'll actually be needed. For example, you could 
read them all in from a datafile at startup:

   my $browser = LWP::UserAgent->new( );
   if(open(KEYS, "< keyring.dat")) {
     while(<KEYS>) {
       chomp;
       my @info = split "\t", $_, -1;
       $browser->credential(@info) if @info == 4;
     }
     close(KEYS);
   }


Security

Clearly, storing lots of passwords in a plain text file is not terribly 
good security practice, but the obvious alternative is not much better: 
storing the same data in plain text in a Perl file. One could make a point 
of prompting the user for the information every time,* instead of storing 
it anywhere at all, but clearly this is useful only for interactive 
programs (as opposed to a programs run by crontab, for example). In any 
case, HTTP Basic Authentication is not the height of security: the username 
and password are normally sent unencrypted. This and other security 
shortcomings with HTTP Basic Authentication are explained in greater detail 
in RFC 2617. See the Preface for information on where to get a copy of RFC 
2617.

* In fact, Ave Wrigley wrote a module to do exactly that. It's not part of 
the LWP distribution, but it's available in CPAN as LWP::AuthenAgent. The 
author describes it as "a simple subclass of LWP::UserAgent to allow the 
user to type in username/password information if required for authentication."



An HTTP Authentication Example: The Unicode Mailing Archive

Most password-protected sites (whether protected via HTTP Basic 
Authentication or otherwise) are that way because the sites' owners don't 
want just anyone to look at the content. And it would be a bit odd if I 
gave away such a username and password by mentioning it in this book! 
However, there is one well-known site whose content is password protected 
without being secret: the mailing list archive of the Unicode mailing lists.

In an effort to keep email-harvesting bots from finding the Unicode mailing 
list archive while spidering the Web for fresh email addresses, the 
Unicode.org sysadmins have put a password on that part of their site. But 
to allow people (actual not-bot humans) to access the site, the site 
administrators publicly state the password, on an unprotected page, at 
http://www.unicode.org/mail-arch/, which links to the protected part, but 
also states the username and password you should use.

The main Unicode mailing list (called unicode) once in a while has a thread 
that is really very interesting and you really must read, but it's buried 
in a thousand other messages that are not even worth downloading, even in 
digest form. Luckily, this problem meets a tidy solution with LWP: I've 
written a short program that, on the first of every month, downloads the 
index of all the previous month's messages and reports the number of 
messages that has each topic as its subject.

The trick is that the web pages that list this information are password 
protected. Moreover, the URL for the index of last month's posts is 
different every month, but in a fairly obvious way. The URL for March 2002, 
for example, is:
   http://www.unicode.org/mail-arch/unicode-ml/y2002-m03/

Deducing the URL for the month that has just ended is simple enough:

   # To be run on the first of every month...
   use POSIX ('strftime');
   my $last_month = strftime("y%Y-m%m", localtime(time - 24 * 60 * 60));
   # Since today is the first, one day ago (24*60*60 seconds) is in
   # last month.
   my $url = "http://www.unicode.org/mail-arch/unicode-ml/$last_month/";;

But getting the contents of that URL involves first providing the username 
and password and realm name. The Unicode web site doesn't publicly declare 
the realm name, because it's an irrelevant detail for users with 
interactive browsers, but we need to know it for our call to the credential 
method. To find out the realm name, try accessing the URL in an interactive 
browser. The realm will be shown in the authentication dialog box, as shown 
in Figure 11-1.

In this case, it's "Unicode-MailList-Archives," which is all we needed to 
make our request.

   my $browser = LWP::UserAgent->new;
   $browser->credentials(
     'www.unicode.org:80', # Don't forget the ":80"!
     # This is no secret...
     'Unicode-MailList-Archives',
     'unicode-ml' => 'unicode'
   );
   print "Getting topics for last month, $last_month\n",
         " from $url\n";
   my $response = $browser->get($url);
   die "Error getting $url: ", $response->status_line
    if $response->is_error;

If this fails (if the Unicode site's admins have changed the username or 
password or even the realm name), that will die with this error message:

   Error getting http://www.unicode.org/mail-arch/unicode-ml/y2002-m03/:
   401 Authorization Required at unicode_list001.pl line 21.

But assuming the authorization data is correct, the page is retrieved as if 
it were a normal, unprotected page. From there, counting the topics and 
noting the absolute URL of the first message of each thread is a matter of 
extracting data from the HTML source and reporting it concisely.

   my(%posts, %first_url);
   while( ${ $response->content_ref }
     =~ m{<li><a href="(\d+.html)"><strong>(.*?)</strong>}g
     # Like: <li><a href="0127.html"><strong>Klingon</strong>
   ) {
     my($url, $topic) = ($1,$2);
     # Strip any number of "Re:" prefixes.
     while( $topic =~ s/^Re:\s+//i ) {}
     ++$posts{$topic};
     use URI; # For absolutizing URLs...
     $first_url{$topic} ||= URI->new_abs($url, $response->base);
   }
   print "Topics:\n", reverse sort map # Most common first:
     sprintf("% 5s %s\n %s\n",
     $posts{$_}, $_, $first_url{$_}
   ), keys %posts;

Typical output starts out like this:

   Getting topics for last month, y2002-m02
   from http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/
   Topics:
   86 Unicode and Security
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0021.html
   47 ISO 3166 (country codes) Maintenance Agency Web pages move
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0390.html
   41 Unicode and end users
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0260.html
   27 Unicode Search Engines
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0360.html
   22 Smiles, faces, etc
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0275.html
   18 This spoofing and security thread
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0216.html
   16 Standard Conventions and euro
    http://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0418.html

This continues for a few pages.

[end extract]

--
Sean M. Burke    http://www.spinn.net/~sburke/

Re: Credentials in LWP

Reply via email to