Hi John,
Let me clean up the code a bit and then post it. If my understanding of the HTTP
protocol
is correct, the client requests a resource and if credentials are required the server
returns a 401 with WWW-Authenticate header(s). If the client can authenticate it will
re-request the page supplying credentials. I'm guessing that the client can supply the
credentials on the first request but that may be a bit too soon without knowing the
proper
authentication type and/or realm.
I do think that the authentication information can be for both hosts and realms and or
combinations but I think that realm is probably the first we should target (although we
should build the code so that it can handle either). The benefit of a realm is that it
can cover multiple hosts -- if an organization has setup their servers this way.
I'll try to post the code today.
Matt
[EMAIL PROTECTED]
Sent by: To: [EMAIL PROTECTED]
[EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL
PROTECTED]
eforge.net Subject: Re:
[Nutch-dev] Http Protocol
07/06/2004 11:59 PM
Please respond to dev
Matt,
On Fri, Jul 02, 2004 at 10:37:32PM -0700, Matt Tencati wrote:
> I've been interested in using Nutch in a corporate environment where most content
requires
> authentication. I've begun implementing the changes required to include an
> HttpAuthentication set of interfaces and classes in order to support this (my initial
plan
> is to key off realms). However, I have found an issue in the implementation of
> Content.java (and subclasses) which may not make this process as clean as possible.
> The
> metadata information is stored via Properties which implements Map only allows a
> single
> value for a given key. Authentication allows for multiple WWW-Authenticate headers
> so
> that the client can create a new request and choose any of the given challenges as
> the
> method to authenticate.
>
> I have reviewed the HTTP protocol (RFC 1945) and it does allow for multiple headers
using
> the same name - which makes me think that there may be other headers (now or in the
> future) that would require multiple values. I have created a class called
> MultipleProperties which will handle this however it breaks the contract of the Map
> interface.
Hi, Matt,
Could you post your code? Does not have to be a working patch.
It will be easier to discuss.
One other thing: is this HttpAuthentication information in metadata
of every page? If so, may be redundant for large site? Need to be host based
or realm based?
John
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers