I've been interested in using Nutch in a corporate environment where most content 
requires
authentication.  I've begun implementing the changes required to include an
HttpAuthentication set of interfaces and classes in order to support this (my initial 
plan
is to key off realms).  However, I have found an issue in the implementation of
Content.java (and subclasses) which may not make this process as clean as possible.  
The
metadata information is stored via Properties which implements Map only allows a single
value for a given key.  Authentication allows for multiple WWW-Authenticate headers so
that the client can create a new request and choose any of the given challenges as the
method to authenticate.

I have reviewed the HTTP protocol (RFC 1945) and it does allow for multiple headers 
using
the same name - which makes me think that there may be other headers (now or in the
future) that would require multiple values.  I have created a class called
MultipleProperties which will handle this however it breaks the contract of the Map
interface.

Is anyone else interested in this type of use of nutch?  Should the implementation be 
left
as is even though there may be headers that are currently being missed?

My initial code has successfully used the MultiProperties class to collect multiple
key/value pairs with the same key.  I have also authenticated using Basic 
authentication
at this point and plan to continue developing various authentication schemes.

Before I submit anything I'll wait to hear responses.

Matt




-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to