Nick Lothian wrote:
You may also be able to extract some useful information from the character encoding (available in the Content-Type header - see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11).
Obviously this won't always be useful, but encodings like Shift-JIS are
pretty good indicators of the language (Japanese in that case)
good point, need to dig more into that.
-- Sami Siren
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
