I am using Nutch 0.9 parsing framework on its own.  I create a Content with a 
contentType text/plain; charset="windows-1251".  However, Content does not 
preserve the charset part of the content type, so when the TextParser calls

String encoding = StringUtil.parseCharacterEncoding(content.getContentType());

it always gets null because the contentType no longer contains the charset 
string.

I see from the trunk that all this has changed quite a lot and I read about the 
changes, but I'm not sure if I'm doing something wrong or if it ever worked.

Can anyone confirm is this is a known problem and if there is a simple known 
solution-  I could simply store the full contentType and add a new method to 
get 
that, which is then used in TextParers, but is there a more elegant solution.

Thanks
Antony


Reply via email to