Bug in Content+TextParser?

Bowesman Antony Wed, 14 May 2008 02:28:11 -0700

I am using Nutch 0.9 parsing framework on its own.  I create a Content with a 
contentType text/plain; charset="windows-1251".  However, Content does not 
preserve the charset part of the content type, so when the TextParser calls


String encoding = StringUtil.parseCharacterEncoding(content.getContentType());

it always gets null because the contentType no longer contains the charset 
string.

I see from the trunk that all this has changed quite a lot and I read about the 
changes, but I'm not sure if I'm doing something wrong or if it ever worked.

Can anyone confirm is this is a known problem and if there is a simple known 
solution-  I could simply store the full contentType and add a new method to 
get 
that, which is then used in TextParers, but is there a more elegant solution.

Thanks
Antony

Bug in Content+TextParser?

Reply via email to