I am using Nutch 0.9 parsing framework on its own. I create a Content with a contentType text/plain; charset="windows-1251". However, Content does not preserve the charset part of the content type, so when the TextParser calls
String encoding = StringUtil.parseCharacterEncoding(content.getContentType()); it always gets null because the contentType no longer contains the charset string. I see from the trunk that all this has changed quite a lot and I read about the changes, but I'm not sure if I'm doing something wrong or if it ever worked. Can anyone confirm is this is a known problem and if there is a simple known solution- I could simply store the full contentType and add a new method to get that, which is then used in TextParers, but is there a more elegant solution. Thanks Antony