cache.jsp does not recognize encoding conversion from content different to UTF-8 --------------------------------------------------------------------------------
Key: NUTCH-946 URL: https://issues.apache.org/jira/browse/NUTCH-946 Project: Nutch Issue Type: Bug Components: web gui Affects Versions: 1.2 Environment: Server version: Apache Tomcat/6.0.29 Server built: July 19 2010 1458 Server number: 6.0.0.29 OS Name: Linux OS Version: 2.6.18-128.7.1.el5 Architecture: i386 JVM Version: 1.6.0_22-b04 JVM Vendor: Sun Microsystems Inc. Reporter: Enrique Berlanga Priority: Minor Cache view does not recognize encoding conversion needed to show properly page content stored in a segment. The problem is that it searchs "CharEncodingForConversion" meta in content metadata, but it's stored in parse metadata. Here is the patch I've generated for the fixed version: ### Eclipse Workspace Patch 1.0 #P branch-1.2 Index: src/web/jsp/cached.jsp =================================================================== --- src/web/jsp/cached.jsp (revision 1027060) +++ src/web/jsp/cached.jsp (working copy) @@ -39,17 +39,18 @@ ResourceBundle.getBundle("org.nutch.jsp.cached", request.getLocale()) .getLocale().getLanguage(); - Metadata metaData = bean.getParseData(details).getContentMeta(); + Metadata contentMetaData = bean.getParseData(details).getContentMeta(); + Metadata parseMetaData = bean.getParseData(details).getParseMeta(); String content = null; - String contentType = (String) metaData.get(Metadata.CONTENT_TYPE); + String contentType = (String) contentMetaData.get(Metadata.CONTENT_TYPE); if (contentType.startsWith("text/html")) { // FIXME : it's better to emit the original 'byte' sequence // with 'charset' set to the value of 'CharEncoding', // but I don't know how to emit 'byte sequence' in JSP. // out.getOutputStream().write(bean.getContent(details)) may work, // but I'm not sure. - String encoding = (String) metaData.get("CharEncodingForConversion"); + String encoding = (String) parseMetaData.get("CharEncodingForConversion"); if (encoding != null) { try { content = new String(bean.getContent(details), encoding); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.