[ https://issues.apache.org/jira/browse/NUTCH-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003281#comment-13003281 ]
Nikos Mastropavlos commented on NUTCH-946: ------------------------------------------ Having tried this on some Greek websites with encoding Windows-1253, the correct meta name seems to be "Content-Encoding" instead of "CharEncodingForConversion". So, using the patch described above and adding a if (encoding==null) encoding = (String) parseMetaData.get("Content-Encoding"); right after the CharEncodingForConversion search, seemed to do the trick for me. > cache.jsp does not recognize encoding conversion from content different to > UTF-8 > -------------------------------------------------------------------------------- > > Key: NUTCH-946 > URL: https://issues.apache.org/jira/browse/NUTCH-946 > Project: Nutch > Issue Type: Bug > Components: web gui > Affects Versions: 1.2 > Environment: Server version: Apache Tomcat/6.0.29 > Server built: July 19 2010 1458 > Server number: 6.0.0.29 > OS Name: Linux > OS Version: 2.6.18-128.7.1.el5 > Architecture: i386 > JVM Version: 1.6.0_22-b04 > JVM Vendor: Sun Microsystems Inc. > Reporter: Enrique Berlanga > Priority: Minor > Attachments: cache-946.patch > > > Cache view does not recognize encoding conversion needed to show properly > page content stored in a segment. > The problem is that it searchs "CharEncodingForConversion" meta in content > metadata, but it's stored in parse metadata. > Here is the patch I've generated for the fixed version: > ### Eclipse Workspace Patch 1.0 > #P branch-1.2 > Index: src/web/jsp/cached.jsp > =================================================================== > --- src/web/jsp/cached.jsp (revision 1027060) > +++ src/web/jsp/cached.jsp (working copy) > @@ -39,17 +39,18 @@ > ResourceBundle.getBundle("org.nutch.jsp.cached", request.getLocale()) > .getLocale().getLanguage(); > > - Metadata metaData = bean.getParseData(details).getContentMeta(); > + Metadata contentMetaData = bean.getParseData(details).getContentMeta(); > + Metadata parseMetaData = bean.getParseData(details).getParseMeta(); > > String content = null; > - String contentType = (String) metaData.get(Metadata.CONTENT_TYPE); > + String contentType = (String) contentMetaData.get(Metadata.CONTENT_TYPE); > if (contentType.startsWith("text/html")) { > // FIXME : it's better to emit the original 'byte' sequence > // with 'charset' set to the value of 'CharEncoding', > // but I don't know how to emit 'byte sequence' in JSP. > // out.getOutputStream().write(bean.getContent(details)) may work, > // but I'm not sure. > - String encoding = (String) metaData.get("CharEncodingForConversion"); > + String encoding = (String) > parseMetaData.get("CharEncodingForConversion"); > if (encoding != null) { > try { > content = new String(bean.getContent(details), encoding); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira