Hi, I use nutch 0.9 to crawl some Chinese web site, and search using nutch web portal but found that cached html copy display incorrectly. Then I use "bin/nutch readseg -dump" to dump segments : ParseText(UTF-8) display correctly, but the Chinse character in Content display incorrectly as '?'.--the original html uses gd2312 charset.
What's the possible cause? And how to fix? Thanks in advance, Xiong ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general