Hi Folks, There is an issue with protocol-file plugin in while fetching files that contain CJK characters in the file name. JIRA Nutch 968
After I checked the code, I discovered that the problem due to the encoding in the file name while fetching the directory. After changing couple of lines as discussed in the JIRA Nutch 968, the issue is resolved. I see the issue is still open in JIRA and the latest nutch release has no fix in it yet. I like to discuss further on the solution I have here in the list and submit the patch once fine. Anyone in for it? I can elaborate further more on the fix. Cheers, Ye