Hi Rushikesh, I don't have any experience with this specific plugin, but I have run across similar problems, with 2 possible reasons: 1. It is possible that this specific site does not properly declare what encoding it is using, and the browser guesses the correct one. 2. You may have run across https://issues.apache.org/jira/browse/NUTCH-1807. I solved a similar problem by setting the environment variable LC_ALL to en_US.UTF-8 for all Hadoop processes (more specifically, adding `export LC_ALL=en_US.UTF-8` in ~hadoop/.bashrc on all Hadoop machines solved the problem for me).
Yossi. > -----Original Message----- > From: Rushi [mailto:rushikeshmod...@gmail.com] > Sent: 25 January 2018 16:32 > To: user@nutch.apache.org; Mark Vega <veg...@uci.edu> > Subject: Bayan Group Extractor plugin for Nutch-Spanish Accent Character Issue > > Hello Everyone, > I am having an issue while crawling the spanish website,some the accent > characters are not converting properly. > Here is an example Infección (wrong one)should be Infección (correct ). > > Note:This is with *Bayan Group Extractor plugin.* Is there any change that i > need to make to convert correctly. > > -- > Regards > Rushikesh M > .Net Developer