Related issue? http://www.mail-archive.com/[email protected]/msg06135.html
[EMAIL PROTECTED] wrote: > Hi all. > > I have a problem in config nutch-default.xml. As I am in China, most ftp > sites that I want to crawl are encoded in chinese, but when nutch crawl these > ftp sites,it could not get the correct charset code,and the parse results are > incomprehensible and useless. so I set <property> > <name>parser.character.encoding.default</name> > <value>windows-1252</value> > </property> > to <value>gb2312</value> and got a very interesting result, nutch now can > crawl the files and directories of the root directoy of chinese ftp sites > without any messy characters,but can NOT crawl any files in > SUBdirectories,just got a result :404 no found. > I know there must be something wrong in config files but how and where can I > config nutch to crawl a chinese ftp site? > I 've been working on this problem for halt a month and find no way to solve > it, Could anyone helo me??? > > thanks > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
