[ https://issues.apache.org/jira/browse/NUTCH-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Minyao Zhu updated NUTCH-807: ----------------------------- Summary: JSParseFilter produces malformed URL (was: JSParseFilter produces weired URL) > JSParseFilter produces malformed URL > ------------------------------------ > > Key: NUTCH-807 > URL: https://issues.apache.org/jira/browse/NUTCH-807 > Project: Nutch > Issue Type: Bug > Components: parser > Affects Versions: 1.0.0 > Environment: Redhat 2.6.18-128.1.6.el5PAE i686 i686 i386 GNU/Linux > Reporter: Minyao Zhu > > This is found when crawling site: http://zhidao.baidu.com/ ( a Chinese > language site ) > It appears this page contains javascripts which confused JSParseFilter, which > produced URL like this: > http://zhidao.baidu.com/){if(A===46){baidu.hide( > Not sure the impact/scope of this issue in general. The observation for this > specific site is, much less pages got crawled. > Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.