[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025946#comment-13025946 ]
Gabriele Kahlout edited comment on NUTCH-990 at 4/27/11 6:51 PM: ----------------------------------------------------------------- A test page: <!DOCTYPE html> <html> <head> <title></title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> artificial rucksack </body> </html> Even with a title it doesn't work. was (Author: simpatico): A test page: <!DOCTYPE html> <html> <head> <title></title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> artificial rucksack </body> </html> Maybe the problem is the missing title? > protocol-httpclient fails with short pages > ------------------------------------------ > > Key: NUTCH-990 > URL: https://issues.apache.org/jira/browse/NUTCH-990 > Project: Nutch > Issue Type: Bug > Components: fetcher > Reporter: Gabriele Kahlout > Priority: Minor > Fix For: 1.3 > > Attachments: hadoop.log > > > Using protocol-http with a few words html pages works fine. But with > protocol-httpclient the same pages disappear from the index, although they > are still fetched. > Those small files are useful for quick testing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira