<property>
<name>parser.caching.forbidden.policy</name>
<value>content</value>
<description>If a site (or a page) requests through its robot metatags
that it should not be shown as cached content, apply this policy.
Currently
three keywords are recognized: "none" ignores any "noarchive" directives.
"content" doesn't show the content, but shows summaries (snippets).
"all" doesn't show either content or summaries.</description>
</property>
in nutch-default.xml will control how Nutch deals with "<meta
http-equiv="Pragma" content="no-cache" />"
If you do want to cache those pages, set that parameter in your
nutch-site.xml
Justin
Bartosz Gadzimski wrote:
Yves Yu pisze:
Hi, all,
My nutch can viewed cache correctly by most pages, but some pages cannot.
Always said like following:
Display of this content was administratively prohibited by the webmaster.
You may visit the original page instead:
http://forum.laopdr.gov.la/forums/list.page.
Any reasons?
Thanks
Yves
Maybe because of tag on website:
<meta http-equiv="Pragma" content="no-cache" />