<property>
  <name>parser.caching.forbidden.policy</name>
  <value>content</value>
  <description>If a site (or a page) requests through its robot metatags
that it should not be shown as cached content, apply this policy. Currently
  three keywords are recognized: "none" ignores any "noarchive" directives.
  "content" doesn't show the content, but shows summaries (snippets).
  "all" doesn't show either content or summaries.</description>
</property>

in nutch-default.xml will control how Nutch deals with "<meta http-equiv="Pragma" content="no-cache" />" If you do want to cache those pages, set that parameter in your nutch-site.xml

Justin

Bartosz Gadzimski wrote:
Yves Yu pisze:
Hi, all,
My nutch can viewed cache correctly by most pages, but some pages cannot.
Always said like following:

Display of this content was administratively prohibited by the webmaster.
You may visit the original page instead:
http://forum.laopdr.gov.la/forums/list.page.

Any reasons?

Thanks
Yves

Maybe because of tag on website:

<meta http-equiv="Pragma" content="no-cache" />


Reply via email to