Hi I try to use create-date or modified-time ergo the lastModified tag from html-pages.
I found this similar postings, but barely helpful: http://www.mail-archive.com/[email protected]/msg12884.html(2009) http://www.mail-archive.com/[email protected]/msg09542.html(2007) http://www.mail-archive.com/[email protected]/msg07300.html(2007) http://www.mail-archive.com/[email protected]/msg09548.html(2007) http://www.mail-archive.com/[email protected]/msg08668.html(2007) http://www.mail-archive.com/[email protected]/msg01956.html(2005) If I start nutch-1.0 using as intranet crawl, but regardless setting index-more and query-more (in nutch-site.xml), lastModified is all over 0 respectively modified time is 01:00:00 CET 1970. So I ask me why Nutch-1.0 doesn't use date respectively last-modified tags from following header-sample? http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <head> <meta http-equiv="Pragma" content="no-cache" /> <meta http-equiv="Expires" content="-1" /> <meta http-equiv="Last-modified" content="Sun, 3 May 2009 21:23:00 GMT" /> <meta name="date" content="2009-05-03" /> <title>some title</title> ... <meta name="title" content="our organisation" /> <meta name="language" content="de" /> <meta name="subject" content="our topics" /> ... </script></head> Any help is greatly appreciated. Thanks, MnT
