[ https://issues.apache.org/jira/browse/CONNECTORS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900307#comment-13900307 ]
Florian Schmedding commented on CONNECTORS-850: ----------------------------------------------- Normally, the HTTP header contains a date value. It looks like this field is removed before computing the changes. A few values from ingeststatus: {noformat} lastingest: 12 Feb 2014 18:27:45 firstingest: 12 Feb 2014 17:00:25 (doesn't match exactly the history entry above, same for another document) lastoutputversion: 0+0++ lastversion: 0+-8+header-Accept-Ranges=bytes=+header-Connection=Keep-Alive=+header-Content-Length=7559=+header-Content-Type=application/xml=+header-ETag="142000000039b75-1d87-4f238a1156aaf"=+header-Keep-Alive=timeout\\=5, max\\=100=+header-Last-Modified=Wed, 12 Feb 2014 17:09:01 GMT=+header-Server=Apache/2.2.22 (Win32) PHP/5.4.5 mod_jk/1.2.37=+845393346261438975+.*+ changecount: 22 {noformat} Not considering the header date would explain the above fetches wihtout ingests. Hope this makes sense. > Maximum interval in dynamic crawling > ------------------------------------ > > Key: CONNECTORS-850 > URL: https://issues.apache.org/jira/browse/CONNECTORS-850 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent > Affects Versions: ManifoldCF 1.4.1 > Reporter: Florian Schmedding > Assignee: Karl Wright > Priority: Minor > Labels: features > Fix For: ManifoldCF 1.5 > > > Currently, the dynamic crawling method used for a continuous job extends the > reseed and recrawl intervals when no changes are found in a checked document. > However, it should be possible to restrict this extension to a maximum value > in order to make sure that new documents are discovered within a certain > interval. -- This message was sent by Atlassian JIRA (v6.1.5#6160)