[ https://issues.apache.org/jira/browse/CONNECTORS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900198#comment-13900198 ]
Florian Schmedding commented on CONNECTORS-850: ----------------------------------------------- What contributes to a document change - anything besides the content, e.g., HTTP header fields? The content was only changed at the time indicated by the "***" note. The document is served by an Apache http server on localhost. I used a modified webcrawler connector that recognizes links in a custom xml format (it parses the xml and extracts the links and a document id, nothing else). > Maximum interval in dynamic crawling > ------------------------------------ > > Key: CONNECTORS-850 > URL: https://issues.apache.org/jira/browse/CONNECTORS-850 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent > Affects Versions: ManifoldCF 1.4.1 > Reporter: Florian Schmedding > Assignee: Karl Wright > Priority: Minor > Labels: features > Fix For: ManifoldCF 1.5 > > > Currently, the dynamic crawling method used for a continuous job extends the > reseed and recrawl intervals when no changes are found in a checked document. > However, it should be possible to restrict this extension to a maximum value > in order to make sure that new documents are discovered within a certain > interval. -- This message was sent by Atlassian JIRA (v6.1.5#6160)