[ 
https://issues.apache.org/jira/browse/CONNECTORS-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900307#comment-13900307
 ] 

Florian Schmedding commented on CONNECTORS-850:
-----------------------------------------------

Normally, the HTTP header contains a date value. It looks like this field is 
removed before computing the changes. A few values from ingeststatus:
{noformat}
lastingest: 12 Feb 2014 18:27:45
firstingest: 12 Feb 2014 17:00:25 (doesn't match exactly the history entry 
above, same for another document)
lastoutputversion: 0+0++
lastversion: 
0+-8+header-Accept-Ranges=bytes=+header-Connection=Keep-Alive=+header-Content-Length=7559=+header-Content-Type=application/xml=+header-ETag="142000000039b75-1d87-4f238a1156aaf"=+header-Keep-Alive=timeout\\=5,
 max\\=100=+header-Last-Modified=Wed, 12 Feb 2014 17:09:01 
GMT=+header-Server=Apache/2.2.22 (Win32) PHP/5.4.5 
mod_jk/1.2.37=+845393346261438975+.*+
changecount: 22
{noformat}

Not considering the header date would explain the above fetches wihtout 
ingests. Hope this makes sense.

> Maximum interval in dynamic crawling
> ------------------------------------
>
>                 Key: CONNECTORS-850
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-850
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework crawler agent
>    Affects Versions: ManifoldCF 1.4.1
>            Reporter: Florian Schmedding
>            Assignee: Karl Wright
>            Priority: Minor
>              Labels: features
>             Fix For: ManifoldCF 1.5
>
>
> Currently, the dynamic crawling method used for a continuous job extends the 
> reseed and recrawl intervals when no changes are found in a checked document. 
> However, it should be possible to restrict this extension to a maximum value 
> in order to make sure that new documents are discovered within a certain 
> interval.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to