Donald Van den Driessche created CONNECTORS-1602:
----------------------------------------------------

             Summary: Continuous crawling doesn't recrawl everything
                 Key: CONNECTORS-1602
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1602
             Project: ManifoldCF
          Issue Type: Bug
          Components: Web connector
            Reporter: Donald Van den Driessche


When crawling a website in continuous crawling mode we saw that not all 
documents are recrawled.

The site is quite extensive. We figured out that after crawling a document/page 
gets a recrawl timestamp in between the recrawl interval and max recrawl 
interval.

But if these values occur within the first crawl, Manifold starts recrawling 
those, but seems to ignore the rest of the website. Also sometimes documents 
get recrawled 5 times while other don't get recrawled. Apparently due to the 
same issue.

 

Is it possible to shed a bit more light on the continuous crawling?

Is it a good system to use for crawling a (extensive) website?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to