Hi Julien,

This is a complex question and the framework behaves differently depending
on the connector model.  Please read:

https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs

Karl


On Wed, Oct 24, 2018 at 5:26 AM Julien Massiera <
julien.massi...@francelabs.com> wrote:

> Hi Karl,
>
> I am trying to understand the behavior of ManifoldCF during a re-crawl
> and specially how missing documents are deleted and by which process ?
>
> I am focusing on two repository connectors, the JCIFS one and the JDBC
> one. Here is what I understand so far :
>
> In the JCIFS connector, the addSeedDocuments method list all the files
> found for each configured path. So it seems clear that any previously
> crawled files that have not been listed during a re-crawl by this method
> should be deleted.
>
> In the JDBC connector, the addSeedDocuments method only list the new or
> modified documents during a re-crawl (if, of course, the id query is
> correctly using the starttime and endtime variables). So here, there is
> a difference between the two connectors. It means that to delete missing
> documents, the previously crawled ones need to be 'checked' with the
> version query to detect the documents that must be removed.
>
> I am currently unable to tell what is really performed by ManifoldCF to
> deal with documents to delete and if any of the assumptions I exposed
> above are correct and/or used. Also, I am really interested to know
> which part of the code is performing the delete process.
>
> Thanks for your help.
>
> --
> Julien MASSIERA
> Directeur développement produit
> France Labs – Les experts du Search
> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
> www.francelabs.com
>
>

Reply via email to