Hi Julien, This is a complex question and the framework behaves differently depending on the connector model. Please read:
https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs Karl On Wed, Oct 24, 2018 at 5:26 AM Julien Massiera < julien.massi...@francelabs.com> wrote: > Hi Karl, > > I am trying to understand the behavior of ManifoldCF during a re-crawl > and specially how missing documents are deleted and by which process ? > > I am focusing on two repository connectors, the JCIFS one and the JDBC > one. Here is what I understand so far : > > In the JCIFS connector, the addSeedDocuments method list all the files > found for each configured path. So it seems clear that any previously > crawled files that have not been listed during a re-crawl by this method > should be deleted. > > In the JDBC connector, the addSeedDocuments method only list the new or > modified documents during a re-crawl (if, of course, the id query is > correctly using the starttime and endtime variables). So here, there is > a difference between the two connectors. It means that to delete missing > documents, the previously crawled ones need to be 'checked' with the > version query to detect the documents that must be removed. > > I am currently unable to tell what is really performed by ManifoldCF to > deal with documents to delete and if any of the assumptions I exposed > above are correct and/or used. Also, I am really interested to know > which part of the code is performing the delete process. > > Thanks for your help. > > -- > Julien MASSIERA > Directeur développement produit > France Labs – Les experts du Search > Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC > www.francelabs.com > >