Hi, I am working on integrating manifoldcf or mcf with alfresco cms as repository connector using CMIS query and using solr as output channel where all index are stored. I am able to do it fine & can search documents in solr index.
Now as part of implementation, i am planing to introduce multiple repository such as sharepoint, file systems etc. so now i have three document repositories : alfresco, sharepoint & filesystem. I am planning to have scheduled jobs which run through each of repositories and crawl these at particular intervals. But i have following contentions. 1. Although i am scheduling jobs for frequent intervals, i want to make sure that mcf jobs pick only those content which are either added new or updated say i have 100 docs dring current job run but say 110 at next job run so i only want to run jobs for new 10 docs not entire 110 docs. 2. As there are relatively lesser mcf tutorials available, i have no means to ensure that mcf jobs behaves this way but i assume it is intelligent enough to behave this way but again no proof to substantiate it. 3. I want to know more about mcf job schedule type : scan every document once/rescan documents directly. Similarly i want to know more about job invocation : complete/minimal. i would be sorry for being a newbie. 4. Also i am considering about doing some custom coding to ensure that only latest/updated docs are eligible for processing but again going thru code only as less documentation available. 5. Is it wise to doc custom coding in this case or mcf provides all these features OOTB. I would appreciate for any response. Regards, Lalit Jangra.
