Hi Rafa, You probably need to do a few things to get your connector working right. First, what connector model are you using? MODEL_ALL is the default, and it tells ManifoldCF that your seeding method supplies ALL matching documents, and that's probably not right. Maybe you want MODEL_ADD_CHANGE instead. Second, please be sure your connector deals properly with the situation where the previous seeding string is empty. The seeding string is set to empty whenever someone changes the document specification for a job. In that case, you should always seed as if from the beginning of time.
I will not have a chance to review your code for a while due to other issues I'm currently looking at, but based on your description of the problem, you've probably chosen the wrong seeding model. Thanks, Karl On Wed, Sep 17, 2014 at 10:41 AM, Rafa Haro <[email protected]> wrote: > Hi folks, > > We have been working on an “unofficial” Alfresco connector that currently > is more or less working for Manifold 1.7. You can check the code here: > https://github.com/rafaharo/alfresco-webscript-manifold-connector. The > README.md file is out of date, so please ignore it. Basically, this > connector is using a client that consumes a set of Alfresco webscritps for > dealing with content and metadata crawling. Documents seeding is based on > Alfresco transactions, so the connector keeps asking alfresco for a > concrete number of transactions until no new transactions are found. The > transactions info, among others things, indicates if a documents has been > deleted so, later, while processing the documents, those documents are > marked to be deleted. > > In the first run, all the available documents identifiers are seeded. In > the next runs, we thought to seed only those documents affected by new > transactions (new documents, any change at any level or deletions). And > this is what is happening right now: for example, if there is not new > transactions, any document is seeded and the whole index is purged (all the > previous indexed documents are deleted). > > My question is: is this a normal behavior ? How can we avoid it? Is there > any configuration option for the jobs? We have read about minimal and > complete runs, but it is still not clear for us. > > Thanks a lot! > Cheers, > Rafa > > >
