For any given job run, all documents that are added via addSeedDocuments()
should be processed.  There is no magic in the framework that somehow knows
that a document has been created vs. modified vs. deleted until
processDocuments() is called.  If your claim is that this contract is not
being honored, could you try changing your connector model to
MODEL_ADD_CHANGE, just temporarily, to see if everything seems to work
using that model.  If it does *not* then clearly you've got some kind of
implementation problem at the addSeedDocuments() level because most of the
Manifold connectors use that model.

If MODEL_ADD_CHANGE mostly works for you, then the next step is to figure
out why MODEL_ADD_CHANGE_DELETE is failing.

Karl


On Fri, May 24, 2019 at 5:06 PM Raman Gupta <rocketra...@gmail.com> wrote:

> On Fri, May 24, 2019 at 4:41 PM Karl Wright <daddy...@gmail.com> wrote:
> >
> > For ADD_CHANGE_DELETE, the contract for addSeedDocuments() basically says
> > that you have to include *at least* the documents that were changed,
> added,
> > or deleted since the previous stamp, and if no stamp is provided, it
> should
> > return ALL specified documents.  Are you doing that?
>
> Yes, the delta API gives us all the changed, added, and deleted
> documents, and those are exactly the ones that we are including.
>
> > If you are, the next thing to look at is the computation of the version
> > string.  The version string is what is used to figure out if a change
> took
> > place.  You need this IN ADDITION TO the addSeedDocuments() doing the
> right
> > thing.  For deleted documents, obviously the processDocuments() should
> call
> > the activities.deleteDocument() method.
>
> The version String is calculated by `processDocuments`. Since after
> calling `addSeedDocuments` once for document A version 1,
> `processDocuments` is never called again for that document, even
> though it has been modified to document A version 2. Therefore, our
> connector never gets a chance to return the "version 2" string.
>
> > Does this sound like what your code is doing?
>
> Yes, as far as we can go given the fact that `processDocuments` is
> only called once for any particular document identifier.
>
> > Karl
> >
> >
> > On Fri, May 24, 2019 at 4:25 PM Raman Gupta <rocketra...@gmail.com>
> wrote:
> >
> > > My team is creating a new repository connector. The source system has
> > > a delta API that lets us know of all new, modified, and deleted
> > > individual folders and documents since the last call to the API. Each
> > > call to the delta API provides the changes, as well as a token which
> > > can be provided on subsequent calls to get changes since that token
> > > was generated/returned.
> > >
> > > What is the best approach to building a repo connector to a system
> > > that has this type of delta API?
> > >
> > > Our first design was an implementation that specifies
> > > `MODEL_ADD_CHANGE_DELETE` and then:
> > >
> > > * In addSeedDocuments, on the initial call we seed every document in
> > > the source system. On subsequent calls, we use the delta API to seed
> > > every added, modified, or deleted file. We return the delta API token
> > > as the version value of addSeedDocuments, so that it an be used on
> > > subsequent calls.
> > >
> > > * In processDocuments, we do the usual thing for each document
> identifier.
> > >
> > > On prototyping, this works for new docs, but "processDocuments" is
> > > never triggered for modified and deleted docs.
> > >
> > > A second design we are considering is to use
> > > MODEL_CHAINED_ADD_CHANGE_DELETE and have addSeedDocuments return only
> > > one "virtual" document, which represents the root of the remote repo.
> > >
> > > Then, in "processDocuments" the new "document" is used to determine
> > > all the child documents of that delta call, which are then added to
> > > the queue via `activities.addDocumentReference`. To force the "virtual
> > > seed" to trigger processDocuments again on the next call to
> > > `addSeedDocuments`, we do `activities.deleteDocument(virtualDocId)` as
> > > well.
> > >
> > > With this alternative design, the stage 1 seed effectively becomes a
> > > no-op, and is just used as a mechanism to trigger stage 2.
> > >
> > > Thoughts?
> > >
> > > Regards,
> > > Raman Gupta
> > >
>

Reply via email to