On Fri, May 24, 2019 at 4:41 PM Karl Wright <daddy...@gmail.com> wrote:
>
> For ADD_CHANGE_DELETE, the contract for addSeedDocuments() basically says
> that you have to include *at least* the documents that were changed, added,
> or deleted since the previous stamp, and if no stamp is provided, it should
> return ALL specified documents.  Are you doing that?

Yes, the delta API gives us all the changed, added, and deleted
documents, and those are exactly the ones that we are including.

> If you are, the next thing to look at is the computation of the version
> string.  The version string is what is used to figure out if a change took
> place.  You need this IN ADDITION TO the addSeedDocuments() doing the right
> thing.  For deleted documents, obviously the processDocuments() should call
> the activities.deleteDocument() method.

The version String is calculated by `processDocuments`. Since after
calling `addSeedDocuments` once for document A version 1,
`processDocuments` is never called again for that document, even
though it has been modified to document A version 2. Therefore, our
connector never gets a chance to return the "version 2" string.

> Does this sound like what your code is doing?

Yes, as far as we can go given the fact that `processDocuments` is
only called once for any particular document identifier.

> Karl
>
>
> On Fri, May 24, 2019 at 4:25 PM Raman Gupta <rocketra...@gmail.com> wrote:
>
> > My team is creating a new repository connector. The source system has
> > a delta API that lets us know of all new, modified, and deleted
> > individual folders and documents since the last call to the API. Each
> > call to the delta API provides the changes, as well as a token which
> > can be provided on subsequent calls to get changes since that token
> > was generated/returned.
> >
> > What is the best approach to building a repo connector to a system
> > that has this type of delta API?
> >
> > Our first design was an implementation that specifies
> > `MODEL_ADD_CHANGE_DELETE` and then:
> >
> > * In addSeedDocuments, on the initial call we seed every document in
> > the source system. On subsequent calls, we use the delta API to seed
> > every added, modified, or deleted file. We return the delta API token
> > as the version value of addSeedDocuments, so that it an be used on
> > subsequent calls.
> >
> > * In processDocuments, we do the usual thing for each document identifier.
> >
> > On prototyping, this works for new docs, but "processDocuments" is
> > never triggered for modified and deleted docs.
> >
> > A second design we are considering is to use
> > MODEL_CHAINED_ADD_CHANGE_DELETE and have addSeedDocuments return only
> > one "virtual" document, which represents the root of the remote repo.
> >
> > Then, in "processDocuments" the new "document" is used to determine
> > all the child documents of that delta call, which are then added to
> > the queue via `activities.addDocumentReference`. To force the "virtual
> > seed" to trigger processDocuments again on the next call to
> > `addSeedDocuments`, we do `activities.deleteDocument(virtualDocId)` as
> > well.
> >
> > With this alternative design, the stage 1 seed effectively becomes a
> > no-op, and is just used as a mechanism to trigger stage 2.
> >
> > Thoughts?
> >
> > Regards,
> > Raman Gupta
> >

Reply via email to