My application ingests documents that need to be broken up into
subdocuments.  We want the process to be atomic so our initial approach
was to run in within a single CPF pipeline.  

While this works fine for small documents we have encountered larger
documents that time out because processing takes longer than the time
limit set for the task server.  We increasing the time limit works but
this does not seem to be an optimal solution since an example document
took over 1.5 hours to process into 60 sub documents.  In addition, the
parent documents are sent to us by an external provider and our
interface allows them to send an unlimited number of elements for
processing into sub documents.  They will not change their data and
there is no guarantee that any chosen  time limit would be sufficiently
long to allow processing to complete.

One solution could be to process each subdocument in a separate
transaction, but write them to a temporary collection.  If all
subdocuments are processed successfully they could be moved to the
destination collection in a single transaction.  If any failed
processing all of them would be deleted and an error logged.

Is this a reasonable approach to avoiding a single long running
transaction?  Can you recommend alternatives?  Thanks.

Bob
_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Reply via email to