My application ingests documents that need to be broken up into subdocuments. We want the process to be atomic so our initial approach was to run in within a single CPF pipeline.
While this works fine for small documents we have encountered larger documents that time out because processing takes longer than the time limit set for the task server. We increasing the time limit works but this does not seem to be an optimal solution since an example document took over 1.5 hours to process into 60 sub documents. In addition, the parent documents are sent to us by an external provider and our interface allows them to send an unlimited number of elements for processing into sub documents. They will not change their data and there is no guarantee that any chosen time limit would be sufficiently long to allow processing to complete. One solution could be to process each subdocument in a separate transaction, but write them to a temporary collection. If all subdocuments are processed successfully they could be moved to the destination collection in a single transaction. If any failed processing all of them would be deleted and an error logged. Is this a reasonable approach to avoiding a single long running transaction? Can you recommend alternatives? Thanks. Bob
_______________________________________________ General mailing list General@developer.marklogic.com http://xqzone.com/mailman/listinfo/general