When I'm loading directories of slightly fewer than 100,000 XML files into
a large MarkLogic instance, I often get timeout and transaction errors. If
I re-run the same directory of files which got those errors, I typically
don't get any errors.
So, I have a few questions:
* Can I get prevent the errors from happening in the first place - e.g. by
tuning MarkLogic parameters or altering my use of mlcp?
* If I do get errors, what is the best way to get a report on the files
which failed, so I can retry just those ones? Is the best option for me to
write some code to pick out the errors from the log file? And, if so, am I
guaranteed to get all of the files reported?
Some Details
The command line template is
mlcp.sh import -username {1} -password {2} -host localhost -port {4}
-input_file_path {5} -output_uri_replace \"{6},'{7}'\"
Sometimes, the imports run just fine. However, often I get a large number
of SVC-EXTIME errors followed by a XDMP-NOTXN error. For example:
16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit
exceeded
16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document
029ccd8ac3323658277ca28fead7a73d.0.xml in
file:/mnt/ingestion/MarkLogicIngestion/smyles/todo/2014_0005.done/029ccd8ac3323658277ca28fead7a73d.0.xml
16/09/22 17:54:03 ERROR mapreduce.ContentWriter: SVC-EXTIME: Time limit
exceeded
16/09/22 17:54:03 WARN mapreduce.ContentWriter: Failed document
02eb4562784255e249c4ec3ed472f9aa.1.xml in
file:/mnt/ingestion/MarkLogicIngestion/smyles/todo/2014_0005.done/02eb4562784255e249c4ec3ed472f9aa.1.xml
16/09/22 17:54:04 INFO contentpump.LocalJobRunner: completed 33%
16/09/22 17:54:21 ERROR mapreduce.ContentWriter: XDMP-NOTXN: No transaction
with identifier 9076269665213828952
So far, I'm just rerunning the entire directory again. Most of the time, it
ingests fine on the second attempt. However, I have thousands of these
directories to process. So, I would prefer to avoid getting the errors in
the first place. Failing that, I would like to capture the errors and just
retry the files which failed.
Any help much appreciated.
Regards,
Stuart
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general