MLCP comes with source. Should be a small edit to catch the exception and keep going. Maybe enabled with a flag.
Sent from my iPhone > On Jul 22, 2015, at 21:19, Geert Josten <geert.jos...@marklogic.com> wrote: > > Hi Kristina, > > I would have expected MLCP to skip corrupt files without crashing, but > apparently not. Not perfect, but a way around could be to wrap MLCP in > another script that loops over the zip files itself, and makes a new MLCP > call for each zip. More difficult to do parallelization (e.g. likely slower), > but at least it allows you to finish processing completely.. > > Can you send me a small example of such a corrupt zip file off-list? I could > use that to file a bug against MLCP internally.. > > Cheers, > Geert > > From: <general-boun...@developer.marklogic.com> on behalf of "Morales-Martin, > Kristina" <kmorales-mar...@cas.org> > Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com> > Date: Tuesday, July 21, 2015 at 6:58 PM > To: MarkLogic Developer Discussion <general@developer.marklogic.com> > Subject: [MarkLogic Dev General] mlcp ability to skip corrupt zip files? > > > Dear all, > > We are using the MarkLogic Content Pump to push content from many directories > that have zip files that in turn contain .xml files. > From the last communication with Geet, we are also using the transform option > in order to ingest only xml content. This suggested filtering approach > using a transform works. > > Unfortunately, when mlcp encounters a corrupt zip file (which we possibly can > get from our sources), > the process terminates. Is there an option to instruct mlcp to keep going, > that is, to skip the corrupt skip file, and continue processing the large and > deeply nested directories for the rest of the zip files? It looks like the > -tolerate_errors option won’t work given that we need to use a transform to > ingest only xml files, > and that forces the batch size to 1. > > Please advise. > > We are using the following options: > -input_file_path $inputFilePath \ > -mode local -input_compressed true \ > -output_uri_replace > "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \ > -output_collections "$collections" \ > -database $dbName -output_permissions … > -transform_module /ourNamespace/ourTransformModule.xqy \ > -transform_namespace "http://cas.org/..." \ > -xml_repair_level full \ > > Thank you, > Kristina Morales-Martin > Sr. Technical Information Specialist, e-Content Operations > CAS, a division of the American Chemical Society > 2540 Olentangy River Road > Columbus, OH 43202 > Phone: 614-447-3600, ext. 2322 > Fax: 614-447-3827 > www.cas.org > > Confidentiality Notice: This electronic message transmission, including any > attachment(s), may contain confidential, proprietary, or privileged > information from Chemical Abstracts Service (“CAS”), a division of the > American Chemical Society (“ACS”). If you have received this transmission in > error, be advised that any disclosure, copying, distribution, or use of the > contents of this information is strictly prohibited. Please destroy all > copies of the message and contact the sender immediately by either replying > to this message or calling 614-447-3600. > > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general