MLCP comes with source. Should be a small edit to catch the exception and keep 
going. Maybe enabled with a flag. 

Sent from my iPhone

> On Jul 22, 2015, at 21:19, Geert Josten <geert.jos...@marklogic.com> wrote:
> 
> Hi Kristina,
> 
> I would have expected MLCP to skip corrupt files without crashing, but 
> apparently not. Not perfect, but a way around could be to wrap MLCP in 
> another script that loops over the zip files itself, and makes a new MLCP 
> call for each zip. More difficult to do parallelization (e.g. likely slower), 
> but at least it allows you to finish processing completely..
> 
> Can you send me a small example of such a corrupt zip file off-list? I could 
> use that to file a bug against MLCP internally..
> 
> Cheers,
> Geert
> 
> From: <general-boun...@developer.marklogic.com> on behalf of "Morales-Martin, 
> Kristina" <kmorales-mar...@cas.org>
> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Date: Tuesday, July 21, 2015 at 6:58 PM
> To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Subject: [MarkLogic Dev General] mlcp ability to skip corrupt zip files?
> 
>  
> Dear all,
>  
> We are using the MarkLogic Content Pump to push content from many directories 
> that have zip files that in turn contain .xml files.
> From the last communication with Geet, we are also using the transform option 
> in order to ingest only xml content.  This suggested filtering approach
> using a transform works. 
>  
> Unfortunately, when mlcp encounters a corrupt zip file (which we possibly can 
> get from our sources),
> the process terminates.  Is there an option to instruct mlcp to keep going, 
> that is, to skip the corrupt skip file, and continue processing the large and
> deeply nested directories for the rest of the zip files?  It looks like the 
> -tolerate_errors option won’t work given that we need to use a transform to 
> ingest only xml files,
> and that forces the batch size to 1.
>  
> Please advise.
>  
> We are using the following options:
> -input_file_path $inputFilePath \
> -mode local -input_compressed true \
> -output_uri_replace 
> "(\/.+\/+)(?=.+\.zip),'/ourOverrideOfTheURIToRemoveTheLeadingNASPath/'" \
> -output_collections "$collections" \
> -database $dbName -output_permissions …
> -transform_module /ourNamespace/ourTransformModule.xqy  \
> -transform_namespace "http://cas.org/..."; \
> -xml_repair_level full \
>  
> Thank you,
> Kristina Morales-Martin
> Sr. Technical Information Specialist, e-Content Operations
> CAS, a division of the American Chemical Society
> 2540 Olentangy River Road
> Columbus, OH 43202
> Phone: 614-447-3600, ext. 2322
> Fax: 614-447-3827
> www.cas.org
>  
> Confidentiality Notice: This electronic message transmission, including any 
> attachment(s), may contain confidential, proprietary, or privileged 
> information from Chemical Abstracts Service (“CAS”), a division of the 
> American Chemical Society (“ACS”). If you have received this transmission in 
> error, be advised that any disclosure, copying, distribution, or use of the 
> contents of this information is strictly prohibited. Please destroy all 
> copies of the message and contact the sender immediately by either replying 
> to this message or calling 614-447-3600.
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at: 
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to