Hi Carmeen,
1. Local mode is not related to the input_file_path option, but about leveraging a Hadoop cluster for processing. To my knowledge you can read any folder that is mounted to the host on which you are running MLCP. If you can mount your AWS fileshare, then I’d say yes you can process it. 2. Using a transform you can interpret, and influence the document uri, but unfortunately not influence the output_collections dynamically. Create a batch/shell script that loops over the folders, and runs MLCP for each c/d folder separately, and passes those as output_collections param 3. I am not aware of an option to exclude a sub-dir, but maybe the file_pattern could be of some help? If not, do same as previous, and process subdirs individually, so you have better control of what is included and what not. 4. Not sure I understand what you mean exactly, but file_pattern takes a regex, so you should be able to define alternatives. You could also use a transform to decide, and suppress files you don’t want to import by returning an empty sequence in those cases. Kind regards, Geert From: <[email protected]<mailto:[email protected]>> on behalf of "Sindiya, Carmeen (LNG-CON)" <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Wednesday, December 23, 2015 at 10:01 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: [MarkLogic Dev General] MLCP queries Hi, I am very new to MarkLogic and MLCP and learning to load document using the MarkLogic community. But below are some queries which I need to sort out. The answers will be very informative to understand MLCP better. 1. I running the MLCP command with mode as ‘local’. Is it possible to have the input_file_path as AWS fileshare? 2. Is it possible to retrieve the collection name from the uri of the document. For eg., For a URI with //a/b/c/d/e.xml , I need the collection value as c & d 3. While loading documents, for a given input file path, it loads all the documents under the folder including its subdirectories by default. How to exclude the files in the subdirectory? 4. Is there an option to pass multiple values for filtering file formats. Thanks, Carmeen ________________________________ LexisNexis is a trading name of REED ELSEVIER (UK) LIMITED - Registered office - 1-3 STRAND, LONDON WC2N 5JR Registered in England - Company No. 02746621
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
