Hi Carmeen,

  1.  Local mode is not related to the input_file_path option, but about 
leveraging a Hadoop cluster for processing. To my knowledge you can read any 
folder that is mounted to the host on which you are running MLCP. If you can 
mount your AWS fileshare, then I’d say yes you can process it.
  2.  Using a transform you can interpret, and influence the document uri, but 
unfortunately not influence the output_collections dynamically. Create a 
batch/shell script that loops over the folders, and runs MLCP for each c/d 
folder separately, and passes those as output_collections param
  3.  I am not aware of an option to exclude a sub-dir, but maybe the 
file_pattern could be of some help? If not, do same as previous, and process 
subdirs individually, so you have better control of what is included and what 
not.
  4.  Not sure I understand what you mean exactly, but file_pattern takes a 
regex, so you should be able to define alternatives. You could also use a 
transform to decide, and suppress files you don’t want to import by returning 
an empty sequence in those cases.

Kind regards,
Geert

From: 
<[email protected]<mailto:[email protected]>>
 on behalf of "Sindiya, Carmeen (LNG-CON)" 
<[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, December 23, 2015 at 10:01 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [MarkLogic Dev General] MLCP queries

Hi,

I am very new to MarkLogic and MLCP and learning to load document using the 
MarkLogic community. But below are some queries which I need to sort out. The 
answers will be very informative to understand MLCP better.


1.       I running the MLCP command with mode as ‘local’. Is it possible to 
have the input_file_path as AWS fileshare?

2.       Is it possible to retrieve the collection name from the uri of the 
document.

For eg., For a URI with //a/b/c/d/e.xml , I need the collection value as c & d

3.       While loading documents, for a given input file path, it loads all the 
documents under the folder including its subdirectories by default. How to 
exclude the files in the subdirectory?

4.       Is there an option to pass multiple values for filtering file formats.

Thanks,
Carmeen

________________________________

LexisNexis is a trading name of REED ELSEVIER (UK) LIMITED - Registered office 
- 1-3 STRAND, LONDON WC2N 5JR
Registered in England - Company No. 02746621
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to