>From what I can tell, that means you are not seeing a bug in mlcp. You simply >have a an assumption in your transform that is effectively a race condition. >If your transform depends on a set of files being loaded before another set of >files, you must completely load the first set first.
I have written more advanced transforms in the past that can merge all dependencies whenever the last one arrives, but that's only necessary when you can't find a way to just load the first set first. Sam Mefford Senior Engineer MarkLogic Corporation sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com> Cell: +1 801 706 9731 www.marklogic.com<http://www.marklogic.com> This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation. On 7/5/2016 9:30 AM, Hans Hübner wrote: Hi, just to let you know: The problem that I had was entirely caused by the fact that I was loading files in parallel that depended on each other, by the way of the loader transformation that I've posted. The mlcp percentage display is still confusing, though, as it apparently shows the percentage of the input data that was loaded into the database, not the number of records read from the input. That could be improved, I think but it does not seem to be very important. Thank you Indy and Geert for looking at this! -Hans On Sun, Jul 3, 2016 at 7:52 PM, Hans Hübner <hans.hueb...@lambdawerk.com<mailto:hans.hueb...@lambdawerk.com>> wrote: Hi, I'm trying to load a bunch of files into MarkLogic using mlcp, but for some reason, it seems that it skips some of the files. I'm using a command line like this: mlcp.sh import \ -database tx-claims \ -host marklogic -port 8884 -username XXX -password XXX -mode local \ -input_file_path 2015/277ca/ \ -input_file_type aggregates -aggregate_record_element TRANSACTION \ -transform_module /transform-in.xquery \ -transform_function transform-response \ -transform_namespace http://lambdawerk.com/tx-claims The transform-response function looks like this: declare function tx-claims:transform-response( $content as map:map, $context as map:map ) as map:map* { let $doc := map:get($content, 'value') let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text() let $uri := concat('/responses/', $icn, '.xml') return (map:put($content, 'uri', $uri), $content) }; The mlcp output looks like this at the end: 16/07/03 18:59:22 INFO contentpump.LocalJobRunner: completed 76% 16/07/03 18:59:28 INFO contentpump.LocalJobRunner: completed 77% 16/07/03 18:59:30 INFO contentpump.LocalJobRunner: completed 78% 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 80% 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 81% 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.ContentPumpStats: 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 1404192 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time: 3270 sec After the load operation completes, nothing unusual is in the ErrroLog.txt file. However, when I look into the database, I find that some files are missing. When I load one of the missing files into the database explicitly (specifying its name as -input_file_path argument), it is correctly loaded. Now, the mlcp output looks kind of fishy to me in that i apparently loads the last 19% of the work in under one second. It seems that it is skipping a whole bunch of files. It also seems that some output records could not be written. The manual says that this could be caused by a server-side transformation, but our function does not seem to be at fault - When I load the missing file specifying its file name, it is correctly loaded, so it seems to be something else. I would greatly appreciate any ideas or advice. Thanks! Hans -- LambdaWerk GmbH Oranienburger Straße 87/89 10178 Berlin Phone: +49 30 555 7335 0 Fax: +49 30 555 7335 99 HRB 169991 B Amtsgericht Charlottenburg USt-ID: DE301399951 Geschäftsführer: Hans Hübner http://lambdawerk.com/ _______________________________________________ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general