>From what I can tell, that means you are not seeing a bug in mlcp.  You simply 
>have a an assumption in your transform that is effectively a race condition.  
>If your transform depends on a set of files being loaded before another set of 
>files, you must completely load the first set first.

I have written more advanced transforms in the past that can merge all 
dependencies whenever the last one arrives, but that's only necessary when you 
can't find a way to just load the first set first.

Sam Mefford
Senior Engineer
MarkLogic Corporation
sam.meff...@marklogic.com<mailto:sam.meff...@marklogic.com>
Cell: +1 801 706 9731
www.marklogic.com<http://www.marklogic.com>

This e-mail and any accompanying attachments are confidential. The information 
is intended
solely for the use of the individual to whom it is addressed. Any review, 
disclosure, copying,
distribution, or use of this e-mail communication by others is strictly 
prohibited. If you
are not the intended recipient, please notify us immediately by returning this 
message to
the sender and delete all copies. Thank you for your cooperation.


On 7/5/2016 9:30 AM, Hans Hübner wrote:
Hi,

just to let you know:  The problem that I had was entirely caused by the fact 
that I was loading files in parallel that depended on each other, by the way of 
the loader transformation that I've posted.  The mlcp percentage display is 
still confusing, though, as it apparently shows the percentage of the input 
data that was loaded into the database, not the number of records read from the 
input.  That could be improved, I think but it does not seem to be very 
important.

Thank you Indy and Geert for looking at this!
-Hans

On Sun, Jul 3, 2016 at 7:52 PM, Hans Hübner 
<hans.hueb...@lambdawerk.com<mailto:hans.hueb...@lambdawerk.com>> wrote:
Hi,

I'm trying to load a bunch of files into MarkLogic using mlcp, but for some 
reason, it seems that it skips some of the files.  I'm using a command line 
like this:

mlcp.sh import \
     -database tx-claims \
     -host marklogic -port 8884 -username XXX -password XXX -mode local \
     -input_file_path 2015/277ca/ \
     -input_file_type aggregates -aggregate_record_element TRANSACTION \
     -transform_module /transform-in.xquery \
     -transform_function transform-response \
     -transform_namespace http://lambdawerk.com/tx-claims

The transform-response function looks like this:

declare function tx-claims:transform-response(
$content as map:map,
$context as map:map
) as map:map*
{
  let $doc := map:get($content, 'value')
  let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text()
  let $uri := concat('/responses/', $icn, '.xml')
  return
    (map:put($content, 'uri', $uri),
    $content)
};

The mlcp output looks like this at the end:

16/07/03 18:59:22 INFO contentpump.LocalJobRunner:  completed 76%
16/07/03 18:59:28 INFO contentpump.LocalJobRunner:  completed 77%
16/07/03 18:59:30 INFO contentpump.LocalJobRunner:  completed 78%
16/07/03 18:59:31 INFO contentpump.LocalJobRunner:  completed 80%
16/07/03 18:59:31 INFO contentpump.LocalJobRunner:  completed 81%
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: 
com.marklogic.mapreduce.ContentPumpStats:
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 
1404192
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time: 3270 
sec

After the load operation completes, nothing unusual is in the ErrroLog.txt 
file.  However, when I look into the database, I find that some files are 
missing.  When I load one of the missing files into the database explicitly 
(specifying its name as -input_file_path argument), it is correctly loaded.

Now, the mlcp output looks kind of fishy to me in that i apparently loads the 
last 19% of the work in under one second.  It seems that it is skipping a whole 
bunch of files.  It also seems that some output records could not be written.  
The manual says that this could be caused by a server-side transformation, but 
our function does not seem to be at fault - When I load the missing file 
specifying its file name, it is correctly loaded, so it seems to be something 
else.

I would greatly appreciate any ideas or advice.

Thanks!
Hans



--
LambdaWerk GmbH
Oranienburger Straße 87/89
10178 Berlin
Phone: +49 30 555 7335 0
Fax: +49 30 555 7335 99

HRB 169991 B Amtsgericht Charlottenburg
USt-ID: DE301399951
Geschäftsführer:  Hans Hübner

http://lambdawerk.com/





_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to