I’ve found my bug: bad assumption about the data: While the URIs of all my 
input documents are unique, their filenames are not and I was using just the 
filename as the basis for my task record URIs.

I was in the process of posting my code and that led me to verify that my 
assumptions were correct (because I knew somebody would challenge them) and 
what do you know, they weren’t.

So the lesson for the day is: always double check your assumptions about data.

But I also learned something about uncatchable exceptions, so that’s good too.

Thanks for everyone’s help.

Cheers,

Eliot 


--
Eliot Kimber
http://contrext.com
 


On 11/9/17, 10:11 AM, "[email protected] on behalf of 
Eliot Kimber" <[email protected] on behalf of 
[email protected]> wrote:

    I’m actually not doing anything with the HTTP response. I get the response 
but currently don’t examine it (in fact I have a FIXME in the code to add 
handling of non-200 response codes, but for now it’s basically fire and 
forget—the request to the remote server ultimately spawns a task on that 
server, so the only non-success response would be one where the xmdp:spawn() on 
the remote server failed, which is unlikely to happen under normal operating 
conditions).
    
    I’m also careful to always turn off auto mapping, which is as evil as evil 
can be.
    
    There were no relevant errors in the ErrorLog.txt. The code is running on 
the task server and I do see all my expected (success) messages there.
    
    Looking at the uncatchable exceptions article, the only possible issue 
would be failures during commit that result in uncatchable exceptions.
    
    Per the article, I’m now using eval() to do the document-insert—that should 
allow any commit-time failure exception to now be caught.
    
    Otherwise, none of the conditions that you suggested could hold: document 
URIs should be unique (because they reflect the URIs of the source items, each 
of which is the root node of its own document), there are no permissions in 
effect, I’m only creating a few 100 tasks in the task queue (each task then 
processes a 1000 input items, so 500K items means 500 tasks), I’m not spawning 
in update mode (but if that was the problem then it should fail for all 
attempts, not just a few of them).
    
    Cheers,
    
    E.
    ----
    Eliot Kimber
    
    On 11/9/17, 10:00 AM, "[email protected] on behalf of 
Will Thompson" <[email protected] on behalf of 
[email protected]> wrote:
    
        Eliot,
        
        When you make the remote HTTP call, are you using one of the 
xdmp:http-XYZ functions? Since those functions return a payload describing the 
response condition and don't throw exceptions for most errors, is it possible 
that an HTTP response error condition is not being handled, resulting in 
inserting an empty sequence instead of a document? In the default case where 
function mapping is turned on, inserting an empty sequence will result in not 
calling xdmp:document-insert at all. You could test to see if that's happening 
by disabling function mapping, which would cause an exception to be raised 
instead.
        
        -Will
        
        
        > On Nov 8, 2017, at 5:25 PM, Eliot Kimber <[email protected]> wrote:
        > 
        > Using ML 9:
        > 
        > I have a process that quickly creates a large number of small 
documents, one for each item in a set of input items.
        > 
        > My code is basically:
        > 
        > 1. Log that I’m about to act on the input item
        > 2. Act on the input item (send the input item to a remote HTTP end 
point)
        > 3. Create a new doc reflecting the input item I just acted on
        > 
        > This code is within a try/catch and I log the exception, so I should 
know if there are any exceptions during this process by examining the log.
        > 
        > I’m processing about 500K input items, with the processing spread 
over the 16 threads of my task server. So there are 16 tasks quickly writing 
these docs concurrently.
        > 
        > I know the exact count of the input items and I get that count in the 
log, so I know that I’m actually processing all the items I should be.
        > 
        > However, if I subsequently count the documents created in step 3 I’m 
short by about 1500, meaning that not all the docs got created, which should 
not be able to happen unless there was an exception between the log message and 
the document-insert() call, but I’m not finding any exceptions or other errors 
reported in the log.
        > 
        > My question: is there anything that would cause docs to silently not 
get created under this kind of heavy-load? I would hope not but just wanted to 
make sure.
        > 
        > I’m assuming this issue is my bug somewhere, but the code is pretty 
simple and I’m not seeing any obvious way the documents could not get created 
without a corresponding exception report.
        > 
        > Thanks,
        > 
        > Eliot
        > --
        > Eliot Kimber
        > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=2iEH0KHItwSGn5Cq8UYIMpA4MQnafnAny1y8s43aoag&s=mTsM_MYz77769uC2Vfuy-90pJind0H3TE9DPcO3HaDM&e=
        > 
        > 
        > 
        > 
        > _______________________________________________
        > General mailing list
        > [email protected]
        > Manage your subscription at: 
        > 
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=2iEH0KHItwSGn5Cq8UYIMpA4MQnafnAny1y8s43aoag&s=rwaLAlQ6u8lCrp2pFbliZy9Buu5-PZZo65CIbCTXoUk&e=
        
        _______________________________________________
        General mailing list
        [email protected]
        Manage your subscription at: 
        http://developer.marklogic.com/mailman/listinfo/general
        
    
    
    
    
    
    _______________________________________________
    General mailing list
    [email protected]
    Manage your subscription at: 
    http://developer.marklogic.com/mailman/listinfo/general
    


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to