<https://docs.marklogic.com/guide/java/data-movement>

The open-source Data Movement SDK that ships with v4 of the Java Client API 
<https://github.com/marklogic/java-client-api> was designed for bulk 
orchestration. It runs in an external Java process and doesn’t rely on the task 
server, CPF, or large transactions. It allows you to break up import (MarkLogic 
as sink), export (MarkLogic as source), or transformation (MarkLogic as source 
and sink) jobs into chunks and parallelize them over a cluster. It handles all 
of tricky threading in Java and gives you nice callback interfaces for progress 
and error handling. It will even handle failover events on the MarkLogic 
cluster over which it’s working.

Justin

--
Justin Makeig
Senior Director, Product Management
MarkLogic
[email protected]<mailto:[email protected]>


On Dec 7, 2017, at 10:34 AM, Will Thompson 
<[email protected]<mailto:[email protected]>> wrote:

Hi Eliot,

I have found that trying to control a long batch process through a single 
long-running transaction is too much trouble because of the same issues you are 
having w.r.t. visibility of updates. I think it seems natural at first because 
XQuery is such a nice language, but the accumulation of workarounds for the 
transaction boundary issues can turn into a maintenance nightmare. I recently 
worked on a legacy project like that (a real cautionary tale!). It's much less 
of an uphill battle if you can break the long-running task into multiple 
shorter-running ones, either through CPF, triggers, scheduled tasks, etc. and 
avoid all that. I like to keep projects native when possible and (through lots 
of trial and error) have become accustomed to the constraints, but a lot of 
developers just factor this part out into Java or another language where they 
can execute a long-running controller type loop without any of those headaches.

-Will


On Dec 7, 2017, at 11:19 AM, Eliot Kimber 
<[email protected]<mailto:[email protected]>> wrote:

I think I've solved my problem by once again being more careful about holding 
elements in memory. By replacing global reads of my job doc with on-demand 
reads through xdmp:eval() I seem to have resolved my issue with changes to the 
job doc not being seen within the same separate transaction (e.g,, my read 
loop). I seem to be unable to let go of my procedural language brain damage....

Still, it seems like having a general, cross-application field or shared memory 
mechanism would be useful for this type of application where one app (e.g., my 
Web UI) spawns tasks that do the work and need a way to dynamically communicate 
within the scope of a single long-running transaction. At least that's the way 
I would go about building this type of application in a different environment.

Cheers,

E.
--
Eliot Kimber
https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=axTz3BGMQtTK1QP5MhZE5Zq0ANiiWVI0SuRG4dVpxp8&e=


On 12/7/17, 10:48 AM, 
"[email protected]<mailto:[email protected]>
 on behalf of Eliot Kimber" 
<[email protected]<mailto:[email protected]>
 on behalf of [email protected]<mailto:[email protected]>> wrote:

  I don't think server fields are going to work because they are per 
application server and I have different application servers at work.

  There is an HTTP server that gets the pause/resume request and then spawned 
tasks running the TaskServer that need to read the field.

  My experiments show that, per the docs, a field changed by one app is not 
seen by a different app.

  Cheers,

  Eliot
  --
  Eliot Kimber
  
https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=axTz3BGMQtTK1QP5MhZE5Zq0ANiiWVI0SuRG4dVpxp8&e=


  On 12/7/17, 10:13 AM, 
"[email protected]<mailto:[email protected]>
 on behalf of Eliot Kimber" 
<[email protected]<mailto:[email protected]>
 on behalf of [email protected]<mailto:[email protected]>> wrote:

      I had not considered server fields--I'll check it out.

      Cheers,

      E.

      --
      Eliot Kimber
      
https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=axTz3BGMQtTK1QP5MhZE5Zq0ANiiWVI0SuRG4dVpxp8&e=


      On 12/7/17, 10:11 AM, 
"[email protected]<mailto:[email protected]>
 on behalf of Erik Hennum" 
<[email protected]<mailto:[email protected]>
 on behalf of [email protected]<mailto:[email protected]>> 
wrote:

          Hi, Eliot:

          Have you considered a server field -- where any code that changes the 
status also updates the server field and the iterator checks the server field?

          The server fields are local to the host, so there's no concern about 
a separate iterator running on a different host.

          If multiple iterators run on the same host, each would need to 
distinguish its status by an id, which the iterator could generate from a 
random id when it starts.


          Hoping that helps,


          Erik Hennum



          ________________________________________
          From: 
[email protected]<mailto:[email protected]>
 
<[email protected]<mailto:[email protected]>>
 on behalf of Eliot Kimber <[email protected]<mailto:[email protected]>>
          Sent: Thursday, December 7, 2017 7:48:44 AM
          To: MarkLogic Developer Discussion
          Subject: [MarkLogic Dev General] Best Approach to Manage "Flags" That 
Might Change Within a Single Transaction

          In the context of my remote processing management system, where my 
client server is sending many tasks to a set of remote servers through a set of 
spawned tasks running in parallel, I need to be able to pause the client so 
that it stops sending new tasks to the remote servers.

          So far I've been using a single document stored in ML as my mechanism 
for indicating that a job is in progress and capturing the job details (job ID, 
start time, servers in use, etc.). This works fine because it was only updated 
at the start and end of the job.

          But for the pause/resume use case I need to have a flag that 
indicates that the job is paused and have other processes (e.g., my 
task-submission code) immediately respond to a change. For example, if I'm 
looping over 100 tasks to load up a remote task queue and the job is paused, I 
want that loop to end immediately.

          So basically, in this loop, for every iteration, check the "is 
paused" status, which requires reading the job doc to see if a @paused 
attribute is present (the @paused attribute captures the time the pause was 
requested and serves as the "is paused" flag). However, because the loop is a 
single transaction, it will see the same version of the job doc for every 
iteration, even if it's changed.

          I tried using xdmp:eval() to read the job doc but that didn't seem to 
change the behavior.

          E.g., doing this in query console:

                  return (er:is-job-paused(), er:pause-job(), 
er:is-job-paused())

          Results in (false, false)

          So this isn't going to work.

          So my question: what's the best way to manage this kind of dynamic 
flag in ML?

          I could use file system files instead of docs in the database, which 
would avoid the ML transaction behavior but that seems a little hackier than 
I'd like.

          What I'd really like is some kind of "shared memory" mechanism where 
I can set and reset variables at will across different modules running in 
parallel but I haven't seen anything like that in my study of the ML API.

          Is there such a mechanism that I've missed?

          Or am I just thinking about the problem the wrong way?

          Thanks,

          Eliot

          --
          Eliot Kimber
          
https://urldefense.proofpoint.com/v2/url?u=http-3A__contrext.com&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=axTz3BGMQtTK1QP5MhZE5Zq0ANiiWVI0SuRG4dVpxp8&e=




          _______________________________________________
          General mailing list
          
[email protected]<mailto:[email protected]>
          Manage your subscription at:
          
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=9EUpTMO-7o12k1aaE3EX4DGBmnpIzDdvIUFpTfQDiac&e=
          _______________________________________________
          General mailing list
          
[email protected]<mailto:[email protected]>
          Manage your subscription at:
          
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=9EUpTMO-7o12k1aaE3EX4DGBmnpIzDdvIUFpTfQDiac&e=



      _______________________________________________
      General mailing list
      [email protected]<mailto:[email protected]>
      Manage your subscription at:
      
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=9EUpTMO-7o12k1aaE3EX4DGBmnpIzDdvIUFpTfQDiac&e=



  _______________________________________________
  General mailing list
  [email protected]<mailto:[email protected]>
  Manage your subscription at:
  
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=9EUpTMO-7o12k1aaE3EX4DGBmnpIzDdvIUFpTfQDiac&e=



_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=DwIGaQ&c=IdrBOxAMwHPzAikPNzltHw&r=_thRNTuzvzYaEDwaA_AfnAe5hN2lWgi6qdluz6ApLYI&m=Ab7G2N7CzquTxNem21cvkP5LWiFn1bSROAHHMlRw3xQ&s=9EUpTMO-7o12k1aaE3EX4DGBmnpIzDdvIUFpTfQDiac&e=

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to