Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-17 Thread via GitHub


nit0906 merged PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-17 Thread via GitHub


nit0906 commented on code in PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522#discussion_r1643657775


##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,62 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {
+String msg = "The lane is not failing. This operation should 
only be performed if the lane is failing, it should first be allowed to catch 
up on its own.";
+log.warn(msg);
+return msg;
+}
+
+try {
+log.info("Running a forced catch-up for indexing lane [{}]. ", 
name);
+// First we need to abort and pause the running indexing task
+this.abortAndPause();
+log.info("Aborted and paused async indexing for lane [{}]", 
name);
+// Release lease for the paused lane
+this.releaseLeaseForPausedLane();
+log.info("Released lease for paused lane [{}]", name);
+String newReferenceCheckpoint = store.checkpoint(lifetime, 
Map.of(

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-14 Thread via GitHub


thomasmueller commented on code in PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522#discussion_r1639552708


##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,62 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {
+String msg = "The lane is not failing. This operation should 
only be performed if the lane is failing, it should first be allowed to catch 
up on its own.";
+log.warn(msg);
+return msg;
+}
+
+try {
+log.info("Running a forced catch-up for indexing lane [{}]. ", 
name);
+// First we need to abort and pause the running indexing task
+this.abortAndPause();

Review Comment:
   I wouldn't mind if this is just an additional preconditions... instead of 
doing it here, just verify it's done. But it's OK, no need to change it now.
   
   (the idea is: We don't want this feature is used a lot, so making it a 
little bit harder to use might help that it is used less often... The "CONFIRM" 
is a great example of that!... Do not make it easy to delete stuff, so that 
it's less likely that stuff is deleted by mistake.)



##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,62 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {

Review Comment:
   In theory (not now, but maybe later) we could add an additional check: only 
allow if the indexing lane is behind more than 30 minutes, or something like 
that.
   
   But again, not needed right now.



##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,62 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {
+String msg = "The lane is not failing. This operation should 
only be performed if the lane is failing, it should first be allowed to catch 
up on its own.";
+log.warn(msg);
+return msg;
+}
+
+try {
+log.info("Running a forced catch-up for indexing lane [{}]. ", 
name);
+// First we need to abort and pause the running indexing task
+this.abortAndPause();
+log.info("Aborted and paused async indexing for lane [{}]", 
name);
+// Release lease for the paused lane
+this.releaseLeaseForPausedLane();
+log.info("Released lease for paused lane [{}]", name);
+String newReferenceCheckpoint = store.checkpoint(lifetime, 
Map.of(

Review Comment:
   Ah, DEFAULT_LIFETIME is 1000 days! could you change that to 100 days please? 
That should still be plenty, but (hopefully) will result in slightly less 
"orphaned checkpoints".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-14 Thread via GitHub


nit0906 commented on code in PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522#discussion_r1639441493


##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,61 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {
+String msg = "The lane is not failing. This operation should 
only be performed if the lane is failing, it should first be allowed to catch 
up on its own.";
+log.warn(msg);
+return msg;
+}
+
+try {
+log.info("Running a forced catch-up for indexing lane [{}]. ", 
name);
+// First we need to abort and pause the running indexing task
+this.abortAndPause();
+log.info("Aborted and paused async indexing for lane [{}]", 
name);
+// Release lease for the paused lane
+this.releaseLeaseForPausedLane();
+log.info("Released lease for paused lane [{}]", name);
+String newReferenceCheckpoint = store.checkpoint(lifetime, 
Map.of(
+"creator", AsyncIndexUpdate.class.getSimpleName(),
+"created", now(),
+"thread", Thread.currentThread().getName(),
+"name", name + "-forceModified"));
+String existingReferenceCheckpoint = this.referenceCp;
+log.info("Modifying the referred checkpoint for lane [{}] from 
{} to {}." +
+" This means that any content modifications between 
these checkpoints will not reflect in the indexes on this lane." +
+" Reindexing is needed to get this content indexed.", 
name, existingReferenceCheckpoint, newReferenceCheckpoint);
+NodeBuilder builder = store.getRoot().builder();
+builder.child(ASYNC).setProperty(name, newReferenceCheckpoint);
+this.referenceCp = newReferenceCheckpoint;
+mergeWithConcurrencyCheck(store, validatorProviders, builder, 
existingReferenceCheckpoint, null, name);
+// Remove the existing reference checkpoint
+if (store.release(existingReferenceCheckpoint)) {
+log.info("Old reference checkpoint {} removed or didn't 
exist", existingReferenceCheckpoint);
+} else {
+log.warn("Unable to remove old reference checkpoint {}. 
This can result in orphaned checkpoints and would need to be removed 
manually.", existingReferenceCheckpoint);
+}
+// Resume the paused lane;
+this.resume();
+log.info("Resumed async indexing for lane [{}]", name);
+return "Lane successfully forced to catch-up. New reference 
checkpoint is " + newReferenceCheckpoint + " . Please make sure to perform 
reindexing to get the diff content indexed.";
+} catch (Exception e) {
+log.error("Exception while trying to force update the indexing 
lane [{}]", name, e);
+if (this.isPaused()) {
+log.info("Resuming the lane [{}] as it was paused during 
the operation", name);

Review Comment:
   good catch. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-14 Thread via GitHub


nfsantos commented on code in PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522#discussion_r1639428734


##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,61 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {
+String msg = "The lane is not failing. This operation should 
only be performed if the lane is failing, it should first be allowed to catch 
up on its own.";
+log.warn(msg);
+return msg;
+}
+
+try {
+log.info("Running a forced catch-up for indexing lane [{}]. ", 
name);
+// First we need to abort and pause the running indexing task
+this.abortAndPause();
+log.info("Aborted and paused async indexing for lane [{}]", 
name);
+// Release lease for the paused lane
+this.releaseLeaseForPausedLane();
+log.info("Released lease for paused lane [{}]", name);
+String newReferenceCheckpoint = store.checkpoint(lifetime, 
Map.of(
+"creator", AsyncIndexUpdate.class.getSimpleName(),
+"created", now(),
+"thread", Thread.currentThread().getName(),
+"name", name + "-forceModified"));
+String existingReferenceCheckpoint = this.referenceCp;
+log.info("Modifying the referred checkpoint for lane [{}] from 
{} to {}." +
+" This means that any content modifications between 
these checkpoints will not reflect in the indexes on this lane." +
+" Reindexing is needed to get this content indexed.", 
name, existingReferenceCheckpoint, newReferenceCheckpoint);
+NodeBuilder builder = store.getRoot().builder();
+builder.child(ASYNC).setProperty(name, newReferenceCheckpoint);
+this.referenceCp = newReferenceCheckpoint;
+mergeWithConcurrencyCheck(store, validatorProviders, builder, 
existingReferenceCheckpoint, null, name);
+// Remove the existing reference checkpoint
+if (store.release(existingReferenceCheckpoint)) {
+log.info("Old reference checkpoint {} removed or didn't 
exist", existingReferenceCheckpoint);
+} else {
+log.warn("Unable to remove old reference checkpoint {}. 
This can result in orphaned checkpoints and would need to be removed 
manually.", existingReferenceCheckpoint);
+}
+// Resume the paused lane;
+this.resume();
+log.info("Resumed async indexing for lane [{}]", name);
+return "Lane successfully forced to catch-up. New reference 
checkpoint is " + newReferenceCheckpoint + " . Please make sure to perform 
reindexing to get the diff content indexed.";
+} catch (Exception e) {
+log.error("Exception while trying to force update the indexing 
lane [{}]", name, e);
+if (this.isPaused()) {
+log.info("Resuming the lane [{}] as it was paused during 
the operation", name);

Review Comment:
   Should there be a call to resume here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-14 Thread via GitHub


nfsantos commented on code in PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522#discussion_r1639426483


##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,61 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {
+String msg = "The lane is not failing. This operation should 
only be performed if the lane is failing, it should first be allowed to catch 
up on its own.";
+log.warn(msg);
+return msg;
+}
+
+try {
+log.info("Running a forced catch-up for indexing lane [{}]. ", 
name);
+// First we need to abort and pause the running indexing task
+this.abortAndPause();
+log.info("Aborted and paused async indexing for lane [{}]", 
name);
+// Release lease for the paused lane
+this.releaseLeaseForPausedLane();
+log.info("Released lease for paused lane [{}]", name);
+String newReferenceCheckpoint = store.checkpoint(lifetime, 
Map.of(
+"creator", AsyncIndexUpdate.class.getSimpleName(),
+"created", now(),
+"thread", Thread.currentThread().getName(),
+"name", name + "-forceModified"));
+String existingReferenceCheckpoint = this.referenceCp;
+log.info("Modifying the referred checkpoint for lane [{}] from 
{} to {}." +
+" This means that any content modifications between 
these checkpoints will not reflect in the indexes on this lane." +
+" Reindexing is needed to get this content indexed.", 
name, existingReferenceCheckpoint, newReferenceCheckpoint);
+NodeBuilder builder = store.getRoot().builder();
+builder.child(ASYNC).setProperty(name, newReferenceCheckpoint);
+this.referenceCp = newReferenceCheckpoint;
+mergeWithConcurrencyCheck(store, validatorProviders, builder, 
existingReferenceCheckpoint, null, name);
+// Remove the existing reference checkpoint
+if (store.release(existingReferenceCheckpoint)) {
+log.info("Old reference checkpoint {} removed or didn't 
exist", existingReferenceCheckpoint);
+} else {
+log.warn("Unable to remove old reference checkpoint {}. 
This can result in orphaned checkpoints and would need to be removed 
manually.", existingReferenceCheckpoint);
+}
+// Resume the paused lane;
+this.resume();
+log.info("Resumed async indexing for lane [{}]", name);
+return "Lane successfully forced to catch-up. New reference 
checkpoint is " + newReferenceCheckpoint + " . Please make sure to perform 
reindexing to get the diff content indexed.";
+} catch (Exception e) {
+log.error("Exception while trying to force update the indexing 
lane [{}]", name, e);
+if (this.isPaused()) {
+log.info("Resuming the lane [{}] as it was paused during 
the operation", name);
+}
+return "Unable to complete the force update due to" + 
e.getMessage() + "Please check logs for more details";

Review Comment:
   ```suggestion
   return "Unable to complete the force update due to " + 
e.getMessage() + ". Please check logs for more details";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-14 Thread via GitHub


nfsantos commented on code in PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522#discussion_r1639426483


##
oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/AsyncIndexUpdate.java:
##
@@ -1242,6 +1242,61 @@ public String getReferenceCheckpoint() {
 return referenceCp;
 }
 
+@Override
+public String forceIndexLaneCatchup(String confirmMessage) throws 
CommitFailedException {
+
+if (!"CONFIRM".equals(confirmMessage)) {
+String msg = "Please confirm that you want to force the lane 
catch-up by passing 'CONFIRM' as argument";
+log.warn(msg);
+return msg;
+}
+
+if (!this.isFailing()) {
+String msg = "The lane is not failing. This operation should 
only be performed if the lane is failing, it should first be allowed to catch 
up on its own.";
+log.warn(msg);
+return msg;
+}
+
+try {
+log.info("Running a forced catch-up for indexing lane [{}]. ", 
name);
+// First we need to abort and pause the running indexing task
+this.abortAndPause();
+log.info("Aborted and paused async indexing for lane [{}]", 
name);
+// Release lease for the paused lane
+this.releaseLeaseForPausedLane();
+log.info("Released lease for paused lane [{}]", name);
+String newReferenceCheckpoint = store.checkpoint(lifetime, 
Map.of(
+"creator", AsyncIndexUpdate.class.getSimpleName(),
+"created", now(),
+"thread", Thread.currentThread().getName(),
+"name", name + "-forceModified"));
+String existingReferenceCheckpoint = this.referenceCp;
+log.info("Modifying the referred checkpoint for lane [{}] from 
{} to {}." +
+" This means that any content modifications between 
these checkpoints will not reflect in the indexes on this lane." +
+" Reindexing is needed to get this content indexed.", 
name, existingReferenceCheckpoint, newReferenceCheckpoint);
+NodeBuilder builder = store.getRoot().builder();
+builder.child(ASYNC).setProperty(name, newReferenceCheckpoint);
+this.referenceCp = newReferenceCheckpoint;
+mergeWithConcurrencyCheck(store, validatorProviders, builder, 
existingReferenceCheckpoint, null, name);
+// Remove the existing reference checkpoint
+if (store.release(existingReferenceCheckpoint)) {
+log.info("Old reference checkpoint {} removed or didn't 
exist", existingReferenceCheckpoint);
+} else {
+log.warn("Unable to remove old reference checkpoint {}. 
This can result in orphaned checkpoints and would need to be removed 
manually.", existingReferenceCheckpoint);
+}
+// Resume the paused lane;
+this.resume();
+log.info("Resumed async indexing for lane [{}]", name);
+return "Lane successfully forced to catch-up. New reference 
checkpoint is " + newReferenceCheckpoint + " . Please make sure to perform 
reindexing to get the diff content indexed.";
+} catch (Exception e) {
+log.error("Exception while trying to force update the indexing 
lane [{}]", name, e);
+if (this.isPaused()) {
+log.info("Resuming the lane [{}] as it was paused during 
the operation", name);
+}
+return "Unable to complete the force update due to" + 
e.getMessage() + "Please check logs for more details";

Review Comment:
   ```suggestion
   return "Unable to complete the force update due to" + 
e.getMessage() + ". Please check logs for more details";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Oak-10874 | Add jmx function for bringing forward a delayed async lane to a latest checkpoint. [jackrabbit-oak]

2024-06-13 Thread via GitHub


nit0906 commented on PR #1522:
URL: https://github.com/apache/jackrabbit-oak/pull/1522#issuecomment-2166431495

   @nfsantos  - I took care of the grammar related issues in the logs.
   
   I will try and answer the other questions here - 
   
   > the forceIndexLaneCatchup is not thread safe. What would happen if it is 
called concurrently? Or if while it is being called, there are other calls 
being performed by JMX to this object? I know this is unlikely to happen, but 
maybe it's worth to make calls into this bean thread safe.
   
   The beans object is created while initialising the AsyncIndexUpdate object 
(which is a scheduled thread). It's taken care that the AsyncIndexUpdate will 
only run on author leader and cannot run concurrently (that is basically using 
the lease check mechanism - but we do clear the lease as part of this). Anyhow, 
the only actual resource that this bean modifies is the :async node's 
checkpoint detail which is committed using a mergeConcurrencyCheck (it 
basically will check before committing that the checkpoint for the lane is the 
same as the old referred checkpoint in this method - otherwise will throw a 
concurrent update exception and this will fail) - I suppose that should be 
enough for the use case ? 
   
   I will check if the clear lease is really needed in this operation or not - 
if not I will remove that too.
   
   
   
   > There is no good error handling in case some of the intermediate 
operations fail.
   
   The main point of failure can be during the commit merge - if it fails there 
- the lane is left in a paused state with the old reference checkpoint. Given 
that the lane was already failing - shouldn't be much of a problem.
   
   If there's a failure post the commit merge - it can be while deleting the 
old checkpoint (there's a log added for that failure). In this case we are left 
with an orphaned checkpoint (However this should not lead to an exception - so 
the flow will continue).
   
   I have still added a basic try catch, and resumed the lane in the catch (if 
it was left in a paused state). I think this should be sufficient - wdyt ? 
   
   
   > Just a few seconds, but if that is the case, then it could be a problem 
when doing all the steps in succession without waiting between them.
   
   That's a good point, the operations don't take immediate effect - since 
these setup the flags (paused, releaseLease etc), These are read during index 
update callback - so basically if I pause or abort a lane - it will take either 
reflect on the next async indexing run, or on an index update processed during 
the running indexing cycle - which would eventually throw an interrupt and 
abort the lane.
   
   Either way, what we need to make sure is that the new run that starts post 
this abort - starts with the new referred checkpoint, which would be the case 
here irrespective of the timing of these events - for instance, 
   
   Let's say we placed a request for pause and abort and then cleared the lease 
 - assuming that this actually didn't happen till now and we went ahead and 
changed the reference checkpoint on the :async node - If an async run is 
already in a place and has already picked up the older checkpoint - it will run 
till it gets interrupted , in which case when the lane will resume, it will 
pick up the new checkpoint (as desired) and if the async run has yet to pick up 
a checkpoint - it will pick up the new one - which is ok (even though this 
wouldn't be the case, since we are only running this on failing lanes under the 
assumption that they have been delayed for quite a while).
   
   Let me know if you think any of the above doesn't make sense. 
   
   I would also request @thomasmueller  to review the PR and my responses above 
once .
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: oak-dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org