abdullah alamoudi has posted comments on this change.

Change subject: [ASTERIXDB-1943][API][STO] Make rebalance idempotent.
......................................................................


Patch Set 12:

(15 comments)

https://asterix-gerrit.ics.uci.edu/#/c/1821/12/asterixdb/asterix-app/src/main/java/org/apache/asterix/api/http/server/RebalanceApiServlet.java
File 
asterixdb/asterix-app/src/main/java/org/apache/asterix/api/http/server/RebalanceApiServlet.java:

PS12, Line 74: Future
parametrize Future throughout this class?


PS12, Line 77: rebalanceFutureTerminated
I don't understand the need for this queue and the use of the countdownlatch?

Why not simply use the future to wait for completion?


PS12, Line 174:  sendResponse(out, jsonResponse, response, 
HttpResponseStatus.BAD_REQUEST,
              :                         "to rebalance a particular dataset, the 
parameter dataverseName must be given");
              :                 return;
Log bad requests?


PS12, Line 159: // Parses and check target nodes.
              :             if (nodes == null) {
              :                 sendResponse(out, jsonResponse, response, 
HttpResponseStatus.BAD_REQUEST, "nodes are not given");
              :                 return;
              :             }
              :             String nodesString = StringUtils.strip(nodes, 
"\"'").trim();
              :             String[] targetNodes = nodesString.split(",");
              :             if ("".equals(nodesString)) {
              :                 sendResponse(out, jsonResponse, response, 
HttpResponseStatus.BAD_REQUEST,
              :                         "target nodes should not be empty");
              :                 return;
              :             }
              : 
              :             // If a user gives parameter datasetName, she 
should give dataverseName as well.
              :             if (dataverseName == null && datasetName != null) {
              :                 sendResponse(out, jsonResponse, response, 
HttpResponseStatus.BAD_REQUEST,
              :                         "to rebalance a particular dataset, the 
parameter dataverseName must be given");
              :                 return;
              :             }
              : 
              :             // Does not allow rebalancing a metadata dataset.
              :             if (METADATA.equals(dataverseName)) {
              :                 sendResponse(out, jsonResponse, response, 
HttpResponseStatus.BAD_REQUEST,
              :                         "cannot rebalance a metadata dataset");
              :                 return;
              :             }
why not validate before submitting the task?


https://asterix-gerrit.ics.uci.edu/#/c/1821/12/asterixdb/asterix-app/src/main/java/org/apache/asterix/utils/RebalanceUtil.java
File 
asterixdb/asterix-app/src/main/java/org/apache/asterix/utils/RebalanceUtil.java:

PS12, Line 132: / TODO(yingyi): in case a crash happens, currently the system 
will either:
              :         // 1. (crash before metadata switch) think the 
rebalance is not done, and the target data files are leaked until
              :         // the next rebalance request.
              :         // 2. (crash after metadata switch) think the rebalance 
is done, and the source data files are leaked;
Create an issue?


PS12, Line 148: private interface Work 
why introduce a new interface?
why not simply use Callable<Void> ?


PS12, Line 167: throw interruptedException;
Log too when that happens. maybe along with the number of interrupted attempts?
maybe even with suppressing every subsequent interruption?

We need those to give us clues when things go bad...

Also add tests for this?


https://asterix-gerrit.ics.uci.edu/#/c/1821/12/asterixdb/asterix-app/src/test/java/org/apache/asterix/test/common/RebalanceCancellationTestExecutor.java
File 
asterixdb/asterix-app/src/test/java/org/apache/asterix/test/common/RebalanceCancellationTestExecutor.java:

PS12, Line 96: rc == 200 || rc == 404
the coverage of this test is weak and could miss regressions. Create tests that 
ensure both cases are covered.

Also cover the test case for the cancellation during the dataset switch!


https://asterix-gerrit.ics.uci.edu/#/c/1821/12/asterixdb/asterix-common/src/main/java/org/apache/asterix/common/context/DatasetLifecycleManager.java
File 
asterixdb/asterix-common/src/main/java/org/apache/asterix/common/context/DatasetLifecycleManager.java:

PS12, Line 274: No index found with resourceID " + resourceID
this error message should be fixed. preferably with an error code!


PS12, Line 280: finally {
              :             if (iInfo != null) {
              :                 iInfo.untouch();
              :             }
              :             if (dsr != null) {
              :                 dsr.untouch();
              :             }
Can we add a comment explaining in what scenario can each of these two cases 
happen?


https://asterix-gerrit.ics.uci.edu/#/c/1821/12/asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/PersistentLocalResourceRepository.java
File 
asterixdb/asterix-transactions/src/main/java/org/apache/asterix/transaction/management/resource/PersistentLocalResourceRepository.java:

PS12, Line 219:    // In case resourceFile.delete() gets interrupted -- the 
file may or may not
              :                 // exists, we should still invalidate.
              :                 // Since it's just a cache invalidation, it 
should not affect correctness.
              :                 resourceCache.invalidate(relativePath);
I think this whole method should be atomic. an easy way to do this is to make 
it un-interruptible. something like what you did in the rebalance servlet?

note that as per the comment, this could lead to a resource being deleted 
locally but not on replica since the code below will not be executed!


https://asterix-gerrit.ics.uci.edu/#/c/1821/12/hyracks-fullstack/hyracks/hyracks-storage-am-common/src/main/java/org/apache/hyracks/storage/am/common/api/IIndexDataflowHelper.java
File 
hyracks-fullstack/hyracks/hyracks-storage-am-common/src/main/java/org/apache/hyracks/storage/am/common/api/IIndexDataflowHelper.java:

PS12, Line 49: boolean failSilently
I think this interface shouldn't change and we can keep the failSilently in the 
drop index operator node pushable where we swallow the exception.


https://asterix-gerrit.ics.uci.edu/#/c/1821/12/hyracks-fullstack/hyracks/hyracks-storage-am-common/src/main/java/org/apache/hyracks/storage/am/common/dataflow/IndexDataflowHelper.java
File 
hyracks-fullstack/hyracks/hyracks-storage-am-common/src/main/java/org/apache/hyracks/storage/am/common/dataflow/IndexDataflowHelper.java:

This file will not need to change if we keep the failSilently flag in the index 
drop operator node pushable.


PS12, Line 61: (lr.getPath(
keeping the flag outside this, we can easily avoid this NPE too.


https://asterix-gerrit.ics.uci.edu/#/c/1821/12/hyracks-fullstack/hyracks/hyracks-storage-am-common/src/main/java/org/apache/hyracks/storage/am/common/dataflow/IndexDropOperatorNodePushable.java
File 
hyracks-fullstack/hyracks/hyracks-storage-am-common/src/main/java/org/apache/hyracks/storage/am/common/dataflow/IndexDropOperatorNodePushable.java:

PS12, Line 55: failSliently
instead of passing this in. let this method swallow the exception?


-- 
To view, visit https://asterix-gerrit.ics.uci.edu/1821
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0d14a07978e106cd497cc35538fafef318b2fcf7
Gerrit-PatchSet: 12
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Owner: Yingyi Bu <buyin...@gmail.com>
Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Michael Blow <mb...@apache.org>
Gerrit-Reviewer: Till Westmann <ti...@apache.org>
Gerrit-Reviewer: Yingyi Bu <buyin...@gmail.com>
Gerrit-Reviewer: abdullah alamoudi <bamou...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to