keith-turner commented on code in PR #5170:
URL: https://github.com/apache/accumulo/pull/5170#discussion_r1885850738
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/log/TabletServerLogger.java:
##########
@@ -458,6 +458,26 @@ private void write(final Collection<CommitSession>
sessions, boolean mincFinish,
// the logs haven't changed.
final int finalCurrent = currentLogId;
if (!success) {
Review Comment:
There seem to be two cases were success is false. One is when exceptions
happen and the other is a race condition where the [logs change during a
write](https://github.com/apache/accumulo/blob/78cbdbe5560272028b93401b9f2f4f792c625798/server/tserver/src/main/java/org/apache/accumulo/tserver/log/TabletServerLogger.java#L432-L440).
For the case where logs change that could be somewhat normal and may create a
lot of zookeeper load. Like if there were 100 threads writing mutations and
the logs changes while they are doing that then all of those threads would go
do these zookeeper reads.
Could avoid checking on this race condition w/ another boolean that is only
set when an exception happens.
```java
if(!success){
if(sawException && (tserver.getLock() == null ||
!tserver.getLock().verifyLockAtSource())) {
....
}
}
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/log/TabletServerLogger.java:
##########
@@ -458,6 +458,26 @@ private void write(final Collection<CommitSession>
sessions, boolean mincFinish,
// the logs haven't changed.
final int finalCurrent = currentLogId;
if (!success) {
+
+ if (tserver.getLock() == null ||
!tserver.getLock().verifyLockAtSource()) {
Review Comment:
There could be a lot of threads running this code for writes in lots of
tserers. If there is temporary HDFS across the closter hiccup it could cause
all of those threads to jump on zookeeper. That is probably ok, so not sure we
should try to do anything about that. Might be worthwhile adding some debug or
greater logging when this check is run so that its known that it is happening.
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/log/TabletServerLogger.java:
##########
@@ -458,6 +458,26 @@ private void write(final Collection<CommitSession>
sessions, boolean mincFinish,
// the logs haven't changed.
final int finalCurrent = currentLogId;
if (!success) {
+
+ if (tserver.getLock() == null ||
!tserver.getLock().verifyLockAtSource()) {
+ // try to close the log, then Halt the VM
+ testLockAndRun(logIdLock, new TestCallWithWriteLock() {
+
+ @Override
+ boolean test() {
+ return true;
+ }
+
+ @Override
+ void withWriteLock() throws IOException {
+ close();
Review Comment:
This has the potential to block and prevent the thread from calling Halt.
Wondering if this should be done, maybe just skip it and halt. After we know
the lock is not held the sooner we lock the better.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]