[jira] [Updated] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state for a long time
[ https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3082: - Component/s: consensus > tablets in "CONSENSUS_MISMATCH" state for a long time > - > > Key: KUDU-3082 > URL: https://issues.apache.org/jira/browse/KUDU-3082 > Project: Kudu > Issue Type: Bug > Components: consensus >Affects Versions: 1.10.1 >Reporter: YifanZhang >Priority: Major > > Lately we found a few tablets in one of our clusters are unhealthy, the ksck > output is like: > > {code:java} > Tablet Summary > Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = 7380d797d2ea49e88d71091802fb1c81 > B = d1952499f94a4e6087bee28466fcb09f > C = 47af52df1adc47e1903eb097e9c88f2e > D = 08beca5ed4d04003b6979bf8bac378d2 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 5| -1 | Yes > B | A B C| 5| -1 | Yes > C | A B C* D~ | 5| 54649| No > Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 > replicas' active configs disagree with the leader master's > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > 5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING > All reported replicas are: > A = d1952499f94a4e6087bee28466fcb09f > B = 47af52df1adc47e1903eb097e9c88f2e > C = 5a8aeadabdd140c29a09dabcae919b31 > D = 14632cdbb0d04279bc772f64e06389f9 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B* C| | | Yes > A | A B* C| 5| 5| Yes > B | A B* C D~ | 5| 96176| No > C | A B* C| 5| 5| Yes > Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 > replicas' active configs disagree with the leader master's > a9eaff3cf1ed483aae84954d649a (kudu-ts23): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = a9eaff3cf1ed483aae84954d649a > B = f75df4a6b5ce404884313af5f906b392 > C = 47af52df1adc47e1903eb097e9c88f2e > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 1| -1 | Yes > B | A B C* | 1| -1 | Yes > C | A B C* D~ | 1| 2| No > Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > All reported replicas are: > A = 47af52df1adc47e1903eb097e9c88f2e > B = f0f7b2f4b9d344e6929105f48365f38e > C = f75df4a6b5ce404884313af5f906b392 > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A* B C| | | Yes > A | A* B C D~ | 1| 1991 | No > B | A* B C| 1| 4| Yes > C | A* B C| 1| 4| Yes{code} > These tablets couldn't recover for a couple of days until we restart > kudu-ts27. > I found so many duplicated logs in kudu-ts27 are like: > {code:java} > I0314 04:38:41.511279 65731 raft_consensus.cc:937] T > 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 > LEADER]: attempt to pr
[jira] [Created] (KUDU-3084) Multiple time sources with fallback behavior between them
Alexey Serbin created KUDU-3084: --- Summary: Multiple time sources with fallback behavior between them Key: KUDU-3084 URL: https://issues.apache.org/jira/browse/KUDU-3084 Project: Kudu Issue Type: Improvement Components: master, tserver Reporter: Alexey Serbin [~tlipcon] suggested an alternative approach to configure and select HybridClock's time source. Kudu servers could maintain multiple time sources and switch between them with a fallback behavior. The default or preferred time source might be any of the existing ones (e.g., the built-in client), but when it's not available, another available time source is selected (e.g., {{system}} -- the NTP-synchronized local clock). Switching between time sources can be done: * only upon startup/initialization * upon startup/initialization and later during normal run time The advantages are: * easier deployment and configuration of Kudu clusters * simplified upgrade path from older releases using {{system}} time source to newer releases using {{builtin}} time source by default There are downsides, though. Since the new way of maintaining time source is more dynamic, it can: * mask various configuration or network issues * result in different time source within the same Kudu cluster due to transient issues * introduce extra startup delay -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2432) isolate race creating directory via dist_test.py
[ https://issues.apache.org/jira/browse/KUDU-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063083#comment-17063083 ] Todd Lipcon commented on KUDU-2432: --- I pushed a fix for this: https://github.com/cloudera/dist_test/pull/new/kudu-2432 Testing in prod :) > isolate race creating directory via dist_test.py > > > Key: KUDU-2432 > URL: https://issues.apache.org/jira/browse/KUDU-2432 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: Mike Percy >Priority: Major > Attachments: logs.txt > > > When running dist_test.py I have been getting a 1% failure rate due to the > following errors. > I am not sure if this is new or related to a single bad machine. > {code:java} > failed to download task files: WARNING 123 isolateserver(1484): Adding > unknown file 7cf0792d18a9dbef867c9bce0c681b3def0510b6 to cache > WARNING 126 isolateserver(1490): Added back 1 unknown files > INFO 135 tools(106): Profiling: Section Setup took 0.045 seconds > INFO 164 tools(106): Profiling: Section GetIsolateds took 0.029 seconds > INFO 167 tools(106): Profiling: Section GetRest took 0.003 seconds > INFO 175 isolateserver(1365): 1 ( 227022kb) added > INFO 176 isolateserver(1369): 1642 ( 3864634kb) current > INFO 176 isolateserver(1372): 0 ( 0kb) removed > INFO 176 isolateserver(1375): 45627408kb free > INFO 176 tools(106): Profiling: Section CleanupTrimming took 0.009 seconds > INFO 177 isolateserver(1365): 1 ( 227022kb) added > INFO 177 isolateserver(1369): 1642 ( 3864634kb) current > INFO 177 isolateserver(1372): 0 ( 0kb) removed > INFO 177 isolateserver(1375): 45627408kb free > INFO 178 tools(106): Profiling: Section CleanupTrimming took 0.001 seconds > INFO 178 isolateserver(381): Waiting for all threads to die... > INFO 178 isolateserver(390): Done. > Traceback (most recent call last): > File "/swarming.client/isolateserver.py", line 2211, in > sys.exit(main(sys.argv[1:])) > File "/swarming.client/isolateserver.py", line 2204, in main > return dispatcher.execute(OptionParserIsolateServer(), args) > File "/swarming.client/third_party/depot_tools/subcommand.py", line 242, in > execute > return command(parser, args[1:]) > File "/swarming.client/isolateserver.py", line 2064, in CMDdownload > require_command=False) > File "/swarming.client/isolateserver.py", line 1827, in fetch_isolated > create_directories(outdir, bundle.files) > File "/swarming.client/isolateserver.py", line 212, in create_directories > os.mkdir(os.path.join(base_directory, d)) > OSError: [Errno 17] File exists: '/tmp/dist-test-task_gm4pM/build' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2432) isolate race creating directory via dist_test.py
[ https://issues.apache.org/jira/browse/KUDU-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063080#comment-17063080 ] Todd Lipcon commented on KUDU-2432: --- I looked into this a bit tonight since it's happening a lot lately. I sshed into one of the slaves that had had a failure and ran 'docker logs' on the dist-test slave container to get the full logs, and then grabbed the portion corresponding to a failed job. It looks like the issue is that a first attempt to download the files for the task failed with a "connection reset by peer" error. The retries seem to fail because the directory already exists from the first attempt. In other words, it's not a race, just broken retry logic. Will look at the code next. > isolate race creating directory via dist_test.py > > > Key: KUDU-2432 > URL: https://issues.apache.org/jira/browse/KUDU-2432 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: Mike Percy >Priority: Major > Attachments: logs.txt > > > When running dist_test.py I have been getting a 1% failure rate due to the > following errors. > I am not sure if this is new or related to a single bad machine. > {code:java} > failed to download task files: WARNING 123 isolateserver(1484): Adding > unknown file 7cf0792d18a9dbef867c9bce0c681b3def0510b6 to cache > WARNING 126 isolateserver(1490): Added back 1 unknown files > INFO 135 tools(106): Profiling: Section Setup took 0.045 seconds > INFO 164 tools(106): Profiling: Section GetIsolateds took 0.029 seconds > INFO 167 tools(106): Profiling: Section GetRest took 0.003 seconds > INFO 175 isolateserver(1365): 1 ( 227022kb) added > INFO 176 isolateserver(1369): 1642 ( 3864634kb) current > INFO 176 isolateserver(1372): 0 ( 0kb) removed > INFO 176 isolateserver(1375): 45627408kb free > INFO 176 tools(106): Profiling: Section CleanupTrimming took 0.009 seconds > INFO 177 isolateserver(1365): 1 ( 227022kb) added > INFO 177 isolateserver(1369): 1642 ( 3864634kb) current > INFO 177 isolateserver(1372): 0 ( 0kb) removed > INFO 177 isolateserver(1375): 45627408kb free > INFO 178 tools(106): Profiling: Section CleanupTrimming took 0.001 seconds > INFO 178 isolateserver(381): Waiting for all threads to die... > INFO 178 isolateserver(390): Done. > Traceback (most recent call last): > File "/swarming.client/isolateserver.py", line 2211, in > sys.exit(main(sys.argv[1:])) > File "/swarming.client/isolateserver.py", line 2204, in main > return dispatcher.execute(OptionParserIsolateServer(), args) > File "/swarming.client/third_party/depot_tools/subcommand.py", line 242, in > execute > return command(parser, args[1:]) > File "/swarming.client/isolateserver.py", line 2064, in CMDdownload > require_command=False) > File "/swarming.client/isolateserver.py", line 1827, in fetch_isolated > create_directories(outdir, bundle.files) > File "/swarming.client/isolateserver.py", line 212, in create_directories > os.mkdir(os.path.join(base_directory, d)) > OSError: [Errno 17] File exists: '/tmp/dist-test-task_gm4pM/build' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2432) isolate race creating directory via dist_test.py
[ https://issues.apache.org/jira/browse/KUDU-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-2432: -- Attachment: logs.txt > isolate race creating directory via dist_test.py > > > Key: KUDU-2432 > URL: https://issues.apache.org/jira/browse/KUDU-2432 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: Mike Percy >Priority: Major > Attachments: logs.txt > > > When running dist_test.py I have been getting a 1% failure rate due to the > following errors. > I am not sure if this is new or related to a single bad machine. > {code:java} > failed to download task files: WARNING 123 isolateserver(1484): Adding > unknown file 7cf0792d18a9dbef867c9bce0c681b3def0510b6 to cache > WARNING 126 isolateserver(1490): Added back 1 unknown files > INFO 135 tools(106): Profiling: Section Setup took 0.045 seconds > INFO 164 tools(106): Profiling: Section GetIsolateds took 0.029 seconds > INFO 167 tools(106): Profiling: Section GetRest took 0.003 seconds > INFO 175 isolateserver(1365): 1 ( 227022kb) added > INFO 176 isolateserver(1369): 1642 ( 3864634kb) current > INFO 176 isolateserver(1372): 0 ( 0kb) removed > INFO 176 isolateserver(1375): 45627408kb free > INFO 176 tools(106): Profiling: Section CleanupTrimming took 0.009 seconds > INFO 177 isolateserver(1365): 1 ( 227022kb) added > INFO 177 isolateserver(1369): 1642 ( 3864634kb) current > INFO 177 isolateserver(1372): 0 ( 0kb) removed > INFO 177 isolateserver(1375): 45627408kb free > INFO 178 tools(106): Profiling: Section CleanupTrimming took 0.001 seconds > INFO 178 isolateserver(381): Waiting for all threads to die... > INFO 178 isolateserver(390): Done. > Traceback (most recent call last): > File "/swarming.client/isolateserver.py", line 2211, in > sys.exit(main(sys.argv[1:])) > File "/swarming.client/isolateserver.py", line 2204, in main > return dispatcher.execute(OptionParserIsolateServer(), args) > File "/swarming.client/third_party/depot_tools/subcommand.py", line 242, in > execute > return command(parser, args[1:]) > File "/swarming.client/isolateserver.py", line 2064, in CMDdownload > require_command=False) > File "/swarming.client/isolateserver.py", line 1827, in fetch_isolated > create_directories(outdir, bundle.files) > File "/swarming.client/isolateserver.py", line 212, in create_directories > os.mkdir(os.path.join(base_directory, d)) > OSError: [Errno 17] File exists: '/tmp/dist-test-task_gm4pM/build' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state for a long time
[ https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063045#comment-17063045 ] YifanZhang commented on KUDU-3082: -- Sorry I forgot to explain, the cluster version is 1.10.1. > tablets in "CONSENSUS_MISMATCH" state for a long time > - > > Key: KUDU-3082 > URL: https://issues.apache.org/jira/browse/KUDU-3082 > Project: Kudu > Issue Type: Bug >Reporter: YifanZhang >Priority: Major > > Lately we found a few tablets in one of our clusters are unhealthy, the ksck > output is like: > > {code:java} > Tablet Summary > Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = 7380d797d2ea49e88d71091802fb1c81 > B = d1952499f94a4e6087bee28466fcb09f > C = 47af52df1adc47e1903eb097e9c88f2e > D = 08beca5ed4d04003b6979bf8bac378d2 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 5| -1 | Yes > B | A B C| 5| -1 | Yes > C | A B C* D~ | 5| 54649| No > Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 > replicas' active configs disagree with the leader master's > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > 5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING > All reported replicas are: > A = d1952499f94a4e6087bee28466fcb09f > B = 47af52df1adc47e1903eb097e9c88f2e > C = 5a8aeadabdd140c29a09dabcae919b31 > D = 14632cdbb0d04279bc772f64e06389f9 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B* C| | | Yes > A | A B* C| 5| 5| Yes > B | A B* C D~ | 5| 96176| No > C | A B* C| 5| 5| Yes > Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 > replicas' active configs disagree with the leader master's > a9eaff3cf1ed483aae84954d649a (kudu-ts23): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = a9eaff3cf1ed483aae84954d649a > B = f75df4a6b5ce404884313af5f906b392 > C = 47af52df1adc47e1903eb097e9c88f2e > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 1| -1 | Yes > B | A B C* | 1| -1 | Yes > C | A B C* D~ | 1| 2| No > Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > All reported replicas are: > A = 47af52df1adc47e1903eb097e9c88f2e > B = f0f7b2f4b9d344e6929105f48365f38e > C = f75df4a6b5ce404884313af5f906b392 > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A* B C| | | Yes > A | A* B C D~ | 1| 1991 | No > B | A* B C| 1| 4| Yes > C | A* B C| 1| 4| Yes{code} > These tablets couldn't recover for a couple of days until we restart > kudu-ts27. > I found so many duplicated logs in kudu-ts27 are like: > {code:java} > I0314 04:38:41.511279 65731 raft_consensus.cc:937] T > 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3
[jira] [Updated] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state for a long time
[ https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3082: - Affects Version/s: 1.10.1 > tablets in "CONSENSUS_MISMATCH" state for a long time > - > > Key: KUDU-3082 > URL: https://issues.apache.org/jira/browse/KUDU-3082 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: YifanZhang >Priority: Major > > Lately we found a few tablets in one of our clusters are unhealthy, the ksck > output is like: > > {code:java} > Tablet Summary > Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = 7380d797d2ea49e88d71091802fb1c81 > B = d1952499f94a4e6087bee28466fcb09f > C = 47af52df1adc47e1903eb097e9c88f2e > D = 08beca5ed4d04003b6979bf8bac378d2 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 5| -1 | Yes > B | A B C| 5| -1 | Yes > C | A B C* D~ | 5| 54649| No > Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 > replicas' active configs disagree with the leader master's > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > 5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING > All reported replicas are: > A = d1952499f94a4e6087bee28466fcb09f > B = 47af52df1adc47e1903eb097e9c88f2e > C = 5a8aeadabdd140c29a09dabcae919b31 > D = 14632cdbb0d04279bc772f64e06389f9 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B* C| | | Yes > A | A B* C| 5| 5| Yes > B | A B* C D~ | 5| 96176| No > C | A B* C| 5| 5| Yes > Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 > replicas' active configs disagree with the leader master's > a9eaff3cf1ed483aae84954d649a (kudu-ts23): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = a9eaff3cf1ed483aae84954d649a > B = f75df4a6b5ce404884313af5f906b392 > C = 47af52df1adc47e1903eb097e9c88f2e > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 1| -1 | Yes > B | A B C* | 1| -1 | Yes > C | A B C* D~ | 1| 2| No > Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > All reported replicas are: > A = 47af52df1adc47e1903eb097e9c88f2e > B = f0f7b2f4b9d344e6929105f48365f38e > C = f75df4a6b5ce404884313af5f906b392 > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A* B C| | | Yes > A | A* B C D~ | 1| 1991 | No > B | A* B C| 1| 4| Yes > C | A* B C| 1| 4| Yes{code} > These tablets couldn't recover for a couple of days until we restart > kudu-ts27. > I found so many duplicated logs in kudu-ts27 are like: > {code:java} > I0314 04:38:41.511279 65731 raft_consensus.cc:937] T > 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 > LEADER]: attempt to promote peer 08beca5ed4d04003b69
[jira] [Resolved] (KUDU-2928) built-in NTP client: tests to evaluate the behavior of the client
[ https://issues.apache.org/jira/browse/KUDU-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin resolved KUDU-2928. - Fix Version/s: 1.12.0 Resolution: Fixed Implemented with {{4aa0c7c0bc7d91af8be9a837b64f2a53fe31dd44}} > built-in NTP client: tests to evaluate the behavior of the client > - > > Key: KUDU-2928 > URL: https://issues.apache.org/jira/browse/KUDU-2928 > Project: Kudu > Issue Type: Sub-task > Components: clock, test >Affects Versions: 1.11.0 >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > Labels: clock > Fix For: 1.12.0 > > > It's necessary to implement tests covering the behavior of the built-in NTP > client in various corner cases: > * A set of NTP servers which doesn't agree on time > * non-synchronized NTP server > * NTP server that loses track of its reference and becomes a false ticker > * etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3083) Kudu installation fails on ubuntu 19.10 & gcc
[ https://issues.apache.org/jira/browse/KUDU-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Syed Taha Nemat updated KUDU-3083: -- Description: Followed steps mentioned in [https://kudu.apache.org/docs/installation.html#ubuntu_from_source]. Build-if-neccesary.sh script fails on linux ubuntu 19.10 with a gcc compiler. Possible work arounds: * have users install clang in environment and run cmake commands using clang * make all C code compliant with gcc. was: Followed steps mentioned in [https://kudu.apache.org/docs/installation.html#ubuntu_from_source]. Build-if-neccesary.sh script fails on linux ubuntu 19.10 with a gcc compiler. Possible work arounds: * have users install clang in environment * make all C code compliant with gcc. > Kudu installation fails on ubuntu 19.10 & gcc > - > > Key: KUDU-3083 > URL: https://issues.apache.org/jira/browse/KUDU-3083 > Project: Kudu > Issue Type: Bug > Components: build > Environment: ubuntu 19.10, gcc >Reporter: Syed Taha Nemat >Priority: Major > Labels: installguide > Attachments: Screenshot from 2020-03-20 04-36-35.png > > > Followed steps mentioned in > [https://kudu.apache.org/docs/installation.html#ubuntu_from_source]. > Build-if-neccesary.sh script fails on linux ubuntu 19.10 with a gcc compiler. > Possible work arounds: > * have users install clang in environment and run cmake commands using clang > * make all C code compliant with gcc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3083) Kudu installation fails on ubuntu 19.10 & gcc
Syed Taha Nemat created KUDU-3083: - Summary: Kudu installation fails on ubuntu 19.10 & gcc Key: KUDU-3083 URL: https://issues.apache.org/jira/browse/KUDU-3083 Project: Kudu Issue Type: Bug Components: build Environment: ubuntu 19.10, gcc Reporter: Syed Taha Nemat Attachments: Screenshot from 2020-03-20 04-36-35.png Followed steps mentioned in [https://kudu.apache.org/docs/installation.html#ubuntu_from_source]. Build-if-neccesary.sh script fails on linux ubuntu 19.10 with a gcc compiler. Possible work arounds: * have users install clang in environment * make all C code compliant with gcc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state for a long time
[ https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062763#comment-17062763 ] Alexey Serbin commented on KUDU-3082: - [~zhangyifan27], what Kudu version is that? > tablets in "CONSENSUS_MISMATCH" state for a long time > - > > Key: KUDU-3082 > URL: https://issues.apache.org/jira/browse/KUDU-3082 > Project: Kudu > Issue Type: Bug >Reporter: YifanZhang >Priority: Major > > Lately we found a few tablets in one of our clusters are unhealthy, the ksck > output is like: > > {code:java} > Tablet Summary > Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = 7380d797d2ea49e88d71091802fb1c81 > B = d1952499f94a4e6087bee28466fcb09f > C = 47af52df1adc47e1903eb097e9c88f2e > D = 08beca5ed4d04003b6979bf8bac378d2 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 5| -1 | Yes > B | A B C| 5| -1 | Yes > C | A B C* D~ | 5| 54649| No > Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 > replicas' active configs disagree with the leader master's > d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > 5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING > All reported replicas are: > A = d1952499f94a4e6087bee28466fcb09f > B = 47af52df1adc47e1903eb097e9c88f2e > C = 5a8aeadabdd140c29a09dabcae919b31 > D = 14632cdbb0d04279bc772f64e06389f9 > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B* C| | | Yes > A | A B* C| 5| 5| Yes > B | A B* C D~ | 5| 96176| No > C | A B* C| 5| 5| Yes > Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 > replicas' active configs disagree with the leader master's > a9eaff3cf1ed483aae84954d649a (kudu-ts23): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > All reported replicas are: > A = a9eaff3cf1ed483aae84954d649a > B = f75df4a6b5ce404884313af5f906b392 > C = 47af52df1adc47e1903eb097e9c88f2e > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A B C* | | | Yes > A | A B C* | 1| -1 | Yes > B | A B C* | 1| -1 | Yes > C | A B C* D~ | 1| 2| No > Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 > replicas' active configs disagree with the leader master's > 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] > f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING > f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING > All reported replicas are: > A = 47af52df1adc47e1903eb097e9c88f2e > B = f0f7b2f4b9d344e6929105f48365f38e > C = f75df4a6b5ce404884313af5f906b392 > D = d1952499f94a4e6087bee28466fcb09f > The consensus matrix is: > Config source | Replicas | Current term | Config index | Committed? > ---+--+--+--+ > master| A* B C| | | Yes > A | A* B C D~ | 1| 1991 | No > B | A* B C| 1| 4| Yes > C | A* B C| 1| 4| Yes{code} > These tablets couldn't recover for a couple of days until we restart > kudu-ts27. > I found so many duplicated logs in kudu-ts27 are like: > {code:java} > I0314 04:38:41.511279 65731 raft_consensus.cc:937] T > 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 > LEAD
[jira] [Updated] (KUDU-3067) Inexplict cloud detection for AWS and OpenStack based cloud by querying metadata
[ https://issues.apache.org/jira/browse/KUDU-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-3067: Fix Version/s: 1.12.0 Resolution: Fixed Status: Resolved (was: In Review) > Inexplict cloud detection for AWS and OpenStack based cloud by querying > metadata > > > Key: KUDU-3067 > URL: https://issues.apache.org/jira/browse/KUDU-3067 > Project: Kudu > Issue Type: Bug >Reporter: liusheng >Assignee: Alexey Serbin >Priority: Major > Fix For: 1.12.0 > > > The cloud detector is used to check the cloud provider of the instance, see > [here|#L59-L93]], For AWS cloud it using the URL > [http://169.254.169.254/latest/meta-data/instance-id|http://169.254.169.254/latest/meta-data/instance-id*] > to check the specific metadata to determine it is AWS instance. This is OK, > but for OpenStack based cloud, the metadata is same with AWS, so this URL can > also be accessed. So this cannot distinct the AWS and other OpenStack based > clouds. This caused an issue when run > "HybridClockTest.TimeSourceAutoSelection" test case, this test will use the > above URL to detect the Cloud of instance current running on and then try to > call the NTP service, for AWS, the dedicated NTP service is > "169.254.169.123", but for OpenStack based cloud, there isn't such a > dedicated NTP service. So this test case will fail if I run on a instance of > OpenStack based cloud because the cloud detector suppose it is AWS instance > and try to access "169.254.169.123". > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3067) Inexplict cloud detection for AWS and OpenStack based cloud by querying metadata
[ https://issues.apache.org/jira/browse/KUDU-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062706#comment-17062706 ] ASF subversion and git services commented on KUDU-3067: --- Commit a80d7472110ae2349a82c2150aad61079969b337 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=a80d747 ] [util] KUDU-3067 add OpenStack metadata detector This patch adds OpenStack metadata detector that works with OpenStack Nova metadata server (see [1] for details). In addition, this patch fixes the existing AWS detector to tell apart a true EC2 instance from a masquerading OpenStack one [2]. I couldn't get access to an OpenStack instance, but I asked the reporter of KUDU-3067 to test how it works and report back. 1. https://docs.openstack.org/nova/latest/user/metadata.html#metadata-service 2. https://docs.openstack.org/nova/latest/user/metadata.html#metadata-ec2-format Change-Id: I84cc6d155ab1fbd7b401f5349d292f46fcac3a34 Reviewed-on: http://gerrit.cloudera.org:8080/15488 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo Reviewed-by: liusheng > Inexplict cloud detection for AWS and OpenStack based cloud by querying > metadata > > > Key: KUDU-3067 > URL: https://issues.apache.org/jira/browse/KUDU-3067 > Project: Kudu > Issue Type: Bug >Reporter: liusheng >Assignee: Alexey Serbin >Priority: Major > > The cloud detector is used to check the cloud provider of the instance, see > [here|#L59-L93]], For AWS cloud it using the URL > [http://169.254.169.254/latest/meta-data/instance-id|http://169.254.169.254/latest/meta-data/instance-id*] > to check the specific metadata to determine it is AWS instance. This is OK, > but for OpenStack based cloud, the metadata is same with AWS, so this URL can > also be accessed. So this cannot distinct the AWS and other OpenStack based > clouds. This caused an issue when run > "HybridClockTest.TimeSourceAutoSelection" test case, this test will use the > above URL to detect the Cloud of instance current running on and then try to > call the NTP service, for AWS, the dedicated NTP service is > "169.254.169.123", but for OpenStack based cloud, there isn't such a > dedicated NTP service. So this test case will fail if I run on a instance of > OpenStack based cloud because the cloud detector suppose it is AWS instance > and try to access "169.254.169.123". > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3067) Inexplict cloud detection for AWS and OpenStack based cloud by querying metadata
[ https://issues.apache.org/jira/browse/KUDU-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062705#comment-17062705 ] ASF subversion and git services commented on KUDU-3067: --- Commit a80d7472110ae2349a82c2150aad61079969b337 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=a80d747 ] [util] KUDU-3067 add OpenStack metadata detector This patch adds OpenStack metadata detector that works with OpenStack Nova metadata server (see [1] for details). In addition, this patch fixes the existing AWS detector to tell apart a true EC2 instance from a masquerading OpenStack one [2]. I couldn't get access to an OpenStack instance, but I asked the reporter of KUDU-3067 to test how it works and report back. 1. https://docs.openstack.org/nova/latest/user/metadata.html#metadata-service 2. https://docs.openstack.org/nova/latest/user/metadata.html#metadata-ec2-format Change-Id: I84cc6d155ab1fbd7b401f5349d292f46fcac3a34 Reviewed-on: http://gerrit.cloudera.org:8080/15488 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo Reviewed-by: liusheng > Inexplict cloud detection for AWS and OpenStack based cloud by querying > metadata > > > Key: KUDU-3067 > URL: https://issues.apache.org/jira/browse/KUDU-3067 > Project: Kudu > Issue Type: Bug >Reporter: liusheng >Assignee: Alexey Serbin >Priority: Major > > The cloud detector is used to check the cloud provider of the instance, see > [here|#L59-L93]], For AWS cloud it using the URL > [http://169.254.169.254/latest/meta-data/instance-id|http://169.254.169.254/latest/meta-data/instance-id*] > to check the specific metadata to determine it is AWS instance. This is OK, > but for OpenStack based cloud, the metadata is same with AWS, so this URL can > also be accessed. So this cannot distinct the AWS and other OpenStack based > clouds. This caused an issue when run > "HybridClockTest.TimeSourceAutoSelection" test case, this test will use the > above URL to detect the Cloud of instance current running on and then try to > call the NTP service, for AWS, the dedicated NTP service is > "169.254.169.123", but for OpenStack based cloud, there isn't such a > dedicated NTP service. So this test case will fail if I run on a instance of > OpenStack based cloud because the cloud detector suppose it is AWS instance > and try to access "169.254.169.123". > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state for a long time
[ https://issues.apache.org/jira/browse/KUDU-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3082: - Description: Lately we found a few tablets in one of our clusters are unhealthy, the ksck output is like: {code:java} Tablet Summary Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 replicas' active configs disagree with the leader master's 7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] All reported replicas are: A = 7380d797d2ea49e88d71091802fb1c81 B = d1952499f94a4e6087bee28466fcb09f C = 47af52df1adc47e1903eb097e9c88f2e D = 08beca5ed4d04003b6979bf8bac378d2 The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A B C* | | | Yes A | A B C* | 5| -1 | Yes B | A B C| 5| -1 | Yes C | A B C* D~ | 5| 54649| No Tablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 replicas' active configs disagree with the leader master's d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] 5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING All reported replicas are: A = d1952499f94a4e6087bee28466fcb09f B = 47af52df1adc47e1903eb097e9c88f2e C = 5a8aeadabdd140c29a09dabcae919b31 D = 14632cdbb0d04279bc772f64e06389f9 The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A B* C| | | Yes A | A B* C| 5| 5| Yes B | A B* C D~ | 5| 96176| No C | A B* C| 5| 5| Yes Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 replicas' active configs disagree with the leader master's a9eaff3cf1ed483aae84954d649a (kudu-ts23): RUNNING f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] All reported replicas are: A = a9eaff3cf1ed483aae84954d649a B = f75df4a6b5ce404884313af5f906b392 C = 47af52df1adc47e1903eb097e9c88f2e D = d1952499f94a4e6087bee28466fcb09f The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A B C* | | | Yes A | A B C* | 1| -1 | Yes B | A B C* | 1| -1 | Yes C | A B C* D~ | 1| 2| No Tablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 replicas' active configs disagree with the leader master's 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING All reported replicas are: A = 47af52df1adc47e1903eb097e9c88f2e B = f0f7b2f4b9d344e6929105f48365f38e C = f75df4a6b5ce404884313af5f906b392 D = d1952499f94a4e6087bee28466fcb09f The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A* B C| | | Yes A | A* B C D~ | 1| 1991 | No B | A* B C| 1| 4| Yes C | A* B C| 1| 4| Yes{code} These tablets couldn't recover for a couple of days until we restart kudu-ts27. I found so many duplicated logs in kudu-ts27 are like: {code:java} I0314 04:38:41.511279 65731 raft_consensus.cc:937] T 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 LEADER]: attempt to promote peer 08beca5ed4d04003b6979bf8bac378d2: there is already a config change operation in progress. Unable to promote follower until it completes. Doing nothing. I0314 04:38:41.751009 65453 raft_consensus.cc:937] T 6d9d3fb034314fa7bee9cfbf602bcdc8 P 47af52df1adc47e1903eb097e9c88f2e [term 5 LEADER]: attempt to promote peer 14632cdbb0d04279bc772f64e06389f9: there is already a config change operation in progress. Unable to promote follower until it completes. Doing nothing. {code} There seems to be some RaftConfig change operations that somehow cannot comple
[jira] [Created] (KUDU-3082) tablets in "CONSENSUS_MISMATCH" state for a long time
YifanZhang created KUDU-3082: Summary: tablets in "CONSENSUS_MISMATCH" state for a long time Key: KUDU-3082 URL: https://issues.apache.org/jira/browse/KUDU-3082 Project: Kudu Issue Type: Bug Reporter: YifanZhang Lately we found a few tablets in one of our clusters are unhealthy, the ksck output is like: {code:java} Tablet Summary Tablet 7404240f458f462d92b6588d07583a52 of table '' is conflicted: 3 replicas' active configs disagree with the leader master's 7380d797d2ea49e88d71091802fb1c81 (kudu-ts26): RUNNING d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] All reported replicas are: A = 7380d797d2ea49e88d71091802fb1c81 B = d1952499f94a4e6087bee28466fcb09f C = 47af52df1adc47e1903eb097e9c88f2e D = 08beca5ed4d04003b6979bf8bac378d2 The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A B C* | | | Yes A | A B C* | 5| -1 | Yes B | A B C| 5| -1 | Yes C | A B C* D~ | 5| 54649| NoTablet 6d9d3fb034314fa7bee9cfbf602bcdc8 of table '' is conflicted: 2 replicas' active configs disagree with the leader master's d1952499f94a4e6087bee28466fcb09f (kudu-ts25): RUNNING 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] 5a8aeadabdd140c29a09dabcae919b31 (kudu-ts21): RUNNING All reported replicas are: A = d1952499f94a4e6087bee28466fcb09f B = 47af52df1adc47e1903eb097e9c88f2e C = 5a8aeadabdd140c29a09dabcae919b31 D = 14632cdbb0d04279bc772f64e06389f9 The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A B* C| | | Yes A | A B* C| 5| 5| Yes B | A B* C D~ | 5| 96176| No C | A B* C| 5| 5| Yes Tablet bf1ec7d693b94632b099dc0928e76363 of table '' is conflicted: 1 replicas' active configs disagree with the leader master's a9eaff3cf1ed483aae84954d649a (kudu-ts23): RUNNING f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] All reported replicas are: A = a9eaff3cf1ed483aae84954d649a B = f75df4a6b5ce404884313af5f906b392 C = 47af52df1adc47e1903eb097e9c88f2e D = d1952499f94a4e6087bee28466fcb09f The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A B C* | | | Yes A | A B C* | 1| -1 | Yes B | A B C* | 1| -1 | Yes C | A B C* D~ | 1| 2| NoTablet 3190a310857e4c64997adb477131488a of table '' is conflicted: 3 replicas' active configs disagree with the leader master's 47af52df1adc47e1903eb097e9c88f2e (kudu-ts27): RUNNING [LEADER] f0f7b2f4b9d344e6929105f48365f38e (kudu-ts24): RUNNING f75df4a6b5ce404884313af5f906b392 (kudu-ts19): RUNNING All reported replicas are: A = 47af52df1adc47e1903eb097e9c88f2e B = f0f7b2f4b9d344e6929105f48365f38e C = f75df4a6b5ce404884313af5f906b392 D = d1952499f94a4e6087bee28466fcb09f The consensus matrix is: Config source | Replicas | Current term | Config index | Committed? ---+--+--+--+ master| A* B C| | | Yes A | A* B C D~ | 1| 1991 | No B | A* B C| 1| 4| Yes C | A* B C| 1| 4| Yes{code} These tablets couldn't recover for a couple of days until we restart kudu-ts27. I found so many duplicated logs in kudu-ts27 are like: {code:java} I0314 04:38:41.511279 65731 raft_consensus.cc:937] T 7404240f458f462d92b6588d07583a52 P 47af52df1adc47e1903eb097e9c88f2e [term 3 LEADER]: attempt to promote peer 08beca5ed4d04003b6979bf8bac378d2: there is already a config change operation in progress. Unable to promote follower until it completes. Doing nothing. I0314 04:38:41.751009 65453 raft_consensus.cc:937] T 6d9d3fb034314fa7bee9cfbf602bcdc8 P 47af52df1adc47e1903eb097e9c88f2e [term 5 LEADER]: attempt to promote peer 14632cdbb0d04279bc772f64e06389f9: there is already a config change operation in progress. Unable to promote follower