This is an automated email from the ASF dual-hosted git repository.

alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git


The following commit(s) were added to refs/heads/master by this push:
     new 366596389 KUDU-3527 Fix block manager test when using 64k container 
block alignment
366596389 is described below

commit 366596389b12df977a93408eebdc7dc9e30be3b3
Author: Zoltan Martonka <zmarto...@gmail.com>
AuthorDate: Mon Jan 29 16:07:49 2024 +0000

    KUDU-3527 Fix block manager test when using 64k container block alignment
    
    BlockManagerTest.TestMetadataOkayDespiteFailure might fail on system
    where we use 64k alignment for data blocks.
    
    Root cause:
        Currently  tablets fail to load if .metadata is missing but there
        is still a non-empty ".data" file. If FLAGS_env_inject_eio is set to 
greater
        than zero, then there is a chance that, when we delete the container 
file,
        we only delete the ".meta", but leave the ".data" file.
        So deleting containers with injected io errors is expected to
        sometime prevent the block manager from restarting properly.
    
        However container deletion almost never occurred in this test until we 
run
        it on the new RHEL 8.8 ARM with 64K page size.
    
    Why is it stable on x86_64:
        On x86_64 we usually use 4k block alignment. We write 6080 byte data 
into a
        block, which is padded to 8k. So in the current test settings we have 32
        blocks in a container when it becomes full
        (FLAGS_log_container_max_size = 256k). Later we delete exactly half of 
the
        500 blocks we wrote. The chance of deleting all 32 blocks in a container
        is very small, and even if it happens, it still has around 0.09 chance 
to
        become corrupted. It is a bit flaky, but it would fail less than once 
in a
        billion run.
        If you dramatically decrease the FLAGS_log_container_max_size flag, the 
test
        starts to occasionally fail on a traditional x86_64 machine too.
    
    Why is it unstable with 64k alignment:
        With 64k alignment (currently used on ARM RHEL 8.8 with 64k page size),
        there are 4 blocks in a full container file. We write 500 blocks, so we
        expect to have nearly 125 full files. If we delete exactly half of the
        blocks, we will make many (full) container file empty. Some of them 
might
        fail to be deleted properly leaving a lonely non-empty .data file 
without
        .metadata. On my RHEL machine the test fails 97-98% of the time for this
        exact reason.
    
    Solution:
        The test TestMetadataOkayDespiteFailure was supposed to test reloading 
the
        block manager with containers having deleted blocks, with some previous
        failed delete. It (probably) never tested the case when container 
deletion
        occurs.
        + Disabled container deletion, so the test scope remains the same as it 
was
          with smaller block alignments.
        + Add a new (currently disabled) test, to see how block manager handles 
the
        above described situation. Filed a JIRA issue to track the issue: 
KUDU-3528.
    
        The original issue is not ARM specific, and far from trivial to solve, 
and
        was always in the system.
    
    Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20
    Reviewed-on: http://gerrit.cloudera.org:8080/20725
    Reviewed-by: Ashwani Raina <ara...@cloudera.com>
    Reviewed-by: Alexey Serbin <ale...@apache.org>
    Tested-by: Alexey Serbin <ale...@apache.org>
---
 src/kudu/fs/block_manager-test.cc | 113 ++++++++++++++++++++++++++++++++------
 1 file changed, 97 insertions(+), 16 deletions(-)

diff --git a/src/kudu/fs/block_manager-test.cc 
b/src/kudu/fs/block_manager-test.cc
index efe3119a3..724d3d927 100644
--- a/src/kudu/fs/block_manager-test.cc
+++ b/src/kudu/fs/block_manager-test.cc
@@ -71,6 +71,7 @@ using std::unique_ptr;
 using std::vector;
 using strings::Substitute;
 
+DECLARE_bool(log_block_manager_delete_dead_container);
 DECLARE_double(log_container_live_metadata_before_compact_ratio);
 DECLARE_uint64(log_container_preallocate_bytes);
 DECLARE_uint64(log_container_max_size);
@@ -157,6 +158,18 @@ class BlockManagerTest : public KuduTest {
 
  protected:
 
+  Status CreateBlock(BlockId* out, const string& data, int repeat_data = 1) {
+    unique_ptr<WritableBlock> block;
+    RETURN_NOT_OK(bm_->CreateBlock(test_block_opts_, &block));
+    for (int i = 0; i < repeat_data; i++) {
+      RETURN_NOT_OK(block->Append(data));
+    }
+    RETURN_NOT_OK(block->Finalize());
+    RETURN_NOT_OK(block->Close());
+    *out = block->id();
+    return Status::OK();
+  }
+
   T* CreateBlockManager(const scoped_refptr<MetricEntity>& metric_entity,
                         const shared_ptr<MemTracker>& parent_mem_tracker) {
     if (!dd_manager_) {
@@ -923,6 +936,73 @@ TYPED_TEST(BlockManagerTest, TestDiskSpaceCheck) {
   }
 }
 
+// Regression test for KUDU-3528.
+// We should handle the following case:
+//   + A container becomes full.
+//   + All blocks in this container are deleted so we try to delete it
+//   + We delete the ".metadata" file, but fail to delete the ".data"
+//   + Block manager restarts (e.g  Kudu tserver restarts)
+// Current behaviour:
+//   + Block manager fails to start, because it can't handle a .data file whose
+//     corresponding .metadata file is not present.
+// Expected:
+//   + We should either do a more error resistant delete (see KUDU-3528) or
+//     make blockmanager handle the situation at start.
+// Keep test disabled until KUDU-3528 is fixed. Comments in the test might need
+// to be rephrased/revisited after KUDU-3528 is fixed.
+TYPED_TEST(BlockManagerTest, 
DISABLED_TestReloadBlockManagerDespiteFileDeleteFailure) {
+  // If you update the parameters, please also update the calculations in
+  // the comments.
+  constexpr int kBlocksWritten = 500;
+  constexpr int kMaxBlocksInContainer = 2;
+  FLAGS_log_block_manager_delete_dead_container = true;
+  const string kTestData = string(500, 'L');
+  uint64_t block_size = 0;
+  ASSERT_OK(this->env_->GetBlockSize(this->test_dir_, &block_size));
+  ASSERT_LE(kTestData.size(), block_size);
+
+  FLAGS_log_container_max_size = kMaxBlocksInContainer * block_size;
+  FLAGS_log_container_preallocate_bytes = FLAGS_log_container_max_size;
+
+  FLAGS_crash_on_eio = false;
+  // We don't want any error when writing, only when deleting.
+  FLAGS_env_inject_eio = 0.0;
+
+  vector<BlockId> ids(kBlocksWritten);
+  for (auto& id : ids) {
+    ASSERT_OK(this->CreateBlock(&id, kTestData));
+  }
+
+  {
+    google::FlagSaver saver;
+    // If we have a full container file, we want to be left with a .data 
without
+    // a .metadata file.
+    // let x = FLAGS_env_inject_eio
+    // The chance that we delete the
+    // .metadata but fail to delete the .data file is (1-x) * x
+    // best chance is 1/4 at x=0.5.
+    // We have approximately 250 containers, so the chance that we will be
+    // left with a .data file whose corresponding .metadata file is not present
+    // is
+    // [1 - (0.75)^250] ~ 0.99999999999999999999999999999994174.
+    FLAGS_env_inject_eio = 0.5;
+
+    vector<BlockId> deleted;
+    shared_ptr<BlockDeletionTransaction> deletion_transaction = 
this->bm_->NewDeletionTransaction();
+    for (const auto& id : ids) {
+      deletion_transaction->AddDeletedBlock(id);
+    }
+    ignore_result(deletion_transaction->CommitDeletedBlocks(&deleted));
+    LOG(INFO) << Substitute(
+        "Successfully deleted $0 blocks on $1 attempts", deleted.size(), 
ids.size());
+  }
+
+  ASSERT_OK(this->ReopenBlockManager(scoped_refptr<MetricEntity>(),
+                                     shared_ptr<MemTracker>(),
+                                     {GetTestDataDirectory()},
+                                     false /* create */));
+}
+
 // Regression test for KUDU-1793.
 TYPED_TEST(BlockManagerTest, TestMetadataOkayDespiteFailure) {
   const int kNumTries = 3;
@@ -936,6 +1016,22 @@ TYPED_TEST(BlockManagerTest, 
TestMetadataOkayDespiteFailure) {
   FLAGS_log_container_max_size = 256 * 1024;
   FLAGS_log_container_preallocate_bytes = 8 * 1024;
 
+  // Make sure no whole container file deletion is triggered!
+  // That case is tested by TestReloadBlockManagerDespiteFileDeleteFailure
+  // If we delete every block with 50% chance, there is a (1/2)^32 chance
+  // (we have 6080 byte blocks padded to 8k)
+  // that we clear a whole container with 4k alignment. It is even
+  // much lower per file than (1/2)^32 if you consider that we delete exactly
+  // half the blocks across all container files.
+  // If we use 64k block alignment (e.g possible on RHEL ARM 8.8 with 64K page
+  // size) we will have 4 blocks per container file, and we will have around
+  // ~125 files. It means we almost always delete some file
+  // (99.8% chance according to my test runs).
+  // So this flag only affects 64k block alignment in practice.
+  // (it would affect 32k alignment with a measurable chance too if we had such
+  // a system.)
+  FLAGS_log_block_manager_delete_dead_container = false;
+
   // Force some file operations to fail.
   FLAGS_crash_on_eio = false;
   FLAGS_env_inject_eio = 0.1;
@@ -944,21 +1040,6 @@ TYPED_TEST(BlockManagerTest, 
TestMetadataOkayDespiteFailure) {
   // errors may also crop up here.
   FLAGS_log_container_live_metadata_before_compact_ratio = 0.5;
 
-  // Creates a block with the given 'test_data', writing the result
-  // to 'out' on success.
-  auto create_a_block = [&](BlockId* out, const string& test_data) {
-    unique_ptr<WritableBlock> block;
-    RETURN_NOT_OK(this->bm_->CreateBlock(this->test_block_opts_, &block));
-    for (int i = 0; i < kNumAppends; i++) {
-      RETURN_NOT_OK(block->Append(test_data));
-    }
-
-    RETURN_NOT_OK(block->Finalize());
-    RETURN_NOT_OK(block->Close());
-    *out = block->id();
-    return Status::OK();
-  };
-
   // Reads a block given by 'id', comparing its contents. Note that
   // we need to compare with both kLongTestData and kShortTestData as we
   // do not know the blocks' content ahead.
@@ -993,7 +1074,7 @@ TYPED_TEST(BlockManagerTest, 
TestMetadataOkayDespiteFailure) {
       BlockId id;
       // Inject different size of data to better simulate
       // real world case.
-      Status s = create_a_block(&id, i % 2 ? kShortTestData : kLongTestData);
+      Status s = this->CreateBlock(&id, i % 2 ? kShortTestData : 
kLongTestData, kNumAppends);
 
       if (s.ok()) {
         InsertOrDie(&ids, id);

Reply via email to