limowang opened a new pull request, #2216:
URL: https://github.com/apache/incubator-pegasus/pull/2216

   [#2215](https://github.com/apache/incubator-pegasus/issues/2215)
   
   During rolling upgrades and duplication data migration, there are cases 
where a replica first learns the checkpoint from the primary replica and then 
loads the partition. However, if the primary replica crashes during the 
execution of `::dsn::error_code 
pegasus_server_impl::copy_checkpoint_to_dir_unsafe(const char *checkpoint_dir, 
int64_t *checkpoint_decree, bool flush_memtable)` while generating the 
checkpoint, it may skip the final block of code:
   
   ```cpp
   std::vector<rocksdb::ColumnFamilyDescriptor> column_families(
       {{DATA_COLUMN_FAMILY_NAME, rocksdb::ColumnFamilyOptions()},
        {META_COLUMN_FAMILY_NAME, rocksdb::ColumnFamilyOptions()}});
   status = rocksdb::DB::OpenForReadOnly(
       rocksdb::DBOptions(), checkpoint_dir, column_families, &handles_opened, 
&snapshot_db);
   if (!status.ok()) {
       derror_replica(
           "OpenForReadOnly from {} failed, error = {}", checkpoint_dir, 
status.ToString());
       snapshot_db = nullptr;
       cleanup(true);
       return ::dsn::ERR_LOCAL_APP_FAILURE;
   }
   dcheck_eq_replica(handles_opened.size(), 2);
   dcheck_eq_replica(handles_opened[1]->GetName(), META_COLUMN_FAMILY_NAME);
   ```
   As a result, the META_COLUMN_FAMILY_NAME column family might be missing from 
the checkpoint. When other potential follower replicas request checkpoint data, 
the primary replica does not validate whether the checkpoint data is complete, 
which could lead to triggering the !missing_meta_cf assertion failure.
   
   Therefore, we can add checkpoint integrity validation in the primary replica 
when sending checkpoint data, specifically in the ::dsn::error_code 
pegasus_server_impl::get_checkpoint(int64_t learn_start, const dsn::blob 
&learn_request, dsn::replication::learn_state &state) function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to