limowang opened a new pull request, #2216: URL: https://github.com/apache/incubator-pegasus/pull/2216
[#2215](https://github.com/apache/incubator-pegasus/issues/2215) During rolling upgrades and duplication data migration, there are cases where a replica first learns the checkpoint from the primary replica and then loads the partition. However, if the primary replica crashes during the execution of `::dsn::error_code pegasus_server_impl::copy_checkpoint_to_dir_unsafe(const char *checkpoint_dir, int64_t *checkpoint_decree, bool flush_memtable)` while generating the checkpoint, it may skip the final block of code: ```cpp std::vector<rocksdb::ColumnFamilyDescriptor> column_families( {{DATA_COLUMN_FAMILY_NAME, rocksdb::ColumnFamilyOptions()}, {META_COLUMN_FAMILY_NAME, rocksdb::ColumnFamilyOptions()}}); status = rocksdb::DB::OpenForReadOnly( rocksdb::DBOptions(), checkpoint_dir, column_families, &handles_opened, &snapshot_db); if (!status.ok()) { derror_replica( "OpenForReadOnly from {} failed, error = {}", checkpoint_dir, status.ToString()); snapshot_db = nullptr; cleanup(true); return ::dsn::ERR_LOCAL_APP_FAILURE; } dcheck_eq_replica(handles_opened.size(), 2); dcheck_eq_replica(handles_opened[1]->GetName(), META_COLUMN_FAMILY_NAME); ``` As a result, the META_COLUMN_FAMILY_NAME column family might be missing from the checkpoint. When other potential follower replicas request checkpoint data, the primary replica does not validate whether the checkpoint data is complete, which could lead to triggering the !missing_meta_cf assertion failure. Therefore, we can add checkpoint integrity validation in the primary replica when sending checkpoint data, specifically in the ::dsn::error_code pegasus_server_impl::get_checkpoint(int64_t learn_start, const dsn::blob &learn_request, dsn::replication::learn_state &state) function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
