[ 
https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963856#comment-16963856
 ] 

YangSong commented on KUDU-2975:
--------------------------------

Thank you, let me summarize the implementation:

1. We need to add a new gflag, such as "–fs_wal_dirs", to support spreading WAL 
across multiple dirs. And we should keep around {{--fs_wal_dir}} for backwards 
compatibility. User can chose one of them.

2. The first time 'fs_manager' is initialized it needs to generate an instance 
file per wal directory. If the data directories (fs_data_dirs) not provided, we 
use write-ahead log directories(fs_wal_dirs) as data directories. If the 
metadata directory not provided, we use the first wal directories or the first 
data directories. If one of the WAL directories doesn't exist, report a fatal 
error. If some of WAL directories have 'instance' file, but some of them have 
not, report a fatal error. 

3. Add a class WalDirManager, maybe like this:
{quote}class WalDirManager {

public:  

  static Status Create(CanonicalizedRootsList wal_fs_roots,   
std::unique_ptr<WalDirManager>* wal_manager);      static Status 
Open(CanonicalizedRootsList wal_fs_roots,   std::unique_ptr<WalDirManager>* 
wal_manager);     ~WalDirManager();

  void Shutdown();

  Status LoadWalDirFromPB(const std::string& tablet_id, const WalDirPB& pb);

  std::set<std::string> FindTabletsByWALDir(const std::string& wal_dir) const;

  Status FindWalDirByTabletId(const std::string& tablet_id, std::string* 
wal_dir) const;

  Status MarkWalDirsFailed(const std::string& error_message = "");

  void MarkWalDirFailed(const std::string& dir);

  bool IsWalDirFailed(const std::string& dir) const;

  const std::set<string> GetFailedDataDirs() const;

  std::vector<std::string> GetWalDirs() const;

  string GetWalDirByUuid(string uuid) const;

  Status CreateWalDir(const std::string& tablet_id);

private:

  WalDirManager(CanonicalizedRootsList canonicalized_wal_roots);

  const CanonicalizedRootsList canonicalized_wal_fs_roots_;

  typedef std::unordered_map<std::string, std::string> DirByUuidMap;

  DirByUuidMap dir_by_uuid_;

  typedef std::multimap<std::string, std::string> TabletsByDirMap;

  TabletsByDirMap tablets_by_dir_;

  typedef std::set<string> FailedWalDirSet;

  FailedWalDirSet failed_data_dirs_;

}
{quote}
 
 * We need to update the "instance" file under per WAL dir when creating a new 
WalDirManager class. Each wal directory generates its own uuid, and recorde it 
in the instance file.
 * The directory structure may be like this: 

 
{panel:title=one of WAL directorys's structure}
 

  ----wal

  --------instance

  --------wals

  ------------tablet1_uuid

  ----------------index.0

  ----------------wal.0

  ------------tablet2_uuid

  ----------------index.0

  ----------------wal.0

 
{panel}
 
 * When creating metadata for tablet, you need to determine the wal directory 
for the tablet. Record the identified uuid of dir into the tablet's metadata, 
by WalDirPB.
 * The way to determine the WAL directory for the tablet is to call the 
function "WalDirManager::CreateWalDir()". A simple way to do this is to record 
how many tablets there are in each WAL directory, and select the directory with 
the lowest number of tablets each time.
 * When deleting tablet, we need to delete the relevant information in 
"TabletsByDirMap". For tombstoned tablet, we also need to clear the WAL dir 
from the metadata.

4. After we've passed the initial FsManager checks and start bootstrapping, if 
tablet's metadata is missing WAL directory information and the state of tablet 
is not tombstoned, we mark the tablet failed. If metadata is OK, but has rowset 
and miss WAL(such as "tablet1_uuid" missed, if "wal" missed, KUDU will crash 
while checking FsManager), we also mark the tablet failed. I did a test with 
the latest KUDU version, if I removed some tablets's WALs, then restarted the 
tserver, the tserver could start with error like "Tablet failed to bootstrap: 
Illegal state:Found rowsets but no log segments could be found.". If the 
tserver was restarted immediately, tablet would be recovered by raft. If we 
waited a few minutes, then restarted the tserver, the tablet has been recovered 
to other tserver, the tablet would be tombstoned. 

5. If a disk IO error is reported while reading or writing to WAL 
file/directory, this is similar to what we do for data directory failures. We 
may need to modify this function "FailTabletsInDataDir(string uudi)", change it 
as "FailTabletsInDir(DirType type, string uuid)" , the "DirType" identifies 
whether it belongs to the data directory or the WAL directory. 

6. We also need to modify the relevant code about "--fs_wal_dir" in the tool.

Is this an accurate summary? There may be omissions or errors. This approach 
seems relatively simpler and can solve the problem quickly.

> Spread WAL across multiple data directories
> -------------------------------------------
>
>                 Key: KUDU-2975
>                 URL: https://issues.apache.org/jira/browse/KUDU-2975
>             Project: Kudu
>          Issue Type: New Feature
>          Components: fs, tablet, tserver
>            Reporter: LiFu He
>            Priority: Major
>         Attachments: network.png, tserver-WARNING.png, util.png
>
>
> Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we 
> created a big table and loaded data to it through flink.  We noticed that the 
> util of one SSD which is used to store WAL is 100% but others are free. So, 
> we suggest to spread WAL across multiple data directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to