[ https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963801#comment-16963801 ]
YangSong commented on KUDU-2975: -------------------------------- Thank you, let me summarize the implementation: # We need to add a new gflag, such as "–fs_wal_dirs", to support spreading WAL across multiple dirs. And we should keep around {{--fs_wal_dir}} for backwards compatibility. User can chose one of them. # The first time 'fs_manager' is initialized it needs to generate an instance file per wal directory. If the data directories (fs_data_dirs) not provided, we use write-ahead log directories(fs_wal_dirs) as data directories. If the metadata directory not provided, we use the first wal directories or the first data directories. If one of the WAL directories doesn't exist, report a fatal error. If some of WAL directories have 'instance' file, but some of them have not, report a fatal error. # Add a class WalDirManager, maybe like this:class WalDirManager { public: static Status Create(CanonicalizedRootsList wal_fs_roots, std::unique_ptr<WalDirManager>* wal_manager); static Status Open(CanonicalizedRootsList wal_fs_roots, std::unique_ptr<WalDirManager>* wal_manager); ~WalDirManager(); void Shutdown(); Status LoadWalDirFromPB(const std::string& tablet_id, const WalDirPB& pb); std::set<std::string> FindTabletsByWALDir(const std::string& wal_dir) const; Status FindWalDirByTabletId(const std::string& tablet_id, std::string* wal_dir) const; Status MarkWalDirsFailed(const std::string& error_message = ""); void MarkWalDirFailed(const std::string& dir); bool IsWalDirFailed(const std::string& dir) const; const std::set<string> GetFailedDataDirs() const; std::vector<std::string> GetWalDirs() const; string GetWalDirByUuid(string uuid) const; Status CreateWalDir(const std::string& tablet_id); private: WalDirManager(CanonicalizedRootsList canonicalized_wal_roots); const CanonicalizedRootsList canonicalized_wal_fs_roots_; typedef std::unordered_map<std::string, std::string> DirByUuidMap; DirByUuidMap dir_by_uuid_; typedef std::multimap<std::string, std::string> TabletsByDirMap; TabletsByDirMap tablets_by_dir_; typedef std::set<string> FailedWalDirSet; FailedWalDirSet failed_data_dirs_; }; We need to update the "instance" file under per WAL dir when creating a new WalDirManager class. Each wal directory generates its own uuid, and recorde it in the instance file.The directory structure may be like this: --wal ----instance # adf # asdfadf # dasf > Spread WAL across multiple data directories > ------------------------------------------- > > Key: KUDU-2975 > URL: https://issues.apache.org/jira/browse/KUDU-2975 > Project: Kudu > Issue Type: New Feature > Components: fs, tablet, tserver > Reporter: LiFu He > Priority: Major > Attachments: network.png, tserver-WARNING.png, util.png > > > Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we > created a big table and loaded data to it through flink. We noticed that the > util of one SSD which is used to store WAL is 100% but others are free. So, > we suggest to spread WAL across multiple data directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)