Re: [PATCH 0/2] nilfs2: improve inode allocation algorithm
Hi, On Sun, 12 Oct 2014 12:38:21 +0200, Andreas Rohner wrote: Hi, The algorithm simply makes sure, that after a directory inode there are a certain number of free slots available and the search for file inodes is started at their parent directory. I havn't had the time yet to do a full scale performance test of it, but my simple preliminary tests have shown, that the allocation of inodes takes a little bit longer and the lookup is a little bit faster. My simple test just creates 1500 directories and after that creates 10 files in each directory. So more testing is definetly necessary, but I wanted to get some feedback about the design first. Is my code a step in the right direction? Best regards, Andreas Rohner Andreas Rohner (2): nilfs2: support the allocation of whole blocks of meta data entries nilfs2: improve inode allocation algorithm fs/nilfs2/alloc.c | 161 fs/nilfs2/alloc.h | 18 +- fs/nilfs2/ifile.c | 63 ++-- fs/nilfs2/ifile.h | 6 +- fs/nilfs2/inode.c | 6 +- fs/nilfs2/segment.c | 5 +- 6 files changed, 235 insertions(+), 24 deletions(-) I don't know whether this patchset is going in the right direction. .. we should first measure how the original naive allocator is bad in comparison with an elaborately designed allocator like this. But, I will add some comments anyway: 1) You must not use sizeof(struct nilfs_inode) to get inode size. The size of on-disk inodes is variable and you have to use NILFS_MDT(ifile)-mi_entry_size to ensure compatibility. To get ipb (= number of inodes per block), you should use NILFS_MDT(ifile)-mi_entries_per_block. Please remove nilfs_ifile_inodes_per_block(). It's redundant. 2) __nilfs_palloc_prepare_alloc_entry() The argument block_size is so confusing. Usually we use it for the size of disk block. Please use a proper word alignment_size or so. 3) nilfs_palloc_find_available_slot_align32() This function seems to be violating endian compatibility. The order of two 32-bit words in a 64-bit word in little endian architectures differs from that of big endian architectures. Having three different implementations looks too overkill to me at this time. It should be removed unless it will make a significant difference. 4) nilfs_cpu_to_leul() Adding this macro is not preferable. It depends on endian. Did you look for a generic macro which does the same thing ? Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] nilfs2: improve inode allocation algorithm
On Mon, 13 Oct 2014 23:52:59 +0900 (JST), Ryusuke Konishi wrote: Hi, On Sun, 12 Oct 2014 12:38:21 +0200, Andreas Rohner wrote: Hi, The algorithm simply makes sure, that after a directory inode there are a certain number of free slots available and the search for file inodes is started at their parent directory. I havn't had the time yet to do a full scale performance test of it, but my simple preliminary tests have shown, that the allocation of inodes takes a little bit longer and the lookup is a little bit faster. My simple test just creates 1500 directories and after that creates 10 files in each directory. So more testing is definetly necessary, but I wanted to get some feedback about the design first. Is my code a step in the right direction? Best regards, Andreas Rohner Andreas Rohner (2): nilfs2: support the allocation of whole blocks of meta data entries nilfs2: improve inode allocation algorithm fs/nilfs2/alloc.c | 161 fs/nilfs2/alloc.h | 18 +- fs/nilfs2/ifile.c | 63 ++-- fs/nilfs2/ifile.h | 6 +- fs/nilfs2/inode.c | 6 +- fs/nilfs2/segment.c | 5 +- 6 files changed, 235 insertions(+), 24 deletions(-) I don't know whether this patchset is going in the right direction. .. we should first measure how the original naive allocator is bad in comparison with an elaborately designed allocator like this. But, I will add some comments anyway: 1) You must not use sizeof(struct nilfs_inode) to get inode size. The size of on-disk inodes is variable and you have to use NILFS_MDT(ifile)-mi_entry_size to ensure compatibility. To get ipb (= number of inodes per block), you should use NILFS_MDT(ifile)-mi_entries_per_block. Please remove nilfs_ifile_inodes_per_block(). It's redundant. 2) __nilfs_palloc_prepare_alloc_entry() The argument block_size is so confusing. Usually we use it for the size of disk block. Please use a proper word alignment_size or so. 3) nilfs_palloc_find_available_slot_align32() This function seems to be violating endian compatibility. The order of two 32-bit words in a 64-bit word in little endian architectures differs from that of big endian architectures. Sorry, the implementation looks correct (I misread earlier). Ignore this one. Regards, Ryusuke Konishi Having three different implementations looks too overkill to me at this time. It should be removed unless it will make a significant difference. 4) nilfs_cpu_to_leul() Adding this macro is not preferable. It depends on endian. Did you look for a generic macro which does the same thing ? Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] nilfs2: improve inode allocation algorithm
On 2014-10-13 16:52, Ryusuke Konishi wrote: Hi, On Sun, 12 Oct 2014 12:38:21 +0200, Andreas Rohner wrote: Hi, The algorithm simply makes sure, that after a directory inode there are a certain number of free slots available and the search for file inodes is started at their parent directory. I havn't had the time yet to do a full scale performance test of it, but my simple preliminary tests have shown, that the allocation of inodes takes a little bit longer and the lookup is a little bit faster. My simple test just creates 1500 directories and after that creates 10 files in each directory. So more testing is definetly necessary, but I wanted to get some feedback about the design first. Is my code a step in the right direction? Best regards, Andreas Rohner Andreas Rohner (2): nilfs2: support the allocation of whole blocks of meta data entries nilfs2: improve inode allocation algorithm fs/nilfs2/alloc.c | 161 fs/nilfs2/alloc.h | 18 +- fs/nilfs2/ifile.c | 63 ++-- fs/nilfs2/ifile.h | 6 +- fs/nilfs2/inode.c | 6 +- fs/nilfs2/segment.c | 5 +- 6 files changed, 235 insertions(+), 24 deletions(-) I don't know whether this patchset is going in the right direction. .. we should first measure how the original naive allocator is bad in comparison with an elaborately designed allocator like this. But, I will add some comments anyway: I think the alignment creates a lot of overhead, because every directory uses up a whole block in the ifile. I could also create a simpler patch that only stores the last allocated inode number in struct nilfs_ifile_info and starts the search from there for the next allocation. Then I can test the three versions against each other in a large scale test. 1) You must not use sizeof(struct nilfs_inode) to get inode size. The size of on-disk inodes is variable and you have to use NILFS_MDT(ifile)-mi_entry_size to ensure compatibility. To get ipb (= number of inodes per block), you should use NILFS_MDT(ifile)-mi_entries_per_block. Please remove nilfs_ifile_inodes_per_block(). It's redundant. Agreed. 2) __nilfs_palloc_prepare_alloc_entry() The argument block_size is so confusing. Usually we use it for the size of disk block. Please use a proper word alignment_size or so. Yes that's true alignment_size sounds better. 3) nilfs_palloc_find_available_slot_align32() This function seems to be violating endian compatibility. The order of two 32-bit words in a 64-bit word in little endian architectures differs from that of big endian architectures. Having three different implementations looks too overkill to me at this time. It should be removed unless it will make a significant difference. 32 is the most common case (4096 block size and 128 inode size), so I thought it makes sense to optimize for it. But it is not necessary and it shouldn't make a big difference. 4) nilfs_cpu_to_leul() Adding this macro is not preferable. It depends on endian. Did you look for a generic macro which does the same thing ? There are only macros for specific bit lengths, as far as I know. But unsigned long varies for 32bit and 64bit systems. You could also implement it like this: #if BITS_PER_LONG == 64 #define nilfs_cpu_to_leul cpu_to_le64 #elif BITS_PER_LONG == 32 #define nilfs_cpu_to_leul cpu_to_le32 #else #error BITS_PER_LONG not defined #endif Best regards, Andreas Rohner -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] nilfs2: improve inode allocation algorithm
Hi, The algorithm simply makes sure, that after a directory inode there are a certain number of free slots available and the search for file inodes is started at their parent directory. I havn't had the time yet to do a full scale performance test of it, but my simple preliminary tests have shown, that the allocation of inodes takes a little bit longer and the lookup is a little bit faster. My simple test just creates 1500 directories and after that creates 10 files in each directory. So more testing is definetly necessary, but I wanted to get some feedback about the design first. Is my code a step in the right direction? Best regards, Andreas Rohner Andreas Rohner (2): nilfs2: support the allocation of whole blocks of meta data entries nilfs2: improve inode allocation algorithm fs/nilfs2/alloc.c | 161 fs/nilfs2/alloc.h | 18 +- fs/nilfs2/ifile.c | 63 ++-- fs/nilfs2/ifile.h | 6 +- fs/nilfs2/inode.c | 6 +- fs/nilfs2/segment.c | 5 +- 6 files changed, 235 insertions(+), 24 deletions(-) -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] nilfs2: improve inode allocation algorithm
The current inode allocation algorithm of NILFS2 does not use any information about the previous allocation or the parent directory of the inode. It simply searches for a free entry in the ifile, always starting from position 0. This patch introduces an improved allocation scheme. There are no changes to the on-disk-format necessary. First of all the algorithm distinguishes between files and directories. File inodes start their search at the location of their parent directory, so that they will be allocated near their parent. If the file inode ends up on the same block as its parent, the common task of listing the directory contents (e.g.: ls) will be faster. Although there is no guarantee, that any subsequent blocks after that will end up near the block with the directory on disk, there is still a possibility for performance improvement, because it can be expected, that subsequent blocks are read ahead. Also the cleaner will write out blocks in the order of their file offset, so there is an increased likelihood that those blocks can be read in faster. Directory inodes are allocated aligned to block boundaries and with a certain number of empty slots following them. The empty slots increase the likelihood, that files can be allocated in the same block as their parent. Furthermore the alignment improves performance, because unaligned locations do not have to be searched. The location of the last successful allocation of a directory is stored in memory and used as a starting point for the next allocation. This value is periodically reset to 0 to allow the algorithm to find previously deleted slots. Signed-off-by: Andreas Rohner andreas.roh...@gmx.net --- fs/nilfs2/ifile.c | 63 - fs/nilfs2/ifile.h | 6 +++-- fs/nilfs2/inode.c | 6 +++-- fs/nilfs2/segment.c | 5 - 4 files changed, 69 insertions(+), 11 deletions(-) diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c index 6548c78..9fc83ce 100644 --- a/fs/nilfs2/ifile.c +++ b/fs/nilfs2/ifile.c @@ -33,20 +33,49 @@ * struct nilfs_ifile_info - on-memory private data of ifile * @mi: on-memory private data of metadata file * @palloc_cache: persistent object allocator cache of ifile + * @last_dir: ino of the last directory allocated */ struct nilfs_ifile_info { struct nilfs_mdt_info mi; struct nilfs_palloc_cache palloc_cache; + __u64 last_dir; }; static inline struct nilfs_ifile_info *NILFS_IFILE_I(struct inode *ifile) { return (struct nilfs_ifile_info *)NILFS_MDT(ifile); } +/** + * nilfs_ifile_last_dir_reset - set last_dir to 0 + * @ifile: ifile inode + * + * Description: The value of last_dir will increase with every new + * allocation of a directory, because it is used as the starting point of the + * search for a free entry in the ifile. It should be reset periodically to 0 + * (e.g.: every segctor timeout), so that previously deleted entries can be + * found. + */ +void nilfs_ifile_last_dir_reset(struct inode *ifile) +{ + NILFS_IFILE_I(ifile)-last_dir = 0; +} + +/** + * nilfs_ifile_inodes_per_block - get number of inodes in a block + * @ifile: ifile inode + * + * Return Value: The number of inodes that fit into a file system block. + */ +static inline int nilfs_ifile_inodes_per_block(struct inode *ifile) +{ + return (1 ifile-i_blkbits) / sizeof(struct nilfs_inode); +} /** * nilfs_ifile_create_inode - create a new disk inode * @ifile: ifile inode + * @parent: inode number of the parent directory + * @mode: inode type of the newly created inode * @out_ino: pointer to a variable to store inode number * @out_bh: buffer_head contains newly allocated disk inode * @@ -62,17 +91,29 @@ static inline struct nilfs_ifile_info *NILFS_IFILE_I(struct inode *ifile) * * %-ENOSPC - No inode left. */ -int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino, +int nilfs_ifile_create_inode(struct inode *ifile, ino_t parent, +umode_t mode, ino_t *out_ino, struct buffer_head **out_bh) { struct nilfs_palloc_req req; - int ret; + int ret, ipb = nilfs_ifile_inodes_per_block(ifile); - req.pr_entry_nr = 0; /* 0 says find free inode from beginning of -a group. dull code!! */ req.pr_entry_bh = NULL; - ret = nilfs_palloc_prepare_alloc_entry(ifile, req); + if (S_ISDIR(mode)) { + req.pr_entry_nr = NILFS_IFILE_I(ifile)-last_dir; + ret = __nilfs_palloc_prepare_alloc_entry(ifile, req, ipb, + nilfs_palloc_entries_per_group(ifile) 2); + if (unlikely(ret == -ENOSPC)) { + /* fallback to normal allocation */ + req.pr_entry_nr = 0; + ret = nilfs_palloc_prepare_alloc_entry(ifile, req); + } + } else { + req.pr_entry_nr = parent + 1; +
Re: improve inode allocation (was Re: [PATCH v2] nilfs2: improve the performance of fdatasync())
On 2014-09-23 18:35, Ryusuke Konishi wrote: On Tue, 23 Sep 2014 16:21:33 +0200, Andreas Rohner wrote: On 2014-09-23 14:47, Ryusuke Konishi wrote: By the way, if you are interested in improving this sort of bad implemetation, please consider improving inode allocator that we can see at nilfs_ifile_create_inode(). It always searches free inode from ino=0. It doesn't use the knowledge of the last allocated inode number (inumber) nor any locality of close-knit inodes such as a file and the directory that contains it. A simple strategy is to start finding a free inode from (inumber of the parent directory) + 1, but this may not work efficiently if the namespace has multiple active directories, and requires that inumbers of directories are suitably dispersed. On the other hands, it increases the number of disk read and also increases the number of inode blocks to be written out if inodes are allocated too discretely. The optimal strategy may differ from that of other file systems because inode blocks are not allocated to static places in nilfs. For example, it may be better if we gather inodes of frequently accessed directories into the first valid inode block (on ifile) for nilfs. Sure I'll have a look at it, but this seems to be a hard problem. Since one inode has 128 bytes a typical block of 4096 contains 32 inodes. We could just allocate every directory inode into an empty block with 31 free slots. Then any subsequent file inode allocation would first search the 31 slots of the parent directory and if they are full, fallback to a search starting with ino 0. We can utilize several characteristics of metadata files for this problem: - It supports read ahead feature. when ifile reads an inode block, we can expect that several subsequent blocks will be loaded to page cache in the background. - B-tree of NILFS is efficient to hold sparse blocks. This means that putting close-knit 32 * n inodes far from offset=0 is not so bad. - ifile now can have private variables in nilfs_ifile_info (on-memory) struct. They are available to store context information of allocator without compatibility issue. - We can also use nilfs_inode_info struct of directories to store directory-based context of allocator without losing compatibility. - Only caller of nilfs_ifile_create_inode() is nilfs_new_inode(), and this function knows the inode of the parent directory. Then the only problem is how to efficiently allocate the directories. We could do something similar to the Orlov allocator used by the ext2/3/4 file systems: 1. We spread first level directories. Every one gets a full bitmap block (or half a bitmap block) 2. For the other directories we will try to choose the bitmap block of the parent unless the number of free inodes is below a certain threshold. Within this bitmap block the directories should also spread out. File inodes will just start a linear search at the parents inode if there is enough space left in the bitmap. This way if a directory has less than 32 files, all its inodes can be read in with one single block. If a directory has more than 32 files its inodes will spill over into the slots of other directories. But I am not sure if this strategy would pay off. Yes, for small namespaces, the current implementation may be enough. We should first decide how we evaluate the effect of the algorithm. It may be the scalability of namespace. It will be very difficult to measure the time accurately. I would suggest to simply count the number of reads and writes on the device. This can be easily done: mkfs.nilfs2 /dev/sdb cat /proc/diskstats rw_before.txt do_tests extract_kernel_sources ... find /mnt cat /proc/diskstats rw_after.txt The algorithm with fewer writes and reads wins. I am still not convinced that all of this will pay off, but I will try a few things and see if it works. br, Andreas Rohner -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: improve inode allocation
On Wed, 24 Sep 2014 10:01:05 +0200, Andreas Rohner wrote: On 2014-09-23 18:35, Ryusuke Konishi wrote: On Tue, 23 Sep 2014 16:21:33 +0200, Andreas Rohner wrote: On 2014-09-23 14:47, Ryusuke Konishi wrote: By the way, if you are interested in improving this sort of bad implemetation, please consider improving inode allocator that we can see at nilfs_ifile_create_inode(). It always searches free inode from ino=0. It doesn't use the knowledge of the last allocated inode number (inumber) nor any locality of close-knit inodes such as a file and the directory that contains it. A simple strategy is to start finding a free inode from (inumber of the parent directory) + 1, but this may not work efficiently if the namespace has multiple active directories, and requires that inumbers of directories are suitably dispersed. On the other hands, it increases the number of disk read and also increases the number of inode blocks to be written out if inodes are allocated too discretely. The optimal strategy may differ from that of other file systems because inode blocks are not allocated to static places in nilfs. For example, it may be better if we gather inodes of frequently accessed directories into the first valid inode block (on ifile) for nilfs. Sure I'll have a look at it, but this seems to be a hard problem. Since one inode has 128 bytes a typical block of 4096 contains 32 inodes. We could just allocate every directory inode into an empty block with 31 free slots. Then any subsequent file inode allocation would first search the 31 slots of the parent directory and if they are full, fallback to a search starting with ino 0. We can utilize several characteristics of metadata files for this problem: - It supports read ahead feature. when ifile reads an inode block, we can expect that several subsequent blocks will be loaded to page cache in the background. - B-tree of NILFS is efficient to hold sparse blocks. This means that putting close-knit 32 * n inodes far from offset=0 is not so bad. - ifile now can have private variables in nilfs_ifile_info (on-memory) struct. They are available to store context information of allocator without compatibility issue. - We can also use nilfs_inode_info struct of directories to store directory-based context of allocator without losing compatibility. - Only caller of nilfs_ifile_create_inode() is nilfs_new_inode(), and this function knows the inode of the parent directory. Then the only problem is how to efficiently allocate the directories. We could do something similar to the Orlov allocator used by the ext2/3/4 file systems: 1. We spread first level directories. Every one gets a full bitmap block (or half a bitmap block) 2. For the other directories we will try to choose the bitmap block of the parent unless the number of free inodes is below a certain threshold. Within this bitmap block the directories should also spread out. In my understanding, the basic strategy of the Orlov allocator is to physically spead out subtrees over cylinder groups. This strategy is effective for ext2/ext3/ext4 to mitigate overheads which come from disk seeks. The strategy increases the locality of data and metadata and that of a parent directory and its childs nodes, but the same thing isn't always true for nilfs because real block allocation of ifile and other files including directories is virtualized and doesn't reflect underlying phyiscs (e.g. relation between LBA and seek time) as is. I think the strategy 1 above doesn't make sense unlike ext2/3/4. File inodes will just start a linear search at the parents inode if there is enough space left in the bitmap. This way if a directory has less than 32 files, all its inodes can be read in with one single block. If a directory has more than 32 files its inodes will spill over into the slots of other directories. But I am not sure if this strategy would pay off. Yes, for small namespaces, the current implementation may be enough. We should first decide how we evaluate the effect of the algorithm. It may be the scalability of namespace. It will be very difficult to measure the time accurately. I would suggest to simply count the number of reads and writes on the device. This can be easily done: mkfs.nilfs2 /dev/sdb cat /proc/diskstats rw_before.txt do_tests extract_kernel_sources ... find /mnt cat /proc/diskstats rw_after.txt The algorithm with fewer writes and reads wins. I am still not convinced that all of this will pay off, but I will try a few things and see if it works. How about measuring the following performance? (1) Latency of inode allocation and deletion in a file system which holds millions of inodes. (Can you estimate the cost of bitmap scan per block and its relation with the number of inodes ?) (2) Time to traverse a real model directory tree such as a full UNIX
Re: improve inode allocation
On 2014-09-24 17:01, Ryusuke Konishi wrote: On Wed, 24 Sep 2014 10:01:05 +0200, Andreas Rohner wrote: On 2014-09-23 18:35, Ryusuke Konishi wrote: On Tue, 23 Sep 2014 16:21:33 +0200, Andreas Rohner wrote: On 2014-09-23 14:47, Ryusuke Konishi wrote: By the way, if you are interested in improving this sort of bad implemetation, please consider improving inode allocator that we can see at nilfs_ifile_create_inode(). It always searches free inode from ino=0. It doesn't use the knowledge of the last allocated inode number (inumber) nor any locality of close-knit inodes such as a file and the directory that contains it. A simple strategy is to start finding a free inode from (inumber of the parent directory) + 1, but this may not work efficiently if the namespace has multiple active directories, and requires that inumbers of directories are suitably dispersed. On the other hands, it increases the number of disk read and also increases the number of inode blocks to be written out if inodes are allocated too discretely. The optimal strategy may differ from that of other file systems because inode blocks are not allocated to static places in nilfs. For example, it may be better if we gather inodes of frequently accessed directories into the first valid inode block (on ifile) for nilfs. Sure I'll have a look at it, but this seems to be a hard problem. Since one inode has 128 bytes a typical block of 4096 contains 32 inodes. We could just allocate every directory inode into an empty block with 31 free slots. Then any subsequent file inode allocation would first search the 31 slots of the parent directory and if they are full, fallback to a search starting with ino 0. We can utilize several characteristics of metadata files for this problem: - It supports read ahead feature. when ifile reads an inode block, we can expect that several subsequent blocks will be loaded to page cache in the background. - B-tree of NILFS is efficient to hold sparse blocks. This means that putting close-knit 32 * n inodes far from offset=0 is not so bad. - ifile now can have private variables in nilfs_ifile_info (on-memory) struct. They are available to store context information of allocator without compatibility issue. - We can also use nilfs_inode_info struct of directories to store directory-based context of allocator without losing compatibility. - Only caller of nilfs_ifile_create_inode() is nilfs_new_inode(), and this function knows the inode of the parent directory. Then the only problem is how to efficiently allocate the directories. We could do something similar to the Orlov allocator used by the ext2/3/4 file systems: 1. We spread first level directories. Every one gets a full bitmap block (or half a bitmap block) 2. For the other directories we will try to choose the bitmap block of the parent unless the number of free inodes is below a certain threshold. Within this bitmap block the directories should also spread out. In my understanding, the basic strategy of the Orlov allocator is to physically spead out subtrees over cylinder groups. This strategy is effective for ext2/ext3/ext4 to mitigate overheads which come from disk seeks. The strategy increases the locality of data and metadata and that of a parent directory and its childs nodes, but the same thing isn't always true for nilfs because real block allocation of ifile and other files including directories is virtualized and doesn't reflect underlying phyiscs (e.g. relation between LBA and seek time) as is. I think the strategy 1 above doesn't make sense unlike ext2/3/4. I know that it is a sparse file and the blocks can end up anywhere on disk, independent of the offset in the ifile. I just thought it may be a good idea to give top level directories more room to grow. But you are probably right and it makes no sense for nilfs... File inodes will just start a linear search at the parents inode if there is enough space left in the bitmap. This way if a directory has less than 32 files, all its inodes can be read in with one single block. If a directory has more than 32 files its inodes will spill over into the slots of other directories. But I am not sure if this strategy would pay off. Yes, for small namespaces, the current implementation may be enough. We should first decide how we evaluate the effect of the algorithm. It may be the scalability of namespace. It will be very difficult to measure the time accurately. I would suggest to simply count the number of reads and writes on the device. This can be easily done: mkfs.nilfs2 /dev/sdb cat /proc/diskstats rw_before.txt do_tests extract_kernel_sources ... find /mnt cat /proc/diskstats rw_after.txt The algorithm with fewer writes and reads wins. I am still not convinced that all of this will pay off, but I will try a few things and see if it works. How about measuring the
improve inode allocation (was Re: [PATCH v2] nilfs2: improve the performance of fdatasync())
On Tue, 23 Sep 2014 16:21:33 +0200, Andreas Rohner wrote: On 2014-09-23 14:47, Ryusuke Konishi wrote: By the way, if you are interested in improving this sort of bad implemetation, please consider improving inode allocator that we can see at nilfs_ifile_create_inode(). It always searches free inode from ino=0. It doesn't use the knowledge of the last allocated inode number (inumber) nor any locality of close-knit inodes such as a file and the directory that contains it. A simple strategy is to start finding a free inode from (inumber of the parent directory) + 1, but this may not work efficiently if the namespace has multiple active directories, and requires that inumbers of directories are suitably dispersed. On the other hands, it increases the number of disk read and also increases the number of inode blocks to be written out if inodes are allocated too discretely. The optimal strategy may differ from that of other file systems because inode blocks are not allocated to static places in nilfs. For example, it may be better if we gather inodes of frequently accessed directories into the first valid inode block (on ifile) for nilfs. Sure I'll have a look at it, but this seems to be a hard problem. Since one inode has 128 bytes a typical block of 4096 contains 32 inodes. We could just allocate every directory inode into an empty block with 31 free slots. Then any subsequent file inode allocation would first search the 31 slots of the parent directory and if they are full, fallback to a search starting with ino 0. We can utilize several characteristics of metadata files for this problem: - It supports read ahead feature. when ifile reads an inode block, we can expect that several subsequent blocks will be loaded to page cache in the background. - B-tree of NILFS is efficient to hold sparse blocks. This means that putting close-knit 32 * n inodes far from offset=0 is not so bad. - ifile now can have private variables in nilfs_ifile_info (on-memory) struct. They are available to store context information of allocator without compatibility issue. - We can also use nilfs_inode_info struct of directories to store directory-based context of allocator without losing compatibility. - Only caller of nilfs_ifile_create_inode() is nilfs_new_inode(), and this function knows the inode of the parent directory. This way if a directory has less than 32 files, all its inodes can be read in with one single block. If a directory has more than 32 files its inodes will spill over into the slots of other directories. But I am not sure if this strategy would pay off. Yes, for small namespaces, the current implementation may be enough. We should first decide how we evaluate the effect of the algorithm. It may be the scalability of namespace. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html