Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Gang, On 2017/11/28 16:32, Gang He wrote: > Hi Alex, > > >> Hi Gang, >> >> On 2017/11/28 15:38, Gang He wrote: >>> Hi Alex, >>> >>> >> Hi Gang, On 2017/11/28 13:33, Gang He wrote: > Hello Alex, > > >> Hi Gang, >> >> On 2017/11/27 17:46, Gang He wrote: >>> Add ocfs2_overwrite_io function, which is used to judge if >>> overwrite allocated blocks, otherwise, the write will bring extra >>> block allocation overhead. >>> >>> Signed-off-by: Gang He >>> --- >>> fs/ocfs2/extent_map.c | 67 >> +++ >>> fs/ocfs2/extent_map.h | 3 +++ >>> 2 files changed, 70 insertions(+) >>> >>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >>> index e4719e0..98bf325 100644 >>> --- a/fs/ocfs2/extent_map.c >>> +++ b/fs/ocfs2/extent_map.c >>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >> fiemap_extent_info *fieinfo, >>> return ret; >>> } >>> >>> +/* Is IO overwriting allocated blocks? */ >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait) >>> +{ >>> + int ret = 0, is_last; >>> + u32 mapping_end, cpos; >>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> + struct buffer_head *di_bh = NULL; >>> + struct ocfs2_extent_rec rec; >>> + >>> + if (wait) >>> + ret = ocfs2_inode_lock(inode, &di_bh, 0); >>> + else >>> + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >>> + if (ret) >>> + goto out; >>> + >>> + if (wait) >>> + down_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + else { >>> + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >>> + ret = -EAGAIN; >>> + goto out_unlock1; >>> + } >>> + } >>> + >>> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >>> + ((map_start + map_len) <= i_size_read(inode))) >>> + goto out_unlock2; >>> + >>> + cpos = map_start >> osb->s_clustersize_bits; >>> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >>> + map_start + map_len); >>> + is_last = 0; >>> + while (cpos < mapping_end && !is_last) { >>> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >>> +NULL, &rec, &is_last); >>> + if (ret) { >>> + mlog_errno(ret); >>> + goto out_unlock2; >>> + } >>> + >>> + if (rec.e_blkno == 0ULL) >>> + break; >> I think here the blocks is not overwrite, because the hold is found and >> the >> blocks >> should be allocated. > If the rec.e_blkno == NULL, this means there is a hole. > The file hole means that these blocks are not allocated, it does not like unwritten block. > The unwritten blocks means that these blocks are allocated, but still > have not been unwritten. > If we break the loop when we find the hold, out of this function we will allocate the blocks in ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g et_block() ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT'; >>> Yes, then we need to check if this is a overwrite before doing direct-io. >>> >> >> I mean here we should return 0 instead of break and we should immediately >> return -EAGAIN >> to upper apps, otherwise, some block allocation will be happen, which >> violates the >> semantics of 'IOCB_NOWAIT'. > Before we do a direct-io, I need to check if this is a overwrite allocated > blocks IO. > If not, we will return -EAGAIN in 'IOCB_NOWAIT' mode. this should not > trigger any block allocation. > I am not sure if we understand your concern totally. > Yes, your description is correct. So we should return 0 instead of break when we find the hold in ocfs2_overwrite_io(); > Thanks > Gang > >> >> Thanks, >> Alex >> BTW, should we consider the down_write() and ocfs2_inode_lock() in ocfs2_dio_wr_get_block() when the flag 'IOCB_NOWAIT' is set; >>> I think that we should not consider that layer lock, otherwise, the code >> change will become more and more complex and big. >>> I also refer to ext4 file system code change for this >> feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any >> change >> in that layer. >>> >> >> OK. >> >>> Thanks >>> Gang >>> >>> + >>> + if (rec.e_flags & OCFS2_E
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
On 17/11/28 16:54, Gang He wrote: > Hi Joseph, > > > >> >> On 17/11/28 15:24, Gang He wrote: >>> Hello Joseph, >>> >>> >> >>> On 17/11/28 11:35, Gang He wrote: > Hello Joseph, > > >> Hi Gang, >> >> On 17/11/27 17:46, Gang He wrote: >>> Add ocfs2_overwrite_io function, which is used to judge if >>> overwrite allocated blocks, otherwise, the write will bring extra >>> block allocation overhead. >>> >>> Signed-off-by: Gang He >>> --- >>> fs/ocfs2/extent_map.c | 67 >> +++ >>> fs/ocfs2/extent_map.h | 3 +++ >>> 2 files changed, 70 insertions(+) >>> >>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >>> index e4719e0..98bf325 100644 >>> --- a/fs/ocfs2/extent_map.c >>> +++ b/fs/ocfs2/extent_map.c >>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >> fiemap_extent_info *fieinfo, >>> return ret; >>> } >>> >>> +/* Is IO overwriting allocated blocks? */ >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait) >>> +{ >>> + int ret = 0, is_last; >>> + u32 mapping_end, cpos; >>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> + struct buffer_head *di_bh = NULL; >>> + struct ocfs2_extent_rec rec; >>> + >>> + if (wait) >>> + ret = ocfs2_inode_lock(inode, &di_bh, 0); >>> + else >>> + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >>> + if (ret) >>> + goto out; >>> + >>> + if (wait) >>> + down_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + else { >>> + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >>> + ret = -EAGAIN; >>> + goto out_unlock1; >>> + } >>> + } >>> + >>> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >>> + ((map_start + map_len) <= i_size_read(inode))) >>> + goto out_unlock2; >>> + >>> + cpos = map_start >> osb->s_clustersize_bits; >>> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >>> + map_start + map_len); >>> + is_last = 0; >>> + while (cpos < mapping_end && !is_last) { >>> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >>> +NULL, &rec, &is_last); >>> + if (ret) { >>> + mlog_errno(ret); >>> + goto out_unlock2; >>> + } >>> + >>> + if (rec.e_blkno == 0ULL) >>> + break; >>> + >>> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >>> + break; >>> + >>> + cpos = le32_to_cpu(rec.e_cpos) + >>> + le16_to_cpu(rec.e_leaf_clusters); >>> + } >>> + >>> + if (cpos < mapping_end) >>> + ret = 1; >>> + >>> +out_unlock2: >>> + brelse(di_bh); >>> + >>> + up_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + >>> +out_unlock1: >> Should brelse(di_bh) be here? > If the code jumps to out_unlock1 directly, the di_bh pointer should be > NULL, >> it is not necessary to release. > Umm... No, once going out here, we have already taken inode lock. So di_bh should be released. >>> Sorry, you are right. >>> >> >>> + ocfs2_inode_unlock(inode, 0); >>> + >>> +out: >>> + return (ret ? 0 : 1); >> I don't think EAGAIN and other error code can be handled the same. We >> have to distinguish them. > Ok, I think we can add one line log to report the error in case the error > is >> not EAGAIN. > My point is, there is no need to try again in several cases, e.g. EROFS returned by ocfs2_get_clusters_nocache. >>> In this function ocfs2_overwrite_io() only can return True(1) or False(0), >> then I think we can only give a error print before return true/false. >>> It is not necessary to return another value, but should let the user know >> any possible error message. >>> This is because you just ignore the error and convert it to 0 or 1. >> But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN >> to upper layer and let it try again. >> But in some cases, e.g. EROFS, trying again is meaningless. That's why >> we can't simply return 0 or 1 here. Also we have to distinguish the >> error code in the next patch. > I think that we have to use the return value if we want to propagate the > errorno to the above. > I will
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Joseph, >>> > > On 17/11/28 15:24, Gang He wrote: >> Hello Joseph, >> >> > >> >>> >>> On 17/11/28 11:35, Gang He wrote: Hello Joseph, >>> > Hi Gang, > > On 17/11/27 17:46, Gang He wrote: >> Add ocfs2_overwrite_io function, which is used to judge if >> overwrite allocated blocks, otherwise, the write will bring extra >> block allocation overhead. >> >> Signed-off-by: Gang He >> --- >> fs/ocfs2/extent_map.c | 67 > +++ >> fs/ocfs2/extent_map.h | 3 +++ >> 2 files changed, 70 insertions(+) >> >> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >> index e4719e0..98bf325 100644 >> --- a/fs/ocfs2/extent_map.c >> +++ b/fs/ocfs2/extent_map.c >> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct buffer_head *di_bh = NULL; >> +struct ocfs2_extent_rec rec; >> + >> +if (wait) >> +ret = ocfs2_inode_lock(inode, &di_bh, 0); >> +else >> +ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >> +if (ret) >> +goto out; >> + >> +if (wait) >> +down_read(&OCFS2_I(inode)->ip_alloc_sem); >> +else { >> +if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >> +ret = -EAGAIN; >> +goto out_unlock1; >> +} >> +} >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out_unlock2; >> + >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, &rec, &is_last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out_unlock2; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = 1; >> + >> +out_unlock2: >> +brelse(di_bh); >> + >> +up_read(&OCFS2_I(inode)->ip_alloc_sem); >> + >> +out_unlock1: > Should brelse(di_bh) be here? If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, > >>> it is not necessary to release. >>> Umm... No, once going out here, we have already taken inode lock. So >>> di_bh should be released. >> Sorry, you are right. >> >>> > >> +ocfs2_inode_unlock(inode, 0); >> + >> +out: >> +return (ret ? 0 : 1); > I don't think EAGAIN and other error code can be handled the same. We > have to distinguish them. Ok, I think we can add one line log to report the error in case the error is > >>> not EAGAIN. >>> My point is, there is no need to try again in several cases, e.g. EROFS >>> returned by ocfs2_get_clusters_nocache. >> In this function ocfs2_overwrite_io() only can return True(1) or False(0), > then I think we can only give a error print before return true/false. >> It is not necessary to return another value, but should let the user know > any possible error message. >>This is because you just ignore the error and convert it to 0 or 1. > But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN > to upper layer and let it try again. > But in some cases, e.g. EROFS, trying again is meaningless. That's why > we can't simply return 0 or 1 here. Also we have to distinguish the > error code in the next patch. I think that we have to use the return value if we want to propagate the errorno to the above. I will change the return value meanings of ocfs2_overwrite_io() function. return 0 means this is a overwrite allocated block IO. return
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
On 17/11/28 15:24, Gang He wrote: > Hello Joseph, > > > >> >> On 17/11/28 11:35, Gang He wrote: >>> Hello Joseph, >>> >>> >> Hi Gang, On 17/11/27 17:46, Gang He wrote: > Add ocfs2_overwrite_io function, which is used to judge if > overwrite allocated blocks, otherwise, the write will bring extra > block allocation overhead. > > Signed-off-by: Gang He > --- > fs/ocfs2/extent_map.c | 67 +++ > fs/ocfs2/extent_map.h | 3 +++ > 2 files changed, 70 insertions(+) > > diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c > index e4719e0..98bf325 100644 > --- a/fs/ocfs2/extent_map.c > +++ b/fs/ocfs2/extent_map.c > @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, > return ret; > } > > +/* Is IO overwriting allocated blocks? */ > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait) > +{ > + int ret = 0, is_last; > + u32 mapping_end, cpos; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + struct buffer_head *di_bh = NULL; > + struct ocfs2_extent_rec rec; > + > + if (wait) > + ret = ocfs2_inode_lock(inode, &di_bh, 0); > + else > + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); > + if (ret) > + goto out; > + > + if (wait) > + down_read(&OCFS2_I(inode)->ip_alloc_sem); > + else { > + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { > + ret = -EAGAIN; > + goto out_unlock1; > + } > + } > + > + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && > +((map_start + map_len) <= i_size_read(inode))) > + goto out_unlock2; > + > + cpos = map_start >> osb->s_clustersize_bits; > + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, > +map_start + map_len); > + is_last = 0; > + while (cpos < mapping_end && !is_last) { > + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, > + NULL, &rec, &is_last); > + if (ret) { > + mlog_errno(ret); > + goto out_unlock2; > + } > + > + if (rec.e_blkno == 0ULL) > + break; > + > + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) > + break; > + > + cpos = le32_to_cpu(rec.e_cpos) + > + le16_to_cpu(rec.e_leaf_clusters); > + } > + > + if (cpos < mapping_end) > + ret = 1; > + > +out_unlock2: > + brelse(di_bh); > + > + up_read(&OCFS2_I(inode)->ip_alloc_sem); > + > +out_unlock1: Should brelse(di_bh) be here? >>> If the code jumps to out_unlock1 directly, the di_bh pointer should be >>> NULL, >> it is not necessary to release. >>> >> Umm... No, once going out here, we have already taken inode lock. So >> di_bh should be released. > Sorry, you are right. > >> > + ocfs2_inode_unlock(inode, 0); > + > +out: > + return (ret ? 0 : 1); I don't think EAGAIN and other error code can be handled the same. We have to distinguish them. >>> Ok, I think we can add one line log to report the error in case the error >>> is >> not EAGAIN. >>> >> My point is, there is no need to try again in several cases, e.g. EROFS >> returned by ocfs2_get_clusters_nocache. > In this function ocfs2_overwrite_io() only can return True(1) or False(0), > then I think we can only give a error print before return true/false. > It is not necessary to return another value, but should let the user know any > possible error message. >This is because you just ignore the error and convert it to 0 or 1. But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN to upper layer and let it try again. But in some cases, e.g. EROFS, trying again is meaningless. That's why we can't simply return 0 or 1 here. Also we have to distinguish the error code in the next patch. > Thanks > Gang > >> Thanks, Joseph > +} > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence) > { > struct inode *inode = file->f_mapping->host; > diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h > index 67ea57d..fd9e86a 100644 > --- a/fs/ocfs2/extent_map.h > +++ b/fs/ocfs2/extent_map.h > @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, > u64 v_blkno, u64 *p_blkno, > int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >u64 map_start, u64 map_len); > > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_l
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Alex, >>> > Hi Gang, > > On 2017/11/28 15:38, Gang He wrote: >> Hi Alex, >> >> > >>> Hi Gang, >>> >>> On 2017/11/28 13:33, Gang He wrote: Hello Alex, >>> > Hi Gang, > > On 2017/11/27 17:46, Gang He wrote: >> Add ocfs2_overwrite_io function, which is used to judge if >> overwrite allocated blocks, otherwise, the write will bring extra >> block allocation overhead. >> >> Signed-off-by: Gang He >> --- >> fs/ocfs2/extent_map.c | 67 > +++ >> fs/ocfs2/extent_map.h | 3 +++ >> 2 files changed, 70 insertions(+) >> >> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >> index e4719e0..98bf325 100644 >> --- a/fs/ocfs2/extent_map.c >> +++ b/fs/ocfs2/extent_map.c >> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct buffer_head *di_bh = NULL; >> +struct ocfs2_extent_rec rec; >> + >> +if (wait) >> +ret = ocfs2_inode_lock(inode, &di_bh, 0); >> +else >> +ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >> +if (ret) >> +goto out; >> + >> +if (wait) >> +down_read(&OCFS2_I(inode)->ip_alloc_sem); >> +else { >> +if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >> +ret = -EAGAIN; >> +goto out_unlock1; >> +} >> +} >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out_unlock2; >> + >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, &rec, &is_last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out_unlock2; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; > I think here the blocks is not overwrite, because the hold is found and > the > blocks > should be allocated. If the rec.e_blkno == NULL, this means there is a hole. The file hole means that these blocks are not allocated, it does not like >>> unwritten block. The unwritten blocks means that these blocks are allocated, but still have >>> not been unwritten. >>> If we break the loop when we find the hold, out of this function we will >>> allocate the blocks in >>> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g >>> et_block() >>> ->ocfs2_write_begin_nolock. Does this violate the semantics of >>> 'IOCB_NOWAIT'; >> Yes, then we need to check if this is a overwrite before doing direct-io. >> > > I mean here we should return 0 instead of break and we should immediately > return -EAGAIN > to upper apps, otherwise, some block allocation will be happen, which > violates the > semantics of 'IOCB_NOWAIT'. Before we do a direct-io, I need to check if this is a overwrite allocated blocks IO. If not, we will return -EAGAIN in 'IOCB_NOWAIT' mode. this should not trigger any block allocation. I am not sure if we understand your concern totally. Thanks Gang > > Thanks, > Alex > >>> >>> BTW, should we consider the down_write() and ocfs2_inode_lock() in >>> ocfs2_dio_wr_get_block() when >>> the flag 'IOCB_NOWAIT' is set; >> I think that we should not consider that layer lock, otherwise, the code > change will become more and more complex and big. >> I also refer to ext4 file system code change for this > feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change > in that layer. >> > > OK. > >> Thanks >> Gang >> >>> >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = 1
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Gang, On 2017/11/28 15:38, Gang He wrote: > Hi Alex, > > >> Hi Gang, >> >> On 2017/11/28 13:33, Gang He wrote: >>> Hello Alex, >>> >>> >> Hi Gang, On 2017/11/27 17:46, Gang He wrote: > Add ocfs2_overwrite_io function, which is used to judge if > overwrite allocated blocks, otherwise, the write will bring extra > block allocation overhead. > > Signed-off-by: Gang He > --- > fs/ocfs2/extent_map.c | 67 +++ > fs/ocfs2/extent_map.h | 3 +++ > 2 files changed, 70 insertions(+) > > diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c > index e4719e0..98bf325 100644 > --- a/fs/ocfs2/extent_map.c > +++ b/fs/ocfs2/extent_map.c > @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, > return ret; > } > > +/* Is IO overwriting allocated blocks? */ > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait) > +{ > + int ret = 0, is_last; > + u32 mapping_end, cpos; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + struct buffer_head *di_bh = NULL; > + struct ocfs2_extent_rec rec; > + > + if (wait) > + ret = ocfs2_inode_lock(inode, &di_bh, 0); > + else > + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); > + if (ret) > + goto out; > + > + if (wait) > + down_read(&OCFS2_I(inode)->ip_alloc_sem); > + else { > + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { > + ret = -EAGAIN; > + goto out_unlock1; > + } > + } > + > + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && > +((map_start + map_len) <= i_size_read(inode))) > + goto out_unlock2; > + > + cpos = map_start >> osb->s_clustersize_bits; > + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, > +map_start + map_len); > + is_last = 0; > + while (cpos < mapping_end && !is_last) { > + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, > + NULL, &rec, &is_last); > + if (ret) { > + mlog_errno(ret); > + goto out_unlock2; > + } > + > + if (rec.e_blkno == 0ULL) > + break; I think here the blocks is not overwrite, because the hold is found and the blocks should be allocated. >>> If the rec.e_blkno == NULL, this means there is a hole. >>> The file hole means that these blocks are not allocated, it does not like >> unwritten block. >>> The unwritten blocks means that these blocks are allocated, but still have >> not been unwritten. >>> >> If we break the loop when we find the hold, out of this function we will >> allocate the blocks in >> ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g >> et_block() >> ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT'; > Yes, then we need to check if this is a overwrite before doing direct-io. > I mean here we should return 0 instead of break and we should immediately return -EAGAIN to upper apps, otherwise, some block allocation will be happen, which violates the semantics of 'IOCB_NOWAIT'. Thanks, Alex >> >> BTW, should we consider the down_write() and ocfs2_inode_lock() in >> ocfs2_dio_wr_get_block() when >> the flag 'IOCB_NOWAIT' is set; > I think that we should not consider that layer lock, otherwise, the code > change will become more and more complex and big. > I also refer to ext4 file system code change for this > feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change > in that layer. > OK. > Thanks > Gang > >> > + > + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) > + break; > + > + cpos = le32_to_cpu(rec.e_cpos) + > + le16_to_cpu(rec.e_leaf_clusters); > + } > + > + if (cpos < mapping_end) > + ret = 1; > + > +out_unlock2: I think the 'out_up_read' is more readable than the 'out_unlock2' . >>> Ok, I will use more readable tag here. > + brelse(di_bh); > + > + up_read(&OCFS2_I(inode)->ip_alloc_sem); > + > +out_unlock1: We should release buffer head here. > + ocfs2_inode_unlock(inode, 0); > + > +out: > + return (ret ? 0 : 1); > +} > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence) > { > struct inode *inode = file->f_mapping->host; > diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h > index 67ea57d..fd9e86a 100644 > --- a/fs/ocfs2/extent_map.h > +++ b/fs/ocfs2/exten
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Alex, >>> > Hi Gang, > > On 2017/11/28 13:33, Gang He wrote: >> Hello Alex, >> >> > >>> Hi Gang, >>> >>> On 2017/11/27 17:46, Gang He wrote: Add ocfs2_overwrite_io function, which is used to judge if overwrite allocated blocks, otherwise, the write will bring extra block allocation overhead. Signed-off-by: Gang He --- fs/ocfs2/extent_map.c | 67 >>> +++ fs/ocfs2/extent_map.h | 3 +++ 2 files changed, 70 insertions(+) diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c index e4719e0..98bf325 100644 --- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >>> fiemap_extent_info *fieinfo, return ret; } +/* Is IO overwriting allocated blocks? */ +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, + int wait) +{ + int ret = 0, is_last; + u32 mapping_end, cpos; + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); + struct buffer_head *di_bh = NULL; + struct ocfs2_extent_rec rec; + + if (wait) + ret = ocfs2_inode_lock(inode, &di_bh, 0); + else + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); + if (ret) + goto out; + + if (wait) + down_read(&OCFS2_I(inode)->ip_alloc_sem); + else { + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { + ret = -EAGAIN; + goto out_unlock1; + } + } + + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && + ((map_start + map_len) <= i_size_read(inode))) + goto out_unlock2; + + cpos = map_start >> osb->s_clustersize_bits; + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, + map_start + map_len); + is_last = 0; + while (cpos < mapping_end && !is_last) { + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, + NULL, &rec, &is_last); + if (ret) { + mlog_errno(ret); + goto out_unlock2; + } + + if (rec.e_blkno == 0ULL) + break; >>> I think here the blocks is not overwrite, because the hold is found and the >>> blocks >>> should be allocated. >> If the rec.e_blkno == NULL, this means there is a hole. >> The file hole means that these blocks are not allocated, it does not like > unwritten block. >> The unwritten blocks means that these blocks are allocated, but still have > not been unwritten. >> > If we break the loop when we find the hold, out of this function we will > allocate the blocks in > ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_g > et_block() > ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT'; Yes, then we need to check if this is a overwrite before doing direct-io. > > BTW, should we consider the down_write() and ocfs2_inode_lock() in > ocfs2_dio_wr_get_block() when > the flag 'IOCB_NOWAIT' is set; I think that we should not consider that layer lock, otherwise, the code change will become more and more complex and big. I also refer to ext4 file system code change for this feature(728fbc0e10b7f3ce2ee043b32e3453fd5201c055), they did not do any change in that layer. Thanks Gang > + + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) + break; + + cpos = le32_to_cpu(rec.e_cpos) + + le16_to_cpu(rec.e_leaf_clusters); + } + + if (cpos < mapping_end) + ret = 1; + +out_unlock2: >>> >>> I think the 'out_up_read' is more readable than the 'out_unlock2' . >> Ok, I will use more readable tag here. >>> + brelse(di_bh); + + up_read(&OCFS2_I(inode)->ip_alloc_sem); + +out_unlock1: >>> >>> We should release buffer head here. >>> + ocfs2_inode_unlock(inode, 0); + +out: + return (ret ? 0 : 1); +} + int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >>> whence) { struct inode *inode = file->f_mapping->host; diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h index 67ea57d..fd9e86a 100644 --- a/fs/ocfs2/extent_map.h +++ b/fs/ocfs2/extent_map.h @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 >>> v_blkno, u64 *p_blkno, int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 map_start, u64 map_len); +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, + int wait); + int o
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hello Joseph, >>> > > On 17/11/28 11:35, Gang He wrote: >> Hello Joseph, >> >> > >>> Hi Gang, >>> >>> On 17/11/27 17:46, Gang He wrote: Add ocfs2_overwrite_io function, which is used to judge if overwrite allocated blocks, otherwise, the write will bring extra block allocation overhead. Signed-off-by: Gang He --- fs/ocfs2/extent_map.c | 67 >>> +++ fs/ocfs2/extent_map.h | 3 +++ 2 files changed, 70 insertions(+) diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c index e4719e0..98bf325 100644 --- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >>> fiemap_extent_info *fieinfo, return ret; } +/* Is IO overwriting allocated blocks? */ +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, + int wait) +{ + int ret = 0, is_last; + u32 mapping_end, cpos; + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); + struct buffer_head *di_bh = NULL; + struct ocfs2_extent_rec rec; + + if (wait) + ret = ocfs2_inode_lock(inode, &di_bh, 0); + else + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); + if (ret) + goto out; + + if (wait) + down_read(&OCFS2_I(inode)->ip_alloc_sem); + else { + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { + ret = -EAGAIN; + goto out_unlock1; + } + } + + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && + ((map_start + map_len) <= i_size_read(inode))) + goto out_unlock2; + + cpos = map_start >> osb->s_clustersize_bits; + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, + map_start + map_len); + is_last = 0; + while (cpos < mapping_end && !is_last) { + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, + NULL, &rec, &is_last); + if (ret) { + mlog_errno(ret); + goto out_unlock2; + } + + if (rec.e_blkno == 0ULL) + break; + + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) + break; + + cpos = le32_to_cpu(rec.e_cpos) + + le16_to_cpu(rec.e_leaf_clusters); + } + + if (cpos < mapping_end) + ret = 1; + +out_unlock2: + brelse(di_bh); + + up_read(&OCFS2_I(inode)->ip_alloc_sem); + +out_unlock1: >>> Should brelse(di_bh) be here? >> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, > it is not necessary to release. >> > Umm... No, once going out here, we have already taken inode lock. So > di_bh should be released. Sorry, you are right. > >>> + ocfs2_inode_unlock(inode, 0); + +out: + return (ret ? 0 : 1); >>> I don't think EAGAIN and other error code can be handled the same. We >>> have to distinguish them. >> Ok, I think we can add one line log to report the error in case the error is > not EAGAIN. >> > My point is, there is no need to try again in several cases, e.g. EROFS > returned by ocfs2_get_clusters_nocache. In this function ocfs2_overwrite_io() only can return True(1) or False(0), then I think we can only give a error print before return true/false. It is not necessary to return another value, but should let the user know any possible error message. Thanks Gang > >>> >>> Thanks, >>> Joseph >>> +} + int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >>> whence) { struct inode *inode = file->f_mapping->host; diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h index 67ea57d..fd9e86a 100644 --- a/fs/ocfs2/extent_map.h +++ b/fs/ocfs2/extent_map.h @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 >>> v_blkno, u64 *p_blkno, int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 map_start, u64 map_len); +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, + int wait); + int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >>> origin); int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
On 2017/11/28 13:44, Gang He wrote: > Hi Changwei, > > >> Hi, >> Gang >> >> On 2017/11/27 17:48, Gang He wrote: >>> Add ocfs2_overwrite_io function, which is used to judge if >>> overwrite allocated blocks, otherwise, the write will bring extra >>> block allocation overhead. >>> >> >> Can you elaborate how this overhead is introduced? >> Forgive me, I don't figure it. > If the blocks have been allocated, we just write these block directly. > If these blocks have not been allocated, that means we need to allocate these > block firstly before write, > this allocation will bring the IO invoking be blocked, if the upper > application does not want take this kind of overhead, > he can pass a nowait flag to avoid and return immediately with a -EAGAIN > error. Thanks for your answer and contribution. This makes sense. Changwei > > Thanks > Gang > >> >> Thanks, >> Changwei >> >>> Signed-off-by: Gang He >>> --- >>>fs/ocfs2/extent_map.c | 67 >> +++ >>>fs/ocfs2/extent_map.h | 3 +++ >>>2 files changed, 70 insertions(+) >>> >>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >>> index e4719e0..98bf325 100644 >>> --- a/fs/ocfs2/extent_map.c >>> +++ b/fs/ocfs2/extent_map.c >>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >> fiemap_extent_info *fieinfo, >>> return ret; >>>} >>> >>> +/* Is IO overwriting allocated blocks? */ >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait) >>> +{ >>> + int ret = 0, is_last; >>> + u32 mapping_end, cpos; >>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> + struct buffer_head *di_bh = NULL; >>> + struct ocfs2_extent_rec rec; >>> + >>> + if (wait) >>> + ret = ocfs2_inode_lock(inode, &di_bh, 0); >>> + else >>> + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >>> + if (ret) >>> + goto out; >>> + >>> + if (wait) >>> + down_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + else { >>> + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >>> + ret = -EAGAIN; >>> + goto out_unlock1; >>> + } >>> + } >>> + >>> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >>> + ((map_start + map_len) <= i_size_read(inode))) >>> + goto out_unlock2; >>> + >>> + cpos = map_start >> osb->s_clustersize_bits; >>> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >>> + map_start + map_len); >>> + is_last = 0; >>> + while (cpos < mapping_end && !is_last) { >>> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >>> +NULL, &rec, &is_last); >>> + if (ret) { >>> + mlog_errno(ret); >>> + goto out_unlock2; >>> + } >>> + >>> + if (rec.e_blkno == 0ULL) >>> + break; >>> + >>> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >>> + break; >>> + >>> + cpos = le32_to_cpu(rec.e_cpos) + >>> + le16_to_cpu(rec.e_leaf_clusters); >>> + } >>> + >>> + if (cpos < mapping_end) >>> + ret = 1; >>> + >>> +out_unlock2: >>> + brelse(di_bh); >>> + >>> + up_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + >>> +out_unlock1: >>> + ocfs2_inode_unlock(inode, 0); >>> + >>> +out: >>> + return (ret ? 0 : 1); >>> +} >>> + >>>int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> whence) >>>{ >>> struct inode *inode = file->f_mapping->host; >>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >>> index 67ea57d..fd9e86a 100644 >>> --- a/fs/ocfs2/extent_map.h >>> +++ b/fs/ocfs2/extent_map.h >>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 >> v_blkno, u64 *p_blkno, >>>int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >>> u64 map_start, u64 map_len); >>> >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait); >>> + >>>int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> origin); >>> >>>int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >>> > > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
On 17/11/28 11:35, Gang He wrote: > Hello Joseph, > > >> Hi Gang, >> >> On 17/11/27 17:46, Gang He wrote: >>> Add ocfs2_overwrite_io function, which is used to judge if >>> overwrite allocated blocks, otherwise, the write will bring extra >>> block allocation overhead. >>> >>> Signed-off-by: Gang He >>> --- >>> fs/ocfs2/extent_map.c | 67 >> +++ >>> fs/ocfs2/extent_map.h | 3 +++ >>> 2 files changed, 70 insertions(+) >>> >>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >>> index e4719e0..98bf325 100644 >>> --- a/fs/ocfs2/extent_map.c >>> +++ b/fs/ocfs2/extent_map.c >>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >> fiemap_extent_info *fieinfo, >>> return ret; >>> } >>> >>> +/* Is IO overwriting allocated blocks? */ >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait) >>> +{ >>> + int ret = 0, is_last; >>> + u32 mapping_end, cpos; >>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> + struct buffer_head *di_bh = NULL; >>> + struct ocfs2_extent_rec rec; >>> + >>> + if (wait) >>> + ret = ocfs2_inode_lock(inode, &di_bh, 0); >>> + else >>> + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >>> + if (ret) >>> + goto out; >>> + >>> + if (wait) >>> + down_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + else { >>> + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >>> + ret = -EAGAIN; >>> + goto out_unlock1; >>> + } >>> + } >>> + >>> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >>> + ((map_start + map_len) <= i_size_read(inode))) >>> + goto out_unlock2; >>> + >>> + cpos = map_start >> osb->s_clustersize_bits; >>> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >>> + map_start + map_len); >>> + is_last = 0; >>> + while (cpos < mapping_end && !is_last) { >>> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >>> +NULL, &rec, &is_last); >>> + if (ret) { >>> + mlog_errno(ret); >>> + goto out_unlock2; >>> + } >>> + >>> + if (rec.e_blkno == 0ULL) >>> + break; >>> + >>> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >>> + break; >>> + >>> + cpos = le32_to_cpu(rec.e_cpos) + >>> + le16_to_cpu(rec.e_leaf_clusters); >>> + } >>> + >>> + if (cpos < mapping_end) >>> + ret = 1; >>> + >>> +out_unlock2: >>> + brelse(di_bh); >>> + >>> + up_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + >>> +out_unlock1: >> Should brelse(di_bh) be here? > If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, > it is not necessary to release. > Umm... No, once going out here, we have already taken inode lock. So di_bh should be released. >> >>> + ocfs2_inode_unlock(inode, 0); >>> + >>> +out: >>> + return (ret ? 0 : 1); >> I don't think EAGAIN and other error code can be handled the same. We >> have to distinguish them. > Ok, I think we can add one line log to report the error in case the error is > not EAGAIN. > My point is, there is no need to try again in several cases, e.g. EROFS returned by ocfs2_get_clusters_nocache. >> >> Thanks, >> Joseph >> >>> +} >>> + >>> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> whence) >>> { >>> struct inode *inode = file->f_mapping->host; >>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >>> index 67ea57d..fd9e86a 100644 >>> --- a/fs/ocfs2/extent_map.h >>> +++ b/fs/ocfs2/extent_map.h >>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 >> v_blkno, u64 *p_blkno, >>> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >>> u64 map_start, u64 map_len); >>> >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait); >>> + >>> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> origin); >>> >>> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >>> > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Gang, On 2017/11/28 13:33, Gang He wrote: > Hello Alex, > > >> Hi Gang, >> >> On 2017/11/27 17:46, Gang He wrote: >>> Add ocfs2_overwrite_io function, which is used to judge if >>> overwrite allocated blocks, otherwise, the write will bring extra >>> block allocation overhead. >>> >>> Signed-off-by: Gang He >>> --- >>> fs/ocfs2/extent_map.c | 67 >> +++ >>> fs/ocfs2/extent_map.h | 3 +++ >>> 2 files changed, 70 insertions(+) >>> >>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >>> index e4719e0..98bf325 100644 >>> --- a/fs/ocfs2/extent_map.c >>> +++ b/fs/ocfs2/extent_map.c >>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >> fiemap_extent_info *fieinfo, >>> return ret; >>> } >>> >>> +/* Is IO overwriting allocated blocks? */ >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait) >>> +{ >>> + int ret = 0, is_last; >>> + u32 mapping_end, cpos; >>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> + struct buffer_head *di_bh = NULL; >>> + struct ocfs2_extent_rec rec; >>> + >>> + if (wait) >>> + ret = ocfs2_inode_lock(inode, &di_bh, 0); >>> + else >>> + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >>> + if (ret) >>> + goto out; >>> + >>> + if (wait) >>> + down_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + else { >>> + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >>> + ret = -EAGAIN; >>> + goto out_unlock1; >>> + } >>> + } >>> + >>> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >>> + ((map_start + map_len) <= i_size_read(inode))) >>> + goto out_unlock2; >>> + >>> + cpos = map_start >> osb->s_clustersize_bits; >>> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >>> + map_start + map_len); >>> + is_last = 0; >>> + while (cpos < mapping_end && !is_last) { >>> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >>> +NULL, &rec, &is_last); >>> + if (ret) { >>> + mlog_errno(ret); >>> + goto out_unlock2; >>> + } >>> + >>> + if (rec.e_blkno == 0ULL) >>> + break; >> I think here the blocks is not overwrite, because the hold is found and the >> blocks >> should be allocated. > If the rec.e_blkno == NULL, this means there is a hole. > The file hole means that these blocks are not allocated, it does not like > unwritten block. > The unwritten blocks means that these blocks are allocated, but still have > not been unwritten. > If we break the loop when we find the hold, out of this function we will allocate the blocks in ocfs2_file_write_iter()->..->ocfs2_direct_IO->__blockdev_direct_IO->..->ocfs2_dio_wr_get_block() ->ocfs2_write_begin_nolock. Does this violate the semantics of 'IOCB_NOWAIT'; BTW, should we consider the down_write() and ocfs2_inode_lock() in ocfs2_dio_wr_get_block() when the flag 'IOCB_NOWAIT' is set; >>> + >>> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >>> + break; >>> + >>> + cpos = le32_to_cpu(rec.e_cpos) + >>> + le16_to_cpu(rec.e_leaf_clusters); >>> + } >>> + >>> + if (cpos < mapping_end) >>> + ret = 1; >>> + >>> +out_unlock2: >> >> I think the 'out_up_read' is more readable than the 'out_unlock2' . > Ok, I will use more readable tag here. >> >>> + brelse(di_bh); >>> + >>> + up_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + >>> +out_unlock1: >> >> We should release buffer head here. >> >>> + ocfs2_inode_unlock(inode, 0); >>> + >>> +out: >>> + return (ret ? 0 : 1); >>> +} >>> + >>> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> whence) >>> { >>> struct inode *inode = file->f_mapping->host; >>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >>> index 67ea57d..fd9e86a 100644 >>> --- a/fs/ocfs2/extent_map.h >>> +++ b/fs/ocfs2/extent_map.h >>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 >> v_blkno, u64 *p_blkno, >>> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >>> u64 map_start, u64 map_len); >>> >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait); >>> + >>> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> origin); >>> >>> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >>> > > > . > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Changwei, >>> > Hi, > Gang > > On 2017/11/27 17:48, Gang He wrote: >> Add ocfs2_overwrite_io function, which is used to judge if >> overwrite allocated blocks, otherwise, the write will bring extra >> block allocation overhead. >> > > Can you elaborate how this overhead is introduced? > Forgive me, I don't figure it. If the blocks have been allocated, we just write these block directly. If these blocks have not been allocated, that means we need to allocate these block firstly before write, this allocation will bring the IO invoking be blocked, if the upper application does not want take this kind of overhead, he can pass a nowait flag to avoid and return immediately with a -EAGAIN error. Thanks Gang > > Thanks, > Changwei > >> Signed-off-by: Gang He >> --- >> fs/ocfs2/extent_map.c | 67 > +++ >> fs/ocfs2/extent_map.h | 3 +++ >> 2 files changed, 70 insertions(+) >> >> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >> index e4719e0..98bf325 100644 >> --- a/fs/ocfs2/extent_map.c >> +++ b/fs/ocfs2/extent_map.c >> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct buffer_head *di_bh = NULL; >> +struct ocfs2_extent_rec rec; >> + >> +if (wait) >> +ret = ocfs2_inode_lock(inode, &di_bh, 0); >> +else >> +ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >> +if (ret) >> +goto out; >> + >> +if (wait) >> +down_read(&OCFS2_I(inode)->ip_alloc_sem); >> +else { >> +if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >> +ret = -EAGAIN; >> +goto out_unlock1; >> +} >> +} >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out_unlock2; >> + >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, &rec, &is_last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out_unlock2; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = 1; >> + >> +out_unlock2: >> +brelse(di_bh); >> + >> +up_read(&OCFS2_I(inode)->ip_alloc_sem); >> + >> +out_unlock1: >> +ocfs2_inode_unlock(inode, 0); >> + >> +out: >> +return (ret ? 0 : 1); >> +} >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) >> { >> struct inode *inode = file->f_mapping->host; >> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >> index 67ea57d..fd9e86a 100644 >> --- a/fs/ocfs2/extent_map.h >> +++ b/fs/ocfs2/extent_map.h >> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, >> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >> u64 map_start, u64 map_len); >> >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait); >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); >> >> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hello Alex, >>> > Hi Gang, > > On 2017/11/27 17:46, Gang He wrote: >> Add ocfs2_overwrite_io function, which is used to judge if >> overwrite allocated blocks, otherwise, the write will bring extra >> block allocation overhead. >> >> Signed-off-by: Gang He >> --- >> fs/ocfs2/extent_map.c | 67 > +++ >> fs/ocfs2/extent_map.h | 3 +++ >> 2 files changed, 70 insertions(+) >> >> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >> index e4719e0..98bf325 100644 >> --- a/fs/ocfs2/extent_map.c >> +++ b/fs/ocfs2/extent_map.c >> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct buffer_head *di_bh = NULL; >> +struct ocfs2_extent_rec rec; >> + >> +if (wait) >> +ret = ocfs2_inode_lock(inode, &di_bh, 0); >> +else >> +ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >> +if (ret) >> +goto out; >> + >> +if (wait) >> +down_read(&OCFS2_I(inode)->ip_alloc_sem); >> +else { >> +if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >> +ret = -EAGAIN; >> +goto out_unlock1; >> +} >> +} >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out_unlock2; >> + >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, &rec, &is_last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out_unlock2; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; > I think here the blocks is not overwrite, because the hold is found and the > blocks > should be allocated. If the rec.e_blkno == NULL, this means there is a hole. The file hole means that these blocks are not allocated, it does not like unwritten block. The unwritten blocks means that these blocks are allocated, but still have not been unwritten. >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = 1; >> + >> +out_unlock2: > > I think the 'out_up_read' is more readable than the 'out_unlock2' . Ok, I will use more readable tag here. > >> +brelse(di_bh); >> + >> +up_read(&OCFS2_I(inode)->ip_alloc_sem); >> + >> +out_unlock1: > > We should release buffer head here. > >> +ocfs2_inode_unlock(inode, 0); >> + >> +out: >> +return (ret ? 0 : 1); >> +} >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) >> { >> struct inode *inode = file->f_mapping->host; >> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >> index 67ea57d..fd9e86a 100644 >> --- a/fs/ocfs2/extent_map.h >> +++ b/fs/ocfs2/extent_map.h >> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, >> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >> u64 map_start, u64 map_len); >> >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait); >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); >> >> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
>>> > On 2017/11/28 9:52, piaojun wrote: >> Hi Gang, >> >> If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if >> we can discard 'int wait' just as ext4 does: >> >> static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len); > > Yes, Jun has a point. > It seems that ocfs2_overwrite_io is only involved in non-blocking aio > and no other code spot is calling ocfs2_overwrite_io with wait=1 passed. Ok, I will do this change. > >> >> thans, >> Jun >> >> On 2017/11/27 17:46, Gang He wrote: >>> Add ocfs2_overwrite_io function, which is used to judge if >>> overwrite allocated blocks, otherwise, the write will bring extra >>> block allocation overhead. >>> >>> Signed-off-by: Gang He >>> --- >>> fs/ocfs2/extent_map.c | 67 > +++ >>> fs/ocfs2/extent_map.h | 3 +++ >>> 2 files changed, 70 insertions(+) >>> >>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >>> index e4719e0..98bf325 100644 >>> --- a/fs/ocfs2/extent_map.c >>> +++ b/fs/ocfs2/extent_map.c >>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, >>> return ret; >>> } >>> >>> +/* Is IO overwriting allocated blocks? */ >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait) >>> +{ >>> + int ret = 0, is_last; >>> + u32 mapping_end, cpos; >>> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> + struct buffer_head *di_bh = NULL; >>> + struct ocfs2_extent_rec rec; >>> + >>> + if (wait) >>> + ret = ocfs2_inode_lock(inode, &di_bh, 0); >>> + else >>> + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >>> + if (ret) >>> + goto out; >>> + >>> + if (wait) >>> + down_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + else { >>> + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >>> + ret = -EAGAIN; > Here is a little strange, it seems that you don't care much about how > this function fails. Why evaluate _ret_ to -EAGAIN here and ignore it > later? > > Thanks, > Changwei > >>> + goto out_unlock1; >>> + } >>> + } >>> + >>> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >>> + ((map_start + map_len) <= i_size_read(inode))) >>> + goto out_unlock2; >>> + >>> + cpos = map_start >> osb->s_clustersize_bits; >>> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >>> + map_start + map_len); >>> + is_last = 0; >>> + while (cpos < mapping_end && !is_last) { >>> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >>> +NULL, &rec, &is_last); >>> + if (ret) { >>> + mlog_errno(ret); >>> + goto out_unlock2; >>> + } >>> + >>> + if (rec.e_blkno == 0ULL) >>> + break; >>> + >>> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >>> + break; >>> + >>> + cpos = le32_to_cpu(rec.e_cpos) + >>> + le16_to_cpu(rec.e_leaf_clusters); >>> + } >>> + >>> + if (cpos < mapping_end) >>> + ret = 1; >>> + >>> +out_unlock2: >>> + brelse(di_bh); >>> + >>> + up_read(&OCFS2_I(inode)->ip_alloc_sem); >>> + >>> +out_unlock1: >>> + ocfs2_inode_unlock(inode, 0); >>> + >>> +out: >>> + return (ret ? 0 : 1); >>> +} >>> + >>> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) >>> { >>> struct inode *inode = file->f_mapping->host; >>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >>> index 67ea57d..fd9e86a 100644 >>> --- a/fs/ocfs2/extent_map.h >>> +++ b/fs/ocfs2/extent_map.h >>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, >>> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >>> u64 map_start, u64 map_len); >>> >>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >>> + int wait); >>> + >>> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); >>> >>> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >>> >> >> ___ >> Ocfs2-devel mailing list >> Ocfs2-devel@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Jun, >>> > Hi Gang, > > If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if > we can discard 'int wait' just as ext4 does: > > static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len); Ok, it looks most people prefer to get rid of "wait" parameter. Thanks Gang > > thans, > Jun > > On 2017/11/27 17:46, Gang He wrote: >> Add ocfs2_overwrite_io function, which is used to judge if >> overwrite allocated blocks, otherwise, the write will bring extra >> block allocation overhead. >> >> Signed-off-by: Gang He >> --- >> fs/ocfs2/extent_map.c | 67 > +++ >> fs/ocfs2/extent_map.h | 3 +++ >> 2 files changed, 70 insertions(+) >> >> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >> index e4719e0..98bf325 100644 >> --- a/fs/ocfs2/extent_map.c >> +++ b/fs/ocfs2/extent_map.c >> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct buffer_head *di_bh = NULL; >> +struct ocfs2_extent_rec rec; >> + >> +if (wait) >> +ret = ocfs2_inode_lock(inode, &di_bh, 0); >> +else >> +ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >> +if (ret) >> +goto out; >> + >> +if (wait) >> +down_read(&OCFS2_I(inode)->ip_alloc_sem); >> +else { >> +if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >> +ret = -EAGAIN; >> +goto out_unlock1; >> +} >> +} >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out_unlock2; >> + >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, &rec, &is_last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out_unlock2; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = 1; >> + >> +out_unlock2: >> +brelse(di_bh); >> + >> +up_read(&OCFS2_I(inode)->ip_alloc_sem); >> + >> +out_unlock1: >> +ocfs2_inode_unlock(inode, 0); >> + >> +out: >> +return (ret ? 0 : 1); >> +} >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) >> { >> struct inode *inode = file->f_mapping->host; >> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >> index 67ea57d..fd9e86a 100644 >> --- a/fs/ocfs2/extent_map.h >> +++ b/fs/ocfs2/extent_map.h >> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, >> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >> u64 map_start, u64 map_len); >> >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait); >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); >> >> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hello Joseph, >>> > Hi Gang, > > On 17/11/27 17:46, Gang He wrote: >> Add ocfs2_overwrite_io function, which is used to judge if >> overwrite allocated blocks, otherwise, the write will bring extra >> block allocation overhead. >> >> Signed-off-by: Gang He >> --- >> fs/ocfs2/extent_map.c | 67 > +++ >> fs/ocfs2/extent_map.h | 3 +++ >> 2 files changed, 70 insertions(+) >> >> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >> index e4719e0..98bf325 100644 >> --- a/fs/ocfs2/extent_map.c >> +++ b/fs/ocfs2/extent_map.c >> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct buffer_head *di_bh = NULL; >> +struct ocfs2_extent_rec rec; >> + >> +if (wait) >> +ret = ocfs2_inode_lock(inode, &di_bh, 0); >> +else >> +ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >> +if (ret) >> +goto out; >> + >> +if (wait) >> +down_read(&OCFS2_I(inode)->ip_alloc_sem); >> +else { >> +if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >> +ret = -EAGAIN; >> +goto out_unlock1; >> +} >> +} >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out_unlock2; >> + >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, &rec, &is_last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out_unlock2; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = 1; >> + >> +out_unlock2: >> +brelse(di_bh); >> + >> +up_read(&OCFS2_I(inode)->ip_alloc_sem); >> + >> +out_unlock1: > Should brelse(di_bh) be here? If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, it is not necessary to release. > >> +ocfs2_inode_unlock(inode, 0); >> + >> +out: >> +return (ret ? 0 : 1); > I don't think EAGAIN and other error code can be handled the same. We > have to distinguish them. Ok, I think we can add one line log to report the error in case the error is not EAGAIN. > > Thanks, > Joseph > >> +} >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) >> { >> struct inode *inode = file->f_mapping->host; >> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >> index 67ea57d..fd9e86a 100644 >> --- a/fs/ocfs2/extent_map.h >> +++ b/fs/ocfs2/extent_map.h >> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, >> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >> u64 map_start, u64 map_len); >> >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait); >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); >> >> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi, Gang On 2017/11/27 17:48, Gang He wrote: > Add ocfs2_overwrite_io function, which is used to judge if > overwrite allocated blocks, otherwise, the write will bring extra > block allocation overhead. > Can you elaborate how this overhead is introduced? Forgive me, I don't figure it. Thanks, Changwei > Signed-off-by: Gang He > --- > fs/ocfs2/extent_map.c | 67 > +++ > fs/ocfs2/extent_map.h | 3 +++ > 2 files changed, 70 insertions(+) > > diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c > index e4719e0..98bf325 100644 > --- a/fs/ocfs2/extent_map.c > +++ b/fs/ocfs2/extent_map.c > @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, > return ret; > } > > +/* Is IO overwriting allocated blocks? */ > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait) > +{ > + int ret = 0, is_last; > + u32 mapping_end, cpos; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + struct buffer_head *di_bh = NULL; > + struct ocfs2_extent_rec rec; > + > + if (wait) > + ret = ocfs2_inode_lock(inode, &di_bh, 0); > + else > + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); > + if (ret) > + goto out; > + > + if (wait) > + down_read(&OCFS2_I(inode)->ip_alloc_sem); > + else { > + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { > + ret = -EAGAIN; > + goto out_unlock1; > + } > + } > + > + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && > +((map_start + map_len) <= i_size_read(inode))) > + goto out_unlock2; > + > + cpos = map_start >> osb->s_clustersize_bits; > + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, > +map_start + map_len); > + is_last = 0; > + while (cpos < mapping_end && !is_last) { > + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, > + NULL, &rec, &is_last); > + if (ret) { > + mlog_errno(ret); > + goto out_unlock2; > + } > + > + if (rec.e_blkno == 0ULL) > + break; > + > + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) > + break; > + > + cpos = le32_to_cpu(rec.e_cpos) + > + le16_to_cpu(rec.e_leaf_clusters); > + } > + > + if (cpos < mapping_end) > + ret = 1; > + > +out_unlock2: > + brelse(di_bh); > + > + up_read(&OCFS2_I(inode)->ip_alloc_sem); > + > +out_unlock1: > + ocfs2_inode_unlock(inode, 0); > + > +out: > + return (ret ? 0 : 1); > +} > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) > { > struct inode *inode = file->f_mapping->host; > diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h > index 67ea57d..fd9e86a 100644 > --- a/fs/ocfs2/extent_map.h > +++ b/fs/ocfs2/extent_map.h > @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, > int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >u64 map_start, u64 map_len); > > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait); > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); > > int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Gang, On 2017/11/27 17:46, Gang He wrote: > Add ocfs2_overwrite_io function, which is used to judge if > overwrite allocated blocks, otherwise, the write will bring extra > block allocation overhead. > > Signed-off-by: Gang He > --- > fs/ocfs2/extent_map.c | 67 > +++ > fs/ocfs2/extent_map.h | 3 +++ > 2 files changed, 70 insertions(+) > > diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c > index e4719e0..98bf325 100644 > --- a/fs/ocfs2/extent_map.c > +++ b/fs/ocfs2/extent_map.c > @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, > return ret; > } > > +/* Is IO overwriting allocated blocks? */ > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait) > +{ > + int ret = 0, is_last; > + u32 mapping_end, cpos; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + struct buffer_head *di_bh = NULL; > + struct ocfs2_extent_rec rec; > + > + if (wait) > + ret = ocfs2_inode_lock(inode, &di_bh, 0); > + else > + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); > + if (ret) > + goto out; > + > + if (wait) > + down_read(&OCFS2_I(inode)->ip_alloc_sem); > + else { > + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { > + ret = -EAGAIN; > + goto out_unlock1; > + } > + } > + > + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && > +((map_start + map_len) <= i_size_read(inode))) > + goto out_unlock2; > + > + cpos = map_start >> osb->s_clustersize_bits; > + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, > +map_start + map_len); > + is_last = 0; > + while (cpos < mapping_end && !is_last) { > + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, > + NULL, &rec, &is_last); > + if (ret) { > + mlog_errno(ret); > + goto out_unlock2; > + } > + > + if (rec.e_blkno == 0ULL) > + break; I think here the blocks is not overwrite, because the hold is found and the blocks should be allocated. > + > + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) > + break; > + > + cpos = le32_to_cpu(rec.e_cpos) + > + le16_to_cpu(rec.e_leaf_clusters); > + } > + > + if (cpos < mapping_end) > + ret = 1; > + > +out_unlock2: I think the 'out_up_read' is more readable than the 'out_unlock2' . > + brelse(di_bh); > + > + up_read(&OCFS2_I(inode)->ip_alloc_sem); > + > +out_unlock1: We should release buffer head here. > + ocfs2_inode_unlock(inode, 0); > + > +out: > + return (ret ? 0 : 1); > +} > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) > { > struct inode *inode = file->f_mapping->host; > diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h > index 67ea57d..fd9e86a 100644 > --- a/fs/ocfs2/extent_map.h > +++ b/fs/ocfs2/extent_map.h > @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, > int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >u64 map_start, u64 map_len); > > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait); > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); > > int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
On 2017/11/28 9:52, piaojun wrote: > Hi Gang, > > If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if > we can discard 'int wait' just as ext4 does: > > static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len); Yes, Jun has a point. It seems that ocfs2_overwrite_io is only involved in non-blocking aio and no other code spot is calling ocfs2_overwrite_io with wait=1 passed. > > thans, > Jun > > On 2017/11/27 17:46, Gang He wrote: >> Add ocfs2_overwrite_io function, which is used to judge if >> overwrite allocated blocks, otherwise, the write will bring extra >> block allocation overhead. >> >> Signed-off-by: Gang He >> --- >> fs/ocfs2/extent_map.c | 67 >> +++ >> fs/ocfs2/extent_map.h | 3 +++ >> 2 files changed, 70 insertions(+) >> >> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c >> index e4719e0..98bf325 100644 >> --- a/fs/ocfs2/extent_map.c >> +++ b/fs/ocfs2/extent_map.c >> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct >> fiemap_extent_info *fieinfo, >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct buffer_head *di_bh = NULL; >> +struct ocfs2_extent_rec rec; >> + >> +if (wait) >> +ret = ocfs2_inode_lock(inode, &di_bh, 0); >> +else >> +ret = ocfs2_try_inode_lock(inode, &di_bh, 0); >> +if (ret) >> +goto out; >> + >> +if (wait) >> +down_read(&OCFS2_I(inode)->ip_alloc_sem); >> +else { >> +if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { >> +ret = -EAGAIN; Here is a little strange, it seems that you don't care much about how this function fails. Why evaluate _ret_ to -EAGAIN here and ignore it later? Thanks, Changwei >> +goto out_unlock1; >> +} >> +} >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out_unlock2; >> + >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, &rec, &is_last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out_unlock2; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = 1; >> + >> +out_unlock2: >> +brelse(di_bh); >> + >> +up_read(&OCFS2_I(inode)->ip_alloc_sem); >> + >> +out_unlock1: >> +ocfs2_inode_unlock(inode, 0); >> + >> +out: >> +return (ret ? 0 : 1); >> +} >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> whence) >> { >> struct inode *inode = file->f_mapping->host; >> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h >> index 67ea57d..fd9e86a 100644 >> --- a/fs/ocfs2/extent_map.h >> +++ b/fs/ocfs2/extent_map.h >> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 >> v_blkno, u64 *p_blkno, >> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >> u64 map_start, u64 map_len); >> >> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, >> + int wait); >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int >> origin); >> >> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >> > > ___ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Gang, If ocfs2_overwrite_io is only called in 'nowait' scenarios, I wonder if we can discard 'int wait' just as ext4 does: static bool ext4_overwrite_io(struct inode *inode, loff_t pos, loff_t len); thans, Jun On 2017/11/27 17:46, Gang He wrote: > Add ocfs2_overwrite_io function, which is used to judge if > overwrite allocated blocks, otherwise, the write will bring extra > block allocation overhead. > > Signed-off-by: Gang He > --- > fs/ocfs2/extent_map.c | 67 > +++ > fs/ocfs2/extent_map.h | 3 +++ > 2 files changed, 70 insertions(+) > > diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c > index e4719e0..98bf325 100644 > --- a/fs/ocfs2/extent_map.c > +++ b/fs/ocfs2/extent_map.c > @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, > return ret; > } > > +/* Is IO overwriting allocated blocks? */ > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait) > +{ > + int ret = 0, is_last; > + u32 mapping_end, cpos; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + struct buffer_head *di_bh = NULL; > + struct ocfs2_extent_rec rec; > + > + if (wait) > + ret = ocfs2_inode_lock(inode, &di_bh, 0); > + else > + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); > + if (ret) > + goto out; > + > + if (wait) > + down_read(&OCFS2_I(inode)->ip_alloc_sem); > + else { > + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { > + ret = -EAGAIN; > + goto out_unlock1; > + } > + } > + > + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && > +((map_start + map_len) <= i_size_read(inode))) > + goto out_unlock2; > + > + cpos = map_start >> osb->s_clustersize_bits; > + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, > +map_start + map_len); > + is_last = 0; > + while (cpos < mapping_end && !is_last) { > + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, > + NULL, &rec, &is_last); > + if (ret) { > + mlog_errno(ret); > + goto out_unlock2; > + } > + > + if (rec.e_blkno == 0ULL) > + break; > + > + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) > + break; > + > + cpos = le32_to_cpu(rec.e_cpos) + > + le16_to_cpu(rec.e_leaf_clusters); > + } > + > + if (cpos < mapping_end) > + ret = 1; > + > +out_unlock2: > + brelse(di_bh); > + > + up_read(&OCFS2_I(inode)->ip_alloc_sem); > + > +out_unlock1: > + ocfs2_inode_unlock(inode, 0); > + > +out: > + return (ret ? 0 : 1); > +} > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) > { > struct inode *inode = file->f_mapping->host; > diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h > index 67ea57d..fd9e86a 100644 > --- a/fs/ocfs2/extent_map.h > +++ b/fs/ocfs2/extent_map.h > @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, > int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >u64 map_start, u64 map_len); > > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait); > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); > > int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Hi Gang, On 17/11/27 17:46, Gang He wrote: > Add ocfs2_overwrite_io function, which is used to judge if > overwrite allocated blocks, otherwise, the write will bring extra > block allocation overhead. > > Signed-off-by: Gang He > --- > fs/ocfs2/extent_map.c | 67 > +++ > fs/ocfs2/extent_map.h | 3 +++ > 2 files changed, 70 insertions(+) > > diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c > index e4719e0..98bf325 100644 > --- a/fs/ocfs2/extent_map.c > +++ b/fs/ocfs2/extent_map.c > @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct > fiemap_extent_info *fieinfo, > return ret; > } > > +/* Is IO overwriting allocated blocks? */ > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait) > +{ > + int ret = 0, is_last; > + u32 mapping_end, cpos; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + struct buffer_head *di_bh = NULL; > + struct ocfs2_extent_rec rec; > + > + if (wait) > + ret = ocfs2_inode_lock(inode, &di_bh, 0); > + else > + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); > + if (ret) > + goto out; > + > + if (wait) > + down_read(&OCFS2_I(inode)->ip_alloc_sem); > + else { > + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { > + ret = -EAGAIN; > + goto out_unlock1; > + } > + } > + > + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && > +((map_start + map_len) <= i_size_read(inode))) > + goto out_unlock2; > + > + cpos = map_start >> osb->s_clustersize_bits; > + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, > +map_start + map_len); > + is_last = 0; > + while (cpos < mapping_end && !is_last) { > + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, > + NULL, &rec, &is_last); > + if (ret) { > + mlog_errno(ret); > + goto out_unlock2; > + } > + > + if (rec.e_blkno == 0ULL) > + break; > + > + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) > + break; > + > + cpos = le32_to_cpu(rec.e_cpos) + > + le16_to_cpu(rec.e_leaf_clusters); > + } > + > + if (cpos < mapping_end) > + ret = 1; > + > +out_unlock2: > + brelse(di_bh); > + > + up_read(&OCFS2_I(inode)->ip_alloc_sem); > + > +out_unlock1: Should brelse(di_bh) be here? > + ocfs2_inode_unlock(inode, 0); > + > +out: > + return (ret ? 0 : 1); I don't think EAGAIN and other error code can be handled the same. We have to distinguish them. Thanks, Joseph > +} > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) > { > struct inode *inode = file->f_mapping->host; > diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h > index 67ea57d..fd9e86a 100644 > --- a/fs/ocfs2/extent_map.h > +++ b/fs/ocfs2/extent_map.h > @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 > v_blkno, u64 *p_blkno, > int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >u64 map_start, u64 map_len); > > +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, > +int wait); > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); > > int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function
Add ocfs2_overwrite_io function, which is used to judge if overwrite allocated blocks, otherwise, the write will bring extra block allocation overhead. Signed-off-by: Gang He --- fs/ocfs2/extent_map.c | 67 +++ fs/ocfs2/extent_map.h | 3 +++ 2 files changed, 70 insertions(+) diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c index e4719e0..98bf325 100644 --- a/fs/ocfs2/extent_map.c +++ b/fs/ocfs2/extent_map.c @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, return ret; } +/* Is IO overwriting allocated blocks? */ +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, + int wait) +{ + int ret = 0, is_last; + u32 mapping_end, cpos; + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); + struct buffer_head *di_bh = NULL; + struct ocfs2_extent_rec rec; + + if (wait) + ret = ocfs2_inode_lock(inode, &di_bh, 0); + else + ret = ocfs2_try_inode_lock(inode, &di_bh, 0); + if (ret) + goto out; + + if (wait) + down_read(&OCFS2_I(inode)->ip_alloc_sem); + else { + if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) { + ret = -EAGAIN; + goto out_unlock1; + } + } + + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && + ((map_start + map_len) <= i_size_read(inode))) + goto out_unlock2; + + cpos = map_start >> osb->s_clustersize_bits; + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, + map_start + map_len); + is_last = 0; + while (cpos < mapping_end && !is_last) { + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, +NULL, &rec, &is_last); + if (ret) { + mlog_errno(ret); + goto out_unlock2; + } + + if (rec.e_blkno == 0ULL) + break; + + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) + break; + + cpos = le32_to_cpu(rec.e_cpos) + + le16_to_cpu(rec.e_leaf_clusters); + } + + if (cpos < mapping_end) + ret = 1; + +out_unlock2: + brelse(di_bh); + + up_read(&OCFS2_I(inode)->ip_alloc_sem); + +out_unlock1: + ocfs2_inode_unlock(inode, 0); + +out: + return (ret ? 0 : 1); +} + int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int whence) { struct inode *inode = file->f_mapping->host; diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h index 67ea57d..fd9e86a 100644 --- a/fs/ocfs2/extent_map.h +++ b/fs/ocfs2/extent_map.h @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 v_blkno, u64 *p_blkno, int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 map_start, u64 map_len); +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len, + int wait); + int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int origin); int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, -- 1.8.5.6 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel