Re: [Ocfs2-devel] [patch 10/11] ocfs2: add ocfs2_overwrite_io()
Hi Alex, Sorry for replaying your mail so late. >>> > Hi Gang, > > On 2017/12/1 6:24, a...@linux-foundation.org wrote: >> From: Gang He>> Subject: ocfs2: add ocfs2_overwrite_io() >> >> Add ocfs2_overwrite_io(), which is used to judge if overwrite allocated >> blocks, otherwise, the write will bring extra block allocation overhead. >> >> [g...@suse.com: v2] >> Link: > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_151194 > 4612-2D9629-2D3-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICAg=RoP1YumCXCgaWHvlZYR8P > Zh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=C4AZ7lGI2- > gfFtbla8puNvYIA3a9Cr_XOH6oyx94oGo=QhxdPyP9gdEJZTa2UhuSk0x7ENXyGXUOIBV_LyP8oZ > w= >> Link: > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_151177 > 5987-2D841-2D3-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICAg=RoP1YumCXCgaWHvlZYR8PZ > h8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=C4AZ7lGI2-g > fFtbla8puNvYIA3a9Cr_XOH6oyx94oGo=T9Ljku-nipBqCaeXWukY2hAwcHQL95gtlU7pS3l1fgs > = >> Signed-off-by: Gang He >> Cc: Mark Fasheh >> Cc: Joel Becker >> Cc: Junxiao Bi >> Cc: Joseph Qi >> Cc: Changwei Ge >> Signed-off-by: Andrew Morton >> --- >> >> fs/ocfs2/extent_map.c | 41 >> fs/ocfs2/extent_map.h |3 ++ >> 2 files changed, 44 insertions(+) >> >> diff -puN fs/ocfs2/extent_map.c~ocfs2-add-ocfs2_overwrite_io-function > fs/ocfs2/extent_map.c >> --- a/fs/ocfs2/extent_map.c~ocfs2-add-ocfs2_overwrite_io-function >> +++ a/fs/ocfs2/extent_map.c >> @@ -832,6 +832,47 @@ out: >> return ret; >> } >> >> +/* Is IO overwriting allocated blocks? */ >> +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh, >> + u64 map_start, u64 map_len) >> +{ >> +int ret = 0, is_last; >> +u32 mapping_end, cpos; >> +struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> +struct ocfs2_extent_rec rec; >> + >> +if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && >> + ((map_start + map_len) <= i_size_read(inode))) >> +goto out; >> + > Should we return -EAGAIN directly when the condition > ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && ((map_start + > map_len) > i_size_read(inode))) > is true? Yes, here the code can be improved like as below. e.g. if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) { if (ocfs2_size_fits_inline_data(di_bh, map_start + map_len)) goto out; else return -EAGAIN; } Thanks Gang > > Thanks, > Alex >> +cpos = map_start >> osb->s_clustersize_bits; >> +mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, >> + map_start + map_len); >> +is_last = 0; >> +while (cpos < mapping_end && !is_last) { >> +ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, >> + NULL, , _last); >> +if (ret) { >> +mlog_errno(ret); >> +goto out; >> +} >> + >> +if (rec.e_blkno == 0ULL) >> +break; >> + >> +if (rec.e_flags & OCFS2_EXT_REFCOUNTED) >> +break; >> + >> +cpos = le32_to_cpu(rec.e_cpos) + >> +le16_to_cpu(rec.e_leaf_clusters); >> +} >> + >> +if (cpos < mapping_end) >> +ret = -EAGAIN; >> +out: >> +return ret; >> +} >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) >> { >> struct inode *inode = file->f_mapping->host; >> diff -puN fs/ocfs2/extent_map.h~ocfs2-add-ocfs2_overwrite_io-function > fs/ocfs2/extent_map.h >> --- a/fs/ocfs2/extent_map.h~ocfs2-add-ocfs2_overwrite_io-function >> +++ a/fs/ocfs2/extent_map.h >> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct i >> int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >> u64 map_start, u64 map_len); >> >> +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh, >> + u64 map_start, u64 map_len); >> + >> int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); >> >> int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, >> _ >> >> ___ >> Ocfs2-devel mailing list >> Ocfs2-devel@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> >> . >> ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH] ocfs2: don't merge rightmost extent block if it was locked
A crash issue was reported by John. The call trace follows: ocfs2_split_extent+0x1ad3/0x1b40 [ocfs2] ocfs2_change_extent_flag+0x33a/0x470 [ocfs2] ocfs2_mark_extent_written+0x172/0x220 [ocfs2] ocfs2_dio_end_io+0x62d/0x910 [ocfs2] dio_complete+0x19a/0x1a0 do_blockdev_direct_IO+0x19dd/0x1eb0 __blockdev_direct_IO+0x43/0x50 ocfs2_direct_IO+0x8f/0xa0 [ocfs2] generic_file_direct_write+0xb2/0x170 __generic_file_write_iter+0xc3/0x1b0 ocfs2_file_write_iter+0x4bb/0xca0 [ocfs2] __vfs_write+0xae/0xf0 vfs_write+0xb8/0x1b0 SyS_write+0x4f/0xb0 system_call_fastpath+0x16/0x75 The BUG code told that extent tree wants to grow but no metadata was reserved ahead of time. From my investigation into this issue, the root cause it that although enough metadata is reserved, rightmost extent is merged into left one due to a certain times of marking extent written. Because during marking extent written, we got many physically continuous extents. At last, an empty extent showed up and the rightmost path is removed from extent tree. In order to solve this issue, introduce a member named ::et_lock for extent tree. When ocfs2_lock_allocators is invoked and we indeed need to reserve metadata, set ::et_lock so that the rightmost path won't be removed during marking extents written. Also, this patch address the issue John reported that ::dw_zero_count is not calculated properly. After applying this patch, the issue John reported was gone. Thanks to the reproducer provided by John. And this patch has passed ocfs2-test suite running by New H3C Group. Reported-by: John LightseySigned-off-by: Changwei Ge --- fs/ocfs2/alloc.c| 4 +++- fs/ocfs2/alloc.h| 1 + fs/ocfs2/aops.c | 8 +--- fs/ocfs2/suballoc.c | 3 +++ 4 files changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index ab5105f..160e393 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -444,6 +444,7 @@ static void __ocfs2_init_extent_tree(struct ocfs2_extent_tree *et, et->et_ops = ops; et->et_root_bh = bh; et->et_ci = ci; + et->et_lock = 0; et->et_root_journal_access = access; if (!obj) obj = (void *)bh->b_data; @@ -3606,7 +3607,8 @@ static int ocfs2_merge_rec_left(struct ocfs2_path *right_path, * it and we need to delete the right extent block. */ if (le16_to_cpu(right_rec->e_leaf_clusters) == 0 && - le16_to_cpu(el->l_next_free_rec) == 1) { + le16_to_cpu(el->l_next_free_rec) == 1 && + !et->et_lock) { /* extend credit for ocfs2_remove_rightmost_path */ ret = ocfs2_extend_rotate_transaction(handle, 0, handle->h_buffer_credits, diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h index 27b75cf..898671d 100644 --- a/fs/ocfs2/alloc.h +++ b/fs/ocfs2/alloc.h @@ -61,6 +61,7 @@ struct ocfs2_extent_tree { ocfs2_journal_access_func et_root_journal_access; void*et_object; unsigned intet_max_leaf_clusters; + int et_lock; }; /* diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index d151632..c72ce60 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -797,6 +797,7 @@ struct ocfs2_write_ctxt { struct ocfs2_cached_dealloc_ctxt w_dealloc; struct list_headw_unwritten_list; + unsigned intw_unwritten_count; }; void ocfs2_unlock_and_free_pages(struct page **pages, int num_pages) @@ -1386,6 +1387,7 @@ static int ocfs2_unwritten_check(struct inode *inode, desc->c_clear_unwritten = 0; list_add_tail(>ue_ip_node, >ip_unwritten_list); list_add_tail(>ue_node, >w_unwritten_list); + wc->w_unwritten_count++; new = NULL; unlock: spin_unlock(>ip_lock); @@ -1762,8 +1764,8 @@ int ocfs2_write_begin_nolock(struct address_space *mapping, */ ocfs2_init_dinode_extent_tree(, INODE_CACHE(inode), wc->w_di_bh); - ret = ocfs2_lock_allocators(inode, , - clusters_to_alloc, extents_to_split, + ret = ocfs2_lock_allocators(inode, , clusters_to_alloc, + 2*extents_to_split, _ac, _ac); if (ret) { mlog_errno(ret); @@ -2256,7 +2258,7 @@ static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock, ue->ue_phys = desc->c_phys; list_splice_tail_init(>w_unwritten_list, >dw_zero_list); - dwc->dw_zero_count++; + dwc->dw_zero_count += wc->w_unwritten_count;
Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a potential deadlock in dlm_reset_mleres_owner()
Hi Joseph and Changwei, On 2017/12/22 11:17, Joseph Qi wrote: > > > On 17/12/22 10:34, Changwei Ge wrote: >> On 2017/12/21 14:36, alex chen wrote: >>> Hi Joseph, >>> >>> On 2017/12/21 9:30, Joseph Qi wrote: Hi Alex, On 17/12/21 08:55, alex chen wrote: > In dlm_reset_mleres_owner(), we will lock > dlm_lock_resource->spinlock after locking dlm_ctxt->master_lock, > which breaks the spinlock lock ordering: > dlm_domain_lock > struct dlm_ctxt->spinlock > struct dlm_lock_resource->spinlock > struct dlm_ctxt->master_lock > > Fix it by unlocking dlm_ctxt->master_lock before locking > dlm_lock_resource->spinlock and restarting to clean master list. > > Signed-off-by: Alex Chen> Reviewed-by: Jun Piao > --- > fs/ocfs2/dlm/dlmmaster.c | 14 ++ > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c > index 3e04279..d83ccdc 100644 > --- a/fs/ocfs2/dlm/dlmmaster.c > +++ b/fs/ocfs2/dlm/dlmmaster.c > @@ -3287,16 +3287,22 @@ static struct dlm_lock_resource > *dlm_reset_mleres_owner(struct dlm_ctxt *dlm, > { > struct dlm_lock_resource *res; > > + assert_spin_locked(>spinlock); > + assert_spin_locked(>master_lock); > + > /* Find the lockres associated to the mle and set its owner to > UNK */ > - res = __dlm_lookup_lockres(dlm, mle->mname, mle->mnamelen, > + res = __dlm_lookup_lockres_full(dlm, mle->mname, mle->mnamelen, > mle->mnamehash); > if (res) { > spin_unlock(>master_lock); > > - /* move lockres onto recovery list */ > spin_lock(>spinlock); > - dlm_set_lockres_owner(dlm, res, DLM_LOCK_RES_OWNER_UNKNOWN); > - dlm_move_lockres_to_recovery_list(dlm, res); > + if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) { > + /* move lockres onto recovery list */ > + dlm_set_lockres_owner(dlm, res, > DLM_LOCK_RES_OWNER_UNKNOWN); > + dlm_move_lockres_to_recovery_list(dlm, res); > + } > + I don't think this change is lock re-ordering *only*. It definitely changes the logic of resetting mle resource owner. Why do you detach mle from heartbeat if lock resource is in the process of dropping its mastery reference? And why we have to restart in this case? >>> I think if the lock resource is being purge we don't need to set its owner >>> to UNKNOWN and >>> it is the same as the original logic. We should drop the master lock if we >>> want to judge >>> if the state of the lock resource is DLM_LOCK_RES_DROPPING_REF. Once we >>> drop the master lock >>> we should restart to clean master list. >>> Here the mle is not useful and will be released, so we detach it from >>> heartbeat. >>> In fact, the mle has been detached in dlm_clean_migration_mle(). >> Hi Alex, >> >> Perhaps, you can just judge if lock resource is marked >> DLM_LOCK_RES_DROPPING_REF and if so directly >> return NULL with ::master_lock locked :) >> Umm... We can't do this with unlocking master lock first and then > re-taking it. It breaks the logic. > What my concern is the behavior change. e.g. Currently for lock resource > in the process of dropping mastery reference, we just ignore it and > continue with the next. But after this patch, we have to restart at the > beginning. Before this patch, we can continue when we find the lock resource is marked DLM_LOCK_RES_DROPPING_REF, that is because we use a wrong spin lock ordering. Once we drop the master lock we should restart to iterate over the master list, otherwise, we may miss some mles which are inserted during we are dropping the master lock. In this patch, we may increase the times to restart, but we can not avoid it for solving the deadlock. Thanks, Alex > >> Thanks, >> Changwei >> >>> >>> Thanks, >>> Alex Thanks, Joseph > spin_unlock(>spinlock); > dlm_lockres_put(res); > . >>> >>> >>> ___ >>> Ocfs2-devel mailing list >>> Ocfs2-devel@oss.oracle.com >>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >>> >> > > . > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a potential deadlock in dlm_reset_mleres_owner()
On 17/12/22 10:34, Changwei Ge wrote: > On 2017/12/21 14:36, alex chen wrote: >> Hi Joseph, >> >> On 2017/12/21 9:30, Joseph Qi wrote: >>> Hi Alex, >>> >>> On 17/12/21 08:55, alex chen wrote: In dlm_reset_mleres_owner(), we will lock dlm_lock_resource->spinlock after locking dlm_ctxt->master_lock, which breaks the spinlock lock ordering: dlm_domain_lock struct dlm_ctxt->spinlock struct dlm_lock_resource->spinlock struct dlm_ctxt->master_lock Fix it by unlocking dlm_ctxt->master_lock before locking dlm_lock_resource->spinlock and restarting to clean master list. Signed-off-by: Alex ChenReviewed-by: Jun Piao --- fs/ocfs2/dlm/dlmmaster.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 3e04279..d83ccdc 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -3287,16 +3287,22 @@ static struct dlm_lock_resource *dlm_reset_mleres_owner(struct dlm_ctxt *dlm, { struct dlm_lock_resource *res; + assert_spin_locked(>spinlock); + assert_spin_locked(>master_lock); + /* Find the lockres associated to the mle and set its owner to UNK */ - res = __dlm_lookup_lockres(dlm, mle->mname, mle->mnamelen, + res = __dlm_lookup_lockres_full(dlm, mle->mname, mle->mnamelen, mle->mnamehash); if (res) { spin_unlock(>master_lock); - /* move lockres onto recovery list */ spin_lock(>spinlock); - dlm_set_lockres_owner(dlm, res, DLM_LOCK_RES_OWNER_UNKNOWN); - dlm_move_lockres_to_recovery_list(dlm, res); + if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) { + /* move lockres onto recovery list */ + dlm_set_lockres_owner(dlm, res, DLM_LOCK_RES_OWNER_UNKNOWN); + dlm_move_lockres_to_recovery_list(dlm, res); + } + >>> I don't think this change is lock re-ordering *only*. It definitely >>> changes the logic of resetting mle resource owner. >>> Why do you detach mle from heartbeat if lock resource is in the process >>> of dropping its mastery reference? And why we have to restart in this >>> case? >> I think if the lock resource is being purge we don't need to set its owner >> to UNKNOWN and >> it is the same as the original logic. We should drop the master lock if we >> want to judge >> if the state of the lock resource is DLM_LOCK_RES_DROPPING_REF. Once we drop >> the master lock >> we should restart to clean master list. >> Here the mle is not useful and will be released, so we detach it from >> heartbeat. >> In fact, the mle has been detached in dlm_clean_migration_mle(). > Hi Alex, > > Perhaps, you can just judge if lock resource is marked > DLM_LOCK_RES_DROPPING_REF and if so directly > return NULL with ::master_lock locked :) >Umm... We can't do this with unlocking master lock first and then re-taking it. It breaks the logic. What my concern is the behavior change. e.g. Currently for lock resource in the process of dropping mastery reference, we just ignore it and continue with the next. But after this patch, we have to restart at the beginning. > Thanks, > Changwei > >> >> Thanks, >> Alex >>> >>> Thanks, >>> Joseph >>> spin_unlock(>spinlock); dlm_lockres_put(res); >>> >>> . >>> >> >> >> ___ >> Ocfs2-devel mailing list >> Ocfs2-devel@oss.oracle.com >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel >> > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a potential deadlock in dlm_reset_mleres_owner()
On 2017/12/21 14:36, alex chen wrote: > Hi Joseph, > > On 2017/12/21 9:30, Joseph Qi wrote: >> Hi Alex, >> >> On 17/12/21 08:55, alex chen wrote: >>> In dlm_reset_mleres_owner(), we will lock >>> dlm_lock_resource->spinlock after locking dlm_ctxt->master_lock, >>> which breaks the spinlock lock ordering: >>> dlm_domain_lock >>> struct dlm_ctxt->spinlock >>> struct dlm_lock_resource->spinlock >>> struct dlm_ctxt->master_lock >>> >>> Fix it by unlocking dlm_ctxt->master_lock before locking >>> dlm_lock_resource->spinlock and restarting to clean master list. >>> >>> Signed-off-by: Alex Chen>>> Reviewed-by: Jun Piao >>> --- >>> fs/ocfs2/dlm/dlmmaster.c | 14 ++ >>> 1 file changed, 10 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c >>> index 3e04279..d83ccdc 100644 >>> --- a/fs/ocfs2/dlm/dlmmaster.c >>> +++ b/fs/ocfs2/dlm/dlmmaster.c >>> @@ -3287,16 +3287,22 @@ static struct dlm_lock_resource >>> *dlm_reset_mleres_owner(struct dlm_ctxt *dlm, >>> { >>> struct dlm_lock_resource *res; >>> >>> + assert_spin_locked(>spinlock); >>> + assert_spin_locked(>master_lock); >>> + >>> /* Find the lockres associated to the mle and set its owner to UNK */ >>> - res = __dlm_lookup_lockres(dlm, mle->mname, mle->mnamelen, >>> + res = __dlm_lookup_lockres_full(dlm, mle->mname, mle->mnamelen, >>>mle->mnamehash); >>> if (res) { >>> spin_unlock(>master_lock); >>> >>> - /* move lockres onto recovery list */ >>> spin_lock(>spinlock); >>> - dlm_set_lockres_owner(dlm, res, DLM_LOCK_RES_OWNER_UNKNOWN); >>> - dlm_move_lockres_to_recovery_list(dlm, res); >>> + if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) { >>> + /* move lockres onto recovery list */ >>> + dlm_set_lockres_owner(dlm, res, >>> DLM_LOCK_RES_OWNER_UNKNOWN); >>> + dlm_move_lockres_to_recovery_list(dlm, res); >>> + } >>> + >> I don't think this change is lock re-ordering *only*. It definitely >> changes the logic of resetting mle resource owner. >> Why do you detach mle from heartbeat if lock resource is in the process >> of dropping its mastery reference? And why we have to restart in this >> case? > I think if the lock resource is being purge we don't need to set its owner to > UNKNOWN and > it is the same as the original logic. We should drop the master lock if we > want to judge > if the state of the lock resource is DLM_LOCK_RES_DROPPING_REF. Once we drop > the master lock > we should restart to clean master list. > Here the mle is not useful and will be released, so we detach it from > heartbeat. > In fact, the mle has been detached in dlm_clean_migration_mle(). Hi Alex, Perhaps, you can just judge if lock resource is marked DLM_LOCK_RES_DROPPING_REF and if so directly return NULL with ::master_lock locked :) Thanks, Changwei > > Thanks, > Alex >> >> Thanks, >> Joseph >> >>> spin_unlock(>spinlock); >>> dlm_lockres_put(res); >>> >> >> . >> > > > ___ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel