Re: [Ocfs2-devel] [PATCH] ocfs2: don't use iocb when EIOCBQUEUED returns

2018-05-08 Thread Joseph Qi
Hi Changwei,

I agree with Gang that currently we still haven't figured out why iocb
was freed. Though you fix won't bring any side effect, it looks like a
workaround.
That means, the freed iocb may still have risk in other place.

Thanks,
Joseph

On 18/5/8 23:23, Changwei Ge wrote:
> Hi Gang,
> 
> I don't think this patch is a workaround trick.
> 
> We do face the risk using freed iocb although it is actually indeed hard 
> to encounter, it still exists.
> 
> So I propose to fix it making ocfs2 more reliable.
> 
> 
> Moreover, this patch has been kept in -mm tree for one month. Can anyone 
> help review it with ack or nack? So I can do some improvement for it. :-)
> 
> 
> Thanks,
> 
> Changwei
> 
> 
> On 04/11/2018 10:51 AM, Gang He wrote:
>> Hi Changwei,
>>
>> The code change just works around the problem, but theoretically the IOCB 
>> object should not be freed before which is handled.
>> Anyway, if we can find the root cause behind via some way (e.g. inject delay 
>> in some place), the result is more perfect.
>>
>>
>> Thanks
>> Gang
>>
>>
>>> Hi Jun,
>>>
>>> On 2018/4/11 8:52, piaojun wrote:
 Hi Changwei,

 It looks like a code bug, and 'iocb' should not be freed at this place.
 Could this BUG reproduced easily?
>>> Actually, it's not easy to be reproduced since IO is much slower than CPU
>>> executing instructions. But the logic here is broken, we'd better fix this.
>>>
>>> Thanks,
>>> Changwei
>>>
 thanks,
 Jun

 On 2018/4/10 20:00, Changwei Ge wrote:
> When -EIOCBQUEUED returns, it means that aio_complete() will be called
> from dio_complete(), which is an asynchronous progress against write_iter.
> Generally, IO is a very slow progress than executing instruction, but we
> still can't take the risk to access a freed iocb.
>
> And we do face a BUG crash issue.
> >From crash tool, iocb is obviously freed already.
> crash> struct -x kiocb 881a350f5900
> struct kiocb {
> ki_filp = 0x881a350f5a80,
> ki_pos = 0x0,
> ki_complete = 0x0,
> private = 0x0,
> ki_flags = 0x0
> }
>
> And the backtrace shows:
> ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2]
> ? ocfs2_check_range_for_refcount+0x150/0x150 [ocfs2]
> aio_run_iocb+0x229/0x2f0
> ? try_to_wake_up+0x380/0x380
> do_io_submit+0x291/0x540
> ? syscall_trace_leave+0xad/0x130
> SyS_io_submit+0x10/0x20
> system_call_fastpath+0x16/0x75
>
> Signed-off-by: Changwei Ge 
> ---
>fs/ocfs2/file.c | 4 ++--
>1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 5d1784a..1393ff2 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2343,7 +2343,7 @@ static ssize_t ocfs2_file_write_iter(struct kiocb
>>> *iocb,
>
>   written = __generic_file_write_iter(iocb, from);
>   /* buffered aio wouldn't have proper lock coverage today */
> - BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT));
> + BUG_ON(written == -EIOCBQUEUED && !direct_io);
>
>   /*
>* deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a 
> ocfs2_dio_end_io
> @@ -2463,7 +2463,7 @@ static ssize_t ocfs2_file_read_iter(struct kiocb 
> *iocb,
>   trace_generic_file_aio_read_ret(ret);
>
>   /* buffered aio wouldn't have proper lock coverage today */
> - BUG_ON(ret == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT));
> + BUG_ON(ret == -EIOCBQUEUED && !direct_io);
>
>   /* see ocfs2_file_write_iter */
>   if (ret == -EIOCBQUEUED || !ocfs2_iocb_is_rw_locked(iocb)) {
>
>>> ___
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: Correct the offset comments of the structure ocfs2_dir_block_trailer.

2018-05-08 Thread Joseph Qi
Umm... We always explicitly comment out the shift of 0x10, 0x20, ...,
IMO, we'd better move the comments to the correct place instead of
change it to something like 0x28.

Thanks,
Joseph

On 18/5/8 17:46, Guozhonghua wrote:
> Correct the offset comments of the structure ocfs2_dir_block_trailer.
> 
> Signed-off-by: guozhonghua 
> ---
>  fs/ocfs2/ocfs2_fs.h |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/ocfs2_fs.h b/fs/ocfs2/ocfs2_fs.h index 5bb4a89..14c60b0 
> 100644
> --- a/fs/ocfs2/ocfs2_fs.h
> +++ b/fs/ocfs2/ocfs2_fs.h
> @@ -808,10 +808,10 @@ struct ocfs2_dir_block_trailer {
>  /*10*/   __u8db_signature[8];/* Signature for 
> verification */
>   __le64  db_reserved2;
>   __le64  db_free_next;   /* Next block in list (unused) 
> */
> -/*20*/   __le64  db_blkno;   /* Offset on disk, in 
> blocks */
> +/*28*/   __le64  db_blkno;   /* Offset on disk, in 
> blocks */
>   __le64  db_parent_dinode;   /* dinode which owns me, in
>  blocks */
> -/*30*/   struct ocfs2_block_check db_check;  /* Error checking */
> +/*38*/   struct ocfs2_block_check db_check;  /* Error checking */
>  /*40*/
>  };
>  
> --
> 1.7.9.5
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2: clean up redundant function declarations

2018-04-24 Thread Joseph Qi


On 18/4/24 21:10, Jia Guo wrote:
> The function ocfs2_extend_allocation has been deleted, clean up its
> declaration.
> 
Also change the static function name from __ocfs2_extend_allocation()
to ocfs2_extend_allocation() to be consistent with the corresponding
trace events as well as comments for ocfs2_lock_allocators().

Please add above to description as well. 

> Fixes: 964f14a0d350 ("ocfs2: clean up some dead code")
> 
Redundant blank line here.

> Signed-off-by: Jia Guo <guoji...@huawei.com>
> Reviewed-by: Jun Piao <piao...@huawei.com>

> ---
>  fs/ocfs2/file.c | 6 +++---
>  fs/ocfs2/file.h | 2 --
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 6ee94bc..2dfec6b 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -563,7 +563,7 @@ int ocfs2_add_inode_data(struct ocfs2_super *osb,
>   return ret;
>  }
> 
> -static int __ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
> +static int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
>u32 clusters_to_add, int mark_unwritten)
Parameters should be aligned with '('.
For example:
static int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
   u32 clusters_to_add, int mark_unwritten)

>  {
>   int status = 0;
> @@ -1035,7 +1035,7 @@ int ocfs2_extend_no_holes(struct inode *inode, struct 
> buffer_head *di_bh,
>   clusters_to_add -= oi->ip_clusters;
> 
>   if (clusters_to_add) {
> - ret = __ocfs2_extend_allocation(inode, oi->ip_clusters,
> + ret = ocfs2_extend_allocation(inode, oi->ip_clusters,
>   clusters_to_add, 0);
Also here.

>   if (ret) {
>   mlog_errno(ret);
> @@ -1493,7 +1493,7 @@ static int ocfs2_allocate_unwritten_extents(struct 
> inode *inode,
>   goto next;
>   }
> 
> - ret = __ocfs2_extend_allocation(inode, cpos, alloc_size, 1);
> + ret = ocfs2_extend_allocation(inode, cpos, alloc_size, 1);
>   if (ret) {
>   if (ret != -ENOSPC)
>   mlog_errno(ret);
> diff --git a/fs/ocfs2/file.h b/fs/ocfs2/file.h
> index 1fdc983..7eb7f03 100644
> --- a/fs/ocfs2/file.h
> +++ b/fs/ocfs2/file.h
> @@ -65,8 +65,6 @@ int ocfs2_extend_no_holes(struct inode *inode, struct 
> buffer_head *di_bh,
> u64 new_i_size, u64 zero_to);
>  int ocfs2_zero_extend(struct inode *inode, struct buffer_head *di_bh,
> loff_t zero_to);
> -int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
> - u32 clusters_to_add, int mark_unwritten);
>  int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
>  int ocfs2_getattr(const struct path *path, struct kstat *stat,
> u32 request_mask, unsigned int flags);
> 

With above concern addressed,
Acked-by: Joseph Qi <jiangqi...@gmail.com>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: clean up redundant function declarations

2018-04-23 Thread Joseph Qi


On 18/4/21 19:41, Jia Guo wrote:
> The function ocfs2_extend_allocation has been deleted, clean up its
> declaration.
> 
I suggest we also change the static function name from
__ocfs2_extend_allocation() to ocfs2_extend_allocation() to be
consistent with the corresponding trace events as well as comments for
ocfs2_lock_allocators().

Thanks,
Joseph

> Fixes: 964f14a0d350 ("ocfs2: clean up some dead code")
> 
> Signed-off-by: Jia Guo 
> Reviewed-by: Jun Piao > ---
>  fs/ocfs2/file.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/ocfs2/file.h b/fs/ocfs2/file.h
> index 1fdc983..7eb7f03 100644
> --- a/fs/ocfs2/file.h
> +++ b/fs/ocfs2/file.h
> @@ -65,8 +65,6 @@ int ocfs2_extend_no_holes(struct inode *inode, struct 
> buffer_head *di_bh,
> u64 new_i_size, u64 zero_to);
>  int ocfs2_zero_extend(struct inode *inode, struct buffer_head *di_bh,
> loff_t zero_to);
> -int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
> - u32 clusters_to_add, int mark_unwritten);
>  int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
>  int ocfs2_getattr(const struct path *path, struct kstat *stat,
> u32 request_mask, unsigned int flags);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir

2018-04-11 Thread Joseph Qi


On 18/4/12 03:31, Ashish Samant wrote:
> While reflinking an inode, we create a new inode in orphan directory, then
> take EX lock on it, reflink the original inode to orphan inode and release
> EX lock. Once the lock is released another node could request it in EX mode
> from ocfs2_recover_orphans() which causes downconvert of the lock, on this
> node, to NL mode.
> 
> Later we attempt to initialize security acl for the orphan inode and move
> it to the reflink destination. However, while doing this we dont take EX
> lock on the inode. This could potentially cause problems because we could
> be starting transaction, accessing journal and modifying metadata of the
> inode while holding NL lock and with another node holding EX lock on the
> inode.
> 
> Fix this by taking orphan inode cluster lock in EX mode before
> initializing security and moving orphan inode to reflink destination.
> Use the __tracker variant while taking inode lock to avoid recursive
> locking in the ocfs2_init_security_and_acl() call chain.
> 
> Signed-off-by: Ashish Samant <ashish.sam...@oracle.com>
> 
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> V1->V2:
> Modify commit message to better reflect the problem in upstream kernel.
> ---
>  fs/ocfs2/refcounttree.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
> index ab156e3..1b1283f 100644
> --- a/fs/ocfs2/refcounttree.c
> +++ b/fs/ocfs2/refcounttree.c
> @@ -4250,10 +4250,11 @@ static int __ocfs2_reflink(struct dentry *old_dentry,
>  static int ocfs2_reflink(struct dentry *old_dentry, struct inode *dir,
>struct dentry *new_dentry, bool preserve)
>  {
> - int error;
> + int error, had_lock;
>   struct inode *inode = d_inode(old_dentry);
>   struct buffer_head *old_bh = NULL;
>   struct inode *new_orphan_inode = NULL;
> + struct ocfs2_lock_holder oh;
>  
>   if (!ocfs2_refcount_tree(OCFS2_SB(inode->i_sb)))
>   return -EOPNOTSUPP;
> @@ -4295,6 +4296,14 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   goto out;
>   }
>  
> + had_lock = ocfs2_inode_lock_tracker(new_orphan_inode, NULL, 1,
> + );
> + if (had_lock < 0) {
> + error = had_lock;
> + mlog_errno(error);
> + goto out;
> + }
> +
>   /* If the security isn't preserved, we need to re-initialize them. */
>   if (!preserve) {
>   error = ocfs2_init_security_and_acl(dir, new_orphan_inode,
> @@ -4302,14 +4311,15 @@ static int ocfs2_reflink(struct dentry *old_dentry, 
> struct inode *dir,
>   if (error)
>   mlog_errno(error);
>   }
> -out:
>   if (!error) {
>   error = ocfs2_mv_orphaned_inode_to_new(dir, new_orphan_inode,
>  new_dentry);
>   if (error)
>   mlog_errno(error);
>   }
> + ocfs2_inode_unlock_tracker(new_orphan_inode, 1, , had_lock);
>  
> +out:
>   if (new_orphan_inode) {
>   /*
>* We need to open_unlock the inode no matter whether we
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: clean up unused variable in dlm_process_recovery_data

2018-04-03 Thread Joseph Qi


On 18/4/3 13:42, Changwei Ge wrote:
> Signed-off-by: Changwei Ge <ge.chang...@h3c.com>

Acked-by: Joseph Qi <jiangqi...@gmail.com>
> ---
>  fs/ocfs2/dlm/dlmrecovery.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index ec8f758..be6b067 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1807,7 +1807,6 @@ static int dlm_process_recovery_data(struct dlm_ctxt 
> *dlm,
>   int i, j, bad;
>   struct dlm_lock *lock;
>   u8 from = O2NM_MAX_NODES;
> - unsigned int added = 0;
>   __be64 c;
>  
>   mlog(0, "running %d locks for this lockres\n", mres->num_locks);
> @@ -1823,7 +1822,6 @@ static int dlm_process_recovery_data(struct dlm_ctxt 
> *dlm,
>   spin_lock(>spinlock);
>   dlm_lockres_set_refmap_bit(dlm, res, from);
>   spin_unlock(>spinlock);
> - added++;
>   break;
>   }
>   BUG_ON(ml->highest_blocked != LKM_IVMODE);
> @@ -1911,7 +1909,6 @@ static int dlm_process_recovery_data(struct dlm_ctxt 
> *dlm,
>   /* do not alter lock refcount.  switching lists. */
>   list_move_tail(>list, queue);
>   spin_unlock(>spinlock);
> - added++;
>  
>   mlog(0, "just reordered a local lock!\n");
>   continue;
> @@ -2037,7 +2034,6 @@ static int dlm_process_recovery_data(struct dlm_ctxt 
> *dlm,
>"setting refmap bit\n", dlm->name,
>res->lockname.len, res->lockname.name, ml->node);
>   dlm_lockres_set_refmap_bit(dlm, res, ml->node);
> - added++;
>   }
>   spin_unlock(>spinlock);
>   }
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: wait for dlm recovery done when migrating all lock resources

2018-04-02 Thread Joseph Qi


On 18/3/15 20:59, piaojun wrote:
> Wait for dlm recovery done when migrating all lock resources in case that
> new lock resource left after leaving dlm domain. And the left lock
> resource will cause other nodes BUG.
> 
>   NodeA   NodeBNodeC
> 
> umount:
>   dlm_unregister_domain()
> dlm_migrate_all_locks()
> 
>  NodeB down
> 
> do recovery for NodeB
> and collect a new lockres
> form other live nodes:
> 
>   dlm_do_recovery
> dlm_remaster_locks
>   dlm_request_all_locks:
> 
>   dlm_mig_lockres_handler
> dlm_new_lockres
>   __dlm_insert_lockres
> 
> at last NodeA become the
> master of the new lockres
> and leave domain:
>   dlm_leave_domain()
> 
>   mount:
> dlm_join_domain()
> 
>   touch file and request
>   for the owner of the new
>   lockres, but all the
>   other nodes said 'NO',
>   so NodeC decide to be
>   the owner, and send do
>   assert msg to other
>   nodes:
>   dlmlock()
> dlm_get_lock_resource()
>   dlm_do_assert_master()
> 
>   other nodes receive the msg
>   and found two masters exist.
>   at last cause BUG in
>   dlm_assert_master_handler()
>   -->BUG();
> 
> Fixes: bc9838c4d44a ("dlm: allow dlm do recovery during shutdown")
> 
Redundant blank line here.
But I've found Andrew has already fix this when adding to -mm tree.

Acked-by: Joseph Qi <jiangqi...@gmail.com>

> Signed-off-by: Jun Piao <piao...@huawei.com>
> Reviewed-by: Alex Chen <alex.c...@huawei.com>
> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir

2018-04-02 Thread Joseph Qi


On 18/3/31 01:42, Ashish Samant wrote:
> While reflinking an inode, we create a new inode in orphan directory, then
> take EX lock on it, reflink the original inode to orphan inode and release
> EX lock. Once the lock is released another node might request it in PR mode
> which causes downconvert of the lock to PR mode.
> 
> Later we attempt to initialize security acl for the orphan inode and move
> it to the reflink destination. However, while doing this we dont take EX
> lock on the inode. So effectively, we are doing this and accessing the
> journal for this inode while holding PR lock. While accessing the journal,
> we make
> 
> ci->ci_last_trans = journal->j_trans_id
> 
> At this point, if there is another downconvert request on this inode from
> another node (PR->NL), we will trip on the following condition in
> ocfs2_ci_checkpointed()
> 
> BUG_ON(lockres->l_level != DLM_LOCK_EX && !checkpointed);
> 
> because we hold the lock in PR mode and journal->j_trans_id is not greater
> than ci_last_trans for the inode.
> 
> Fix this by taking orphan inode cluster lock in EX mode before
> initializing security and moving orphan inode to reflink destination.
> Use the __tracker variant while taking inode lock to avoid recursive
> locking in the ocfs2_init_security_and_acl() call chain.
> 
> Signed-off-by: Ashish Samant <ashish.sam...@oracle.com>

Looks good.
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: don't evaluate buffer head to NULL managed by caller

2018-03-29 Thread Joseph Qi


On 18/3/30 10:17, Changwei Ge wrote:
 Since we assume caller has to pass either all NULL or all non-NULL,
 here we will only put bh internal allocated. Am I missing something?
>>> Thanks for your review.
>>> Yes, we will only put bh internally allocated.
>>> If bh is reserved in advance, we will not put it and re-assign it to NULL.
>>>
>> So this branch won't have risk, right?
> Sorry... I'm not sure if I understand you correctly.
> This branch will be walked through when previous part of bhs[] faces a read 
> failure in order to put bh allocated in ocfs2_read_blocks().
> And we assume all bh should be NULL or non-NULL, if new_bh is set, the back 
> part 
> should also be put to release those buffer heads.
> 
> If I made a mistake or misunderstand you, please let me know.


I'm saying that sb_getblk() will only be called if bh hasn't been
allocated yet. That means if it fails, the bh to be put can be
guaranteed internal allocated.
Also I don't think the WARN check is necessary as this is common path
and will bring additional cpu consumption. We can make it clear at
comments of ocfs2_read_blocks() that either all NULL or non-NULL bhs
is prerequisite for the caller. And then we make sure we won't put bh
that is allocated outside.

Thanks,
Joseph

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: don't evaluate buffer head to NULL managed by caller

2018-03-29 Thread Joseph Qi


On 18/3/30 09:31, Changwei Ge wrote:
> Hi Joseph,
> 
> On 2018/3/30 9:27, Joseph Qi wrote:
>>
>>
>> On 18/3/29 10:06, Changwei Ge wrote:
>>> ocfs2_read_blocks() is used to read several blocks from disk.
>>> Currently, the input argument *bhs* can be NULL or NOT. It depends on
>>> the caller's behavior. If the function fails in reading blocks from
>>> disk, the corresponding bh will be assigned to NULL and put.
>>>
>>> Obviously, above process for non-NULL input bh is not appropriate.
>>> Because the caller doesn't even know its bhs are put and re-assigned.
>>>
>>> If buffer head is managed by caller, ocfs2_read_blocks should not
>>> evaluate it to NULL. It will cause caller accessing illegal memory,
>>> thus crash.
>>>
>>> Signed-off-by: Changwei Ge <ge.chang...@h3c.com>
>>> ---
>>>   fs/ocfs2/buffer_head_io.c | 31 +--
>>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
>>> index d9ebe11..17329b6 100644
>>> --- a/fs/ocfs2/buffer_head_io.c
>>> +++ b/fs/ocfs2/buffer_head_io.c
>>> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, 
>>> u64 block, int nr,
>>> int i, ignore_cache = 0;
>>> struct buffer_head *bh;
>>> struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
>>> +   int new_bh = 0;
>>>   
>>> trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>>>   
>>> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, 
>>> u64 block, int nr,
>>> goto bail;
>>> }
>>>   
>>> +   /* Use below trick to check if all bhs are NULL or assigned.
>>> +* Basically, we hope all bhs are consistent so that we can
>>> +* handle exception easily.
>>> +*/
>>> +   new_bh = (bhs[0] == NULL);
>>> +   for (i = 1 ; i < nr ; i++) {
>>> +   if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
>>> +   WARN(1, "Not all bhs are consistent\n");
>>> +   break;
>>> +   }
>>> +   }
>>> +
>>> ocfs2_metadata_cache_io_lock(ci);
>>> for (i = 0 ; i < nr ; i++) {
>>> if (bhs[i] == NULL) {
>>> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, 
>>> u64 block, int nr,
>>> if (!(flags & OCFS2_BH_READAHEAD)) {
>>> if (status) {
>>> /* Clear the rest of the buffers on error */
>>> -   put_bh(bh);
>>> -   bhs[i] = NULL;
>>> +   if (new_bh) {
>>> +   put_bh(bh);
>>> +   bhs[i] = NULL;
>>> +   }
>>
>> Since we assume caller has to pass either all NULL or all non-NULL,
>> here we will only put bh internal allocated. Am I missing something?
> 
> Thanks for your review.
> Yes, we will only put bh internally allocated.
> If bh is reserved in advance, we will not put it and re-assign it to NULL.
> 

So this branch won't have risk, right?

Thanks,
Joseph

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: don't evaluate buffer head to NULL managed by caller

2018-03-29 Thread Joseph Qi


On 18/3/29 10:06, Changwei Ge wrote:
> ocfs2_read_blocks() is used to read several blocks from disk.
> Currently, the input argument *bhs* can be NULL or NOT. It depends on
> the caller's behavior. If the function fails in reading blocks from
> disk, the corresponding bh will be assigned to NULL and put.
> 
> Obviously, above process for non-NULL input bh is not appropriate.
> Because the caller doesn't even know its bhs are put and re-assigned.
> 
> If buffer head is managed by caller, ocfs2_read_blocks should not
> evaluate it to NULL. It will cause caller accessing illegal memory,
> thus crash.
> 
> Signed-off-by: Changwei Ge 
> ---
>  fs/ocfs2/buffer_head_io.c | 31 +--
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ocfs2/buffer_head_io.c b/fs/ocfs2/buffer_head_io.c
> index d9ebe11..17329b6 100644
> --- a/fs/ocfs2/buffer_head_io.c
> +++ b/fs/ocfs2/buffer_head_io.c
> @@ -188,6 +188,7 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>   int i, ignore_cache = 0;
>   struct buffer_head *bh;
>   struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
> + int new_bh = 0;
>  
>   trace_ocfs2_read_blocks_begin(ci, (unsigned long long)block, nr, flags);
>  
> @@ -213,6 +214,18 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>   goto bail;
>   }
>  
> + /* Use below trick to check if all bhs are NULL or assigned.
> +  * Basically, we hope all bhs are consistent so that we can
> +  * handle exception easily.
> +  */
> + new_bh = (bhs[0] == NULL);
> + for (i = 1 ; i < nr ; i++) {
> + if ((new_bh && bhs[i]) || (!new_bh && !bhs[i])) {
> + WARN(1, "Not all bhs are consistent\n");
> + break;
> + }
> + }
> +
>   ocfs2_metadata_cache_io_lock(ci);
>   for (i = 0 ; i < nr ; i++) {
>   if (bhs[i] == NULL) {
> @@ -324,8 +337,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>   if (!(flags & OCFS2_BH_READAHEAD)) {
>   if (status) {
>   /* Clear the rest of the buffers on error */
> - put_bh(bh);
> - bhs[i] = NULL;
> + if (new_bh) {
> + put_bh(bh);
> + bhs[i] = NULL;
> + }

Since we assume caller has to pass either all NULL or all non-NULL,
here we will only put bh internal allocated. Am I missing something?

Thanks,
Joseph

>   continue;
>   }
>   /* We know this can't have changed as we hold the
> @@ -342,8 +357,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>* for this bh as it's not marked locally
>* uptodate. */
>   status = -EIO;
> - put_bh(bh);
> - bhs[i] = NULL;
> + if (new_bh) {
> + put_bh(bh);
> + bhs[i] = NULL;
> + }
>   continue;
>   }
>  
> @@ -355,8 +372,10 @@ int ocfs2_read_blocks(struct ocfs2_caching_info *ci, u64 
> block, int nr,
>   clear_buffer_needs_validate(bh);
>   status = validate(sb, bh);
>   if (status) {
> - put_bh(bh);
> - bhs[i] = NULL;
> + if (new_bh) {
> + put_bh(bh);
> + bhs[i] = NULL;
> + }
>   continue;
>   }
>   }
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio

2018-03-28 Thread Joseph Qi


On 18/3/28 15:02, piaojun wrote:
> Hi Joseph,
> 
> On 2018/3/28 12:58, Joseph Qi wrote:
>>
>>
>> On 18/3/28 11:50, piaojun wrote:
>>> We need check len for bio_add_page() to make sure the bio has been set up
>>> correctly, otherwise we may submit incorrect data to device.
>>>
>>> Signed-off-by: Jun Piao <piao...@huawei.com>
>>> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com>
>>> ---
>>>  fs/ocfs2/cluster/heartbeat.c | 11 ++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>>> index ea8c551..43ad79f 100644
>>> --- a/fs/ocfs2/cluster/heartbeat.c
>>> +++ b/fs/ocfs2/cluster/heartbeat.c
>>> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct 
>>> o2hb_region *reg,
>>>  current_page, vec_len, vec_start);
>>>
>>> len = bio_add_page(bio, page, vec_len, vec_start);
>>> -   if (len != vec_len) break;
>>> +   if (len != vec_len) {
>>> +   mlog(ML_ERROR, "Adding page[%d] to bio failed, "
>>> +"page %p, len %d, vec_len %u, vec_start %u, "
>>> +"bi_sector %llu\n", current_page, page, len,
>>> +vec_len, vec_start,
>>> +(unsigned long long)bio->bi_iter.bi_sector);
>>> +   bio_put(bio);
>>> +   bio = ERR_PTR(-EFAULT);
>>
>> IMO, EFAULT is not an appropriate error code here.
>> If __bio_add_page returns 0, some are caused by bio checking failed.
>> Also I've noticed that several other callers just use ENOMEM, so I think
>> EINVAL or ENOMEM may be better.
> 
> __bio_add_page has been deleted in patch c66a14d07c13, and I notice that
> other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
> the same kernel source.
> 

Oops... Yes, I was looking an old kernel...
EIO sounds reasonable, but I don't know why EFAULT since it means "Bad address".

Thanks,
Joseph

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio

2018-03-27 Thread Joseph Qi


On 18/3/28 11:50, piaojun wrote:
> We need check len for bio_add_page() to make sure the bio has been set up
> correctly, otherwise we may submit incorrect data to device.
> 
> Signed-off-by: Jun Piao 
> Reviewed-by: Yiwen Jiang 
> ---
>  fs/ocfs2/cluster/heartbeat.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index ea8c551..43ad79f 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region 
> *reg,
>current_page, vec_len, vec_start);
> 
>   len = bio_add_page(bio, page, vec_len, vec_start);
> - if (len != vec_len) break;
> + if (len != vec_len) {
> + mlog(ML_ERROR, "Adding page[%d] to bio failed, "
> +  "page %p, len %d, vec_len %u, vec_start %u, "
> +  "bi_sector %llu\n", current_page, page, len,
> +  vec_len, vec_start,
> +  (unsigned long long)bio->bi_iter.bi_sector);
> + bio_put(bio);
> + bio = ERR_PTR(-EFAULT);

IMO, EFAULT is not an appropriate error code here.
If __bio_add_page returns 0, some are caused by bio checking failed.
Also I've noticed that several other callers just use ENOMEM, so I think
EINVAL or ENOMEM may be better.

Thanks,
Joseph

> + return bio;
> + }
> 
>   cs += vec_len / (PAGE_SIZE/spp);
>   vec_start = 0;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2/dlm: don't handle migrate lockres if already in shutdown

2018-03-04 Thread Joseph Qi


On 18/3/3 08:45, piaojun wrote:
> We should not handle migrate lockres if we are already in
> 'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
> dlm domain. At last other nodes will get stuck into infinite loop when
> requsting lock from us.
> 
> The problem is caused by concurrency umount between nodes. Before
> receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
> migrate target. So N2 will continue sending lockres to N1 even though N1
> has left domain.
> 
> N1 N2 (owner)
>touch file
> 
> access the file,
> and get pr lock
> 
>begin leave domain and
>pick up N1 as new owner
> 
> begin leave domain and
> migrate all lockres done
> 
>begin migrate lockres to N1
> 
> end leave domain, but
> the lockres left
> unexpectedly, because
> migrate task has passed
> 
> Signed-off-by: Jun Piao <piao...@huawei.com>
> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com> ---
>  fs/ocfs2/dlm/dlmdomain.c   | 14 ++
>  fs/ocfs2/dlm/dlmdomain.h   |  1 +
>  fs/ocfs2/dlm/dlmrecovery.c |  9 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index e1fea14..3b7ec51 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -675,6 +675,20 @@ static void dlm_leave_domain(struct dlm_ctxt *dlm)
>   spin_unlock(>spinlock);
>  }
> 
> +int dlm_joined(struct dlm_ctxt *dlm)
This helper can be static inline and placed into header.

> +{
> + int ret = 0;
> +
> + spin_lock(_domain_lock);
> +
Delete blank line here.

> + if (dlm->dlm_state == DLM_CTXT_JOINED)
> + ret = 1;
> +
Also here.

Except the above concern, it looks good to me.
With they are fixed, feel free to add:

Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> + spin_unlock(_domain_lock);
> +
> + return ret;
> +}
> +
>  int dlm_shutting_down(struct dlm_ctxt *dlm)
>  {
>   int ret = 0;
> diff --git a/fs/ocfs2/dlm/dlmdomain.h b/fs/ocfs2/dlm/dlmdomain.h
> index fd6122a..2f7f60b 100644
> --- a/fs/ocfs2/dlm/dlmdomain.h
> +++ b/fs/ocfs2/dlm/dlmdomain.h
> @@ -28,6 +28,7 @@
>  extern spinlock_t dlm_domain_lock;
>  extern struct list_head dlm_domains;
> 
> +int dlm_joined(struct dlm_ctxt *dlm);
>  int dlm_shutting_down(struct dlm_ctxt *dlm);
>  void dlm_fire_domain_eviction_callbacks(struct dlm_ctxt *dlm,
>   int node_num);
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index ec8f758..505ab42 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1378,6 +1378,15 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 
> len, void *data,
>   if (!dlm_grab(dlm))
>   return -EINVAL;
> 
> + if (!dlm_joined(dlm)) {
> + mlog(ML_ERROR, "Domain %s not joined! "
> +   "lockres %.*s, master %u\n",
> +   dlm->name, mres->lockname_len,
> +   mres->lockname, mres->master);
> + dlm_put(dlm);
> + return -EINVAL;
> + }
> +
>   BUG_ON(!(mres->flags & (DLM_MRES_RECOVERY|DLM_MRES_MIGRATION)));
> 
>   real_master = mres->master;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: clean unrelated comment

2018-02-23 Thread Joseph Qi


On 18/2/23 15:30, ge.chang...@h3c.com wrote:
> From: Changwei Ge <ge.chang...@h3c.com>
> 
> Obviously, the comment before dlm_do_local_recovery_cleanup() has
> nothing to do with it. So remove it.
> 
> Signed-off-by: Changwei Ge <ge.chang...@h3c.com>

It seems to say we can no longer trust the lvb in this case. Maybe it
should be placed somewhere to call dlm_revalidate_lvb()?

Anyway, this cleanup looks good to me.
Acked-by: Joseph Qi <jiangqi...@gmail.com>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: keep the trace point consistent with the function name

2018-02-22 Thread Joseph Qi


On 18/2/14 12:52, Jia Guo wrote:
> keep the trace point consistent with the function name
> 
> Fixes: 3ef045c3d8ae ("ocfs2: switch to ->write_iter()")
> 
> Signed-off-by: Jia Guo <guoji...@huawei.com>
> Reviewed-by: Jun Piao <piao...@huawei.com>
> Reviewed-by: Yiwen Jiang <jiangyi...@huawei.com>
> Reviewed-by: Alex Chen <alex.c...@huawei.com>

Acked-by: Joseph Qi <jiangqi...@gmail.com>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [Ocfs2-dev] BUG: deadlock with umount and ocfs2 workqueue triggered by ocfs2rec thread

2018-01-11 Thread Joseph Qi
Hi Changkuo,

You said s_umount was acquired by umount and ocfs2rec was blocked when
acquiring it. But you didn't describe why umount was blocked.

Thanks,
Joseph

On 18/1/12 11:43, Shichangkuo wrote:
> Hi all,
>   Now we are testing ocfs2 with 4.14 kernel, and we finding a deadlock with 
> umount and ocfs2 workqueue triggered by ocfs2rec thread. The stack as follows:
> journal recovery work:
> [] call_rwsem_down_read_failed+0x14/0x30
> [] ocfs2_finish_quota_recovery+0x62/0x450 [ocfs2]
> [] ocfs2_complete_recovery+0xc1/0x440 [ocfs2]
> [] process_one_work+0x130/0x350
> [] worker_thread+0x46/0x3b0
> [] kthread+0x101/0x140
> [] ret_from_fork+0x1f/0x30
> [] 0x
> 
> /bin/umount:
> [] flush_workqueue+0x104/0x3e0
> [] ocfs2_truncate_log_shutdown+0x3b/0xc0 [ocfs2]
> [] ocfs2_dismount_volume+0x8c/0x3d0 [ocfs2]
> [] ocfs2_put_super+0x31/0xa0 [ocfs2]
> [] generic_shutdown_super+0x6d/0x120
> [] kill_block_super+0x2d/0x60
> [] deactivate_locked_super+0x51/0x90
> [] cleanup_mnt+0x3b/0x70
> [] task_work_run+0x86/0xa0
> [] exit_to_usermode_loop+0x6d/0xa9
> [] do_syscall_64+0x11d/0x130
> [] entry_SYSCALL64_slow_path+0x25/0x25
> [] 0x
>   
> Function ocfs2_finish_quota_recovery try to get sb->s_umount, which was 
> already locked by umount thread, then get a deadlock.
> This issue was introduced by c3b004460d77bf3f980d877be539016f2df4df12 and 
> 5f530de63cfc6ca8571cbdf58af63fb166cc6517.
> I think we cannot use :: s_umount, but the mutex ::dqonoff_mutex was already 
> removed.
> Shall we add a new mutex?
> 
> Thanks
> Changkuo
> -
> 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from New 
> H3C, which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2017-12-28 Thread Joseph Qi
 97224 S 0.333 6.017   0:01.33 corosync
> 
> ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o 
> ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d 
> /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
> Tests with "-b 4096 -C 32768"
> Thu Dec 28 15:04:12 CST 2017
> multi_mmap..Passed.
> Runtime 487 seconds.
> 
> Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
> Signed-off-by: Gang He <g...@suse.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>  fs/ocfs2/dlmglue.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 4689940..5193218 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2486,6 +2486,15 @@ int ocfs2_inode_lock_with_page(struct inode *inode,
>   ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
>   if (ret == -EAGAIN) {
>   unlock_page(page);
> + /*
> +  * If we can't get inode lock immediately, we should not return
> +  * directly here, since this will lead to a softlockup problem.
> +  * The method is to get a blocking lock and immediately unlock
> +  * before returning, this can avoid CPU resource waste due to
> +  * lots of retries, and benefits fairness in getting lock.
> +  */
> + if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
> + ocfs2_inode_unlock(inode, ex);
>   ret = AOP_TRUNCATED_PAGE;
>   }
>  
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: return -EROFS to upper if inode block is invalid

2017-12-25 Thread Joseph Qi


On 17/12/26 14:45, piaojun wrote:
> Hi Joseph,
> 
> On 2017/12/26 14:10, Joseph Qi wrote:
>>
>>
>> On 17/12/26 13:35, piaojun wrote:
>>> Hi Joseph,
>>>
>>> On 2017/12/26 11:05, Joseph Qi wrote:
>>>>
>>>>
>>>> On 17/12/26 10:11, piaojun wrote:
>>>>> If metadata is corrupted such as 'invalid inode block', we will get
>>>>> failed by calling 'mount()' as below:
>>>>>
>>>>> ocfs2_mount
>>>>>   ocfs2_initialize_super
>>>>> ocfs2_init_global_system_inodes : return -EINVAL if inode is NULL
>>>>>   ocfs2_get_system_file_inode
>>>>> _ocfs2_get_system_file_inode : return NULL if inode is errno
>>>> Do you mean inode is bad?
>>>>
>>> Here we have to face two abnormal cases:
>>> 1. inode is bad;
>>> 2. read inode from disk failed due to bad storage link.
>>>>>   ocfs2_iget
>>>>> ocfs2_read_locked_inode
>>>>>   ocfs2_validate_inode_block
>>>>>
>>>>> In this situation we need return -EROFS to upper application, so that
>>>>> user can fix it by fsck. And then mount again.
>>>>>
>>>>> Signed-off-by: Jun Piao <piao...@huawei.com>
>>>>> Reviewed-by: Alex Chen <alex.c...@huawei.com>
>>>>> ---
>>>>>  fs/ocfs2/super.c | 10 --
>>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>>>>> index 040bbb6..dea21a7 100644
>>>>> --- a/fs/ocfs2/super.c
>>>>> +++ b/fs/ocfs2/super.c
>>>>> @@ -474,7 +474,10 @@ static int ocfs2_init_global_system_inodes(struct 
>>>>> ocfs2_super *osb)
>>>>>   new = ocfs2_get_system_file_inode(osb, i, osb->slot_num);
>>>>>   if (!new) {
>>>>>   ocfs2_release_system_inodes(osb);
>>>>> - status = -EINVAL;
>>>>> + if (ocfs2_is_soft_readonly(osb))
>>>> I'm afraid that having bad inode doesn't means ocfs2 is readonly.
>>>> And the calling application is mount.ocfs2. So do you mean mount.ocfs2
>>>> have to handle EROFS like printing corresponding error log?
>>>>
>>> I agree that 'bad inode' also means other abnormal cases like
>>> 'bad storage link' or 'no memory', but we can distinguish that by
>>> ocfs2_is_soft_readonly(). I found that 'mount.ocfs2' did not
>>> distinguish any error type and just return 1 for all error cases. I
>>> wonder if we should return the exact errno for users?
>>> Soft readonly is an in-memory status. The case you described is just
>> trying to read inode and then check if it is bad. So where to set the
>> status before?
>>
> we set readonly status in the following process:
> ocfs2_validate_inode_block()
>   ocfs2_error
> ocfs2_handle_error
>   ocfs2_set_ro_flag(osb, 0);
> 
> I have a suggestion that we could distinguish readonly status in
> 'mount.ocfs2', and return -EROFS to users so that they can fix it.
IC. Please update this information to patch description as well.
And suggest just use ternary operator instead of if/else.
BTW, so mount.ocfs2 should be updated correspondingly, right?

Thanks,
Joseph
>>> thanks,
>>> Jun
>>>
>>>>> + status = -EROFS;
>>>>> + else
>>>>> + status = -EINVAL;
>>>>>   mlog_errno(status);
>>>>>   /* FIXME: Should ERROR_RO_FS */
>>>>>   mlog(ML_ERROR, "Unable to load system inode %d, "
>>>>> @@ -505,7 +508,10 @@ static int ocfs2_init_local_system_inodes(struct 
>>>>> ocfs2_super *osb)
>>>>>   new = ocfs2_get_system_file_inode(osb, i, osb->slot_num);
>>>>>   if (!new) {
>>>>>   ocfs2_release_system_inodes(osb);
>>>>> - status = -EINVAL;
>>>>> + if (ocfs2_is_soft_readonly(osb))
>>>>> + status = -EROFS;
>>>>> + else
>>>>> + status = -EINVAL;
>>>>>   mlog(ML_ERROR, "status=%d, sysfile=%d, slot=%d\n",
>>>>>status, i, osb->slot_num);
>>>>>   goto bail;
>>>>>
>>>> .
>>>>
>> .
>>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: return -EROFS to upper if inode block is invalid

2017-12-25 Thread Joseph Qi


On 17/12/26 13:35, piaojun wrote:
> Hi Joseph,
> 
> On 2017/12/26 11:05, Joseph Qi wrote:
>>
>>
>> On 17/12/26 10:11, piaojun wrote:
>>> If metadata is corrupted such as 'invalid inode block', we will get
>>> failed by calling 'mount()' as below:
>>>
>>> ocfs2_mount
>>>   ocfs2_initialize_super
>>> ocfs2_init_global_system_inodes : return -EINVAL if inode is NULL
>>>   ocfs2_get_system_file_inode
>>> _ocfs2_get_system_file_inode : return NULL if inode is errno
>> Do you mean inode is bad?
>>
> Here we have to face two abnormal cases:
> 1. inode is bad;
> 2. read inode from disk failed due to bad storage link.
>>>   ocfs2_iget
>>> ocfs2_read_locked_inode
>>>   ocfs2_validate_inode_block
>>>
>>> In this situation we need return -EROFS to upper application, so that
>>> user can fix it by fsck. And then mount again.
>>>
>>> Signed-off-by: Jun Piao <piao...@huawei.com>
>>> Reviewed-by: Alex Chen <alex.c...@huawei.com>
>>> ---
>>>  fs/ocfs2/super.c | 10 --
>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>>> index 040bbb6..dea21a7 100644
>>> --- a/fs/ocfs2/super.c
>>> +++ b/fs/ocfs2/super.c
>>> @@ -474,7 +474,10 @@ static int ocfs2_init_global_system_inodes(struct 
>>> ocfs2_super *osb)
>>> new = ocfs2_get_system_file_inode(osb, i, osb->slot_num);
>>> if (!new) {
>>> ocfs2_release_system_inodes(osb);
>>> -   status = -EINVAL;
>>> +   if (ocfs2_is_soft_readonly(osb))
>> I'm afraid that having bad inode doesn't means ocfs2 is readonly.
>> And the calling application is mount.ocfs2. So do you mean mount.ocfs2
>> have to handle EROFS like printing corresponding error log?
>>
> I agree that 'bad inode' also means other abnormal cases like
> 'bad storage link' or 'no memory', but we can distinguish that by
> ocfs2_is_soft_readonly(). I found that 'mount.ocfs2' did not
> distinguish any error type and just return 1 for all error cases. I
> wonder if we should return the exact errno for users?
> Soft readonly is an in-memory status. The case you described is just
trying to read inode and then check if it is bad. So where to set the
status before?

> thanks,
> Jun
> 
>>> +   status = -EROFS;
>>> +   else
>>> +   status = -EINVAL;
>>> mlog_errno(status);
>>> /* FIXME: Should ERROR_RO_FS */
>>> mlog(ML_ERROR, "Unable to load system inode %d, "
>>> @@ -505,7 +508,10 @@ static int ocfs2_init_local_system_inodes(struct 
>>> ocfs2_super *osb)
>>> new = ocfs2_get_system_file_inode(osb, i, osb->slot_num);
>>> if (!new) {
>>> ocfs2_release_system_inodes(osb);
>>> -   status = -EINVAL;
>>> +   if (ocfs2_is_soft_readonly(osb))
>>> +   status = -EROFS;
>>> +   else
>>> +   status = -EINVAL;
>>> mlog(ML_ERROR, "status=%d, sysfile=%d, slot=%d\n",
>>>  status, i, osb->slot_num);
>>> goto bail;
>>>
>> .
>>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: return -EROFS to upper if inode block is invalid

2017-12-25 Thread Joseph Qi


On 17/12/26 10:11, piaojun wrote:
> If metadata is corrupted such as 'invalid inode block', we will get
> failed by calling 'mount()' as below:
> 
> ocfs2_mount
>   ocfs2_initialize_super
> ocfs2_init_global_system_inodes : return -EINVAL if inode is NULL
>   ocfs2_get_system_file_inode
> _ocfs2_get_system_file_inode : return NULL if inode is errno
Do you mean inode is bad?

>   ocfs2_iget
> ocfs2_read_locked_inode
>   ocfs2_validate_inode_block
> 
> In this situation we need return -EROFS to upper application, so that
> user can fix it by fsck. And then mount again.
> 
> Signed-off-by: Jun Piao 
> Reviewed-by: Alex Chen 
> ---
>  fs/ocfs2/super.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 040bbb6..dea21a7 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -474,7 +474,10 @@ static int ocfs2_init_global_system_inodes(struct 
> ocfs2_super *osb)
>   new = ocfs2_get_system_file_inode(osb, i, osb->slot_num);
>   if (!new) {
>   ocfs2_release_system_inodes(osb);
> - status = -EINVAL;
> + if (ocfs2_is_soft_readonly(osb))
I'm afraid that having bad inode doesn't means ocfs2 is readonly.
And the calling application is mount.ocfs2. So do you mean mount.ocfs2
have to handle EROFS like printing corresponding error log?

> + status = -EROFS;
> + else
> + status = -EINVAL;
>   mlog_errno(status);
>   /* FIXME: Should ERROR_RO_FS */
>   mlog(ML_ERROR, "Unable to load system inode %d, "
> @@ -505,7 +508,10 @@ static int ocfs2_init_local_system_inodes(struct 
> ocfs2_super *osb)
>   new = ocfs2_get_system_file_inode(osb, i, osb->slot_num);
>   if (!new) {
>   ocfs2_release_system_inodes(osb);
> - status = -EINVAL;
> + if (ocfs2_is_soft_readonly(osb))
> + status = -EROFS;
> + else
> + status = -EINVAL;
>   mlog(ML_ERROR, "status=%d, sysfile=%d, slot=%d\n",
>status, i, osb->slot_num);
>   goto bail;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a potential deadlock in dlm_reset_mleres_owner()

2017-12-21 Thread Joseph Qi


On 17/12/22 10:34, Changwei Ge wrote:
> On 2017/12/21 14:36, alex chen wrote:
>> Hi Joseph,
>>
>> On 2017/12/21 9:30, Joseph Qi wrote:
>>> Hi Alex,
>>>
>>> On 17/12/21 08:55, alex chen wrote:
>>>> In dlm_reset_mleres_owner(), we will lock
>>>> dlm_lock_resource->spinlock after locking dlm_ctxt->master_lock,
>>>> which breaks the spinlock lock ordering:
>>>>   dlm_domain_lock
>>>>   struct dlm_ctxt->spinlock
>>>>   struct dlm_lock_resource->spinlock
>>>>   struct dlm_ctxt->master_lock
>>>>
>>>> Fix it by unlocking dlm_ctxt->master_lock before locking
>>>> dlm_lock_resource->spinlock and restarting to clean master list.
>>>>
>>>> Signed-off-by: Alex Chen <alex.c...@huawei.com>
>>>> Reviewed-by: Jun Piao <piao...@huawei.com>
>>>> ---
>>>>   fs/ocfs2/dlm/dlmmaster.c | 14 ++
>>>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
>>>> index 3e04279..d83ccdc 100644
>>>> --- a/fs/ocfs2/dlm/dlmmaster.c
>>>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>>>> @@ -3287,16 +3287,22 @@ static struct dlm_lock_resource 
>>>> *dlm_reset_mleres_owner(struct dlm_ctxt *dlm,
>>>>   {
>>>>struct dlm_lock_resource *res;
>>>>
>>>> +  assert_spin_locked(>spinlock);
>>>> +  assert_spin_locked(>master_lock);
>>>> +
>>>>/* Find the lockres associated to the mle and set its owner to UNK */
>>>> -  res = __dlm_lookup_lockres(dlm, mle->mname, mle->mnamelen,
>>>> +  res = __dlm_lookup_lockres_full(dlm, mle->mname, mle->mnamelen,
>>>>   mle->mnamehash);
>>>>if (res) {
>>>>spin_unlock(>master_lock);
>>>>
>>>> -  /* move lockres onto recovery list */
>>>>spin_lock(>spinlock);
>>>> -  dlm_set_lockres_owner(dlm, res, DLM_LOCK_RES_OWNER_UNKNOWN);
>>>> -  dlm_move_lockres_to_recovery_list(dlm, res);
>>>> +  if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
>>>> +  /* move lockres onto recovery list */
>>>> +  dlm_set_lockres_owner(dlm, res, 
>>>> DLM_LOCK_RES_OWNER_UNKNOWN);
>>>> +  dlm_move_lockres_to_recovery_list(dlm, res);
>>>> +  }
>>>> +
>>> I don't think this change is lock re-ordering *only*. It definitely
>>> changes the logic of resetting mle resource owner.
>>> Why do you detach mle from heartbeat if lock resource is in the process
>>> of dropping its mastery reference? And why we have to restart in this
>>> case?
>> I think if the lock resource is being purge we don't need to set its owner 
>> to UNKNOWN and
>> it is the same as the original logic. We should drop the master lock if we 
>> want to judge
>> if the state of the lock resource is DLM_LOCK_RES_DROPPING_REF. Once we drop 
>> the master lock
>> we should restart to clean master list.
>> Here the mle is not useful and will be released, so we detach it from 
>> heartbeat.
>> In fact, the mle has been detached in dlm_clean_migration_mle().
> Hi Alex,
> 
> Perhaps, you can just judge if lock resource is marked  
> DLM_LOCK_RES_DROPPING_REF and if so directly
> return NULL with ::master_lock locked :)
>Umm... We can't do this with unlocking master lock first and then
re-taking it. It breaks the logic.
What my concern is the behavior change. e.g. Currently for lock resource
in the process of dropping mastery reference, we just ignore it and
continue with the next. But after this patch, we have to restart at the
beginning.

> Thanks,
> Changwei
> 
>>
>> Thanks,
>> Alex
>>>
>>> Thanks,
>>> Joseph
>>>
>>>>spin_unlock(>spinlock);
>>>>dlm_lockres_put(res);
>>>>
>>>
>>> .
>>>
>>
>>
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a potential deadlock in dlm_reset_mleres_owner()

2017-12-20 Thread Joseph Qi
Hi Alex,

On 17/12/21 08:55, alex chen wrote:
> In dlm_reset_mleres_owner(), we will lock
> dlm_lock_resource->spinlock after locking dlm_ctxt->master_lock,
> which breaks the spinlock lock ordering:
>  dlm_domain_lock
>  struct dlm_ctxt->spinlock
>  struct dlm_lock_resource->spinlock
>  struct dlm_ctxt->master_lock
> 
> Fix it by unlocking dlm_ctxt->master_lock before locking
> dlm_lock_resource->spinlock and restarting to clean master list.
> 
> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
> ---
>  fs/ocfs2/dlm/dlmmaster.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 3e04279..d83ccdc 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -3287,16 +3287,22 @@ static struct dlm_lock_resource 
> *dlm_reset_mleres_owner(struct dlm_ctxt *dlm,
>  {
>   struct dlm_lock_resource *res;
> 
> + assert_spin_locked(>spinlock);
> + assert_spin_locked(>master_lock);
> +
>   /* Find the lockres associated to the mle and set its owner to UNK */
> - res = __dlm_lookup_lockres(dlm, mle->mname, mle->mnamelen,
> + res = __dlm_lookup_lockres_full(dlm, mle->mname, mle->mnamelen,
>  mle->mnamehash);
>   if (res) {
>   spin_unlock(>master_lock);
> 
> - /* move lockres onto recovery list */
>   spin_lock(>spinlock);
> - dlm_set_lockres_owner(dlm, res, DLM_LOCK_RES_OWNER_UNKNOWN);
> - dlm_move_lockres_to_recovery_list(dlm, res);
> + if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
> + /* move lockres onto recovery list */
> + dlm_set_lockres_owner(dlm, res, 
> DLM_LOCK_RES_OWNER_UNKNOWN);
> + dlm_move_lockres_to_recovery_list(dlm, res);
> + }
> +
I don't think this change is lock re-ordering *only*. It definitely
changes the logic of resetting mle resource owner.
Why do you detach mle from heartbeat if lock resource is in the process
of dropping its mastery reference? And why we have to restart in this
case?

Thanks,
Joseph

>   spin_unlock(>spinlock);
>   dlm_lockres_put(res);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing

2017-12-18 Thread Joseph Qi

On 17/12/19 05:53, Andrew Morton wrote:
> On Mon, 18 Dec 2017 12:06:21 + Changwei Ge  wrote:
> 
>> Before ocfs2 supporting allocating clusters while doing append-dio, all 
>> append
>> dio will fall back to buffer io to allocate clusters firstly. Also, when it
>> steps on a file hole, it will fall back to buffer io, too. But for current
>> code, writing to file hole will leverage dio to allocate clusters. This is 
>> not
>> right, since whether append-io is enabled tells the capability whether ocfs2 
>> can
>> allocate space while doing dio.
>> So introduce file hole check function back into ocfs2.
>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will 
>> fall
>> back to buffer IO to allocate clusters.
>>
> 
> hm, that's a bit hard to understand.  Hopefully reviewers will know
> what's going on ;)> 
Also suggest rearrange the description:)

>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>>  return ret;
>>   }
>>   
>> +/*
>> + * Will look for holes and unwritten extents in the range starting at
>> + * pos for count bytes (inclusive).
>> + */
>> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>> +   size_t count)
>> +{
>> +int ret = 0;
>> +unsigned int extent_flags;
>> +u32 cpos, clusters, extent_len, phys_cpos;
>> +struct super_block *sb = inode->i_sb;
>> +
>> +cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
>> +clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
>> +
>> +while (clusters) {
>> +ret = ocfs2_get_clusters(inode, cpos, _cpos, _len,
>> + _flags);
>> +if (ret < 0) {
>> +mlog_errno(ret);
>> +goto out;
>> +}
>> +
>> +if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
>> +ret = 1;
>> +break;
>> +}
>> +
>> +if (extent_len > clusters)
>> +extent_len = clusters;
>> +
>> +clusters -= extent_len;
>> +cpos += extent_len;
>> +}
>> +out:
>> +return ret;
>> +}
> 
> A few thoughts:
> 
> - a function which does "check_foo" isn't well named.  Because the
>   reader cannot determine whether the return value means "foo is true"
>   or "foo is false".
> 
>   So a better name for this function is ocfs2_range_has_holes(), so
>   the reader immediately understand what its return value *means".
> 
>   Also a bool return value is more logical.
> 
> - Mixing "goto out" with "break" as above is a bit odd.
> 
> So...
> 
> 
> --- 
> a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
> +++ a/fs/ocfs2/aops.c
> @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
>   * Will look for holes and unwritten extents in the range starting at
>   * pos for count bytes (inclusive).
>   */
> -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> -size_t count)
> +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t 
> count)
>  {
> - int ret = 0;
> + bool ret = false;

I have a different opinion here. If we change return value from int to
bool, the error returned by ocfs2_get_clusters cannot be reflected. So
I'd prefer the original version.

Thanks,
Joseph

>   unsigned int extent_flags;
>   u32 cpos, clusters, extent_len, phys_cpos;
>   struct super_block *sb = inode->i_sb;
> @@ -2489,8 +2488,8 @@ static int ocfs2_check_range_for_holes(s
>   }
>  
>   if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
> - ret = 1;
> - break;
> + ret = true;
> + goto out;
>   }
>  
>   if (extent_len > clusters)
> _
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing

2017-12-18 Thread Joseph Qi


On 17/12/19 09:27, Andrew Morton wrote:
> On Tue, 19 Dec 2017 09:24:17 +0800 Joseph Qi <jiangqi...@gmail.com> wrote:
> 
>>> --- 
>>> a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
>>> +++ a/fs/ocfs2/aops.c
>>> @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
>>>   * Will look for holes and unwritten extents in the range starting at
>>>   * pos for count bytes (inclusive).
>>>   */
>>> -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
>>> -  size_t count)
>>> +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t 
>>> count)
>>>  {
>>> -   int ret = 0;
>>> +   bool ret = false;
>>
>> I have a different opinion here. If we change return value from int to
>> bool, the error returned by ocfs2_get_clusters cannot be reflected. So
>> I'd prefer the original version.
> 
> But that error code is not used?
> 
IMO, since ocfs2_get_clusters will get io involved, the caller shouldn't
ignore its error.

Something like:
ret = ocfs2_check_range_for_holes();
if (ret < 0) {
mlog_errno(ret);
goto out;
}

Then check if append dio feature has been enabled as well as write range
and hole.

Thanks,
Joseph

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fix a potential deadlock in dlm_reset_mleres_owner()

2017-12-18 Thread Joseph Qi


On 17/12/18 18:22, alex chen wrote:
> In dlm_reset_mleres_owner(), we will lock
> dlm_lock_resource->spinlock after locking dlm_ctxt->master_lock,
> which breaks the spinlock lock ordering:
>  dlm_domain_lock
>  struct dlm_ctxt->spinlock
>  struct dlm_lock_resource->spinlock
>  struct dlm_ctxt->master_lock
> 
> Fix it by unlocking dlm_ctxt->master_lock before locking
> dlm_lock_resource->spinlock and restarting to clean master list.
> 
> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
> ---
>  fs/ocfs2/dlm/dlmmaster.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 3e04279..0df939a 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -3287,14 +3287,23 @@ static struct dlm_lock_resource 
> *dlm_reset_mleres_owner(struct dlm_ctxt *dlm,
>  {
>   struct dlm_lock_resource *res;
> 
> + assert_spin_locked(>spinlock);
> + assert_spin_locked(>master_lock);
> +
>   /* Find the lockres associated to the mle and set its owner to UNK */
> - res = __dlm_lookup_lockres(dlm, mle->mname, mle->mnamelen,
> + res = __dlm_lookup_lockres_full(dlm, mle->mname, mle->mnamelen,
>  mle->mnamehash);
>   if (res) {
>   spin_unlock(>master_lock);
> 
> - /* move lockres onto recovery list */
>   spin_lock(>spinlock);
> + if (res->state & DLM_LOCK_RES_DROPPING_REF) {
> + spin_unlock(>spinlock);
> + dlm_lockres_put(res);
> + return NULL;
> + }

We can't just return NULL here. Please note that we have to:
return a valid lock resource with master_lock unlocked, or return NULL
with master_lock.
You patch will introduce unlocking master_lock twice.

Thanks,
Joseph

> +
> + /* move lockres onto recovery list */
>   dlm_set_lockres_owner(dlm, res, DLM_LOCK_RES_OWNER_UNKNOWN);
>   dlm_move_lockres_to_recovery_list(dlm, res);
>   spin_unlock(>spinlock);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v3] ocfs2: clean dead code in suballoc.c

2017-12-12 Thread Joseph Qi


On 17/12/13 12:54, Changwei Ge wrote:
> Stack variable fe is no longer used, so trim it to save some CPU cycles
> and stack space.
> 
> Signed-off-by: Changwei Ge <ge.chang...@h3c.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>   fs/ocfs2/suballoc.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 71f22c8fbffd..508422b0032f 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -2566,16 +2566,16 @@ static int _ocfs2_free_clusters(handle_t *handle,
>   int status;
>   u16 bg_start_bit;
>   u64 bg_blkno;
> - struct ocfs2_dinode *fe;
>   
>   /* You can't ever have a contiguous set of clusters
>* bigger than a block group bitmap so we never have to worry
>* about looping on them.
>* This is expensive. We can safely remove once this stuff has
>* gotten tested really well. */
> - BUG_ON(start_blk != ocfs2_clusters_to_blocks(bitmap_inode->i_sb, 
> ocfs2_blocks_to_clusters(bitmap_inode->i_sb, start_blk)));
> + BUG_ON(start_blk != ocfs2_clusters_to_blocks(bitmap_inode->i_sb,
> + ocfs2_blocks_to_clusters(bitmap_inode->i_sb,
> +  start_blk)));
>   
> - fe = (struct ocfs2_dinode *) bitmap_bh->b_data;
>   
>   ocfs2_block_to_cluster_group(bitmap_inode, start_blk, _blkno,
>_start_bit);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2: clean dead code in suballoc.c

2017-12-12 Thread Joseph Qi


On 17/12/13 11:51, Changwei Ge wrote:
> Stack variable fe is no longer used, so trim it to save some cpu cycles
> and stack space.
> 
> Signed-off-by: Changwei Ge 
> ---
>   fs/ocfs2/suballoc.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 71f22c8fbffd..508422b0032f 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -2566,16 +2566,16 @@ static int _ocfs2_free_clusters(handle_t *handle,
>   int status;
>   u16 bg_start_bit;
>   u64 bg_blkno;
> - struct ocfs2_dinode *fe;
> 
>   /* You can't ever have a contiguous set of clusters
>* bigger than a block group bitmap so we never have to worry
>* about looping on them.
>* This is expensive. We can safely remove once this stuff has
>* gotten tested really well. */
> - BUG_ON(start_blk != ocfs2_clusters_to_blocks(bitmap_inode->i_sb, 
> ocfs2_blocks_to_clusters(bitmap_inode->i_sb, start_blk)));

Still malformed. You have to disable auto wrap length in your editor.

Thanks,
Joseph

> + BUG_ON(start_blk != ocfs2_clusters_to_blocks(bitmap_inode->i_sb,
> + ocfs2_blocks_to_clusters(bitmap_inode->i_sb,
> +  start_blk)));
> 
> - fe = (struct ocfs2_dinode *) bitmap_bh->b_data;
> 
>   ocfs2_block_to_cluster_group(bitmap_inode, start_blk, _blkno,
>_start_bit);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/cluster: clean up unused function declaration in heartbeat.h

2017-12-12 Thread Joseph Qi
These have already been cleaned up in commit
98d6c09ec2899a9a601b16ec7ae31d54e6b100b9.

On 17/12/13 11:19, Changwei Ge wrote:
> Signed-off-by: Changwei Ge 
> ---
>   fs/ocfs2/cluster/heartbeat.h | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.h b/fs/ocfs2/cluster/heartbeat.h
> index 3ef5137dc362..a9e67efc0004 100644
> --- a/fs/ocfs2/cluster/heartbeat.h
> +++ b/fs/ocfs2/cluster/heartbeat.h
> @@ -79,10 +79,8 @@ void o2hb_fill_node_map(unsigned long *map,
>   unsigned bytes);
>   void o2hb_exit(void);
>   int o2hb_init(void);
> -int o2hb_check_node_heartbeating(u8 node_num);
>   int o2hb_check_node_heartbeating_no_sem(u8 node_num);
>   int o2hb_check_node_heartbeating_from_callback(u8 node_num);
> -int o2hb_check_local_node_heartbeating(void);
>   void o2hb_stop_all_regions(void);
>   int o2hb_get_all_regions(char *region_uuids, u8 numregions);
>   int o2hb_global_heartbeat_active(void);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: clean dead code in suballoc.c

2017-12-12 Thread Joseph Qi


On 17/12/13 11:04, Changwei Ge wrote:
> Stack variable fe is no longer used, so trim it to save some cpu cycles
> and stack space.
> 
> Signed-off-by: Changwei Ge 
> ---
>   fs/ocfs2/suballoc.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 71f22c8fbffd..a74108d22d47 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -2566,7 +2566,6 @@ static int _ocfs2_free_clusters(handle_t *handle,
>   int status;
>   u16 bg_start_bit;
>   u64 bg_blkno;
> - struct ocfs2_dinode *fe;
> 
>   /* You can't ever have a contiguous set of clusters
>* bigger than a block group bitmap so we never have to worry
> @@ -2575,8 +2574,6 @@ static int _ocfs2_free_clusters(handle_t *handle,
>* gotten tested really well. */
>   BUG_ON(start_blk != ocfs2_clusters_to_blocks(bitmap_inode->i_sb, 
> ocfs2_blocks_to_clusters(bitmap_inode->i_sb, start_blk)));
> 
Here is malformed. Suggest explicitly split the long line into two as
well.

Thanks,
Joseph
> - fe = (struct ocfs2_dinode *) bitmap_bh->b_data;
> -
>   ocfs2_block_to_cluster_group(bitmap_inode, start_blk, _blkno,
>_start_bit);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/cluster: neaten a member of o2net_msg_handler

2017-12-05 Thread Joseph Qi


On 17/12/5 13:47, Changwei Ge wrote:
> It's odd that o2net_msg_handler::nh_func_data is declared as type
> o2net_msg_handler_func*.
> So neaten it.
> 
> Signed-off-by: Changwei Ge <ge.chang...@h3c.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>   fs/ocfs2/cluster/tcp_internal.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/cluster/tcp_internal.h 
> b/fs/ocfs2/cluster/tcp_internal.h
> index b95e7df5b76a..0276f7f8d5e6 100644
> --- a/fs/ocfs2/cluster/tcp_internal.h
> +++ b/fs/ocfs2/cluster/tcp_internal.h
> @@ -196,7 +196,7 @@ struct o2net_msg_handler {
>   u32 nh_msg_type;
>   u32 nh_key;
>   o2net_msg_handler_func  *nh_func;
> - o2net_msg_handler_func  *nh_func_data;
> + void*nh_func_data;
>   o2net_post_msg_handler_func
>   *nh_post_func;
>   struct kref nh_kref;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2: check the metadate alloc before marking extent written

2017-12-04 Thread Joseph Qi


On 17/12/5 11:29, alex chen wrote:
> We need to check the free number of the records in each loop to mark
> extent written, because the last extent block may be changed through
> many times marking extent written and the 'num_free_extents' also be
> changed. In the worst case, the 'num_free_extents' may become less
> than the beginning of the loop. So we should not estimate the free
> number of the records at the beginning of the loop to mark extent
> written.
> 
> Reported-by: John Lightsey <j...@nixnuts.net>
> Signed-off-by: Alex Chen <alex.c...@huawei.com>
> Reviewed-by: Jun Piao <piao...@huawei.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>  fs/ocfs2/aops.c | 77 
> +++--
>  1 file changed, 64 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index d151632..7e1659d 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2272,6 +2272,35 @@ static int ocfs2_dio_wr_get_block(struct inode *inode, 
> sector_t iblock,
>   return ret;
>  }
> 
> +static int ocfs2_dio_should_restart(struct ocfs2_extent_tree *et,
> + struct ocfs2_alloc_context *meta_ac, int max_rec_needed)
> +{
> + int status = 0, free_extents;
> +
> + free_extents = ocfs2_num_free_extents(et);
> + if (free_extents < 0) {
> + status = free_extents;
> + mlog_errno(status);
> + return status;
> + }
> +
> + /*
> +  * there are two cases which could cause us to EAGAIN in the
> +  * we-need-more-metadata case:
> +  * 1) we haven't reserved *any*
> +  * 2) we are so fragmented, we've needed to add metadata too
> +  *many times.
> +  */
> + if (free_extents < max_rec_needed) {
> + if (!meta_ac || (ocfs2_alloc_context_bits_left(meta_ac)
> + < ocfs2_extend_meta_needed(et->et_root_el)))
> + status = 1;
> + }
> +
> + return status;
> +}
> +
> +#define OCFS2_MAX_REC_NEEDED_SPLIT 2
>  static int ocfs2_dio_end_io_write(struct inode *inode,
> struct ocfs2_dio_write_ctxt *dwc,
> loff_t offset,
> @@ -2281,14 +2310,13 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>   struct ocfs2_extent_tree et;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>   struct ocfs2_inode_info *oi = OCFS2_I(inode);
> - struct ocfs2_unwritten_extent *ue = NULL;
> + struct ocfs2_unwritten_extent *ue = NULL, *tmp_ue;
>   struct buffer_head *di_bh = NULL;
>   struct ocfs2_dinode *di;
> - struct ocfs2_alloc_context *data_ac = NULL;
>   struct ocfs2_alloc_context *meta_ac = NULL;
>   handle_t *handle = NULL;
>   loff_t end = offset + bytes;
> - int ret = 0, credits = 0, locked = 0;
> + int ret = 0, credits = 0, locked = 0, restart = 0;
> 
>   ocfs2_init_dealloc_ctxt();
> 
> @@ -2328,10 +2356,10 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> 
>   di = (struct ocfs2_dinode *)di_bh->b_data;
> 
> +restart_all:
>   ocfs2_init_dinode_extent_tree(, INODE_CACHE(inode), di_bh);
> -
> - ret = ocfs2_lock_allocators(inode, , 0, dwc->dw_zero_count*2,
> - _ac, _ac);
> + ret = ocfs2_lock_allocators(inode, , 0, OCFS2_MAX_REC_NEEDED_SPLIT,
> + NULL, _ac);
>   if (ret) {
>   mlog_errno(ret);
>   goto unlock;
> @@ -2343,7 +2371,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>   if (IS_ERR(handle)) {
>   ret = PTR_ERR(handle);
>   mlog_errno(ret);
> - goto unlock;
> + goto free_ac;
>   }
>   ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> OCFS2_JOURNAL_ACCESS_WRITE);
> @@ -2352,7 +2380,17 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>   goto commit;
>   }
> 
> - list_for_each_entry(ue, >dw_zero_list, ue_node) {
> + list_for_each_entry_safe(ue, tmp_ue, >dw_zero_list, ue_node) {
> + ret = ocfs2_dio_should_restart(, meta_ac,
> + OCFS2_MAX_REC_NEEDED_SPLIT * 2);
> + if (ret < 0) {
> + mlog_errno(ret);
> + break;
> + } else if (ret == 1) {
> + restart = 1;
> + break;
> + }
> +
>   ret = ocfs2_mark_extent_written(inode, , handle,
> 

Re: [Ocfs2-devel] [PATCH] ocfs2: check the metadate alloc before marking extent written

2017-12-04 Thread Joseph Qi


On 17/12/5 11:09, alex chen wrote:
> 
> 
> On 2017/12/5 10:45, Joseph Qi wrote:
>>
>>
>> On 17/12/5 10:36, alex chen wrote:
>>> Hi Joseph,
>>>
>>> Thanks for your reviews.
>>>
>>> On 2017/12/5 10:21, Joseph Qi wrote:
>>>>
>>>>
>>>> On 17/12/5 09:41, alex chen wrote:
>>>>> We need to check the free number of the records in each loop to mark
>>>>> extent written, because the last extent block may be changed through
>>>>> many times marking extent written and the 'num_free_extents' also be
>>>>> changed. In the worst case, the 'num_free_extents' may become less
>>>>> than the beginning of the loop. So we should not estimate the free
>>>>> number of the records at the beginning of the loop to mark extent
>>>>> written.
>>>>>
>>>>> Reported-by: John Lightsey <j...@nixnuts.net>
>>>>> Signed-off-by: Alex Chen <alex.c...@huawei.com>
>>>>> Reviewed-by: Jun Piao <piao...@huawei.com>
>>>>> ---
>>>>>  fs/ocfs2/aops.c | 76 
>>>>> -
>>>>>  1 file changed, 64 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>> index d151632..702aebc 100644
>>>>> --- a/fs/ocfs2/aops.c
>>>>> +++ b/fs/ocfs2/aops.c
>>>>> @@ -2272,6 +2272,35 @@ static int ocfs2_dio_wr_get_block(struct inode 
>>>>> *inode, sector_t iblock,
>>>>>   return ret;
>>>>>  }
>>>>>
>>>>> +static int ocfs2_dio_should_restart(struct ocfs2_extent_tree *et,
>>>>> + struct ocfs2_alloc_context *meta_ac, int max_rec_needed)
>>>>> +{
>>>>> + int status = 0, free_extents;
>>>>> +
>>>>> + free_extents = ocfs2_num_free_extents(et);
>>>>> + if (free_extents < 0) {
>>>>> + status = free_extents;
>>>>> + mlog_errno(status);
>>>>> + return status;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> +  * there are two cases which could cause us to EAGAIN in the
>>>>> +  * we-need-more-metadata case:
>>>>> +  * 1) we haven't reserved *any*
>>>>> +  * 2) we are so fragmented, we've needed to add metadata too
>>>>> +  *many times.
>>>>> +  */
>>>>> + if (free_extents < max_rec_needed) {
>>>>> + if (!meta_ac || (ocfs2_alloc_context_bits_left(meta_ac)
>>>>> + < ocfs2_extend_meta_needed(et->et_root_el)))
>>>>> + status = 1;
>>>>> + }
>>>>> +
>>>>> + return status;
>>>>> +}
>>>>> +
>>>>> +#define OCFS2_MAX_REC_NEEDED_SPLIT 2
>>>>>  static int ocfs2_dio_end_io_write(struct inode *inode,
>>>>> struct ocfs2_dio_write_ctxt *dwc,
>>>>> loff_t offset,
>>>>> @@ -2281,14 +2310,13 @@ static int ocfs2_dio_end_io_write(struct inode 
>>>>> *inode,
>>>>>   struct ocfs2_extent_tree et;
>>>>>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>   struct ocfs2_inode_info *oi = OCFS2_I(inode);
>>>>> - struct ocfs2_unwritten_extent *ue = NULL;
>>>>> + struct ocfs2_unwritten_extent *ue = NULL, *tmp_ue;
>>>>>   struct buffer_head *di_bh = NULL;
>>>>>   struct ocfs2_dinode *di;
>>>>> - struct ocfs2_alloc_context *data_ac = NULL;
>>>>>   struct ocfs2_alloc_context *meta_ac = NULL;
>>>>>   handle_t *handle = NULL;
>>>>>   loff_t end = offset + bytes;
>>>>> - int ret = 0, credits = 0, locked = 0;
>>>>> + int ret = 0, credits = 0, locked = 0, restart = 0;
>>>>>
>>>>>   ocfs2_init_dealloc_ctxt();
>>>>>
>>>>> @@ -2330,8 +2358,9 @@ static int ocfs2_dio_end_io_write(struct inode 
>>>>> *inode,
>>>>>
>>>> Should we restart here?
>>>
>>> OK. we should restart before initialized the inode extent tree.
>>>
>>>>
>>>>>   ocfs2_init_dinode_extent_tree(, INODE_CACHE(inode), di_bh);
>>>>>
>>>>> - ret = ocfs2_lock_allocator

Re: [Ocfs2-devel] [PATCH] ocfs2: check the metadate alloc before marking extent written

2017-12-04 Thread Joseph Qi


On 17/12/5 10:36, alex chen wrote:
> Hi Joseph,
> 
> Thanks for your reviews.
> 
> On 2017/12/5 10:21, Joseph Qi wrote:
>>
>>
>> On 17/12/5 09:41, alex chen wrote:
>>> We need to check the free number of the records in each loop to mark
>>> extent written, because the last extent block may be changed through
>>> many times marking extent written and the 'num_free_extents' also be
>>> changed. In the worst case, the 'num_free_extents' may become less
>>> than the beginning of the loop. So we should not estimate the free
>>> number of the records at the beginning of the loop to mark extent
>>> written.
>>>
>>> Reported-by: John Lightsey <j...@nixnuts.net>
>>> Signed-off-by: Alex Chen <alex.c...@huawei.com>
>>> Reviewed-by: Jun Piao <piao...@huawei.com>
>>> ---
>>>  fs/ocfs2/aops.c | 76 
>>> -
>>>  1 file changed, 64 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>> index d151632..702aebc 100644
>>> --- a/fs/ocfs2/aops.c
>>> +++ b/fs/ocfs2/aops.c
>>> @@ -2272,6 +2272,35 @@ static int ocfs2_dio_wr_get_block(struct inode 
>>> *inode, sector_t iblock,
>>> return ret;
>>>  }
>>>
>>> +static int ocfs2_dio_should_restart(struct ocfs2_extent_tree *et,
>>> +   struct ocfs2_alloc_context *meta_ac, int max_rec_needed)
>>> +{
>>> +   int status = 0, free_extents;
>>> +
>>> +   free_extents = ocfs2_num_free_extents(et);
>>> +   if (free_extents < 0) {
>>> +   status = free_extents;
>>> +   mlog_errno(status);
>>> +   return status;
>>> +   }
>>> +
>>> +   /*
>>> +* there are two cases which could cause us to EAGAIN in the
>>> +* we-need-more-metadata case:
>>> +* 1) we haven't reserved *any*
>>> +* 2) we are so fragmented, we've needed to add metadata too
>>> +*many times.
>>> +*/
>>> +   if (free_extents < max_rec_needed) {
>>> +   if (!meta_ac || (ocfs2_alloc_context_bits_left(meta_ac)
>>> +   < ocfs2_extend_meta_needed(et->et_root_el)))
>>> +   status = 1;
>>> +   }
>>> +
>>> +   return status;
>>> +}
>>> +
>>> +#define OCFS2_MAX_REC_NEEDED_SPLIT 2
>>>  static int ocfs2_dio_end_io_write(struct inode *inode,
>>>   struct ocfs2_dio_write_ctxt *dwc,
>>>   loff_t offset,
>>> @@ -2281,14 +2310,13 @@ static int ocfs2_dio_end_io_write(struct inode 
>>> *inode,
>>> struct ocfs2_extent_tree et;
>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> struct ocfs2_inode_info *oi = OCFS2_I(inode);
>>> -   struct ocfs2_unwritten_extent *ue = NULL;
>>> +   struct ocfs2_unwritten_extent *ue = NULL, *tmp_ue;
>>> struct buffer_head *di_bh = NULL;
>>> struct ocfs2_dinode *di;
>>> -   struct ocfs2_alloc_context *data_ac = NULL;
>>> struct ocfs2_alloc_context *meta_ac = NULL;
>>> handle_t *handle = NULL;
>>> loff_t end = offset + bytes;
>>> -   int ret = 0, credits = 0, locked = 0;
>>> +   int ret = 0, credits = 0, locked = 0, restart = 0;
>>>
>>> ocfs2_init_dealloc_ctxt();
>>>
>>> @@ -2330,8 +2358,9 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>>
>> Should we restart here?
> 
> OK. we should restart before initialized the inode extent tree.
> 
>>
>>> ocfs2_init_dinode_extent_tree(, INODE_CACHE(inode), di_bh);
>>>
>>> -   ret = ocfs2_lock_allocators(inode, , 0, dwc->dw_zero_count*2,
>>> -   _ac, _ac);
>>> +restart_all:
>>> +   ret = ocfs2_lock_allocators(inode, , 0, OCFS2_MAX_REC_NEEDED_SPLIT,
>>> +   NULL, _ac);
>>> if (ret) {
>>> mlog_errno(ret);
>>> goto unlock;
>>> @@ -2343,7 +2372,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>>> if (IS_ERR(handle)) {
>>> ret = PTR_ERR(handle);
>>> mlog_errno(ret);
>>> -   goto unlock;
>>> +   goto free_ac;
>> I don't think it is a necessary change here.
> 
> we need to free the 'meta_ac' which a

Re: [Ocfs2-devel] [PATCH] ocfs2: check the metadate alloc before marking extent written

2017-12-04 Thread Joseph Qi


On 17/12/5 09:41, alex chen wrote:
> We need to check the free number of the records in each loop to mark
> extent written, because the last extent block may be changed through
> many times marking extent written and the 'num_free_extents' also be
> changed. In the worst case, the 'num_free_extents' may become less
> than the beginning of the loop. So we should not estimate the free
> number of the records at the beginning of the loop to mark extent
> written.
> 
> Reported-by: John Lightsey 
> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
> ---
>  fs/ocfs2/aops.c | 76 
> -
>  1 file changed, 64 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index d151632..702aebc 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2272,6 +2272,35 @@ static int ocfs2_dio_wr_get_block(struct inode *inode, 
> sector_t iblock,
>   return ret;
>  }
> 
> +static int ocfs2_dio_should_restart(struct ocfs2_extent_tree *et,
> + struct ocfs2_alloc_context *meta_ac, int max_rec_needed)
> +{
> + int status = 0, free_extents;
> +
> + free_extents = ocfs2_num_free_extents(et);
> + if (free_extents < 0) {
> + status = free_extents;
> + mlog_errno(status);
> + return status;
> + }
> +
> + /*
> +  * there are two cases which could cause us to EAGAIN in the
> +  * we-need-more-metadata case:
> +  * 1) we haven't reserved *any*
> +  * 2) we are so fragmented, we've needed to add metadata too
> +  *many times.
> +  */
> + if (free_extents < max_rec_needed) {
> + if (!meta_ac || (ocfs2_alloc_context_bits_left(meta_ac)
> + < ocfs2_extend_meta_needed(et->et_root_el)))
> + status = 1;
> + }
> +
> + return status;
> +}
> +
> +#define OCFS2_MAX_REC_NEEDED_SPLIT 2
>  static int ocfs2_dio_end_io_write(struct inode *inode,
> struct ocfs2_dio_write_ctxt *dwc,
> loff_t offset,
> @@ -2281,14 +2310,13 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>   struct ocfs2_extent_tree et;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>   struct ocfs2_inode_info *oi = OCFS2_I(inode);
> - struct ocfs2_unwritten_extent *ue = NULL;
> + struct ocfs2_unwritten_extent *ue = NULL, *tmp_ue;
>   struct buffer_head *di_bh = NULL;
>   struct ocfs2_dinode *di;
> - struct ocfs2_alloc_context *data_ac = NULL;
>   struct ocfs2_alloc_context *meta_ac = NULL;
>   handle_t *handle = NULL;
>   loff_t end = offset + bytes;
> - int ret = 0, credits = 0, locked = 0;
> + int ret = 0, credits = 0, locked = 0, restart = 0;
> 
>   ocfs2_init_dealloc_ctxt();
> 
> @@ -2330,8 +2358,9 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
> 
Should we restart here?

>   ocfs2_init_dinode_extent_tree(, INODE_CACHE(inode), di_bh);
> 
> - ret = ocfs2_lock_allocators(inode, , 0, dwc->dw_zero_count*2,
> - _ac, _ac);
> +restart_all:
> + ret = ocfs2_lock_allocators(inode, , 0, OCFS2_MAX_REC_NEEDED_SPLIT,
> + NULL, _ac);
>   if (ret) {
>   mlog_errno(ret);
>   goto unlock;
> @@ -2343,7 +2372,7 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>   if (IS_ERR(handle)) {
>   ret = PTR_ERR(handle);
>   mlog_errno(ret);
> - goto unlock;
> + goto free_ac;
I don't think it is a necessary change here.

>   }
>   ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh,
> OCFS2_JOURNAL_ACCESS_WRITE);
> @@ -2352,7 +2381,17 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>   goto commit;
>   }
> 
> - list_for_each_entry(ue, >dw_zero_list, ue_node) {
> + list_for_each_entry_safe(ue, tmp_ue, >dw_zero_list, ue_node) {
> + ret = ocfs2_dio_should_restart(, meta_ac,
> + OCFS2_MAX_REC_NEEDED_SPLIT * 2);
> + if (ret < 0) {
> + mlog_errno(ret);
> + break;
> + } else if (ret == 1) {
> + restart = 1;
> + break;
> + }
> +
>   ret = ocfs2_mark_extent_written(inode, , handle,
>   ue->ue_cpos, 1,
>   ue->ue_phys,
> @@ -2361,24 +2400,37 @@ static int ocfs2_dio_end_io_write(struct inode *inode,
>   mlog_errno(ret);
>   break;
>   }
> +
> + dwc->dw_zero_count--;
> + list_del_init(>ue_node);
> + spin_lock(>ip_lock);
> + list_del_init(>ue_ip_node);
> + 

Re: [Ocfs2-devel] [patch 02/11] ocfs2: give an obvious tip for mismatched cluster names

2017-11-30 Thread Joseph Qi
Acked-by: Joseph Qi <jiangqi...@gmail.com>

On 17/12/1 06:24, a...@linux-foundation.org wrote:
> From: Gang He <g...@suse.com>
> Subject: ocfs2: give an obvious tip for mismatched cluster names
> 
> Add an obvious error message, due to mismatched cluster names between
> on-disk and in the current cluster.  We can meet this case during OCFS2
> cluster migration.
> 
> If we can give the user an obvious tip for why they can not mount the file
> system after migration, they can quickly fix this mismatch problem. 
> 
> Second, also move printing ocfs2_fill_super() errno to the front of
> ocfs2_dismount_volume(), since ocfs2_dismount_volume() will also print
> its own message.
> 
> I looked through all the code of OCFS2 (include o2cb); there is not any
> place which returns this error.  In fact, the function calling path
> ocfs2_fill_super -> ocfs2_mount_volume -> ocfs2_dlm_init ->
> dlm_new_lockspace is a very specific one.  We can use this errno to give
> the user a more clear tip, since this case is a little common during
> cluster migration, but the customer can quickly get the failure cause if
> there is a error printed.  Also, I think it is not possible to add this
> errno in the o2cb path during ocfs2_dlm_init(), since the o2cb code has
> been stable for a long time.  
> 
> We only print this error tip when the user uses pcmk stack, since using
> the o2cb stack the user will not meet this error.
> 
> [g...@suse.com: v2]
>   Link: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1495419305-2D3780-2D1-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=NM9dBrqizxz_iRGR3xJBDWe3ZXjRmdIFbsuME7pct2E=_ZX6R3EJr9aqEZxYr8VosRFGO6DGM6ijCMJJtUn-vkU=
> Link: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1495089336-2D19312-2D1-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=NM9dBrqizxz_iRGR3xJBDWe3ZXjRmdIFbsuME7pct2E=RPH6JpO0mNpIPG31fCVyRNShmF6JUvnvGkN61uUU9ag=
> Signed-off-by: Gang He <g...@suse.com>
> Reviewed-by: Mark Fasheh <mfas...@versity.com>
> Cc: Joel Becker <jl...@evilplan.org>
> Cc: Junxiao Bi <junxiao...@oracle.com>
> Cc: Joseph Qi <jiangqi...@gmail.com>
> Signed-off-by: Andrew Morton <a...@linux-foundation.org>
> ---
> 
>  fs/ocfs2/super.c |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff -puN 
> fs/ocfs2/super.c~ocfs2-give-an-obvious-tip-for-dismatch-cluster-names 
> fs/ocfs2/super.c
> --- a/fs/ocfs2/super.c~ocfs2-give-an-obvious-tip-for-dismatch-cluster-names
> +++ a/fs/ocfs2/super.c
> @@ -1208,14 +1208,15 @@ static int ocfs2_fill_super(struct super
>  read_super_error:
>   brelse(bh);
>  
> + if (status)
> + mlog_errno(status);
> +
>   if (osb) {
>   atomic_set(>vol_state, VOLUME_DISABLED);
>   wake_up(>osb_mount_event);
>   ocfs2_dismount_volume(sb, 1);
>   }
>  
> - if (status)
> - mlog_errno(status);
>   return status;
>  }
>  
> @@ -1843,6 +1844,9 @@ static int ocfs2_mount_volume(struct sup
>   status = ocfs2_dlm_init(osb);
>   if (status < 0) {
>   mlog_errno(status);
> + if (status == -EBADR && ocfs2_userspace_stack(osb))
> + mlog(ML_ERROR, "couldn't mount because cluster name on"
> + " disk does not match the running cluster name.\n");
>   goto leave;
>   }
>  
> _
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function

2017-11-28 Thread Joseph Qi


On 17/11/28 16:54, Gang He wrote:
> Hi Joseph,
> 
> 

> 
>>
>> On 17/11/28 15:24, Gang He wrote:
>>> Hello Joseph,
>>>
>>>
>>
>>>

 On 17/11/28 11:35, Gang He wrote:
> Hello Joseph,
>
>

>> Hi Gang,
>>
>> On 17/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He 
>>> ---
>>>  fs/ocfs2/extent_map.c | 67 
>> +++
>>>  fs/ocfs2/extent_map.h |  3 +++
>>>  2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>> fiemap_extent_info *fieinfo,
>>> return ret;
>>>  }
>>>  
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +  int wait)
>>> +{
>>> +   int ret = 0, is_last;
>>> +   u32 mapping_end, cpos;
>>> +   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +   struct buffer_head *di_bh = NULL;
>>> +   struct ocfs2_extent_rec rec;
>>> +
>>> +   if (wait)
>>> +   ret = ocfs2_inode_lock(inode, _bh, 0);
>>> +   else
>>> +   ret = ocfs2_try_inode_lock(inode, _bh, 0);
>>> +   if (ret)
>>> +   goto out;
>>> +
>>> +   if (wait)
>>> +   down_read(_I(inode)->ip_alloc_sem);
>>> +   else {
>>> +   if (!down_read_trylock(_I(inode)->ip_alloc_sem)) {
>>> +   ret = -EAGAIN;
>>> +   goto out_unlock1;
>>> +   }
>>> +   }
>>> +
>>> +   if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +  ((map_start + map_len) <= i_size_read(inode)))
>>> +   goto out_unlock2;
>>> +
>>> +   cpos = map_start >> osb->s_clustersize_bits;
>>> +   mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +  map_start + map_len);
>>> +   is_last = 0;
>>> +   while (cpos < mapping_end && !is_last) {
>>> +   ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +NULL, , _last);
>>> +   if (ret) {
>>> +   mlog_errno(ret);
>>> +   goto out_unlock2;
>>> +   }
>>> +
>>> +   if (rec.e_blkno == 0ULL)
>>> +   break;
>>> +
>>> +   if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +   break;
>>> +
>>> +   cpos = le32_to_cpu(rec.e_cpos) +
>>> +   le16_to_cpu(rec.e_leaf_clusters);
>>> +   }
>>> +
>>> +   if (cpos < mapping_end)
>>> +   ret = 1;
>>> +
>>> +out_unlock2:
>>> +   brelse(di_bh);
>>> +
>>> +   up_read(_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>> Should brelse(di_bh) be here?
> If the code jumps to out_unlock1 directly, the di_bh pointer should be 
> NULL, 
>>
 it is not necessary to release.
>
 Umm... No, once going out here, we have already taken inode lock. So
 di_bh should be released.
>>> Sorry, you are right.
>>>

>>
>>> +   ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +   return (ret ? 0 : 1);
>> I don't think EAGAIN and other error code can be handled the same. We
>> have to distinguish them.
> Ok, I think we can add one line log to report the error in case the error 
> is 
>>
 not EAGAIN. 
>
 My point is, there is no need to try again in several cases, e.g. EROFS
 returned by ocfs2_get_clusters_nocache.
>>> In this function ocfs2_overwrite_io() only can return True(1) or False(0), 
>> then I think we can only give a error print before return true/false.
>>> It is not necessary to return another value, but should let the user know 
>> any possible error message.
>>> This is because you just ignore the error and convert it to 0 or 1.
>> But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
>> to upper layer and let it try again.
>> But in some cases, e.g. EROFS, trying again is meaningless. That's why
>> we can't simply return 0 or 1 here. Also we have to distinguish the
>> error code in the next patch.
> I think that we have to use the return value if we want to propagate the 
> errorno to the above.
> I will change the 

Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function

2017-11-28 Thread Joseph Qi


On 17/11/28 15:24, Gang He wrote:
> Hello Joseph,
> 
> 

> 
>>
>> On 17/11/28 11:35, Gang He wrote:
>>> Hello Joseph,
>>>
>>>
>>
 Hi Gang,

 On 17/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
>
> Signed-off-by: Gang He 
> ---
>  fs/ocfs2/extent_map.c | 67 
 +++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
>
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
 fiemap_extent_info *fieinfo,
>   return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +int wait)
> +{
> + int ret = 0, is_last;
> + u32 mapping_end, cpos;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> + struct buffer_head *di_bh = NULL;
> + struct ocfs2_extent_rec rec;
> +
> + if (wait)
> + ret = ocfs2_inode_lock(inode, _bh, 0);
> + else
> + ret = ocfs2_try_inode_lock(inode, _bh, 0);
> + if (ret)
> + goto out;
> +
> + if (wait)
> + down_read(_I(inode)->ip_alloc_sem);
> + else {
> + if (!down_read_trylock(_I(inode)->ip_alloc_sem)) {
> + ret = -EAGAIN;
> + goto out_unlock1;
> + }
> + }
> +
> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +((map_start + map_len) <= i_size_read(inode)))
> + goto out_unlock2;
> +
> + cpos = map_start >> osb->s_clustersize_bits;
> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +map_start + map_len);
> + is_last = 0;
> + while (cpos < mapping_end && !is_last) {
> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +  NULL, , _last);
> + if (ret) {
> + mlog_errno(ret);
> + goto out_unlock2;
> + }
> +
> + if (rec.e_blkno == 0ULL)
> + break;
> +
> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> + break;
> +
> + cpos = le32_to_cpu(rec.e_cpos) +
> + le16_to_cpu(rec.e_leaf_clusters);
> + }
> +
> + if (cpos < mapping_end)
> + ret = 1;
> +
> +out_unlock2:
> + brelse(di_bh);
> +
> + up_read(_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
 Should brelse(di_bh) be here?
>>> If the code jumps to out_unlock1 directly, the di_bh pointer should be 
>>> NULL, 
>> it is not necessary to release.
>>>
>> Umm... No, once going out here, we have already taken inode lock. So
>> di_bh should be released.
> Sorry, you are right.
> 
>>

> + ocfs2_inode_unlock(inode, 0);
> +
> +out:
> + return (ret ? 0 : 1);
 I don't think EAGAIN and other error code can be handled the same. We
 have to distinguish them.
>>> Ok, I think we can add one line log to report the error in case the error 
>>> is 
>> not EAGAIN. 
>>>
>> My point is, there is no need to try again in several cases, e.g. EROFS
>> returned by ocfs2_get_clusters_nocache.
> In this function ocfs2_overwrite_io() only can return True(1) or False(0), 
> then I think we can only give a error print before return true/false.
> It is not necessary to return another value, but should let the user know any 
> possible error message.
>This is because you just ignore the error and convert it to 0 or 1.
But in your next patch, if !ocfs2_overwrite_io(), it will return EGAIN
to upper layer and let it try again.
But in some cases, e.g. EROFS, trying again is meaningless. That's why
we can't simply return 0 or 1 here. Also we have to distinguish the
error code in the next patch.

> Thanks
> Gang 
> 
>>

 Thanks,
 Joseph

> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
 whence)
>  {
>   struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, 
> u64 
 v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> + 

Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function

2017-11-27 Thread Joseph Qi


On 17/11/28 11:35, Gang He wrote:
> Hello Joseph,
> 
> 

>> Hi Gang,
>>
>> On 17/11/27 17:46, Gang He wrote:
>>> Add ocfs2_overwrite_io function, which is used to judge if
>>> overwrite allocated blocks, otherwise, the write will bring extra
>>> block allocation overhead.
>>>
>>> Signed-off-by: Gang He 
>>> ---
>>>  fs/ocfs2/extent_map.c | 67 
>> +++
>>>  fs/ocfs2/extent_map.h |  3 +++
>>>  2 files changed, 70 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>> index e4719e0..98bf325 100644
>>> --- a/fs/ocfs2/extent_map.c
>>> +++ b/fs/ocfs2/extent_map.c
>>> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
>> fiemap_extent_info *fieinfo,
>>> return ret;
>>>  }
>>>  
>>> +/* Is IO overwriting allocated blocks? */
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +  int wait)
>>> +{
>>> +   int ret = 0, is_last;
>>> +   u32 mapping_end, cpos;
>>> +   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>> +   struct buffer_head *di_bh = NULL;
>>> +   struct ocfs2_extent_rec rec;
>>> +
>>> +   if (wait)
>>> +   ret = ocfs2_inode_lock(inode, _bh, 0);
>>> +   else
>>> +   ret = ocfs2_try_inode_lock(inode, _bh, 0);
>>> +   if (ret)
>>> +   goto out;
>>> +
>>> +   if (wait)
>>> +   down_read(_I(inode)->ip_alloc_sem);
>>> +   else {
>>> +   if (!down_read_trylock(_I(inode)->ip_alloc_sem)) {
>>> +   ret = -EAGAIN;
>>> +   goto out_unlock1;
>>> +   }
>>> +   }
>>> +
>>> +   if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
>>> +  ((map_start + map_len) <= i_size_read(inode)))
>>> +   goto out_unlock2;
>>> +
>>> +   cpos = map_start >> osb->s_clustersize_bits;
>>> +   mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>> +  map_start + map_len);
>>> +   is_last = 0;
>>> +   while (cpos < mapping_end && !is_last) {
>>> +   ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
>>> +NULL, , _last);
>>> +   if (ret) {
>>> +   mlog_errno(ret);
>>> +   goto out_unlock2;
>>> +   }
>>> +
>>> +   if (rec.e_blkno == 0ULL)
>>> +   break;
>>> +
>>> +   if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
>>> +   break;
>>> +
>>> +   cpos = le32_to_cpu(rec.e_cpos) +
>>> +   le16_to_cpu(rec.e_leaf_clusters);
>>> +   }
>>> +
>>> +   if (cpos < mapping_end)
>>> +   ret = 1;
>>> +
>>> +out_unlock2:
>>> +   brelse(di_bh);
>>> +
>>> +   up_read(_I(inode)->ip_alloc_sem);
>>> +
>>> +out_unlock1:
>> Should brelse(di_bh) be here?
> If the code jumps to out_unlock1 directly, the di_bh pointer should be NULL, 
> it is not necessary to release.
> 
Umm... No, once going out here, we have already taken inode lock. So
di_bh should be released.

>>
>>> +   ocfs2_inode_unlock(inode, 0);
>>> +
>>> +out:
>>> +   return (ret ? 0 : 1);
>> I don't think EAGAIN and other error code can be handled the same. We
>> have to distinguish them.
> Ok, I think we can add one line log to report the error in case the error is 
> not EAGAIN. 
> 
My point is, there is no need to try again in several cases, e.g. EROFS
returned by ocfs2_get_clusters_nocache.

>>
>> Thanks,
>> Joseph
>>
>>> +}
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> whence)
>>>  {
>>> struct inode *inode = file->f_mapping->host;
>>> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
>>> index 67ea57d..fd9e86a 100644
>>> --- a/fs/ocfs2/extent_map.h
>>> +++ b/fs/ocfs2/extent_map.h
>>> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
>> v_blkno, u64 *p_blkno,
>>>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>>>  u64 map_start, u64 map_len);
>>>  
>>> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
>>> +  int wait);
>>> +
>>>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
>> origin);
>>>  
>>>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
>>>
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/3] ocfs2: add ocfs2_overwrite_io function

2017-11-27 Thread Joseph Qi
Hi Gang,

On 17/11/27 17:46, Gang He wrote:
> Add ocfs2_overwrite_io function, which is used to judge if
> overwrite allocated blocks, otherwise, the write will bring extra
> block allocation overhead.
> 
> Signed-off-by: Gang He 
> ---
>  fs/ocfs2/extent_map.c | 67 
> +++
>  fs/ocfs2/extent_map.h |  3 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
> index e4719e0..98bf325 100644
> --- a/fs/ocfs2/extent_map.c
> +++ b/fs/ocfs2/extent_map.c
> @@ -832,6 +832,73 @@ int ocfs2_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>   return ret;
>  }
>  
> +/* Is IO overwriting allocated blocks? */
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +int wait)
> +{
> + int ret = 0, is_last;
> + u32 mapping_end, cpos;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> + struct buffer_head *di_bh = NULL;
> + struct ocfs2_extent_rec rec;
> +
> + if (wait)
> + ret = ocfs2_inode_lock(inode, _bh, 0);
> + else
> + ret = ocfs2_try_inode_lock(inode, _bh, 0);
> + if (ret)
> + goto out;
> +
> + if (wait)
> + down_read(_I(inode)->ip_alloc_sem);
> + else {
> + if (!down_read_trylock(_I(inode)->ip_alloc_sem)) {
> + ret = -EAGAIN;
> + goto out_unlock1;
> + }
> + }
> +
> + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
> +((map_start + map_len) <= i_size_read(inode)))
> + goto out_unlock2;
> +
> + cpos = map_start >> osb->s_clustersize_bits;
> + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
> +map_start + map_len);
> + is_last = 0;
> + while (cpos < mapping_end && !is_last) {
> + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos,
> +  NULL, , _last);
> + if (ret) {
> + mlog_errno(ret);
> + goto out_unlock2;
> + }
> +
> + if (rec.e_blkno == 0ULL)
> + break;
> +
> + if (rec.e_flags & OCFS2_EXT_REFCOUNTED)
> + break;
> +
> + cpos = le32_to_cpu(rec.e_cpos) +
> + le16_to_cpu(rec.e_leaf_clusters);
> + }
> +
> + if (cpos < mapping_end)
> + ret = 1;
> +
> +out_unlock2:
> + brelse(di_bh);
> +
> + up_read(_I(inode)->ip_alloc_sem);
> +
> +out_unlock1:
Should brelse(di_bh) be here?

> + ocfs2_inode_unlock(inode, 0);
> +
> +out:
> + return (ret ? 0 : 1);
I don't think EAGAIN and other error code can be handled the same. We
have to distinguish them.

Thanks,
Joseph

> +}
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> whence)
>  {
>   struct inode *inode = file->f_mapping->host;
> diff --git a/fs/ocfs2/extent_map.h b/fs/ocfs2/extent_map.h
> index 67ea57d..fd9e86a 100644
> --- a/fs/ocfs2/extent_map.h
> +++ b/fs/ocfs2/extent_map.h
> @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct inode *inode, u64 
> v_blkno, u64 *p_blkno,
>  int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>u64 map_start, u64 map_len);
>  
> +int ocfs2_overwrite_io(struct inode *inode, u64 map_start, u64 map_len,
> +int wait);
> +
>  int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int 
> origin);
>  
>  int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster,
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2/dlm: get mle inuse only when it is initialized

2017-11-14 Thread Joseph Qi


On 17/11/14 17:15, Changwei Ge wrote:
> Sorry for previous patch's format.
> So I resend it.
> 
> When dlm_add_migration_mle returns -EEXIST, previously input mle will
> not be initialized. So we can't use its associated dlm object.
> And we truly don't need this mle for already launched migration
> progress, since oldmle has taken this role.
> 
> Thanks,
> Changwei
> 
> Signed-off-by: Changwei Ge <ge.chang...@h3c.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>   fs/ocfs2/dlm/dlmmaster.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 3e04279446e8..9c3e0f13ca87 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -2616,7 +2616,9 @@ static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
>* otherwise the assert_master from the new
>* master will destroy this.
>*/
> - dlm_get_mle_inuse(mle);
> + if (ret != -EEXIST)
> + dlm_get_mle_inuse(mle);
> +
>   spin_unlock(>master_lock);
>   spin_unlock(>spinlock);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] Adding new node to an online OCFS2 cluster

2017-11-09 Thread Joseph Qi
Hi Cédric,
I guess it is because your shared disk is formatted with only 2 slots.
You can check it using debugfs.ocfs2.
If it is, you may update slot count using tunefs.ocfs2 -N .
Hope it will help you.

Thanks,
Joseph

On 17/11/9 18:56, BASSAGET Cédric wrote:
> Hello,
> As I did not get help on users mailing list, I allow myself to post my
> question here. Sorry if it's not the right place, but I can't find any
> documentation.
> 
> I'm currently running an OCFS2 cluster on two hosts.
> My ocfs2 partition is an iscsi disk mounted on both hosts debian1 and
> debian2
> 
> I need to add a third host (debian3) to this cluster.
> I've set up  iscsi (& multipath), but now I'm looking for the correct way
> to add the ocfs2 partition to my existing cluster.
> 
> root@debian1:~# cat /etc/ocfs2/cluster.conf
> node:
> ip_port = 
> ip_address = 192.168.0.11
> number = 0
> name = debian1
> cluster = ocfs2
> node:
> ip_port = 
> ip_address = 192.168.0.12
> number = 1
> name = debian2
> cluster = ocfs2
> cluster:
> node_count = 2
> name = ocfs2
> 
> Same config on debian2.
> 
> I try to add the new node to the cluster, running this command on nodes
> debian1 et debian2 :
> 
> root@debian1:~# o2cb_ctl -C -i -n debian3 -t node -a number=2 -a
> ip_address=192.168.0.13 -a ip_port= -a cluster=ocfs2
> 
> my new node is added to the config file, and node_count is incremented to 3.
> 
> then I do a "systemctl reload ocfs2" on debian1 and debian2.
> I copy the cluster.conf file on debian3 and restart ocfs2 on it.
> 
> and in dmesg, I get :
> [  145.582469] o2net: Connected to node debian2 (num 1) at 192.168.0.12:
> [  145.582978] o2net: Connected to node debian1 (num 0) at 192.168.0.11:
> [  149.631921] o2dlm: Joining domain D2A1F74976E04C55B625CDC8DAC1D5E5
> [  149.631922] (
> [  149.631923] 0
> [  149.631924] 1
> [  149.631924] 2
> [  149.631925] ) 3 nodes
> [  149.634255] (mount.ocfs2,1627,0):ocfs2_find_slot:490 *ERROR: no free
> slots available!*
> [  149.634784] (mount.ocfs2,1627,0):ocfs2_mount_volume:1859 ERROR: status =
> -22
> [  153.738044] o2dlm: Leaving domain D2A1F74976E04C55B625CDC8DAC1D5E5
> [  153.738153] ocfs2: Unmounting device (254,0) on (node 2)
> [  153.738174] (mount.ocfs2,1627,0):ocfs2_fill_super:1218 ERROR: status =
> -22
> [  155.657295] o2net: No longer connected to node debian1 (num 0) at
> 192.168.0.11:
> [  155.657387] o2net: No longer connected to node debian2 (num 1) at
> 192.168.0.12:
> 
> 
> Can anyone tell me how I can add a slot, so my new node can work ?
> 
> Thanks for your help.
> Regards,
> Cédric
> 
> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2/cluster: unlock the o2hb_live_lock before the o2nm_depend_item()

2017-11-06 Thread Joseph Qi
Hi Alex,

For local heartbeat mode, I understand that if the region is not found,
o2hb_live_lock won't be released. And if the region is found, it doesn't
have to iterate the next region. So I agree with your fix in this case.

But I still don't get how to make sure the safe iteration in case of
global heartbeat mode.

Thanks,
Joseph

On 17/11/2 14:00, alex chen wrote:
> In the following situation, the down_write() will be called under
> the spin_lock(), which may lead a soft lockup:
> o2hb_region_inc_user
>  spin_lock(_live_lock)
>   o2hb_region_pin
>o2nm_depend_item
> configfs_depend_item
>  inode_lock
>   down_write
>   -->here may sleep and reschedule
> 
> So we should unlock the o2hb_live_lock before the o2nm_depend_item(), and
> get item reference in advance to prevent the region to be released.
> 
> Signed-off-by: Alex Chen 
> Reviewed-by: Yiwen Jiang 
> Reviewed-by: Jun Piao 
> ---
>  fs/ocfs2/cluster/heartbeat.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index d020604..07b2fdc 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -2399,8 +2399,15 @@ static int o2hb_region_pin(const char *region_uuid)
>   if (reg->hr_item_pinned || reg->hr_item_dropped)
>   goto skip_pin;
> 
> + config_item_get(>hr_item);
> + spin_unlock(_live_lock);
> +
>   /* Ignore ENOENT only for local hb (userdlm domain) */
>   ret = o2nm_depend_item(>hr_item);
> +
> + config_item_put(>hr_item);
> + spin_lock(_live_lock);
> +
>   if (!ret) {
>   mlog(ML_CLUSTER, "Pin region %s\n", uuid);
>   reg->hr_item_pinned = 1;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/cluster: unlock the o2hb_live_lock before the o2nm_depend_item()

2017-10-31 Thread Joseph Qi
Hi Alex,

On 17/10/31 20:41, alex chen wrote:
> In the following situation, the down_write() will be called under
> the spin_lock(), which may lead a soft lockup:
> o2hb_region_inc_user
>  spin_lock(_live_lock)
>   o2hb_region_pin
>o2nm_depend_item
> configfs_depend_item
>  inode_lock
>   down_write
>   -->here may sleep and reschedule
> 
> So we should unlock the o2hb_live_lock before the o2nm_depend_item(), and
> get item reference in advance to prevent the region to be released.
> 
> Signed-off-by: Alex Chen 
> Reviewed-by: Yiwen Jiang 
> Reviewed-by: Jun Piao 
> ---
>  fs/ocfs2/cluster/heartbeat.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index d020604..f1142a9 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -2399,6 +2399,9 @@ static int o2hb_region_pin(const char *region_uuid)
>   if (reg->hr_item_pinned || reg->hr_item_dropped)
>   goto skip_pin;
> 
> + config_item_get(>hr_item);
> + spin_unlock(_live_lock);
> +
If unlock here, the iteration of o2hb_all_regions is no longer safe.

Thanks,
Joseph

>   /* Ignore ENOENT only for local hb (userdlm domain) */
>   ret = o2nm_depend_item(>hr_item);
>   if (!ret) {
> @@ -2410,9 +2413,14 @@ static int o2hb_region_pin(const char *region_uuid)
>   else {
>   mlog(ML_ERROR, "Pin region %s fails with %d\n",
>uuid, ret);
> + config_item_put(>hr_item);
> + spin_lock(_live_lock);
>   break;
>   }
>   }
> +
> + config_item_put(>hr_item);
> + spin_lock(_live_lock);
>  skip_pin:
>   if (found)
>   break;
> -- 1.9.5.msysgit.1
> 
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: should wait dio before inode lock in ocfs2_setattr()

2017-10-30 Thread Joseph Qi

On 17/10/27 15:57, alex chen wrote:
> we should wait dio requests to finish before inode lock in
> ocfs2_setattr(), otherwise the following deadlock will be happened:
> process 1  process 2process 3
> truncate file 'A'  end_io of writing file 'A'   receiving the bast 
> messages
> ocfs2_setattr
>  ocfs2_inode_lock_tracker
>   ocfs2_inode_lock_full
>  inode_dio_wait
>   __inode_dio_wait
>   -->waiting for all dio
>   requests finish
> dlm_proxy_ast_handler
>  dlm_do_local_bast
>   ocfs2_blocking_ast
>
> ocfs2_generic_handle_bast
> set 
> OCFS2_LOCK_BLOCKED flag
> dio_end_io
>  dio_bio_end_aio
>   dio_complete
>ocfs2_dio_end_io
> ocfs2_dio_end_io_write
>  ocfs2_inode_lock
>   __ocfs2_cluster_lock
>ocfs2_wait_for_mask
>-->waiting for OCFS2_LOCK_BLOCKED
>flag to be cleared, that is waiting
>for 'process 1' unlocking the inode lock
>inode_dio_end
>-->here dec the i_dio_count, but will never
>be called, so a deadlock happened.
> 
> Signed-off-by: Alex Chen <alex.c...@huawei.com>
> Reviewed-by: Jun Piao <piao...@huawei.com>
> 
Redundant blank line here.

> ---
>  fs/ocfs2/file.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index 6e41fc8..50e09a6 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   }
>   size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
>   if (size_change) {
> +
I'd prefer no blank line here. And comment it like:
/*
 * xxx
 */

Other looks good to me. After you fix above, feel free to add:
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> + /* here we should wait dio to finish before inode lock
> +  * to avoid a deadlock between ocfs2_setattr() and
> +  * ocfs2_dio_end_io_write()
> +  */
> + inode_dio_wait(inode);
> +
>   status = ocfs2_rw_lock(inode, 1);
>   if (status < 0) {
>   mlog_errno(status);
> @@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   if (status)
>   goto bail_unlock;
> 
> - inode_dio_wait(inode);
> -
>   if (i_size_read(inode) >= attr->ia_size) {
>   if (ocfs2_should_order_data(inode)) {
>   status = ocfs2_begin_ordered_truncate(inode,
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v3] ocfs2: the ip_alloc_sem should be taken in ocfs2_get_block()

2017-10-24 Thread Joseph Qi


On 17/10/24 20:46, alex chen wrote:
> The ip_alloc_sem should be taken in ocfs2_get_block() when reading file
> in DIRECT mode to prevent concurrent access to extent tree with
> ocfs2_dio_end_io_write(), which may cause BUGON in the following situation:
> 
> read file 'A'  end_io of writing file 'A'
> vfs_read
>  __vfs_read
>   ocfs2_file_read_iter
>generic_file_read_iter
> ocfs2_direct_IO
>  __blockdev_direct_IO
>   do_blockdev_direct_IO
>do_direct_IO
> get_more_blocks
>  ocfs2_get_block
>   ocfs2_extent_map_get_blocks
>ocfs2_get_clusters
> ocfs2_get_clusters_nocache()
>  ocfs2_search_extent_list
>   return the index of record which
>   contains the v_cluster, that is
>   v_cluster > rec[i]->e_cpos.
> ocfs2_dio_end_io
>  ocfs2_dio_end_io_write
>   
> down_write(>ip_alloc_sem);
>   ocfs2_mark_extent_written
>ocfs2_change_extent_flag
> ocfs2_split_extent
>  ...
>  --> modify the 
> rec[i]->e_cpos, resulting
>  in v_cluster < 
> rec[i]->e_cpos.
>  BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))
> 
> Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct 
> io")
> 
> Signed-off-by: Alex Chen <alex.c...@huawei.com>
> Reviewed-by: Jun Piao <piao...@huawei.com>
> Acked-by: Changwei Ge <ge.chang...@h3c.com>>
I don't think we have to rename ocfs2_dio_get_block. Anyway it doesn't
matter.
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>
 
> ---
>  fs/ocfs2/aops.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 88a31e9..d151632 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -134,6 +134,19 @@ static int ocfs2_symlink_get_block(struct inode *inode, 
> sector_t iblock,
>   return err;
>  }
> 
> +static int ocfs2_lock_get_block(struct inode *inode, sector_t iblock,
> + struct buffer_head *bh_result, int create)
> +{
> + int ret = 0;
> + struct ocfs2_inode_info *oi = OCFS2_I(inode);
> +
> + down_read(>ip_alloc_sem);
> + ret = ocfs2_get_block(inode, iblock, bh_result, create);
> + up_read(>ip_alloc_sem);
> +
> + return ret;
> +}
> +
>  int ocfs2_get_block(struct inode *inode, sector_t iblock,
>   struct buffer_head *bh_result, int create)
>  {
> @@ -2128,7 +2141,7 @@ static void ocfs2_dio_free_write_ctx(struct inode 
> *inode,
>   * called like this: dio->get_blocks(dio->inode, fs_startblk,
>   *   fs_count, map_bh, dio->rw == WRITE);
>   */
> -static int ocfs2_dio_get_block(struct inode *inode, sector_t iblock,
> +static int ocfs2_dio_wr_get_block(struct inode *inode, sector_t iblock,
>  struct buffer_head *bh_result, int create)
>  {
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> @@ -2154,12 +2167,9 @@ static int ocfs2_dio_get_block(struct inode *inode, 
> sector_t iblock,
>* while file size will be changed.
>*/
>   if (pos + total_len <= i_size_read(inode)) {
> - down_read(>ip_alloc_sem);
> - /* This is the fast path for re-write. */
> - ret = ocfs2_get_block(inode, iblock, bh_result, create);
> -
> - up_read(>ip_alloc_sem);
> 
> + /* This is the fast path for re-write. */
> + ret = ocfs2_lock_get_block(inode, iblock, bh_result, create);
>   if (buffer_mapped(bh_result) &&
>   !buffer_new(bh_result) &&
>   ret == 0)
> @@ -2424,9 +2434,9 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, 
> struct iov_iter *iter)
>   return 0;
> 
>   if (iov_iter_rw(iter) == READ)
> - get_block = ocfs2_get_block;
> + get_block = ocfs2_lock_get_block;
>   else
> - get_block = ocfs2_dio_get_block;
> + get_block = ocfs2_dio_wr_get_block;
> 
>   return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
>   iter, get_block,
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] a puzzle about is_global_system_inode function

2017-10-24 Thread Joseph Qi
GLOBAL_INODE_ALLOC_SYSTEM_INODE is used for system files inode
allocation, you can refer to ocfs2-tools for details.

Thanks,
Joseph

On 17/10/24 18:39, Larry Chen wrote:
> Hi all,
> 
> Function is_global_system_inode checks whether the type is
> in the range [OCFS2_FIRST_ONLINE_SYSTEM_INODE , 
> OCFS2_LAST_GLOBAL_SYSTEM_INODE ].
> But why the range does not include GLOBAL_INODE_ALLOC_SYSTEM_INODE ??
> 
> enum {
>    
>     GLOBAL_INODE_ALLOC_SYSTEM_INODE,
>      SLOT_MAP_SYSTEM_INODE,
> #define OCFS2_FIRST_ONLINE_SYSTEM_INODE SLOT_MAP_SYSTEM_INODE
>      HEARTBEAT_SYSTEM_INODE,
>      GLOBAL_BITMAP_SYSTEM_INODE,
>      USER_QUOTA_SYSTEM_INODE,
>      GROUP_QUOTA_SYSTEM_INODE,
> #define OCFS2_LAST_GLOBAL_SYSTEM_INODE GROUP_QUOTA_SYSTEM_INODE
>      
> }
> 
> Thanks
> Larry Chen
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent

2017-10-23 Thread Joseph Qi


On 17/10/24 10:50, alex chen wrote:
> The subsystem.su_mutex is required while accessing the item->ci_parent,
> otherwise, NULL pointer dereference to the item->ci_parent will be
> triggered in the following situation:
> add node delete node
> sys_write
>  vfs_write
>   configfs_write_file
>o2nm_node_store
> o2nm_node_local_write
>  do_rmdir
>   vfs_rmdir
>configfs_rmdir
> mutex_lock(>su_mutex);
> unlink_obj
>  item->ci_group = NULL;
>  item->ci_parent = NULL;  
>to_o2nm_cluster_from_node
> node->nd_item.ci_parent->ci_parent
> BUG since of NULL pointer dereference to nd_item.ci_parent
> 
> Moreover, the o2nm_cluster also should be protected by the subsystem.su_mutex.
> 
> Signed-off-by: Alex Chen <alex.c...@huawei.com>
> Reviewed-by: Jun Piao <piao...@huawei.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> 
> ---
>  fs/ocfs2/cluster/nodemanager.c | 63 
> --
>  1 file changed, 55 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
> index b17d180..c204ac9b 100644
> --- a/fs/ocfs2/cluster/nodemanager.c
> +++ b/fs/ocfs2/cluster/nodemanager.c
> @@ -40,6 +40,9 @@
>   "panic",/* O2NM_FENCE_PANIC */
>  };
> 
> +static inline void o2nm_lock_subsystem(void);
> +static inline void o2nm_unlock_subsystem(void);
> +
>  struct o2nm_node *o2nm_get_node_by_num(u8 node_num)
>  {
>   struct o2nm_node *node = NULL;
> @@ -181,7 +184,10 @@ static struct o2nm_cluster 
> *to_o2nm_cluster_from_node(struct o2nm_node *node)
>  {
>   /* through the first node_set .parent
>* mycluster/nodes/mynode == o2nm_cluster->o2nm_node_group->o2nm_node */
> - return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent);
> + if (node->nd_item.ci_parent)
> + return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent);
> + else
> + return NULL;
>  }
> 
>  enum {
> @@ -194,7 +200,7 @@ static ssize_t o2nm_node_num_store(struct config_item 
> *item, const char *page,
>  size_t count)
>  {
>   struct o2nm_node *node = to_o2nm_node(item);
> - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node);
> + struct o2nm_cluster *cluster;
>   unsigned long tmp;
>   char *p = (char *)page;
>   int ret = 0;
> @@ -214,6 +220,13 @@ static ssize_t o2nm_node_num_store(struct config_item 
> *item, const char *page,
>   !test_bit(O2NM_NODE_ATTR_PORT, >nd_set_attributes))
>   return -EINVAL; /* XXX */
> 
> + o2nm_lock_subsystem();
> + cluster = to_o2nm_cluster_from_node(node);
> + if (!cluster) {
> + o2nm_unlock_subsystem();
> + return -EINVAL;
> + }
> +
>   write_lock(>cl_nodes_lock);
>   if (cluster->cl_nodes[tmp])
>   ret = -EEXIST;
> @@ -226,6 +239,8 @@ static ssize_t o2nm_node_num_store(struct config_item 
> *item, const char *page,
>   set_bit(tmp, cluster->cl_nodes_bitmap);
>   }
>   write_unlock(>cl_nodes_lock);
> + o2nm_unlock_subsystem();
> +
>   if (ret)
>   return ret;
> 
> @@ -269,7 +284,7 @@ static ssize_t o2nm_node_ipv4_address_store(struct 
> config_item *item,
>   size_t count)
>  {
>   struct o2nm_node *node = to_o2nm_node(item);
> - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node);
> + struct o2nm_cluster *cluster;
>   int ret, i;
>   struct rb_node **p, *parent;
>   unsigned int octets[4];
> @@ -286,6 +301,13 @@ static ssize_t o2nm_node_ipv4_address_store(struct 
> config_item *item,
>   be32_add_cpu(_addr, octets[i] << (i * 8));
>   }
> 
> + o2nm_lock_subsystem();
> + cluster = to_o2nm_cluster_from_node(node);
> + if (!cluster) {
> + o2nm_unlock_subsystem();
> + return -EINVAL;
> + }
> +
>   ret = 0;
>   write_lock(>cl_nodes_lock);
>   if (o2nm_node_ip_tree_lookup(cluster, ipv4_addr, , ))
> @@ -298,6 +320,8 @@ static ssize_t o2nm_node_ipv4_address_store(struct 
> config_item *item,
>   rb_insert_color(>nd_ip_node, >cl_node_ip_tree);
>   }
>   write_unlock(>cl_nodes_lock);
> + o2nm

Re: [Ocfs2-devel] [PATCH] ocfs2: fix cluster hang after a node dies

2017-10-22 Thread Joseph Qi


On 17/10/17 14:48, Changwei Ge wrote:
> When a node dies, other live nodes have to choose a new master
> for an existed lock resource mastered by the dead node.
> 
> As for ocfs2/dlm implementation, this is done by function -
> dlm_move_lockres_to_recovery_list which marks those lock rsources
> as DLM_LOCK_RES_RECOVERING and manages them via a list from which
> DLM changes lock resource's master later.
> 
> So without invoking dlm_move_lockres_to_recovery_list, no master will
> be choosed after dlm recovery accomplishment since no lock resource can
> be found through ::resource list.
> 
> What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for
> lock resources mastered a dead node, it will break up synchronization
> among nodes.
> 
> So invoke dlm_move_lockres_to_recovery_list again.
> 
> Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery
> lockres when recovery master goes down")'
> 
A typo here, it should be:
Fixes: ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when 
recovery master goes down")
Also we'd better Cc stable as well.

Others look good to me.
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> Reported-by: Vitaly Mayatskih <v.mayats...@gmail.com>
> Signed-off-by: Changwei Ge <ge.chang...@h3c.com>
> ---
>   fs/ocfs2/dlm/dlmrecovery.c |1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 74407c6..ec8f758 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanup(struct 
> dlm_ctxt *dlm, u8 dead_node)
>   dlm_lockres_put(res);
>   continue;
>   }
> + dlm_move_lockres_to_recovery_list(dlm, res);
>   } else if (res->owner == dlm->node_num) {
>   dlm_free_dead_locks(dlm, res, dead_node);
>   __dlm_lockres_calc_usage(dlm, res);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: the ip_alloc_sem should be taken in ocfs2_get_block()

2017-10-22 Thread Joseph Qi


On 17/10/20 17:03, alex chen wrote:
> The ip_alloc_sem should be taken in ocfs2_get_block() when reading file
> in DIRECT mode to prevent concurrent access to extent tree with
> ocfs2_dio_end_io_write(), which may cause BUGON in
> ocfs2_get_clusters_nocache()->BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos))
> 
> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
> 
> ---
>  fs/ocfs2/aops.c | 21 +++--
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index 88a31e9..5cb939f 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -134,6 +134,19 @@ static int ocfs2_symlink_get_block(struct inode *inode, 
> sector_t iblock,
>   return err;
>  }
> 
> +static int ocfs2_get_block_lock(struct inode *inode, sector_t iblock,
> + struct buffer_head *bh_result, int create)
> +{
> + int ret;
> + struct ocfs2_inode_info *oi = OCFS2_I(inode);
> +
> + down_read(>ip_alloc_sem);
> + ret = ocfs2_get_block(inode, iblock, bh_result, create);
> + up_read(>ip_alloc_sem);
> +
> + return ret;
> +}
> +
>  int ocfs2_get_block(struct inode *inode, sector_t iblock,
>   struct buffer_head *bh_result, int create)
>  {
> @@ -2154,12 +2167,8 @@ static int ocfs2_dio_get_block(struct inode *inode, 
> sector_t iblock,
>* while file size will be changed.
>*/
>   if (pos + total_len <= i_size_read(inode)) {
> - down_read(>ip_alloc_sem);
>   /* This is the fast path for re-write. */
> - ret = ocfs2_get_block(inode, iblock, bh_result, create);
> -
> - up_read(>ip_alloc_sem);
> -
> + ret = ocfs2_get_block_lock(inode, iblock, bh_result, create);
>   if (buffer_mapped(bh_result) &&
>   !buffer_new(bh_result) &&
>   ret == 0)
> @@ -2424,7 +2433,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, 
> struct iov_iter *iter)
>   return 0;
> 
>   if (iov_iter_rw(iter) == READ)
> - get_block = ocfs2_get_block;
> + get_block = ocfs2_get_block_lock;
ocfs2_lock_get_block may be better.

Thanks,
Joseph

>   else
>   get_block = ocfs2_dio_get_block;
> 
> -- 1.9.5.msysgit.1
> 
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent

2017-10-22 Thread Joseph Qi
Hi Alex,

On 17/10/20 16:27, alex chen wrote:
> The subsystem.su_mutex is required while accessing the item->ci_parent,
> otherwise, NULL pointer dereference to the item->ci_parent will be
> triggered in the following situation:
> add node delete node
> sys_write
>  vfs_write
>   configfs_write_file
>o2nm_node_store
> o2nm_node_local_write
>  do_rmdir
>   vfs_rmdir
>configfs_rmdir
> mutex_lock(>su_mutex);
> unlink_obj
>  item->ci_group = NULL;
>  item->ci_parent = NULL;  
>to_o2nm_cluster_from_node
> node->nd_item.ci_parent->ci_parent
> BUG since of NULL pointer dereference to nd_item.ci_parent
> 
> Moreover, the o2nm_cluster also should be protected by the subsystem.su_mutex.
> 
Looks good to me. One suggestion is, we'd better add some blank lines
for code readability.

> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
> 
> ---
>  fs/ocfs2/cluster/nodemanager.c | 58 
> ++
>  1 file changed, 48 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c
> index b17d180..9b1859a 100644
> --- a/fs/ocfs2/cluster/nodemanager.c
> +++ b/fs/ocfs2/cluster/nodemanager.c
> @@ -39,6 +39,8 @@
>   "reset",/* O2NM_FENCE_RESET */
>   "panic",/* O2NM_FENCE_PANIC */
>  };
> +static inline void o2nm_lock_subsystem(void);
> +static inline void o2nm_unlock_subsystem(void);
> 
>  struct o2nm_node *o2nm_get_node_by_num(u8 node_num)
>  {
> @@ -181,7 +183,10 @@ static struct o2nm_cluster 
> *to_o2nm_cluster_from_node(struct o2nm_node *node)
>  {
>   /* through the first node_set .parent
>* mycluster/nodes/mynode == o2nm_cluster->o2nm_node_group->o2nm_node */
> - return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent);
> + if (node->nd_item.ci_parent)
> + return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent);
> + else
> + return NULL;
>  }
> 
>  enum {
> @@ -194,7 +199,7 @@ static ssize_t o2nm_node_num_store(struct config_item 
> *item, const char *page,
>  size_t count)
>  {
>   struct o2nm_node *node = to_o2nm_node(item);
> - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node);
> + struct o2nm_cluster *cluster;
>   unsigned long tmp;
>   char *p = (char *)page;
>   int ret = 0;
> @@ -213,6 +218,12 @@ static ssize_t o2nm_node_num_store(struct config_item 
> *item, const char *page,
>   if (!test_bit(O2NM_NODE_ATTR_ADDRESS, >nd_set_attributes) ||
>   !test_bit(O2NM_NODE_ATTR_PORT, >nd_set_attributes))
>   return -EINVAL; /* XXX */
<<< A blank line here may be better.

> + o2nm_lock_subsystem();
> + cluster = to_o2nm_cluster_from_node(node);
> + if (!cluster) {
> + o2nm_unlock_subsystem();
> + return -EINVAL;
> + }
> 
>   write_lock(>cl_nodes_lock);
>   if (cluster->cl_nodes[tmp])
> @@ -226,6 +237,7 @@ static ssize_t o2nm_node_num_store(struct config_item 
> *item, const char *page,
>   set_bit(tmp, cluster->cl_nodes_bitmap);
>   }
>   write_unlock(>cl_nodes_lock);
> + o2nm_unlock_subsystem();
>   if (ret)
>   return ret;
> 
> @@ -269,7 +281,7 @@ static ssize_t o2nm_node_ipv4_address_store(struct 
> config_item *item,
>   size_t count)
>  {
>   struct o2nm_node *node = to_o2nm_node(item);
> - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node);
> + struct o2nm_cluster *cluster;
>   int ret, i;
>   struct rb_node **p, *parent;
>   unsigned int octets[4];
> @@ -285,7 +297,12 @@ static ssize_t o2nm_node_ipv4_address_store(struct 
> config_item *item,
>   return -ERANGE;
>   be32_add_cpu(_addr, octets[i] << (i * 8));
>   }
> -
<<< Also here.

> + o2nm_lock_subsystem();
> + cluster = to_o2nm_cluster_from_node(node);
> + if (!cluster) {
> + o2nm_unlock_subsystem();
> + return -EINVAL;
> + }
>   ret = 0;
>   write_lock(>cl_nodes_lock);
>   if (o2nm_node_ip_tree_lookup(cluster, ipv4_addr, , ))
> @@ -298,6 +315,7 @@ static ssize_t o2nm_node_ipv4_address_store(struct 
> config_item *item,
>   rb_insert_color(>nd_ip_node, >cl_node_ip_tree);
>   }
>   write_unlock(>cl_nodes_lock);
> + o2nm_unlock_subsystem();
>   if (ret)
>   return ret;
> 
> @@ -315,7 +333,7 @@ static ssize_t o2nm_node_local_store(struct config_item 
> *item, const char *page,
>size_t count)
>  {
>   struct o2nm_node *node = to_o2nm_node(item);
> -  

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: cleanup unused func declaration and assignment

2017-10-13 Thread Joseph Qi


On 17/10/13 15:01, piaojun wrote:
> Signed-off-by: Jun Piao <piao...@huawei.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>  fs/ocfs2/alloc.c | 2 --
>  fs/ocfs2/cluster/heartbeat.h | 2 --
>  2 files changed, 4 deletions(-)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index a177eae..31a416d 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -3585,8 +3585,6 @@ static int ocfs2_merge_rec_left(struct ocfs2_path 
> *right_path,
>* The easy case - we can just plop the record right in.
>*/
>   *left_rec = *split_rec;
> -
> - has_empty_extent = 0;
>   } else
>   le16_add_cpu(_rec->e_leaf_clusters, split_clusters);
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.h b/fs/ocfs2/cluster/heartbeat.h
> index 3ef5137..a9e67ef 100644
> --- a/fs/ocfs2/cluster/heartbeat.h
> +++ b/fs/ocfs2/cluster/heartbeat.h
> @@ -79,10 +79,8 @@ void o2hb_fill_node_map(unsigned long *map,
>   unsigned bytes);
>  void o2hb_exit(void);
>  int o2hb_init(void);
> -int o2hb_check_node_heartbeating(u8 node_num);
>  int o2hb_check_node_heartbeating_no_sem(u8 node_num);
>  int o2hb_check_node_heartbeating_from_callback(u8 node_num);
> -int o2hb_check_local_node_heartbeating(void);
>  void o2hb_stop_all_regions(void);
>  int o2hb_get_all_regions(char *region_uuids, u8 numregions);
>  int o2hb_global_heartbeat_active(void);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/2] ocfs2: no need flush workqueue before destroying it

2017-10-13 Thread Joseph Qi


On 17/10/13 15:00, piaojun wrote:
> destroy_workqueue() will do flushing work for us.
> 
> Signed-off-by: Jun Piao <piao...@huawei.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>  fs/ocfs2/dlm/dlmdomain.c | 1 -
>  fs/ocfs2/dlmfs/dlmfs.c   | 1 -
>  fs/ocfs2/super.c | 4 +---
>  3 files changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index a2b19fb..e1fea14 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -394,7 +394,6 @@ int dlm_domain_fully_joined(struct dlm_ctxt *dlm)
>  static void dlm_destroy_dlm_worker(struct dlm_ctxt *dlm)
>  {
>   if (dlm->dlm_worker) {
> - flush_workqueue(dlm->dlm_worker);
>   destroy_workqueue(dlm->dlm_worker);
>   dlm->dlm_worker = NULL;
>   }
> diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
> index 9ab9e18..edce7b5 100644
> --- a/fs/ocfs2/dlmfs/dlmfs.c
> +++ b/fs/ocfs2/dlmfs/dlmfs.c
> @@ -670,7 +670,6 @@ static void __exit exit_dlmfs_fs(void)
>  {
>   unregister_filesystem(_fs_type);
> 
> - flush_workqueue(user_dlm_worker);
>   destroy_workqueue(user_dlm_worker);
> 
>   /*
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 8073349..040bbb6 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -2521,10 +2521,8 @@ static void ocfs2_delete_osb(struct ocfs2_super *osb)
>   /* This function assumes that the caller has the main osb resource */
> 
>   /* ocfs2_initializer_super have already created this workqueue */
> - if (osb->ocfs2_wq) {
> - flush_workqueue(osb->ocfs2_wq);
> + if (osb->ocfs2_wq)
>   destroy_workqueue(osb->ocfs2_wq);
> - }
> 
>   ocfs2_free_slot_info(osb);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: no need flush workqueue before destroying it

2017-10-12 Thread Joseph Qi


On 17/10/13 11:04, piaojun wrote:
> 1. delete redundant flush_workqueue();
> 2. delete some unused func declaration and assignment.
> 
Either looks good to me. But I do suggest separate these into two
patches since one is code optimizing and the other is pure cleanup.

Thanks,
Joseph

> Signed-off-by: Jun Piao 
> ---
>  fs/ocfs2/alloc.c | 2 --
>  fs/ocfs2/cluster/heartbeat.h | 2 --
>  fs/ocfs2/dlm/dlmdomain.c | 1 -
>  fs/ocfs2/dlmfs/dlmfs.c   | 1 -
>  fs/ocfs2/super.c | 4 +---
>  5 files changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index a177eae..31a416d 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -3585,8 +3585,6 @@ static int ocfs2_merge_rec_left(struct ocfs2_path 
> *right_path,
>* The easy case - we can just plop the record right in.
>*/
>   *left_rec = *split_rec;
> -
> - has_empty_extent = 0;
>   } else
>   le16_add_cpu(_rec->e_leaf_clusters, split_clusters);
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.h b/fs/ocfs2/cluster/heartbeat.h
> index 3ef5137..a9e67ef 100644
> --- a/fs/ocfs2/cluster/heartbeat.h
> +++ b/fs/ocfs2/cluster/heartbeat.h
> @@ -79,10 +79,8 @@ void o2hb_fill_node_map(unsigned long *map,
>   unsigned bytes);
>  void o2hb_exit(void);
>  int o2hb_init(void);
> -int o2hb_check_node_heartbeating(u8 node_num);
>  int o2hb_check_node_heartbeating_no_sem(u8 node_num);
>  int o2hb_check_node_heartbeating_from_callback(u8 node_num);
> -int o2hb_check_local_node_heartbeating(void);
>  void o2hb_stop_all_regions(void);
>  int o2hb_get_all_regions(char *region_uuids, u8 numregions);
>  int o2hb_global_heartbeat_active(void);
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index a2b19fb..e1fea14 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -394,7 +394,6 @@ int dlm_domain_fully_joined(struct dlm_ctxt *dlm)
>  static void dlm_destroy_dlm_worker(struct dlm_ctxt *dlm)
>  {
>   if (dlm->dlm_worker) {
> - flush_workqueue(dlm->dlm_worker);
>   destroy_workqueue(dlm->dlm_worker);
>   dlm->dlm_worker = NULL;
>   }
> diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
> index 9ab9e18..edce7b5 100644
> --- a/fs/ocfs2/dlmfs/dlmfs.c
> +++ b/fs/ocfs2/dlmfs/dlmfs.c
> @@ -670,7 +670,6 @@ static void __exit exit_dlmfs_fs(void)
>  {
>   unregister_filesystem(_fs_type);
> 
> - flush_workqueue(user_dlm_worker);
>   destroy_workqueue(user_dlm_worker);
> 
>   /*
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 8073349..040bbb6 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -2521,10 +2521,8 @@ static void ocfs2_delete_osb(struct ocfs2_super *osb)
>   /* This function assumes that the caller has the main osb resource */
> 
>   /* ocfs2_initializer_super have already created this workqueue */
> - if (osb->ocfs2_wq) {
> - flush_workqueue(osb->ocfs2_wq);
> + if (osb->ocfs2_wq)
>   destroy_workqueue(osb->ocfs2_wq);
> - }
> 
>   ocfs2_free_slot_info(osb);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fstrim: Fix start offset of first cluster group during fstrim

2017-10-12 Thread Joseph Qi


On 17/10/13 08:45, Junxiao Bi wrote:
> On 10/13/2017 03:12 AM, Ashish Samant wrote:
>> The first cluster group descriptor is not stored at the start of the group
>> but at an offset from the start. We need to take this into account while
>> doing fstrim on the first cluster group. Otherwise we will wrongly start
>> fstrim a few blocks after the desired start block and the range can cross
>> over into the next cluster group and zero out the group descriptor there.
>> This can cause filesytem corruption that cannot be fixed by fsck.
>>
>> Signed-off-by: Ashish Samant <ashish.sam...@oracle.com>
>> Cc: sta...@vger.kernel.org
> Looks good.
> 
> Reviewed-by: Junxiao Bi <junxiao...@oracle.com>
> 
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>
>> ---
>>  fs/ocfs2/alloc.c | 24 ++--
>>  1 file changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>> index a177eae..addd7c5 100644
>> --- a/fs/ocfs2/alloc.c
>> +++ b/fs/ocfs2/alloc.c
>> @@ -7304,13 +7304,24 @@ int ocfs2_truncate_inline(struct inode *inode, 
>> struct buffer_head *di_bh,
>>  
>>  static int ocfs2_trim_extent(struct super_block *sb,
>>   struct ocfs2_group_desc *gd,
>> - u32 start, u32 count)
>> + u64 group, u32 start, u32 count)
>>  {
>>  u64 discard, bcount;
>> +struct ocfs2_super *osb = OCFS2_SB(sb);
>>  
>>  bcount = ocfs2_clusters_to_blocks(sb, count);
>> -discard = le64_to_cpu(gd->bg_blkno) +
>> -ocfs2_clusters_to_blocks(sb, start);
>> +discard = ocfs2_clusters_to_blocks(sb, start);
>> +
>> +/*
>> + * For the first cluster group, the gd->bg_blkno is not at the start
>> + * of the group, but at an offset from the start. If we add it while
>> + * calculating discard for first group, we will wrongly start fstrim a
>> + * few blocks after the desried start block and the range can cross
>> + * over into the next cluster group. So, add it only if this is not
>> + * the first cluster group.
>> + */
>> +if (group != osb->first_cluster_group_blkno)
>> +discard += le64_to_cpu(gd->bg_blkno);
>>  
>>  trace_ocfs2_trim_extent(sb, (unsigned long long)discard, bcount);
>>  
>> @@ -7318,7 +7329,7 @@ static int ocfs2_trim_extent(struct super_block *sb,
>>  }
>>  
>>  static int ocfs2_trim_group(struct super_block *sb,
>> -struct ocfs2_group_desc *gd,
>> +struct ocfs2_group_desc *gd, u64 group,
>>  u32 start, u32 max, u32 minbits)
>>  {
>>  int ret = 0, count = 0, next;
>> @@ -7337,7 +7348,7 @@ static int ocfs2_trim_group(struct super_block *sb,
>>  next = ocfs2_find_next_bit(bitmap, max, start);
>>  
>>  if ((next - start) >= minbits) {
>> -ret = ocfs2_trim_extent(sb, gd,
>> +ret = ocfs2_trim_extent(sb, gd, group,
>>  start, next - start);
>>  if (ret < 0) {
>>  mlog_errno(ret);
>> @@ -7435,7 +7446,8 @@ int ocfs2_trim_fs(struct super_block *sb, struct 
>> fstrim_range *range)
>>  }
>>  
>>  gd = (struct ocfs2_group_desc *)gd_bh->b_data;
>> -cnt = ocfs2_trim_group(sb, gd, first_bit, last_bit, minlen);
>> +cnt = ocfs2_trim_group(sb, gd, group,
>> +   first_bit, last_bit, minlen);
>>  brelse(gd_bh);
>>  gd_bh = NULL;
>>  if (cnt < 0) {
>>
> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch 1/2] ocfs2/dlm: protect 'tracking_list' by 'track_lock'

2017-09-26 Thread Joseph Qi


On 17/9/26 08:39, piaojun wrote:
> 
> 
> On 2017/9/25 18:35, Joseph Qi wrote:
>>
>>
>> On 17/9/23 11:39, piaojun wrote:
>>> 'dlm->tracking_list' need to be protected by 'dlm->track_lock'.
>>>
>>> Signed-off-by: Jun Piao <piao...@huawei.com>
>>> Reviewed-by: Alex Chen <alex.c...@huawei.com>
>>> ---
>>>  fs/ocfs2/dlm/dlmdomain.c | 7 ++-
>>>  fs/ocfs2/dlm/dlmmaster.c | 4 ++--
>>>  2 files changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>>> index a2b19fb..b118525 100644
>>> --- a/fs/ocfs2/dlm/dlmdomain.c
>>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>>> @@ -726,12 +726,17 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
>>> }
>>>
>>> /* This list should be empty. If not, print remaining lockres */
>>> +   spin_lock(>track_lock);
>>> if (!list_empty(>tracking_list)) {
>>> mlog(ML_ERROR, "Following lockres' are still on the "
>>>  "tracking list:\n");
>>> -   list_for_each_entry(res, >tracking_list, tracking)
>>> +   list_for_each_entry(res, >tracking_list, tracking) 
>>> {
>>> +   spin_unlock(>track_lock);
>>
>> Um... If we unlock here, the iterator still has chance to be corrupted.
>>
>> Thanks,
>> Joseph
>>
> 
> we don't need care much about the corrupted 'tracking_list' because we
> have already picked up 'res' from 'tracking_list'. then we will get
> 'track_lock' again to prevent 'tracking_list' from being corrupted. but
> I'd better make sure that 'res' is not NULL before printing, just like:
> 
> list_for_each_entry(res, >tracking_list, tracking) {
>   spin_unlock(>track_lock);
>   if (res)
>   dlm_print_one_lock_resource(res);
>   spin_lock(>track_lock);
> }
> 
> Thanks
> Jun

IIUC, your intent to add track lock here is to protect tracking list
when iterate the loop, right? I am saying that if unlock track lock
here, the loop is still unsafe.
Checking res here is meaningless. Maybe list_for_each_entry_safe
could work here.
BTW, how this race case happens? The above code is during umount,
what is the other flow?

Thanks,
Joseph

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch 2/2] ocfs2: clean up some unused func declaration

2017-09-25 Thread Joseph Qi
Cc Andrew as well.

On 17/9/23 11:41, piaojun wrote:
> Signed-off-by: Jun Piao <piao...@huawei.com>
> Reviewed-by: Alex Chen <alex.c...@huawei.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

> ---
>  fs/ocfs2/buffer_head_io.h| 3 ---
>  fs/ocfs2/cluster/heartbeat.h | 2 --
>  2 files changed, 5 deletions(-)
> 
> diff --git a/fs/ocfs2/buffer_head_io.h b/fs/ocfs2/buffer_head_io.h
> index b97bcc6..b1bb70c 100644
> --- a/fs/ocfs2/buffer_head_io.h
> +++ b/fs/ocfs2/buffer_head_io.h
> @@ -28,9 +28,6 @@
> 
>  #include 
> 
> -void ocfs2_end_buffer_io_sync(struct buffer_head *bh,
> -  int uptodate);
> -
>  int ocfs2_write_block(struct ocfs2_super  *osb,
> struct buffer_head  *bh,
> struct ocfs2_caching_info   *ci);
> diff --git a/fs/ocfs2/cluster/heartbeat.h b/fs/ocfs2/cluster/heartbeat.h
> index 3ef5137..a9e67ef 100644
> --- a/fs/ocfs2/cluster/heartbeat.h
> +++ b/fs/ocfs2/cluster/heartbeat.h
> @@ -79,10 +79,8 @@ void o2hb_fill_node_map(unsigned long *map,
>   unsigned bytes);
>  void o2hb_exit(void);
>  int o2hb_init(void);
> -int o2hb_check_node_heartbeating(u8 node_num);
>  int o2hb_check_node_heartbeating_no_sem(u8 node_num);
>  int o2hb_check_node_heartbeating_from_callback(u8 node_num);
> -int o2hb_check_local_node_heartbeating(void);
>  void o2hb_stop_all_regions(void);
>  int o2hb_get_all_regions(char *region_uuids, u8 numregions);
>  int o2hb_global_heartbeat_active(void);
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch 1/2] ocfs2/dlm: protect 'tracking_list' by 'track_lock'

2017-09-25 Thread Joseph Qi


On 17/9/23 11:39, piaojun wrote:
> 'dlm->tracking_list' need to be protected by 'dlm->track_lock'.
> 
> Signed-off-by: Jun Piao 
> Reviewed-by: Alex Chen 
> ---
>  fs/ocfs2/dlm/dlmdomain.c | 7 ++-
>  fs/ocfs2/dlm/dlmmaster.c | 4 ++--
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
> index a2b19fb..b118525 100644
> --- a/fs/ocfs2/dlm/dlmdomain.c
> +++ b/fs/ocfs2/dlm/dlmdomain.c
> @@ -726,12 +726,17 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm)
>   }
> 
>   /* This list should be empty. If not, print remaining lockres */
> + spin_lock(>track_lock);
>   if (!list_empty(>tracking_list)) {
>   mlog(ML_ERROR, "Following lockres' are still on the "
>"tracking list:\n");
> - list_for_each_entry(res, >tracking_list, tracking)
> + list_for_each_entry(res, >tracking_list, tracking) 
> {
> + spin_unlock(>track_lock);

Um... If we unlock here, the iterator still has chance to be corrupted.

Thanks,
Joseph

>   dlm_print_one_lock_resource(res);
> + spin_lock(>track_lock);
> + }
>   }
> + spin_unlock(>track_lock);
> 
>   dlm_mark_domain_leaving(dlm);
>   dlm_leave_domain(dlm);
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 3e04279..44e7d18 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -589,9 +589,9 @@ static void dlm_init_lockres(struct dlm_ctxt *dlm,
> 
>   res->last_used = 0;
> 
> - spin_lock(>spinlock);
> + spin_lock(>track_lock);
>   list_add_tail(>tracking, >tracking_list);
> - spin_unlock(>spinlock);
> + spin_unlock(>track_lock);
> 
>   memset(res->lvb, 0, DLM_LVB_LEN);
>   memset(res->refmap, 0, sizeof(res->refmap));
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability

2017-08-22 Thread Joseph Qi


On 17/8/23 10:23, Junxiao Bi wrote:
> On 08/10/2017 06:49 PM, Changwei Ge wrote:
>> Hi Joseph,
>>
>>
>> On 2017/8/10 17:53, Joseph Qi wrote:
>>> Hi Changwei,
>>>
>>> On 17/8/9 23:24, ge changwei wrote:
>>>> Hi
>>>>
>>>>
>>>> On 2017/8/9 下午7:32, Joseph Qi wrote:
>>>>> Hi,
>>>>>
>>>>> On 17/8/7 15:13, Changwei Ge wrote:
>>>>>> Hi,
>>>>>>
>>>>>> In current code, while flushing AST, we don't handle an exception that
>>>>>> sending AST or BAST is failed.
>>>>>> But it is indeed possible that AST or BAST is lost due to some kind of
>>>>>> networks fault.
>>>>>>
>>>>> Could you please describe this issue more clearly? It is better analyze
>>>>> issue along with the error message and the status of related nodes.
>>>>> IMO, if network is down, one of the two nodes will be fenced. So what's
>>>>> your case here?
>>>>>
>>>>> Thanks,
>>>>> Joseph
>>>> I have posted the status of related lock resource in my preceding email. 
>>>> Please check them out.
>>>>
>>>> Moreover, network is not down forever even not longer than threshold  to 
>>>> be fenced.
>>>> So no node will be fenced.
>>>>
>>>> This issue happens in terrible network environment. Some messages may be 
>>>> abandoned by switch due to various conditions.
>>>> And even frequent and fast link up and down will also cause this issue.
>>>>
>>>> In a nutshell,  re-queuing AST and BAST is crucial when link between 
>>>> nodes recover quickly. It prevents cluster from hanging.
>>>> So you mean the tcp packet is lost due to connection reset? IIRC,
>> Yes, it's something like that exception which I think is deserved to be
>> fixed within OCFS2.
>>> Junxiao has posted a patchset to fix this issue.
>>> If you are using the way of re-queuing, how to make sure the original
>>> message is *truly* lost and the same ast/bast won't be sent twice?
>> With regards to TCP layer, if it returns error to OCFS2, packets must
>> not be sent successfully. So no node will obtain such an AST or BAST.
> Right, but not only AST/BAST, other messages pending in tcp queue will
> also lost if tcp return error to ocfs2, this can also caused hung.
> Besides, your fix may introduce duplicated ast/bast message Joseph
> mentioned.
> Ocfs2 depends tcp a lot, it can't work well if tcp return error to it.
> To fix it, maybe ocfs2 should maintain its own message queue and ack
> messages while not depend on TCP.>
Agree. Or we can add a sequence to distinguish duplicate message. Under
this, we can simply resend message if fails.

Thanks,
Joseph
 
> Thanks,
> Junxiao.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability

2017-08-22 Thread Joseph Qi
Hi Mark,

On 17/8/23 04:49, Mark Fasheh wrote:
> On Tue, Aug 8, 2017 at 5:56 AM, Changwei Ge  wrote:
 It will improve the reliability a lot.
>>> Can you detail your testing? Code-wise this looks fine to me but as
>>> you note, this is a pretty hard to hit corner case so it'd be nice to
>>> hear that you were able to exercise it.
>>>
>>> Thanks,
>>>--Mark
>> Hi Mark,
>>
>> My test is quite simple to perform.
>> Test environment includes 7 hosts. Ethernet devices in 6 of them are
>> down and then up repetitively.
>> After several rounds of up and down. Some file operation hangs.
>>
>> Through debugfs.ocfs2 tool involved in NODE 2 which was the owner of
>> lock resource 'O111503',
>> it told that:
>>
>> debugfs: dlm_locks O111503
>> Lockres: O111503   Owner: 2State: 0x0
>> Last Used: 0  ASTs Reserved: 0Inflight: 0Migration Pending: No
>> Refs: 4Locks: 2On Lists: None
>> Reference Map: 3
>>  Lock-Queue  Node  Level  Conv  Cookie   Refs  AST  BAST
>> Pending-Action
>>  Granted 2 PR -12:53 2 No   NoNone
>>  Granted 3 PR -13:48 2 No   NoNone
>>
>> That meant NODE 2 had granted NODE 3 and the AST had been transited to
>> NODE 3.
>>
>> Meanwhile, through debugfs.ocfs2 tool involved in NODE 3,
>> it told that:
>> debugfs: dlm_locks O111503
>> Lockres: O111503   Owner: 2State: 0x0
>> Last Used: 0  ASTs Reserved: 0Inflight: 0Migration Pending: No
>> Refs: 3Locks: 1On Lists: None
>> Reference Map:
>>  Lock-Queue  Node  Level  Conv  Cookie   Refs  AST  BAST
>> Pending-Action
>>  Blocked 3 PR -13:48 2 No   NoNone
>>
>> That meant NODE 3 didn't ever receive any AST to move local lock from
>> blocked list to grant list.
>>
>> This consequence  makes sense, since AST sending is failed which can be
>> seen in kernel log.
>>
>> As for BAST, it is more or less the same.
>>
>> Thanks
>> Changwei
> 
> 
> Thanks for the testing details. I think you got Andrew's e-mail wrong
> so I'm CC'ing him now. It might be a good idea to re-send the patch
> with the right CC's - add some of your testing details to the log.

IMO, network error occurs cannot make sure that target node hasn't
received the message. A complete message round includes:
1. sending to the target node;
2. get response from the target node.

So if network error happens on phase 2, re-queue the message will
cause ast/bast to be sent twice. I'm afraid this cannot be handled
currently.

If I understand wrong, please point out.

Thanks,
Joseph

> You're free to use my
> 
> Reviewed-by: Mark Fasheh 
> 
> as well.
> 
> Thanks again,
>--Mark
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability

2017-08-10 Thread Joseph Qi
Hi Changwei,

On 17/8/9 23:24, ge changwei wrote:
> Hi
> 
> 
> On 2017/8/9 下午7:32, Joseph Qi wrote:
>> Hi,
>>
>> On 17/8/7 15:13, Changwei Ge wrote:
>>> Hi,
>>>
>>> In current code, while flushing AST, we don't handle an exception that
>>> sending AST or BAST is failed.
>>> But it is indeed possible that AST or BAST is lost due to some kind of
>>> networks fault.
>>>
>> Could you please describe this issue more clearly? It is better analyze
>> issue along with the error message and the status of related nodes.
>> IMO, if network is down, one of the two nodes will be fenced. So what's
>> your case here?
>>
>> Thanks,
>> Joseph
> 
> I have posted the status of related lock resource in my preceding email. 
> Please check them out.
> 
> Moreover, network is not down forever even not longer than threshold  to 
> be fenced.
> So no node will be fenced.
> 
> This issue happens in terrible network environment. Some messages may be 
> abandoned by switch due to various conditions.
> And even frequent and fast link up and down will also cause this issue.
> 
> In a nutshell,  re-queuing AST and BAST is crucial when link between 
> nodes recover quickly. It prevents cluster from hanging.
>So you mean the tcp packet is lost due to connection reset? IIRC,
Junxiao has posted a patchset to fix this issue.
If you are using the way of re-queuing, how to make sure the original
message is *truly* lost and the same ast/bast won't be sent twice?

Thanks,
Joseph
 
> Thanks,
> Changwei
>>> If above exception happens, the requesting node will never obtain an AST
>>> back, hence, it will never acquire the lock or abort current locking.
>>>
>>> With this patch, I'd like to fix this issue by re-queuing the AST or
>>> BAST if sending is failed due to networks fault.
>>>
>>> And the re-queuing AST or BAST will be dropped if the requesting node is
>>> dead!
>>>
>>> It will improve the reliability a lot.
>>>
>>>
>>> Thanks.
>>>
>>> Changwei.
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: re-queue AST or BAST if sending is failed to improve the reliability

2017-08-09 Thread Joseph Qi
Hi,

On 17/8/7 15:13, Changwei Ge wrote:
> Hi,
> 
> In current code, while flushing AST, we don't handle an exception that
> sending AST or BAST is failed.
> But it is indeed possible that AST or BAST is lost due to some kind of
> networks fault.
> 
Could you please describe this issue more clearly? It is better analyze
issue along with the error message and the status of related nodes.
IMO, if network is down, one of the two nodes will be fenced. So what's
your case here?

Thanks,
Joseph

> If above exception happens, the requesting node will never obtain an AST
> back, hence, it will never acquire the lock or abort current locking.
> 
> With this patch, I'd like to fix this issue by re-queuing the AST or
> BAST if sending is failed due to networks fault.
> 
> And the re-queuing AST or BAST will be dropped if the requesting node is
> dead!
> 
> It will improve the reliability a lot.
> 
> 
> Thanks.
> 
> Changwei.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: free 'dummy_sc' in sc_fop_release() in case of memory leak

2017-06-26 Thread Joseph Qi


On 17/6/25 20:46, piaojun wrote:
> 'sd->dbg_sock' is malloc in sc_common_open(), but not freed at the end
> of sc_fop_release().
> 
> Signed-off-by: Jun Piao <piao...@huawei.com>
Looks good.
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

Thanks,
Joseph

> ---
>  fs/ocfs2/cluster/netdebug.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ocfs2/cluster/netdebug.c b/fs/ocfs2/cluster/netdebug.c
> index 564c504..74a21f6 100644
> --- a/fs/ocfs2/cluster/netdebug.c
> +++ b/fs/ocfs2/cluster/netdebug.c
> @@ -426,6 +426,7 @@ static int sc_fop_release(struct inode *inode, struct 
> file *file)
>   struct o2net_sock_container *dummy_sc = sd->dbg_sock;
> 
>   o2net_debug_del_sc(dummy_sc);
> + kfree(dummy_sc);
>   return seq_release_private(inode, file);
>  }
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-22 Thread Joseph Qi
Looks good.
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

Thanks,
Joseph

On 17/6/22 09:47, Eric Ren wrote:
> Another deadlock path caused by recursive locking is reported.
> This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
> take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
> have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
> taking inode lock at vfs entry points"). Yes, we intend to fix this
> kind of case in incremental way, because it's hard to find out all
> possible paths at once.
> 
> This one can be reproduced like this. On node1, cp a large file from
> home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
> Both nodes will hang up there. The backtraces:
> 
> On node1:
> [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
> [] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
> [] generic_perform_write+0xa9/0x180
> [] __generic_file_write_iter+0x1aa/0x1d0
> [] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
> [] __vfs_write+0xc3/0x130
> [] vfs_write+0xb1/0x1a0
> [] SyS_write+0x46/0xa0
> 
> On node2:
> [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
> [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
> [] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
> [] ocfs2_set_acl+0x22d/0x260 [ocfs2]
> [] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
> [] set_posix_acl+0x75/0xb0
> [] posix_acl_xattr_set+0x49/0xa0
> [] __vfs_setxattr+0x69/0x80
> [] __vfs_setxattr_noperm+0x72/0x1a0
> [] vfs_setxattr+0xa7/0xb0
> [] setxattr+0x12d/0x190
> [] path_setxattr+0x9f/0xb0
> [] SyS_setxattr+0x14/0x20
> 
> Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
> exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking
> logic to avoid recursive cluster lock").
> 
> Changes since v1:
> - Revised git commit description style in commit log.
> 
> Reported-by: Thomas Voegtle <t...@lio96.de>
> Tested-by: Thomas Voegtle <t...@lio96.de>
> Signed-off-by: Eric Ren <z...@suse.com>
> ---
>  fs/ocfs2/dlmglue.c |  4 
>  fs/ocfs2/xattr.c   | 23 +--
>  2 files changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 3b7c937..4689940 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
>   struct ocfs2_lock_res *lockres;
>  
>   lockres = _I(inode)->ip_inode_lockres;
> + /* had_lock means that the currect process already takes the cluster
> +  * lock previously. If had_lock is 1, we have nothing to do here, and
> +  * it will get unlocked where we got the lock.
> +  */
>   if (!had_lock) {
>   ocfs2_remove_holder(lockres, oh);
>   ocfs2_inode_unlock(inode, ex);
> diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
> index 3c5384d..f70c377 100644
> --- a/fs/ocfs2/xattr.c
> +++ b/fs/ocfs2/xattr.c
> @@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
>  void *buffer,
>  size_t buffer_size)
>  {
> - int ret;
> + int ret, had_lock;
>   struct buffer_head *di_bh = NULL;
> + struct ocfs2_lock_holder oh;
>  
> - ret = ocfs2_inode_lock(inode, _bh, 0);
> - if (ret < 0) {
> - mlog_errno(ret);
> - return ret;
> + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
> + if (had_lock < 0) {
> + mlog_errno(had_lock);
> + return had_lock;
>   }
>   down_read(_I(inode)->ip_xattr_sem);
>   ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
>name, buffer, buffer_size);
>   up_read(_I(inode)->ip_xattr_sem);
>  
> - ocfs2_inode_unlock(inode, 0);
> + ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
>  
>   brelse(di_bh);
>  
> @@ -3537,11 +3538,12 @@ int ocfs2_xattr_set(struct inode *inode,
>  {
>   struct buffer_head *di_bh = NULL;
>   struct ocfs2_dinode *di;
> - int ret, credits, ref_meta = 0, ref_credits = 0;
> + int ret, credits, had_lock, ref_meta = 0, ref_credits = 0;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>   struct inode *tl_inode = osb->osb_tl_inode;
>   struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, NULL, };
>   struct ocfs2_refcount_tree *ref_tree = NULL;
> + struct ocfs2_lock_holder oh;
>  
>   struct ocfs2_xattr_info xi = {
>   .xi_name_index = name_index,
> @@ -3572,8 +3574,9 @@ int ocfs2_xattr_set(struct 

Re: [Ocfs2-devel] [PATCH] ocfs2: get rid of ocfs2_is_o2cb_active function

2017-05-22 Thread Joseph Qi


On 17/5/22 16:17, Gang He wrote:
> This patch is used to get rid of ocfs2_is_o2cb_active() function,
> Why? First, we had the similar functions to identify which cluster
> stack is being used via osb->osb_cluster_stack. Second, the current
> implementation of ocfs2_is_o2cb_active() function is not total safe,
> base on the design of stackglue, we need to get ocfs2_stack_lock lock
> before using ocfs2_stack related data structures, and that
> active_stack pointer can be NULL in case mount failure.
> 
> Signed-off-by: Gang He <g...@suse.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

Thanks,
Joseph
> ---
>  fs/ocfs2/dlmglue.c   | 2 +-
>  fs/ocfs2/stackglue.c | 6 --
>  fs/ocfs2/stackglue.h | 3 ---
>  3 files changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 3b7c937..a54196a 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -3409,7 +3409,7 @@ static int ocfs2_downconvert_lock(struct ocfs2_super 
> *osb,
>* we can recover correctly from node failure. Otherwise, we may get
>* invalid LVB in LKB, but without DLM_SBF_VALNOTVALID being set.
>*/
> - if (!ocfs2_is_o2cb_active() &&
> + if (ocfs2_userspace_stack(osb) &&
>   lockres->l_ops->flags & LOCK_TYPE_USES_LVB)
>   lvb = 1;
>  
> diff --git a/fs/ocfs2/stackglue.c b/fs/ocfs2/stackglue.c
> index 8203590..52c07346b 100644
> --- a/fs/ocfs2/stackglue.c
> +++ b/fs/ocfs2/stackglue.c
> @@ -48,12 +48,6 @@
>   */
>  static struct ocfs2_stack_plugin *active_stack;
>  
> -inline int ocfs2_is_o2cb_active(void)
> -{
> - return !strcmp(active_stack->sp_name, OCFS2_STACK_PLUGIN_O2CB);
> -}
> -EXPORT_SYMBOL_GPL(ocfs2_is_o2cb_active);
> -
>  static struct ocfs2_stack_plugin *ocfs2_stack_lookup(const char *name)
>  {
>   struct ocfs2_stack_plugin *p;
> diff --git a/fs/ocfs2/stackglue.h b/fs/ocfs2/stackglue.h
> index e3036e1..f2dce10 100644
> --- a/fs/ocfs2/stackglue.h
> +++ b/fs/ocfs2/stackglue.h
> @@ -298,9 +298,6 @@ int ocfs2_plock(struct ocfs2_cluster_connection *conn, 
> u64 ino,
>  int ocfs2_stack_glue_register(struct ocfs2_stack_plugin *plugin);
>  void ocfs2_stack_glue_unregister(struct ocfs2_stack_plugin *plugin);
>  
> -/* In ocfs2_downconvert_lock(), we need to know which stack we are using */
> -int ocfs2_is_o2cb_active(void);
> -
>  extern struct kset *ocfs2_kset;
>  
>  #endif  /* STACKGLUE_H */
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fix a static checker warning

2017-05-22 Thread Joseph Qi


On 17/5/22 12:54, Gang He wrote:
> This patch will fix a static checker warning, this warning was
> caused by commit d56a8f32e4c662509ce50a37e78fa66c777977d3. after
> apply this patch, the error return value will not be NULL(zero).
> 
> Signed-off-by: Gang He <g...@suse.com>
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>

Thanks,
Joseph
> ---
>  fs/ocfs2/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 382401d..1a1e007 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -136,7 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 
> blkno)
>  struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>int sysfile_type)
>  {
> - int rc = 0;
> + int rc = -ESTALE;
>   struct inode *inode = NULL;
>   struct super_block *sb = osb->sb;
>   struct ocfs2_find_inode_args args;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: give an obvious tip for dismatch cluster names

2017-05-18 Thread Joseph Qi
Hi Gang,

As you described, only fsdlm will return this error and fsdlm has
already print the same message. So why should we add it outside again?

Thanks,
Joseph

On 17/5/18 18:43, Gang He wrote:
> Hi Joseph,
> 
> 

>> Hi Gang,
>>
>> How can we confirm EBADR is only because cluster name mismatch?
>> Since the cluster stack may be o2cb(o2dlm) or user(fsdlm).
> I looked through all the code of OCFS2 (include o2cb), there is not any place 
> which returns this error.
> In fact, the function calling patch ocfs2_fill_super -> ocfs2_mount_volume -> 
> ocfs2_dlm_init -> dlm_new_lockspace
> is very specific path, we can use this errorno to give the uses a more clear 
> tip, 
> since this case looks like a little common during cluster migration, but the 
> customer can quickly
> get the failure cause if there is a error printing.
> Also, I think there is not possible to add this errorno in o2cb path during 
> ocfs2_dlm_init, since o2cb code has been stable for 
> a long time.   
> 
> Thanks
> Gang
> 
>>
>> Thanks,
>> Joseph
>>
>> On 17/5/18 14:35, Gang He wrote:
>>> This patch is used to add an obvious error message, due to
>>> dismatch cluster names between on-disk and in the current cluster.
>>> We can meet this case during OCFS2 cluster migration, if we can
>>> give the user an obvious tip for why they can not mount the file
>>> system after migration, they can quickly fix this dismatch problem.
>>> Second, also move printing ocfs2_fill_super() errno to the front
>>> of ocfs2_dismount_volume() function, since ocfs2_dismount_volume()
>>> will also print it's own message.
>>>
>>> Signed-off-by: Gang He 
>>> ---
>>>  fs/ocfs2/super.c | 8 ++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>>> index ca1646f..5575918 100644
>>> --- a/fs/ocfs2/super.c
>>> +++ b/fs/ocfs2/super.c
>>> @@ -1208,14 +1208,15 @@ static int ocfs2_fill_super(struct super_block *sb, 
>> void *data, int silent)
>>>  read_super_error:
>>> brelse(bh);
>>>  
>>> +   if (status)
>>> +   mlog_errno(status);
>>> +
>>> if (osb) {
>>> atomic_set(>vol_state, VOLUME_DISABLED);
>>> wake_up(>osb_mount_event);
>>> ocfs2_dismount_volume(sb, 1);
>>> }
>>>  
>>> -   if (status)
>>> -   mlog_errno(status);
>>> return status;
>>>  }
>>>  
>>> @@ -1843,6 +1844,9 @@ static int ocfs2_mount_volume(struct super_block *sb)
>>> status = ocfs2_dlm_init(osb);
>>> if (status < 0) {
>>> mlog_errno(status);
>>> +   if (status == -EBADR)
>>> +   mlog(ML_ERROR, "couldn't mount because cluster name on"
>>> +   " disk does not match the running cluster name.\n");
>>> goto leave;
>>> }
>>>  
>>>
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: give an obvious tip for dismatch cluster names

2017-05-18 Thread Joseph Qi
Hi Gang,

How can we confirm EBADR is only because cluster name mismatch?
Since the cluster stack may be o2cb(o2dlm) or user(fsdlm).

Thanks,
Joseph

On 17/5/18 14:35, Gang He wrote:
> This patch is used to add an obvious error message, due to
> dismatch cluster names between on-disk and in the current cluster.
> We can meet this case during OCFS2 cluster migration, if we can
> give the user an obvious tip for why they can not mount the file
> system after migration, they can quickly fix this dismatch problem.
> Second, also move printing ocfs2_fill_super() errno to the front
> of ocfs2_dismount_volume() function, since ocfs2_dismount_volume()
> will also print it's own message.
> 
> Signed-off-by: Gang He 
> ---
>  fs/ocfs2/super.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index ca1646f..5575918 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1208,14 +1208,15 @@ static int ocfs2_fill_super(struct super_block *sb, 
> void *data, int silent)
>  read_super_error:
>   brelse(bh);
>  
> + if (status)
> + mlog_errno(status);
> +
>   if (osb) {
>   atomic_set(>vol_state, VOLUME_DISABLED);
>   wake_up(>osb_mount_event);
>   ocfs2_dismount_volume(sb, 1);
>   }
>  
> - if (status)
> - mlog_errno(status);
>   return status;
>  }
>  
> @@ -1843,6 +1844,9 @@ static int ocfs2_mount_volume(struct super_block *sb)
>   status = ocfs2_dlm_init(osb);
>   if (status < 0) {
>   mlog_errno(status);
> + if (status == -EBADR)
> + mlog(ML_ERROR, "couldn't mount because cluster name on"
> + " disk does not match the running cluster name.\n");
>   goto leave;
>   }
>  
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: o2hb: revert hb threshold to keep compatible

2017-03-28 Thread Joseph Qi


On 17/3/29 09:07, Junxiao Bi wrote:
> On 03/29/2017 06:31 AM, Andrew Morton wrote:
>> On Tue, 28 Mar 2017 09:40:45 +0800 Junxiao Bi  wrote:
>>
>>> Configfs is the interface for ocfs2-tools to set configure to
>>> kernel. Change heartbeat dead threshold name in configfs will
>>> cause compatible issue, so revert it.
>>>
>>> Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store 
>>> methods")
>> I don't get it.  45b997737a80 was merged nearly two years ago, so isn't
>> it a bit late to fix compatibility issues?
>>
> This compatibility will not cause ocfs2 down, just some configure (hb
> dead threshold) lose effect. If someone want to use the new kernel, they
> should apply this fix.
The threshold configuration file has default value in kernel, so it will
only affect changing this value in user space.

Thanks,
Joseph
>
> Thanks,
> Junxiao.
>
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: o2hb: revert hb threshold to keep compatible

2017-03-27 Thread Joseph Qi
Acked-by: Joseph Qi <jiangqi...@gmail.com>

On 17/3/28 09:40, Junxiao Bi wrote:
> Configfs is the interface for ocfs2-tools to set configure to
> kernel. Change heartbeat dead threshold name in configfs will
> cause compatible issue, so revert it.
>
> Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store 
> methods")
> Signed-off-by: Junxiao Bi <junxiao...@oracle.com>
> ---
>   fs/ocfs2/cluster/heartbeat.c |8 
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index f6e871760f8d..0da0332725aa 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -2242,13 +2242,13 @@ static void o2hb_heartbeat_group_drop_item(struct 
> config_group *group,
>   spin_unlock(_live_lock);
>   }
>   
> -static ssize_t o2hb_heartbeat_group_threshold_show(struct config_item *item,
> +static ssize_t o2hb_heartbeat_group_dead_threshold_show(struct config_item 
> *item,
>   char *page)
>   {
>   return sprintf(page, "%u\n", o2hb_dead_threshold);
>   }
>   
> -static ssize_t o2hb_heartbeat_group_threshold_store(struct config_item *item,
> +static ssize_t o2hb_heartbeat_group_dead_threshold_store(struct config_item 
> *item,
>   const char *page, size_t count)
>   {
>   unsigned long tmp;
> @@ -2297,11 +2297,11 @@ static ssize_t o2hb_heartbeat_group_mode_store(struct 
> config_item *item,
>   
>   }
>   
> -CONFIGFS_ATTR(o2hb_heartbeat_group_, threshold);
> +CONFIGFS_ATTR(o2hb_heartbeat_group_, dead_threshold);
>   CONFIGFS_ATTR(o2hb_heartbeat_group_, mode);
>   
>   static struct configfs_attribute *o2hb_heartbeat_group_attrs[] = {
> - _heartbeat_group_attr_threshold,
> + _heartbeat_group_attr_dead_threshold,
>   _heartbeat_group_attr_mode,
>   NULL,
>   };


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-17 Thread Joseph Qi
On 17/1/17 15:55, Eric Ren wrote:
> Hi!
>
> On 01/17/2017 03:39 PM, Joseph Qi wrote:
>>
>> On 17/1/17 14:30, Eric Ren wrote:
>>> We are in the situation that we have to avoid recursive cluster 
>>> locking,
>>> but there is no way to check if a cluster lock has been taken by a
>>> precess already.
>>>
>>> Mostly, we can avoid recursive locking by writing code carefully.
>>> However, we found that it's very hard to handle the routines that
>>> are invoked directly by vfs code. For instance:
>>>
>>> const struct inode_operations ocfs2_file_iops = {
>>>  .permission = ocfs2_permission,
>>>  .get_acl= ocfs2_iop_get_acl,
>>>  .set_acl= ocfs2_iop_set_acl,
>>> };
>>>
>>> Both ocfs2_permission() and ocfs2_iop_get_acl() call 
>>> ocfs2_inode_lock(PR):
>>> do_sys_open
>>>   may_open
>>>inode_permission
>>> ocfs2_permission
>>>  ocfs2_inode_lock() <=== first time
>>>   generic_permission
>>>get_acl
>>> ocfs2_iop_get_acl
>>> ocfs2_inode_lock() <=== recursive one
>>>
>>> A deadlock will occur if a remote EX request comes in between two
>>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>>
>>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>>> on behalf of the remote EX lock request. Another hand, the recursive
>>> cluster lock (the second one) will be blocked in in 
>>> __ocfs2_cluster_lock()
>>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, 
>>> why?
>>> because there is no chance for the first cluster lock on this node 
>>> to be
>>> unlocked - we block ourselves in the code path.
>>>
>>> The idea to fix this issue is mostly taken from gfs2 code.
>>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>>> keep track of the processes' pid  who has taken the cluster lock
>>> of this lock resource;
>>> 2. introduce a new flag for ocfs2_inode_lock_full: 
>>> OCFS2_META_LOCK_GETBH;
>>> it means just getting back disk inode bh for us if we've got cluster 
>>> lock.
>>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>>> have got the cluster lock in the upper code path.
>>>
>>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>>> to solve the recursive locking issue cuased by the fact that vfs 
>>> routines
>>> can call into each other.
>>>
>>> The performance penalty of processing the holder list should only be 
>>> seen
>>> at a few cases where the tracking logic is used, such as get/set acl.
>>>
>>> You may ask what if the first time we got a PR lock, and the second 
>>> time
>>> we want a EX lock? fortunately, this case never happens in the real 
>>> world,
>>> as far as I can see, including permission check, 
>>> (get|set)_(acl|attr), and
>>> the gfs2 code also do so.
>>>
>>> Changes since v1:
>>> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
>>> process gets the cluster lock - suggested by: Joseph Qi 
>>> <jiangqi...@gmail.com>
>>> and Junxiao Bi <junxiao...@oracle.com>.
>>>
>>> - Change "struct ocfs2_holder" to a more meaningful name 
>>> "ocfs2_lock_holder",
>>> suggested by: Junxiao Bi.
>>>
>>> - Do not inline functions whose bodies are not in scope, changed by:
>>> Stephen Rothwell <s...@canb.auug.org.au>.
>>>
>>> Changes since v2:
>>> - Wrap the tracking logic code of recursive locking into functions,
>>> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
>>> suggested by: Junxiao Bi.
>>>
>>> [s...@canb.auug.org.au remove some inlines]
>>> Signed-off-by: Eric Ren <z...@suse.com>
>>> ---
>>>   fs/ocfs2/dlmglue.c | 105 
>>> +++--
>>>   fs/ocfs2/dlmglue.h |  18 +
>>>   fs/ocfs2/ocfs2.h   |   1 +
>>>   3 files changed, 121 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 77d1632..c75b9e9 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -532,6 +532,

Re: [Ocfs2-devel] [PATCH v3 2/2] ocfs2: fix deadlock issue when taking inode lock at vfs entry points

2017-01-16 Thread Joseph Qi

On 17/1/17 14:30, Eric Ren wrote:
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in a deadlock, as the author "Tariq Saeed" realized shortly
> after the patch was merged. The discussion happened here
> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>
> The reason why taking cluster inode lock at vfs entry points opens up
> a self deadlock window, is explained in the previous patch of this
> series.
>
> So far, we have seen two different code paths that have this issue.
> 1. do_sys_open
>   may_open
>inode_permission
> ocfs2_permission
>  ocfs2_inode_lock() <=== take PR
>   generic_permission
>get_acl
> ocfs2_iop_get_acl
>  ocfs2_inode_lock() <=== take PR
> 2. fchmod|fchmodat
>  chmod_common
>   notify_change
>ocfs2_setattr <=== take EX
> posix_acl_chmod
>  get_acl
>   ocfs2_iop_get_acl <=== take PR
>  ocfs2_iop_set_acl <=== take EX
>
> Fixes them by adding the tracking logic (in the previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
>
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
> and Junxiao Bi <junxiao...@oracle.com>.
>
> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
> suggested by: Junxiao Bi.
>
> - Add debugging output at ocfs2_setattr() and ocfs2_permission() to
> catch exceptional cases, suggested by: Junxiao Bi.
>
> Changes since v2:
> - Use new wrappers of tracking logic code, suggested by: Junxiao Bi.
>
> Signed-off-by: Eric Ren <z...@suse.com>
Looks good.
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>
> ---
>   fs/ocfs2/acl.c  | 29 +
>   fs/ocfs2/file.c | 58 
> -
>   2 files changed, 58 insertions(+), 29 deletions(-)
>
> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
> index bed1fcb..dc22ba8 100644
> --- a/fs/ocfs2/acl.c
> +++ b/fs/ocfs2/acl.c
> @@ -283,16 +283,14 @@ int ocfs2_set_acl(handle_t *handle,
>   int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>   {
>   struct buffer_head *bh = NULL;
> - int status = 0;
> + int status, had_lock;
> + struct ocfs2_lock_holder oh;
>   
> - status = ocfs2_inode_lock(inode, , 1);
> - if (status < 0) {
> - if (status != -ENOENT)
> - mlog_errno(status);
> - return status;
> - }
> + had_lock = ocfs2_inode_lock_tracker(inode, , 1, );
> + if (had_lock < 0)
> + return had_lock;
>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
> - ocfs2_inode_unlock(inode, 1);
> + ocfs2_inode_unlock_tracker(inode, 1, , had_lock);
>   brelse(bh);
>   return status;
>   }
> @@ -302,21 +300,20 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
> *inode, int type)
>   struct ocfs2_super *osb;
>   struct buffer_head *di_bh = NULL;
>   struct posix_acl *acl;
> - int ret;
> + int had_lock;
> + struct ocfs2_lock_holder oh;
>   
>   osb = OCFS2_SB(inode->i_sb);
>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>   return NULL;
> - ret = ocfs2_inode_lock(inode, _bh, 0);
> - if (ret < 0) {
> - if (ret != -ENOENT)
> - mlog_errno(ret);
> - return ERR_PTR(ret);
> - }
> +
> + had_lock = ocfs2_inode_lock_tracker(inode, _bh, 0, );
> + if (had_lock < 0)
> + return ERR_PTR(had_lock);
>   
>   acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>   
> - ocfs2_inode_unlock(inode, 0);
> + ocfs2_inode_unlock_tracker(inode, 0, , had_lock);
>   brelse(di_bh);
>   return acl;
>   }
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index c488965..7b6a146 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1138,6 +1138,8 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   handle_t *handle = NULL;
>   struct dquot *transfer_to[MAXQUOTAS] = { };
>   int qtype;
> + int had_lock;
> + struct ocfs2_lock_holder oh;
>   
>   trace_ocfs2_setattr(inode, dentry,
>   (unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -1173,11 +1175,30 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> 

Re: [Ocfs2-devel] [PATCH v3 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-16 Thread Joseph Qi

On 17/1/17 14:30, Eric Ren wrote:
> We are in the situation that we have to avoid recursive cluster locking,
> but there is no way to check if a cluster lock has been taken by a
> precess already.
>
> Mostly, we can avoid recursive locking by writing code carefully.
> However, we found that it's very hard to handle the routines that
> are invoked directly by vfs code. For instance:
>
> const struct inode_operations ocfs2_file_iops = {
>  .permission = ocfs2_permission,
>  .get_acl= ocfs2_iop_get_acl,
>  .set_acl= ocfs2_iop_set_acl,
> };
>
> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
> do_sys_open
>   may_open
>inode_permission
> ocfs2_permission
>  ocfs2_inode_lock() <=== first time
>   generic_permission
>get_acl
> ocfs2_iop_get_acl
>   ocfs2_inode_lock() <=== recursive one
>
> A deadlock will occur if a remote EX request comes in between two
> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>
> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
> BAST(ocfs2_generic_handle_bast) when downconvert is started
> on behalf of the remote EX lock request. Another hand, the recursive
> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
> because there is no chance for the first cluster lock on this node to be
> unlocked - we block ourselves in the code path.
>
> The idea to fix this issue is mostly taken from gfs2 code.
> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
> keep track of the processes' pid  who has taken the cluster lock
> of this lock resource;
> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
> it means just getting back disk inode bh for us if we've got cluster lock.
> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
> have got the cluster lock in the upper code path.
>
> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
> to solve the recursive locking issue cuased by the fact that vfs routines
> can call into each other.
>
> The performance penalty of processing the holder list should only be seen
> at a few cases where the tracking logic is used, such as get/set acl.
>
> You may ask what if the first time we got a PR lock, and the second time
> we want a EX lock? fortunately, this case never happens in the real world,
> as far as I can see, including permission check, (get|set)_(acl|attr), and
> the gfs2 code also do so.
>
> Changes since v1:
> - Let ocfs2_is_locked_by_me() just return true/false to indicate if the
> process gets the cluster lock - suggested by: Joseph Qi <jiangqi...@gmail.com>
> and Junxiao Bi <junxiao...@oracle.com>.
>
> - Change "struct ocfs2_holder" to a more meaningful name "ocfs2_lock_holder",
> suggested by: Junxiao Bi.
>
> - Do not inline functions whose bodies are not in scope, changed by:
> Stephen Rothwell <s...@canb.auug.org.au>.
>
> Changes since v2:
> - Wrap the tracking logic code of recursive locking into functions,
> ocfs2_inode_lock_tracker() and ocfs2_inode_unlock_tracker(),
> suggested by: Junxiao Bi.
>
> [s...@canb.auug.org.au remove some inlines]
> Signed-off-by: Eric Ren <z...@suse.com>
> ---
>   fs/ocfs2/dlmglue.c | 105 
> +++--
>   fs/ocfs2/dlmglue.h |  18 +
>   fs/ocfs2/ocfs2.h   |   1 +
>   3 files changed, 121 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 77d1632..c75b9e9 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>   init_waitqueue_head(>l_event);
>   INIT_LIST_HEAD(>l_blocked_list);
>   INIT_LIST_HEAD(>l_mask_waiters);
> + INIT_LIST_HEAD(>l_holders);
>   }
>   
>   void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
> @@ -749,6 +750,50 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>   res->l_flags = 0UL;
>   }
>   
> +/*
> + * Keep a list of processes who have interest in a lockres.
> + * Note: this is now only uesed for check recursive cluster locking.
> + */
> +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_lock_holder *oh)
> +{
> + INIT_LIST_HEAD(>oh_list);
> + oh->oh_owner_pid =  get_pid(task_pid(current));
Trim the redundant space here.
Others look good to me.
Reviewed-by: Joseph Qi <jiangqi...@gmail.com>
> +
> + sp

Re: [Ocfs2-devel] [PATCH v3] ocfs2/journal: fix umount hang after flushing journal failure

2017-01-15 Thread Joseph Qi


On 17/1/13 20:37, Eric Ren wrote:
> On 01/13/2017 10:52 AM, Changwei Ge wrote:
>> Hi Joseph,
>>
>> Do you think my last version of patch to fix umount hang after journal
>> flushing failure is OK?
>>
>> If so, I 'd like to ask Andrew's help to merge this patch into his test
>> tree.
>>
>>
>> Thanks,
>>
>> Br.
>>
>> Changwei
>
> The message above should not occur in a formal patch.  It should be 
> put in "cover-letter" if
> you want to say something to the other developers. See "git 
> format-patch --cover-letter".
>
>>
>>
>>
>>  From 686b52ee2f06395c53e36e2c7515c276dc7541fb Mon Sep 17 00:00:00 2001
>> From: Changwei Ge 
>> Date: Wed, 11 Jan 2017 09:05:35 +0800
>> Subject: [PATCH] fix umount hang after journal flushing failure
>
> The commit message is needed here! It should describe what's your 
> problem, how to reproduce it,
> and what's your solution, things like that.
>
>>
>> Signed-off-by: Changwei Ge 
>> ---
>>   fs/ocfs2/journal.c |   18 ++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>> index a244f14..5f3c862 100644
>> --- a/fs/ocfs2/journal.c
>> +++ b/fs/ocfs2/journal.c
>> @@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg)
>>   "commit_thread: %u transactions pending 
>> on "
>>   "shutdown\n",
>> atomic_read(>j_num_trans));
>> +
>> +   if (status < 0) {
>> +   mlog(ML_ERROR, "journal is already abort
>> and cannot be "
>> +"flushed any more. So ignore
>> the pending "
>> +"transactions to avoid blocking
>> ocfs2 unmount.\n");
>
> Can you find any example in the kernel source to print out message 
> like that?!
>
> I saw Joseph showed you the right way in previous email:
> "
>
> if (status < 0) {
>
>  mlog(ML_ERROR, "journal is already abort and cannot be "
>
>  "flushed any more. So ignore the pending "
>
>  "transactions to avoid blocking ocfs2 unmount.\n");
>
> "
> So, please be careful and learn from the kernel source and the right 
> way other developers do in
> their patch work. Otherwise, it's meaningless to waste others' time in 
> such basic issues.
>
>> +   /*
>> +* This may a litte hacky, however, no
>> chance
>> +* for ocfs2/journal to decrease this
>> variable
>> +* thourgh commit-thread. I have to 
>> do so to
>> +* avoid umount hang after journal 
>> flushing
>> +* failure. Since jounral has been
>> marked ABORT
>> +* within jbd2_journal_flush, commit
>> cache will
>> +* never do any real work to flush
>> journal to
>> +* disk.Set it to ZERO so that umount 
>> will
>> +* continue during shutting down journal
>> +*/
>> + atomic_set(>j_num_trans, 0);
> It's possible to corrupt data doing this way. Why not just crash the 
> kernel when jbd2 aborts?
> and let the other node to do the journal recovery. It's the strength 
> of cluster filesystem.
We shouldn't crash kernel directly, which will enlarge the impact of the
issue. For example, we have mount multiple volumes and only one has this
error occurred.
But I do agree with you that we have to let other nodes know the
abnormal exit and do the recovery, which can ensure the data
consistency.

Thanks,
Joseph
>
> Anyway, it's glad to see you guys making contributions!
>
> Thanks,
> Eric
>
>
>> +   }
>>  }
>>  }
>>
>> -- 
>> 1.7.9.5
>>
>> -
>>  
>>
>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 
>>
>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 
>>
>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 
>>
>> 邮件!
>> This e-mail and its attachments contain confidential information from 
>> H3C, which is
>> intended only for the person or entity whose address is listed above. 
>> Any use of the
>> information contained herein in any way (including, but not limited 
>> to, total or partial
>> disclosure, reproduction, or dissemination) by persons other than the 
>> intended
>> recipient(s) is prohibited. If you receive this e-mail in error, 
>> please notify the sender
>> by phone or email immediately and delete it!
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-06 Thread Joseph Qi
On 17/1/6 17:13, Eric Ren wrote:
> Hi,
>
>>>
>>> Fixes them by adding the tracking logic (in the previous patch) for
>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>> ocfs2_setattr().
>> As described cases above, shall we just add the tracking logic 
>> only for set/get_acl()?
>
> The idea is to detect recursive locking on the running task stack. 
> Take case 1) for example if ocfs2_permisssion()
> is not changed:
>
> ocfs2_permission() <=== take PR, ocfs2_holder is not added
>ocfs2_iop_get_acl <=== still take PR, because there is no lock 
> holder on the tracking list
 I mean we have no need to check if locked by me, just do inode lock 
 and add holder.
 This will make code more clean, IMO.
>>> Oh, sorry, I get your point this time. I think we need to check it 
>>> if there are more than one processes that hold
>>> PR lock on the same resource.  If I don't understand you correctly, 
>>> please tell me why you think it's not neccessary
>>> to check before getting lock?
>> The code logic can only check if it is locked by myself. In the case
> Why only...?
>> described above, ocfs2_permission is the first entry to take inode lock.
>> And even if check succeeds, it is a bug without unlock, but not the case
>> of recursive lock.
>
> By checking succeeds, you mean it's locked by me, right? If so, this flag
>   "arg_flags = OCFS2_META_LOCK_GETBH"
> will be passed down to ocfs2_inode_lock_full(), which gets back buffer 
> head of
> the disk inode for us if necessary, but doesn't take cluster locking 
> again. So, there is
> no need to unlock in such case.
I am trying to state my point more clearly...
The issue case you are trying to fix is:
Process A
take inode lock (phase1)
...
<<< race window (phase2, Process B)
...
take inode lock again (phase3)

Deadlock happens because Process B in phase2 and Process A in phase3
are waiting for each other.
So you are trying to fix it by making phase3 finish without really doing
__ocfs2_cluster_lock, then Process B can continue either.
Let us bear in mind that phase1 and phase3 are in the same context and
executed in order. That's why I think there is no need to check if locked
by myself in phase1.
If phase1 finds it is already locked by myself, that means the holder
is left by last operation without dec holder. That's why I think it is a bug
instead of a recursive lock case.

Thanks,
Joseph
>
> Thanks,
> Eric
>
>>
>> Thanks,
>> Joseph
>>>
>>> Thanks,
>>> Eric

 Thanks,
 Joseph
>
> Thanks for your review;-)
> Eric
>
>>
>> Thanks,
>> Joseph
>>>
>>> Signed-off-by: Eric Ren 
>>> ---
>>>   fs/ocfs2/acl.c  | 39 ++-
>>>   fs/ocfs2/file.c | 44 ++--
>>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>> index bed1fcb..c539890 100644
>>> --- a/fs/ocfs2/acl.c
>>> +++ b/fs/ocfs2/acl.c
>>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, 
>>> struct posix_acl *acl, int type)
>>>   {
>>>   struct buffer_head *bh = NULL;
>>>   int status = 0;
>>> -
>>> -status = ocfs2_inode_lock(inode, , 1);
>>> +int arg_flags = 0, has_locked;
>>> +struct ocfs2_holder oh;
>>> +struct ocfs2_lock_res *lockres;
>>> +
>>> +lockres = _I(inode)->ip_inode_lockres;
>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>> +if (has_locked)
>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>> +status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>>>   if (status < 0) {
>>>   if (status != -ENOENT)
>>>   mlog_errno(status);
>>>   return status;
>>>   }
>>> +if (!has_locked)
>>> +ocfs2_add_holder(lockres, );
>>> +
>>>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, 
>>> NULL);
>>> -ocfs2_inode_unlock(inode, 1);
>>> +
>>> +if (!has_locked) {
>>> +ocfs2_remove_holder(lockres, );
>>> +ocfs2_inode_unlock(inode, 1);
>>> +}
>>>   brelse(bh);
>>> +
>>>   return status;
>>>   }
>>>   @@ -303,21 +318,35 @@ struct posix_acl 
>>> *ocfs2_iop_get_acl(struct inode *inode, int type)
>>>   struct buffer_head *di_bh = NULL;
>>>   struct posix_acl *acl;
>>>   int ret;
>>> +int arg_flags = 0, has_locked;
>>> +struct ocfs2_holder oh;
>>> +struct ocfs2_lock_res *lockres;
>>> osb = OCFS2_SB(inode->i_sb);
>>>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>>>   return NULL;
>>> -ret = ocfs2_inode_lock(inode, _bh, 0);
>>> +
>>> +lockres = _I(inode)->ip_inode_lockres;
>>> +

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-06 Thread Joseph Qi


On 17/1/6 16:21, Eric Ren wrote:
> On 01/06/2017 03:14 PM, Joseph Qi wrote:
>>
>>
>> On 17/1/6 14:56, Eric Ren wrote:
>>> On 01/06/2017 02:09 PM, Joseph Qi wrote:
>>>> Hi Eric,
>>>>
>>>>
>>>> On 17/1/5 23:31, Eric Ren wrote:
>>>>> Commit 743b5f1434f5 ("ocfs2: take inode lock in 
>>>>> ocfs2_iop_set/get_acl()")
>>>>> results in a deadlock, as the author "Tariq Saeed" realized shortly
>>>>> after the patch was merged. The discussion happened here
>>>>> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>>>>>  
>>>>>
>>>>>
>>>>> The reason why taking cluster inode lock at vfs entry points opens up
>>>>> a self deadlock window, is explained in the previous patch of this
>>>>> series.
>>>>>
>>>>> So far, we have seen two different code paths that have this issue.
>>>>> 1. do_sys_open
>>>>>   may_open
>>>>>inode_permission
>>>>> ocfs2_permission
>>>>>  ocfs2_inode_lock() <=== take PR
>>>>>   generic_permission
>>>>>get_acl
>>>>> ocfs2_iop_get_acl
>>>>>  ocfs2_inode_lock() <=== take PR
>>>>> 2. fchmod|fchmodat
>>>>>  chmod_common
>>>>>   notify_change
>>>>>ocfs2_setattr <=== take EX
>>>>> posix_acl_chmod
>>>>>  get_acl
>>>>>   ocfs2_iop_get_acl <=== take PR
>>>>>  ocfs2_iop_set_acl <=== take EX
>>>>>
>>>>> Fixes them by adding the tracking logic (in the previous patch) for
>>>>> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
>>>>> ocfs2_setattr().
>>>> As described cases above, shall we just add the tracking logic only 
>>>> for set/get_acl()?
>>>
>>> The idea is to detect recursive locking on the running task stack. 
>>> Take case 1) for example if ocfs2_permisssion()
>>> is not changed:
>>>
>>> ocfs2_permission() <=== take PR, ocfs2_holder is not added
>>>ocfs2_iop_get_acl <=== still take PR, because there is no lock 
>>> holder on the tracking list
>> I mean we have no need to check if locked by me, just do inode lock 
>> and add holder.
>> This will make code more clean, IMO.
> Oh, sorry, I get your point this time. I think we need to check it if 
> there are more than one processes that hold
> PR lock on the same resource.  If I don't understand you correctly, 
> please tell me why you think it's not neccessary
> to check before getting lock?
The code logic can only check if it is locked by myself. In the case
described above, ocfs2_permission is the first entry to take inode lock.
And even if check succeeds, it is a bug without unlock, but not the case
of recursive lock.

Thanks,
Joseph
>
> Thanks,
> Eric
>>
>> Thanks,
>> Joseph
>>>
>>> Thanks for your review;-)
>>> Eric
>>>
>>>>
>>>> Thanks,
>>>> Joseph
>>>>>
>>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>>> ---
>>>>>   fs/ocfs2/acl.c  | 39 ++-
>>>>>   fs/ocfs2/file.c | 44 ++--
>>>>>   2 files changed, 68 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
>>>>> index bed1fcb..c539890 100644
>>>>> --- a/fs/ocfs2/acl.c
>>>>> +++ b/fs/ocfs2/acl.c
>>>>> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, 
>>>>> struct posix_acl *acl, int type)
>>>>>   {
>>>>>   struct buffer_head *bh = NULL;
>>>>>   int status = 0;
>>>>> -
>>>>> -status = ocfs2_inode_lock(inode, , 1);
>>>>> +int arg_flags = 0, has_locked;
>>>>> +struct ocfs2_holder oh;
>>>>> +struct ocfs2_lock_res *lockres;
>>>>> +
>>>>> +lockres = _I(inode)->ip_inode_lockres;
>>>>> +has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
>>>>> +if (has_locked)
>>>>> +arg_flags = OCFS2_META_LOCK_GETBH;
>>>>> +statu

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-05 Thread Joseph Qi


On 17/1/6 15:03, Eric Ren wrote:
> On 01/06/2017 02:07 PM, Joseph Qi wrote:
>> Hi Eric,
>>
>>
>> On 17/1/5 23:31, Eric Ren wrote:
>>> We are in the situation that we have to avoid recursive cluster 
>>> locking,
>>> but there is no way to check if a cluster lock has been taken by a
>>> precess already.
>>>
>>> Mostly, we can avoid recursive locking by writing code carefully.
>>> However, we found that it's very hard to handle the routines that
>>> are invoked directly by vfs code. For instance:
>>>
>>> const struct inode_operations ocfs2_file_iops = {
>>>  .permission = ocfs2_permission,
>>>  .get_acl= ocfs2_iop_get_acl,
>>>  .set_acl= ocfs2_iop_set_acl,
>>> };
>>>
>>> Both ocfs2_permission() and ocfs2_iop_get_acl() call 
>>> ocfs2_inode_lock(PR):
>>> do_sys_open
>>>   may_open
>>>inode_permission
>>> ocfs2_permission
>>>  ocfs2_inode_lock() <=== first time
>>>   generic_permission
>>>get_acl
>>> ocfs2_iop_get_acl
>>> ocfs2_inode_lock() <=== recursive one
>>>
>>> A deadlock will occur if a remote EX request comes in between two
>>> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>>>
>>> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
>>> BAST(ocfs2_generic_handle_bast) when downconvert is started
>>> on behalf of the remote EX lock request. Another hand, the recursive
>>> cluster lock (the second one) will be blocked in in 
>>> __ocfs2_cluster_lock()
>>> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, 
>>> why?
>>> because there is no chance for the first cluster lock on this node 
>>> to be
>>> unlocked - we block ourselves in the code path.
>>>
>>> The idea to fix this issue is mostly taken from gfs2 code.
>>> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
>>> keep track of the processes' pid  who has taken the cluster lock
>>> of this lock resource;
>>> 2. introduce a new flag for ocfs2_inode_lock_full: 
>>> OCFS2_META_LOCK_GETBH;
>>> it means just getting back disk inode bh for us if we've got cluster 
>>> lock.
>>> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
>>> have got the cluster lock in the upper code path.
>>>
>>> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
>>> to solve the recursive locking issue cuased by the fact that vfs 
>>> routines
>>> can call into each other.
>>>
>>> The performance penalty of processing the holder list should only be 
>>> seen
>>> at a few cases where the tracking logic is used, such as get/set acl.
>>>
>>> You may ask what if the first time we got a PR lock, and the second 
>>> time
>>> we want a EX lock? fortunately, this case never happens in the real 
>>> world,
>>> as far as I can see, including permission check, 
>>> (get|set)_(acl|attr), and
>>> the gfs2 code also do so.
>>>
>>> Signed-off-by: Eric Ren <z...@suse.com>
>>> ---
>>>   fs/ocfs2/dlmglue.c | 47 
>>> ---
>>>   fs/ocfs2/dlmglue.h | 18 ++
>>>   fs/ocfs2/ocfs2.h   |  1 +
>>>   3 files changed, 63 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 83d576f..500bda4 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct 
>>> ocfs2_lock_res *res)
>>>   init_waitqueue_head(>l_event);
>>>   INIT_LIST_HEAD(>l_blocked_list);
>>>   INIT_LIST_HEAD(>l_mask_waiters);
>>> +INIT_LIST_HEAD(>l_holders);
>>>   }
>>> void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
>>> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res 
>>> *res)
>>>   res->l_flags = 0UL;
>>>   }
>>>   +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
>>> +   struct ocfs2_holder *oh)
>>> +{
>>> +INIT_LIST_HEAD(>oh_list);
>>> +oh->oh_owner_pid =  get_pid(task_pid(current));
>>> +
>>> +spin_lock(>l_lock);
>>> +list_add_tail(>oh_list, >l_h

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix deadlocks when taking inode lock at vfs entry points

2017-01-05 Thread Joseph Qi
Hi Eric,


On 17/1/5 23:31, Eric Ren wrote:
> Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
> results in a deadlock, as the author "Tariq Saeed" realized shortly
> after the patch was merged. The discussion happened here
> (https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html).
>
> The reason why taking cluster inode lock at vfs entry points opens up
> a self deadlock window, is explained in the previous patch of this
> series.
>
> So far, we have seen two different code paths that have this issue.
> 1. do_sys_open
>   may_open
>inode_permission
> ocfs2_permission
>  ocfs2_inode_lock() <=== take PR
>   generic_permission
>get_acl
> ocfs2_iop_get_acl
>  ocfs2_inode_lock() <=== take PR
> 2. fchmod|fchmodat
>  chmod_common
>   notify_change
>ocfs2_setattr <=== take EX
> posix_acl_chmod
>  get_acl
>   ocfs2_iop_get_acl <=== take PR
>  ocfs2_iop_set_acl <=== take EX
>
> Fixes them by adding the tracking logic (in the previous patch) for
> these funcs above, ocfs2_permission(), ocfs2_iop_[set|get]_acl(),
> ocfs2_setattr().
As described cases above, shall we just add the tracking logic only for 
set/get_acl()?

Thanks,
Joseph
>
> Signed-off-by: Eric Ren 
> ---
>   fs/ocfs2/acl.c  | 39 ++-
>   fs/ocfs2/file.c | 44 ++--
>   2 files changed, 68 insertions(+), 15 deletions(-)
>
> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c
> index bed1fcb..c539890 100644
> --- a/fs/ocfs2/acl.c
> +++ b/fs/ocfs2/acl.c
> @@ -284,16 +284,31 @@ int ocfs2_iop_set_acl(struct inode *inode, struct 
> posix_acl *acl, int type)
>   {
>   struct buffer_head *bh = NULL;
>   int status = 0;
> -
> - status = ocfs2_inode_lock(inode, , 1);
> + int arg_flags = 0, has_locked;
> + struct ocfs2_holder oh;
> + struct ocfs2_lock_res *lockres;
> +
> + lockres = _I(inode)->ip_inode_lockres;
> + has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
> + if (has_locked)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>   if (status < 0) {
>   if (status != -ENOENT)
>   mlog_errno(status);
>   return status;
>   }
> + if (!has_locked)
> + ocfs2_add_holder(lockres, );
> +
>   status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL);
> - ocfs2_inode_unlock(inode, 1);
> +
> + if (!has_locked) {
> + ocfs2_remove_holder(lockres, );
> + ocfs2_inode_unlock(inode, 1);
> + }
>   brelse(bh);
> +
>   return status;
>   }
>   
> @@ -303,21 +318,35 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode 
> *inode, int type)
>   struct buffer_head *di_bh = NULL;
>   struct posix_acl *acl;
>   int ret;
> + int arg_flags = 0, has_locked;
> + struct ocfs2_holder oh;
> + struct ocfs2_lock_res *lockres;
>   
>   osb = OCFS2_SB(inode->i_sb);
>   if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL))
>   return NULL;
> - ret = ocfs2_inode_lock(inode, _bh, 0);
> +
> + lockres = _I(inode)->ip_inode_lockres;
> + has_locked = (ocfs2_is_locked_by_me(lockres) != NULL);
> + if (has_locked)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + ret = ocfs2_inode_lock_full(inode, _bh, 0, arg_flags);
>   if (ret < 0) {
>   if (ret != -ENOENT)
>   mlog_errno(ret);
>   return ERR_PTR(ret);
>   }
> + if (!has_locked)
> + ocfs2_add_holder(lockres, );
>   
>   acl = ocfs2_get_acl_nolock(inode, type, di_bh);
>   
> - ocfs2_inode_unlock(inode, 0);
> + if (!has_locked) {
> + ocfs2_remove_holder(lockres, );
> + ocfs2_inode_unlock(inode, 0);
> + }
>   brelse(di_bh);
> +
>   return acl;
>   }
>   
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index c488965..62be75d 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1138,6 +1138,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   handle_t *handle = NULL;
>   struct dquot *transfer_to[MAXQUOTAS] = { };
>   int qtype;
> + int arg_flags = 0, had_lock;
> + struct ocfs2_holder oh;
> + struct ocfs2_lock_res *lockres;
>   
>   trace_ocfs2_setattr(inode, dentry,
>   (unsigned long long)OCFS2_I(inode)->ip_blkno,
> @@ -1173,13 +1176,20 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   }
>   }
>   
> - status = ocfs2_inode_lock(inode, , 1);
> + lockres = _I(inode)->ip_inode_lockres;
> + had_lock = (ocfs2_is_locked_by_me(lockres) != NULL);
> + if (had_lock)
> + arg_flags = OCFS2_META_LOCK_GETBH;
> + status = ocfs2_inode_lock_full(inode, , 1, arg_flags);
>   if 

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock

2017-01-05 Thread Joseph Qi
Hi Eric,


On 17/1/5 23:31, Eric Ren wrote:
> We are in the situation that we have to avoid recursive cluster locking,
> but there is no way to check if a cluster lock has been taken by a
> precess already.
>
> Mostly, we can avoid recursive locking by writing code carefully.
> However, we found that it's very hard to handle the routines that
> are invoked directly by vfs code. For instance:
>
> const struct inode_operations ocfs2_file_iops = {
>  .permission = ocfs2_permission,
>  .get_acl= ocfs2_iop_get_acl,
>  .set_acl= ocfs2_iop_set_acl,
> };
>
> Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR):
> do_sys_open
>   may_open
>inode_permission
> ocfs2_permission
>  ocfs2_inode_lock() <=== first time
>   generic_permission
>get_acl
> ocfs2_iop_get_acl
>   ocfs2_inode_lock() <=== recursive one
>
> A deadlock will occur if a remote EX request comes in between two
> of ocfs2_inode_lock(). Briefly describe how the deadlock is formed:
>
> On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in
> BAST(ocfs2_generic_handle_bast) when downconvert is started
> on behalf of the remote EX lock request. Another hand, the recursive
> cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock()
> because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why?
> because there is no chance for the first cluster lock on this node to be
> unlocked - we block ourselves in the code path.
>
> The idea to fix this issue is mostly taken from gfs2 code.
> 1. introduce a new field: struct ocfs2_lock_res.l_holders, to
> keep track of the processes' pid  who has taken the cluster lock
> of this lock resource;
> 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH;
> it means just getting back disk inode bh for us if we've got cluster lock.
> 3. export a helper: ocfs2_is_locked_by_me() is used to check if we
> have got the cluster lock in the upper code path.
>
> The tracking logic should be used by some of the ocfs2 vfs's callbacks,
> to solve the recursive locking issue cuased by the fact that vfs routines
> can call into each other.
>
> The performance penalty of processing the holder list should only be seen
> at a few cases where the tracking logic is used, such as get/set acl.
>
> You may ask what if the first time we got a PR lock, and the second time
> we want a EX lock? fortunately, this case never happens in the real world,
> as far as I can see, including permission check, (get|set)_(acl|attr), and
> the gfs2 code also do so.
>
> Signed-off-by: Eric Ren 
> ---
>   fs/ocfs2/dlmglue.c | 47 ---
>   fs/ocfs2/dlmglue.h | 18 ++
>   fs/ocfs2/ocfs2.h   |  1 +
>   3 files changed, 63 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 83d576f..500bda4 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res)
>   init_waitqueue_head(>l_event);
>   INIT_LIST_HEAD(>l_blocked_list);
>   INIT_LIST_HEAD(>l_mask_waiters);
> + INIT_LIST_HEAD(>l_holders);
>   }
>   
>   void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res,
> @@ -749,6 +750,45 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res)
>   res->l_flags = 0UL;
>   }
>   
> +inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_holder *oh)
> +{
> + INIT_LIST_HEAD(>oh_list);
> + oh->oh_owner_pid =  get_pid(task_pid(current));
> +
> + spin_lock(>l_lock);
> + list_add_tail(>oh_list, >l_holders);
> + spin_unlock(>l_lock);
> +}
> +
> +inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres,
> +struct ocfs2_holder *oh)
> +{
> + spin_lock(>l_lock);
> + list_del(>oh_list);
> + spin_unlock(>l_lock);
> +
> + put_pid(oh->oh_owner_pid);
> +}
> +
> +inline struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res 
> *lockres)
> +{
> + struct ocfs2_holder *oh;
> + struct pid *pid;
> +
> + /* look in the list of holders for one with the current task as owner */
> + spin_lock(>l_lock);
> + pid = task_pid(current);
> + list_for_each_entry(oh, >l_holders, oh_list) {
> + if (oh->oh_owner_pid == pid)
> + goto out;
> + }
> + oh = NULL;
> +out:
> + spin_unlock(>l_lock);
> + return oh;
> +}
Since this ocfs2_holder won't be used in the caller, I suggest just 
return a bool value here.
Something like:
spin_lock();
list_for_each_entry() {
 if (oh->oh_owner_pid == pid) {
 spin_unlock();
 return 1;
 }
}
spin_unlock();
return 0;

Thanks,
Joseph
> +
>   static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres,
>int level)
>   {
> @@ -2333,8 +2373,9 @@ int 

Re: [Ocfs2-devel] 答复: [PATCH] ocfs2/dlm: fix umount hang

2016-11-17 Thread Joseph Qi
Any clue to confirm the case?

I'm afraid your change will have side effects.

Thanks,

Joseph


On 16/11/17 17:04, Gechangwei wrote:
> Hi Joseph,
>
> I suppose it is because local heartbeat mode was applied in my test 
> environment and
> other nodes were still writing heartbeat to other LUNs but not the LUN 
> corresponding
> to 7DA412FEB1374366B0F3C70025EB14.
>
> Br.
> Changwei.
>
> -邮件原件-
> 发件人: Joseph Qi [mailto:jiangqi...@gmail.com]
> 发送时间: 2016年11月17日 15:00
> 收件人: gechangwei 12382 (CCPL); a...@linux-foundation.org
> 抄送: mfas...@versity.com; ocfs2-devel@oss.oracle.com
> 主题: Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix umount hang
>
> Hi Changwei,
>
> Why are the dead nodes still in live map, according to your dlm_state file?
>
> Thanks,
>
> Joseph
>
> On 16/11/17 14:03, Gechangwei wrote:
>> Hi
>>
>> During my recent test on OCFS2, an umount hang issue was found.
>> Below clues can help us to analyze this issue.
>>
>>   From the debug information, we can see some abnormal stats like only
>> node 1 is in DLM domain map, however, node 3 - 9 are still in MLE's node map 
>> and vote map.
>> The root cause of unchanging vote map I think is that HB events are detached 
>> too early!
>> That caused no chance of transforming from BLOCK MLE into MASTER MLE.
>> Thus NODE 1 can't master lock resource even other nodes are all dead.
>>
>> To fix this, I propose a patch.
>>
>>   From 3163fa7024d96f8d6e6ec2b37ad44e2cc969abd9 Mon Sep 17 00:00:00
>> 2001
>> From: gechangwei <ge.chang...@h3c.com>
>> Date: Thu, 17 Nov 2016 14:00:45 +0800
>> Subject: [PATCH] fix umount hang
>>
>> Signed-off-by: gechangwei <ge.chang...@h3c.com>
>> ---
>>fs/ocfs2/dlm/dlmmaster.c | 2 --
>>1 file changed, 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index
>> 6ea06f8..3c46882 100644
>> --- a/fs/ocfs2/dlm/dlmmaster.c
>> +++ b/fs/ocfs2/dlm/dlmmaster.c
>> @@ -3354,8 +3354,6 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm,
>>   spin_unlock(>spinlock);
>>   wake_up(>wq);
>>
>> -   /* Do not need events any longer, so detach from heartbeat */
>> -   __dlm_mle_detach_hb_events(dlm, mle);
>>   __dlm_put_mle(mle);
>>   }
>>}
>> --
>> 2.5.1.windows.1
>>
>>
>> root@HXY-CVK110:~# grep P00 bbb
>> Lockres: P00   Owner: 255  State: 0x10 InProgress
>>
>> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB14
>> 37# cat dlm_state
>> Domain: 7DA412FEB1374366B0F3C70025EB1437  Key: 0x8ff804a1  Protocol:
>> 1.2 Thread Pid: 21679  Node: 1  State: JOINED Number of Joins: 1
>> Joining Node: 255 Domain Map: 1 Exit Domain Map:
>> Live Map: 1 2 3 4 5 6 7 8 9
>> Lock Resources: 29 (116)
>> MLEs: 1 (119)
>> Blocking: 1 (4)
>> Mastery: 0 (115)
>> Migration: 0 (0)
>> Lists: Dirty=Empty  Purge=Empty  PendingASTs=Empty  PendingBASTs=Empty
>> Purge Count: 0  Refs: 1 Dead Node: 255 Recovery Pid: 21680  Master:
>> 255  State: INACTIVE Recovery Map:
>> Recovery Node State:
>>
>>
>> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB14
>> 37# ls dlm_state  locking_state  mle_state  purge_list
>> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB14
>> 37# cat mle_state Dumping MLEs for Domain: 7DA412FEB1374366B0F3C70025EB1437
>> P00  BLK  mas=255 new=255 evt=0use=1 
>>   ref=  2
>> Maybe=
>> Vote=3 4 5 6 7 8 9
>> Response=
>> Node=3 4 5 6 7 8 9
>> --
>> ---
>> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
>> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
>> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
>> 邮件!
>> This e-mail and its attachments contain confidential information from
>> H3C, which is intended only for the person or entity whose address is
>> listed above. Any use of the information contained herein in any way
>> (including, but not limited to, total or partial disclosure,
>> reproduction, or dissemination) by persons other than the intended
>> recipient(s) is prohibited. If you receive this e-mail in error,
>> please notify the sender by phone or email immediately and delete it!
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix umount hang

2016-11-16 Thread Joseph Qi
Hi Changwei,

Why are the dead nodes still in live map, according to your dlm_state file?

Thanks,

Joseph

On 16/11/17 14:03, Gechangwei wrote:
> Hi
>
> During my recent test on OCFS2, an umount hang issue was found.
> Below clues can help us to analyze this issue.
>
>  From the debug information, we can see some abnormal stats like only node 1 
> is in DLM domain map, however, node 3 - 9 are still
> in MLE's node map and vote map.
> The root cause of unchanging vote map I think is that HB events are detached 
> too early!
> That caused no chance of transforming from BLOCK MLE into MASTER MLE. Thus 
> NODE 1 can't master lock resource even
> other nodes are all dead.
>
> To fix this, I propose a patch.
>
>  From 3163fa7024d96f8d6e6ec2b37ad44e2cc969abd9 Mon Sep 17 00:00:00 2001
> From: gechangwei 
> Date: Thu, 17 Nov 2016 14:00:45 +0800
> Subject: [PATCH] fix umount hang
>
> Signed-off-by: gechangwei 
> ---
>   fs/ocfs2/dlm/dlmmaster.c | 2 --
>   1 file changed, 2 deletions(-)
>
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 6ea06f8..3c46882 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -3354,8 +3354,6 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm,
>  spin_unlock(>spinlock);
>  wake_up(>wq);
>
> -   /* Do not need events any longer, so detach from heartbeat */
> -   __dlm_mle_detach_hb_events(dlm, mle);
>  __dlm_put_mle(mle);
>  }
>   }
> --
> 2.5.1.windows.1
>
>
> root@HXY-CVK110:~# grep P00 bbb
> Lockres: P00   Owner: 255  State: 0x10 InProgress
>
> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat 
> dlm_state
> Domain: 7DA412FEB1374366B0F3C70025EB1437  Key: 0x8ff804a1  Protocol: 1.2
> Thread Pid: 21679  Node: 1  State: JOINED
> Number of Joins: 1  Joining Node: 255
> Domain Map: 1
> Exit Domain Map:
> Live Map: 1 2 3 4 5 6 7 8 9
> Lock Resources: 29 (116)
> MLEs: 1 (119)
>Blocking: 1 (4)
>Mastery: 0 (115)
>Migration: 0 (0)
> Lists: Dirty=Empty  Purge=Empty  PendingASTs=Empty  PendingBASTs=Empty
> Purge Count: 0  Refs: 1
> Dead Node: 255
> Recovery Pid: 21680  Master: 255  State: INACTIVE
> Recovery Map:
> Recovery Node State:
>
>
> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# ls
> dlm_state  locking_state  mle_state  purge_list
> root@HXY-CVK110:/sys/kernel/debug/o2dlm/7DA412FEB1374366B0F3C70025EB1437# cat 
> mle_state
> Dumping MLEs for Domain: 7DA412FEB1374366B0F3C70025EB1437
> P00  BLK  mas=255 new=255 evt=0use=1  
>  ref=  2
> Maybe=
> Vote=3 4 5 6 7 8 9
> Response=
> Node=3 4 5 6 7 8 9
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: clean up deadcode in dlm_master_request_handler()

2016-10-31 Thread Joseph Qi
On 2016/10/31 21:52, piaojun wrote:
> when 'dispatch_assert' is set, 'response' must be DLM_MASTER_RESP_YES,
> and 'res' won't be null, so execution can't reach these two branch.
> 
> Signed-off-by: Jun Piao <piao...@huawei.com>

Looks good to me.
Reviewed-by: Joseph Qi <joseph...@huawei.com>
> ---
>  fs/ocfs2/dlm/dlmmaster.c | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 3f828a1..9a72dd8 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -1644,12 +1644,6 @@ int dlm_master_request_handler(struct o2net_msg *msg, 
> u32 len, void *data,
>* dlm_assert_master_worker() isn't called, we drop it here.
>*/
>   if (dispatch_assert) {
> - if (response != DLM_MASTER_RESP_YES)
> - mlog(ML_ERROR, "invalid response %d\n", response);
> - if (!res) {
> - mlog(ML_ERROR, "bad lockres while trying to assert!\n");
> - BUG();
> - }
>   mlog(0, "%u is the owner of %.*s, cleaning everyone else\n",
>dlm->node_num, res->lockname.len, 
> res->lockname.name);
>   spin_lock(>spinlock);
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] What are the purposes of GLOBAL_INODE_ALLOC_SYSTEM_INODE and BAD_BLOCK_SYSTEM_INODE system file

2016-10-09 Thread Joseph Qi
Hi Gang,
GLOBAL_INODE_ALLOC_SYSTEM_INODE is used for system file inodes
allocation, you can refer mkfs.c for details.

Thanks,
Joseph

On 2016/10/9 16:47, Gang He wrote:
> Hello Guys,
> 
> If you use debugfs.ocfs2 to list system files for a ocfs2 file system, you 
> can find these two system files.
> sles12sp1-node1:/ # debugfs.ocfs2 /dev/sdb1
> debugfs.ocfs2 1.8.2
> debugfs: ls //
>  6   16   12  .
>  6   16   22  ..
>  7   24   10   1  bad_blocks << ==  
> BAD_BLOCK_SYSTEM_INODE
>  8   32   18   1  global_inode_alloc << ==  
> GLOBAL_INODE_ALLOC_SYSTEM_INODE
>   
> 
> But, What are the purposes of GLOBAL_INODE_ALLOC_SYSTEM_INODE and 
> BAD_BLOCK_SYSTEM_INODE system file?
> For BAD_BLOCK_SYSTEM_INODE system file, it looks to be used to store bad 
> blocks for a file system partition, but from the code, there is not any code 
> for this system file.
> For GLOBAL_INODE_ALLOC_SYSTEM_INODE system file, there is also not any code 
> for it, what is the purpose of this file ?
> 
> 
> Thanks
> Gang
> 
> 
> 
> 
> 
> .
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH] ocfs2: fix undefined struct variable in inode.h

2016-09-20 Thread Joseph Qi
The extern struct variable ocfs2_inode_cache is not defined. It meant to
use ocfs2_inode_cachep defined in super.c, I think. Fortunately it is
not used anywhere now, so no impact actually. Clean it up to fix this
mistake.

Signed-off-by: Joseph Qi <joseph...@huawei.com>
---
 fs/ocfs2/inode.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index 50cc550..5af68fc 100644
--- a/fs/ocfs2/inode.h
+++ b/fs/ocfs2/inode.h
@@ -123,8 +123,6 @@ static inline struct ocfs2_inode_info *OCFS2_I(struct inode 
*inode)
 #define INODE_JOURNAL(i) (OCFS2_I(i)->ip_flags & OCFS2_INODE_JOURNAL)
 #define SET_INODE_JOURNAL(i) (OCFS2_I(i)->ip_flags |= OCFS2_INODE_JOURNAL)

-extern struct kmem_cache *ocfs2_inode_cache;
-
 extern const struct address_space_operations ocfs2_aops;
 extern const struct ocfs2_caching_operations ocfs2_inode_caching_ops;

-- 
1.8.4.3



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH v2 RESEND] ocfs2: fix double unlock in case retry after free truncate log

2016-09-14 Thread Joseph Qi
If ocfs2_reserve_cluster_bitmap_bits fails with ENOSPC, it will try to
free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function. But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.
This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.

Changes since v1:
Use ret instead of status to avoid return value overwritten issue.

Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in
truncate log"
Signed-off-by: Joseph Qi <joseph...@huawei.com>
Signed-off-by: Jiufei Xue <xuejiu...@huawei.com>
---
 fs/ocfs2/suballoc.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index ea47120..6ad3533 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1199,14 +1199,24 @@ retry:
inode_unlock((*ac)->ac_inode);

ret = ocfs2_try_to_free_truncate_log(osb, bits_wanted);
-   if (ret == 1)
+   if (ret == 1) {
+   iput((*ac)->ac_inode);
+   (*ac)->ac_inode = NULL;
goto retry;
+   }

if (ret < 0)
mlog_errno(ret);

inode_lock((*ac)->ac_inode);
-   ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
+   ret = ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
+   if (ret < 0) {
+   mlog_errno(ret);
+   inode_unlock((*ac)->ac_inode);
+   iput((*ac)->ac_inode);
+   (*ac)->ac_inode = NULL;
+   goto bail;
+   }
}
if (status < 0) {
if (status != -ENOSPC)
-- 
1.8.4.3



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Joseph Qi
Okay, IC.
So we have to take care of all errors for ocfs2_write_begin_nolock.

On 2016/9/14 16:43, Eric Ren wrote:
> Hi Joseph,
> 
> On 09/14/2016 04:25 PM, Joseph Qi wrote:
>> Hi Eric,
>> Sorry for the delayed response.
>> I have got your explanation. So we have to unlock the page only in case
>> of retry, right?
>> If so, I think the unlock should be right before "goto try_again".
> No, the mmapped page should be unlocked as long as we cannot return 
> VM_FAULT_LOCKED
> to do_page_mkpage(). Otherwise, the deadlock will happen in do_page_mkpage(). 
> Please
> see the recent 2 mails;-)
> 
> Eric
>>
>> Thanks,
>> Joseph
>>
>> On 2016/9/14 16:04, Eric Ren wrote:
>>> Hi Joseph,
>>>>>> In ocfs2_write_begin_nolock(), we first grab the pages and then
>>>>>> allocate disk space for this write; ocfs2_try_to_free_truncate_log()
>>>>>> will be called if ENOSPC is turned; if we're lucky to get enough 
>>>>>> clusters,
>>>>>> which is usually the case, we start over again. But in 
>>>>>> ocfs2_free_write_ctxt()
>>>>>> the target page isn't unlocked, so we will deadlock when trying to grab
>>>>>> the target page again.
>>>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>>>> w_target_locked is set to true, and then will be unlocked by
>>>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>>>> it in more detail?
>>>> Thanks for review;-) Follow up the calling chain:
>>>>
>>>> ocfs2_free_write_ctxt()
>>>>   ->ocfs2_unlock_pages()
>>>>
>>>> in ocfs2_unlock_pages 
>>>> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>>>> can see the code just put_page(target_page), but not unlock it.
>>> Did this answer your question?
>>>
>>> Thanks,
>>> Eric
>>>> Yeah, I will think this a bit more like:
>>>> why not unlock the target_page there? Is there other potential problems if 
>>>> the "ret" is not "-ENOSPC" but
>>>> other possible error code?
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>> Thanks,
>>>>> Joseph
>>>>>
>>>>>> Fix this issue by unlocking the target page after we fail to allocate
>>>>>> enough space at the first time.
>>>>>>
>>>>>> Jan Kara helps me clear out the JBD2 part, and suggest the hint for root 
>>>>>> cause.
>>>>>>
>>>>>> Signed-off-by: Eric Ren <z...@suse.com>
>>>>>> ---
>>>>>>fs/ocfs2/aops.c | 7 +++
>>>>>>1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>>>>>> index 98d3654..78d1d67 100644
>>>>>> --- a/fs/ocfs2/aops.c
>>>>>> +++ b/fs/ocfs2/aops.c
>>>>>> @@ -1860,6 +1860,13 @@ out:
>>>>>> */
>>>>>>try_free = 0;
>>>>>>+/*
>>>>>> + * Unlock mmap_page because the page has been locked when we
>>>>>> + * are here.
>>>>>> + */
>>>>>> +if (mmap_page)
>>>>>> +unlock_page(mmap_page);
>>>>>> +
>>>>>>ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
>>>>>>if (ret1 == 1)
>>>>>>goto try_again;
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ___
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel@oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>
>>
>>
> 
> 
> .
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH v2] ocfs2: fix double unlock in case retry after free truncate log

2016-09-14 Thread Joseph Qi
If ocfs2_reserve_cluster_bitmap_bits fails with ENOSPC, it will try to
free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function. But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.
This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.

Changes since v1:
Use ret instead of status to avoid return value overwritten issue.

Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in
truncate log"
Signed-off-by: Jospeh Qi 
Signed-off-by: Jiufei Xue 
---
 fs/ocfs2/suballoc.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index ea47120..6ad3533 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1199,14 +1199,24 @@ retry:
inode_unlock((*ac)->ac_inode);

ret = ocfs2_try_to_free_truncate_log(osb, bits_wanted);
-   if (ret == 1)
+   if (ret == 1) {
+   iput((*ac)->ac_inode);
+   (*ac)->ac_inode = NULL;
goto retry;
+   }

if (ret < 0)
mlog_errno(ret);

inode_lock((*ac)->ac_inode);
-   ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
+   ret = ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
+   if (ret < 0) {
+   mlog_errno(ret);
+   inode_unlock((*ac)->ac_inode);
+   iput((*ac)->ac_inode);
+   (*ac)->ac_inode = NULL;
+   goto bail;
+   }
}
if (status < 0) {
if (status != -ENOSPC)
-- 
1.8.4.3


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock()

2016-09-14 Thread Joseph Qi
Hi Eric,
Sorry for the delayed response.
I have got your explanation. So we have to unlock the page only in case
of retry, right?
If so, I think the unlock should be right before "goto try_again".

Thanks,
Joseph

On 2016/9/14 16:04, Eric Ren wrote:
> Hi Joseph,
 In ocfs2_write_begin_nolock(), we first grab the pages and then
 allocate disk space for this write; ocfs2_try_to_free_truncate_log()
 will be called if ENOSPC is turned; if we're lucky to get enough clusters,
 which is usually the case, we start over again. But in 
 ocfs2_free_write_ctxt()
 the target page isn't unlocked, so we will deadlock when trying to grab
 the target page again.
>>> IMO, in ocfs2_grab_pages_for_write, mmap_page is mapping to w_pages and
>>> w_target_locked is set to true, and then will be unlocked by
>>> ocfs2_unlock_pages in ocfs2_free_write_ctxt.
>>> So I'm not getting the case "page isn't unlock". Could you please explain
>>> it in more detail?
>> Thanks for review;-) Follow up the calling chain:
>>
>> ocfs2_free_write_ctxt()
>>  ->ocfs2_unlock_pages()
>>
>> in ocfs2_unlock_pages 
>> (https://github.com/torvalds/linux/blob/master/fs/ocfs2/aops.c#L793), we
>> can see the code just put_page(target_page), but not unlock it.
> Did this answer your question?
> 
> Thanks,
> Eric
>>
>> Yeah, I will think this a bit more like:
>> why not unlock the target_page there? Is there other potential problems if 
>> the "ret" is not "-ENOSPC" but
>> other possible error code?
>>
>> Thanks,
>> Eric
>>
>>>
>>> Thanks,
>>> Joseph
>>>
 Fix this issue by unlocking the target page after we fail to allocate
 enough space at the first time.

 Jan Kara helps me clear out the JBD2 part, and suggest the hint for root 
 cause.

 Signed-off-by: Eric Ren 
 ---
   fs/ocfs2/aops.c | 7 +++
   1 file changed, 7 insertions(+)

 diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
 index 98d3654..78d1d67 100644
 --- a/fs/ocfs2/aops.c
 +++ b/fs/ocfs2/aops.c
 @@ -1860,6 +1860,13 @@ out:
*/
   try_free = 0;
   +/*
 + * Unlock mmap_page because the page has been locked when we
 + * are here.
 + */
 +if (mmap_page)
 +unlock_page(mmap_page);
 +
   ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need);
   if (ret1 == 1)
   goto try_again;

>>>
>>>
>>
>>
>>
>>
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: fix double unlock in case retry after free truncate log

2016-09-14 Thread Joseph Qi
Hi Eric,

On 2016/9/14 15:57, Eric Ren wrote:
> Hello Joseph,
> 
> Thanks for fixing up this.
> 
> On 09/14/2016 12:15 PM, Joseph Qi wrote:
>> If ocfs2_reserve_cluster_bitmap_bits fails with ENOSPC, it will try to
>> free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
>> will lock/unlock global bitmap inode, we have to unlock it before
>> calling this function. But when retry reserve and it fails with no
> You mean the retry succeeds by "retry reserve", right? I fail to understand 
> in which situation
> the retry will fail to get global bitmap inode lock. Because I didn't see 
> this problem when I
> tested my patch, could you explain a bit more?
> 
> Eric
Before retry it has inode unlocked, but ac inode is still valid. And
if inode lock fails this time, it will goto bail and do inode unlock
again.

Thanks,
Joseph

>> global bitmap inode lock taken, it will unlock again in error handling
>> branch and BUG.
>> This issue also exists if no need retry and then ocfs2_inode_lock fails.
>> So fix it.
>>
>> Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in
>> truncate log"
>> Signed-off-by: Jospeh Qi <joseph...@huawei.com>
>> Signed-off-by: Jiufei Xue <xuejiu...@huawei.com>
>> ---
>>   fs/ocfs2/suballoc.c | 13 +++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
>> index ea47120..041453b 100644
>> --- a/fs/ocfs2/suballoc.c
>> +++ b/fs/ocfs2/suballoc.c
>> @@ -1199,14 +1199,23 @@ retry:
>>   inode_unlock((*ac)->ac_inode);
>>
>>   ret = ocfs2_try_to_free_truncate_log(osb, bits_wanted);
>> -if (ret == 1)
>> +if (ret == 1) {
>> +iput((*ac)->ac_inode);
>> +(*ac)->ac_inode = NULL;
>>   goto retry;
>> +}
>>
>>   if (ret < 0)
>>   mlog_errno(ret);
>>
>>   inode_lock((*ac)->ac_inode);
>> -ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
>> +status = ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
>> +if (status < 0) {
>> +inode_unlock((*ac)->ac_inode);
>> +iput((*ac)->ac_inode);
>> +(*ac)->ac_inode = NULL;
>> +goto bail;
>> +}
>>   }
>>   if (status < 0) {
>>   if (status != -ENOSPC)
> 
> 
> 
> .
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH] ocfs2: fix double unlock in case retry after free truncate log

2016-09-13 Thread Joseph Qi
If ocfs2_reserve_cluster_bitmap_bits fails with ENOSPC, it will try to
free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function. But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.
This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.

Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in
truncate log"
Signed-off-by: Jospeh Qi 
Signed-off-by: Jiufei Xue 
---
 fs/ocfs2/suballoc.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
index ea47120..041453b 100644
--- a/fs/ocfs2/suballoc.c
+++ b/fs/ocfs2/suballoc.c
@@ -1199,14 +1199,23 @@ retry:
inode_unlock((*ac)->ac_inode);

ret = ocfs2_try_to_free_truncate_log(osb, bits_wanted);
-   if (ret == 1)
+   if (ret == 1) {
+   iput((*ac)->ac_inode);
+   (*ac)->ac_inode = NULL;
goto retry;
+   }

if (ret < 0)
mlog_errno(ret);

inode_lock((*ac)->ac_inode);
-   ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
+   status = ocfs2_inode_lock((*ac)->ac_inode, NULL, 1);
+   if (status < 0) {
+   inode_unlock((*ac)->ac_inode);
+   iput((*ac)->ac_inode);
+   (*ac)->ac_inode = NULL;
+   goto bail;
+   }
}
if (status < 0) {
if (status != -ENOSPC)
-- 
1.8.4.3


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: oldmle should be put while -EEXIST returned, and the new mle should not be get once at that time.

2016-09-13 Thread Joseph Qi
NAK. This has already been fixed by commit 32e493265b2b ("ocfs2/dlm:
do not insert a new mle when another process is already migrating").
Please submit patch based on the latest kernel.

Thanks,
Joseph

On 2016/9/14 10:30, Guozhonghua wrote:
> 
> In the function dlm_migrate_lockres, while calling dlm_add_migration_mle, and 
> the ret is --EEXIST.
> At this time, the oldmle should be put one time for it had been get once in 
> dlm_find_mle.
> And the new mle should not get once for it had not been initialized before 
> goto fail.
> 
> Signed-off-by: Guozhonghua 
> 
> --- ocfs2.orig/dlm/dlmmaster.c  2016-09-13 15:18:13.602684325 +0800
> +++ ocfs2/dlm/dlmmaster.c   2016-09-14 10:15:10.496873879 +0800
> @@ -2573,8 +2573,6 @@ static int dlm_is_lockres_migrateable(st
>  /*
>   * DLM_MIGRATE_LOCKRES
>   */
> -
> -
>  static int dlm_migrate_lockres(struct dlm_ctxt *dlm,
>struct dlm_lock_resource *res, u8 target)
>  {
> @@ -2621,20 +2619,26 @@ static int dlm_migrate_lockres(struct dl
> spin_lock(>master_lock);
> ret = dlm_add_migration_mle(dlm, res, mle, , name,
> namelen, target, dlm->node_num);
> +   if (ret == -EEXIST) {
> +   if(oldmle)
> +   __dlm_put_mle(oldmle);
> +
> +   spin_unlock(>master_lock);
> +   spin_unlock(>spinlock);
> +   mlog(0, "another process is already migrating it\n");
> +   goto fail;
> +   }
> +
> /* get an extra reference on the mle.
>  * otherwise the assert_master from the new
>  * master will destroy this.
>  */
> dlm_get_mle_inuse(mle);
> +   mle_added = 1;
> +
> spin_unlock(>master_lock);
> spin_unlock(>spinlock);
> 
> -   if (ret == -EEXIST) {
> -   mlog(0, "another process is already migrating it\n");
> -   goto fail;
> -   }
> -   mle_added = 1;
> -
> /*
>  * set the MIGRATING flag and flush asts
>  * if we fail after this we need to re-dirty the lockres
> 
> -
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, 
> which is
> intended only for the person or entity whose address is listed above. Any use 
> of the
> information contained herein in any way (including, but not limited to, total 
> or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please 
> notify the sender
> by phone or email immediately and delete it!
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: fix trans extend while free cached blocks

2016-09-12 Thread Joseph Qi
Thanks Junxiao.
Reviewed-by: Joseph Qi <joseph...@huawei.com>

On 2016/9/12 18:03, Junxiao Bi wrote:
> Root cause of this issue is the same with the one fixed by last patch,
> but this time credits for allocator inode and group descriptor may not
> be consumed before trans extend.
> 
> The following error was caught.
> 
> [  685.240276] WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 
> start_this_handle+0x4c3/0x510 [jbd2]()
> [  685.240294] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss 
> sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager 
> ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 
> nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter 
> ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i 
> libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr 
> ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront 
> fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport 
> pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy 
> pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
> [  685.240296] CPU: 0 PID: 2037 Comm: rm Tainted: GW   
> 4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2
> [  685.240296] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
> [  685.240298]  010d 88007ac3f808 816bc5bc 
> 010d
> [  685.240300]   88007ac3f848 81081475 
> 88007ac3f828
> [  685.240301]  880037bbf000 880037688210 0095 
> 0050
> [  685.240301] Call Trace:
> [  685.240305]  [] dump_stack+0x48/0x5c
> [  685.240308]  [] warn_slowpath_common+0x95/0xe0
> [  685.240310]  [] warn_slowpath_null+0x1a/0x20
> [  685.240313]  [] start_this_handle+0x4c3/0x510 [jbd2]
> [  685.240317]  [] ? __jbd2_log_start_commit+0xe5/0xf0 
> [jbd2]
> [  685.240319]  [] ? __wake_up+0x53/0x70
> [  685.240322]  [] jbd2__journal_restart+0x161/0x1b0 [jbd2]
> [  685.240325]  [] jbd2_journal_restart+0x13/0x20 [jbd2]
> [  685.240340]  [] ocfs2_extend_trans+0x74/0x220 [ocfs2]
> [  685.240347]  [] ocfs2_free_cached_blocks+0x16b/0x4e0 
> [ocfs2]
> [  685.240349]  [] ? internal_add_timer+0x91/0xc0
> [  685.240356]  [] ocfs2_run_deallocs+0x70/0x270 [ocfs2]
> [  685.240363]  [] ocfs2_commit_truncate+0x474/0x6f0 [ocfs2]
> [  685.240374]  [] ? 
> ocfs2_xattr_tree_et_ops+0x60/0xfffe8c00 [ocfs2]
> [  685.240384]  [] ? ocfs2_journal_access_eb+0x20/0x20 
> [ocfs2]
> [  685.240385]  [] ? __sb_end_write+0x33/0x70
> [  685.240394]  [] ocfs2_truncate_for_delete+0xbd/0x380 
> [ocfs2]
> [  685.240402]  [] ? ocfs2_query_inode_wipe+0xf4/0x320 
> [ocfs2]
> [  685.240409]  [] ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
> [  685.240415]  [] ? ocfs2_query_inode_wipe+0xf4/0x320 
> [ocfs2]
> [  685.240422]  [] ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
> [  685.240424]  [] ? __inode_wait_for_writeback+0x69/0xc0
> [  685.240437]  [] ? 
> __PRETTY_FUNCTION__.112282+0x20/0xb500 [ocfs2]
> [  685.240444]  [] ocfs2_evict_inode+0x28/0x60 [ocfs2]
> [  685.240445]  [] evict+0xab/0x1a0
> [  685.240456]  [] ? 
> __PRETTY_FUNCTION__.112282+0x20/0xb500 [ocfs2]
> [  685.240457]  [] iput_final+0xf6/0x190
> [  685.240458]  [] iput+0xc8/0xe0
> [  685.240460]  [] do_unlinkat+0x1b7/0x310
> [  685.240462]  [] ? __audit_syscall_entry+0xac/0x110
> [  685.240464]  [] ? do_audit_syscall_entry+0x6c/0x70
> [  685.240465]  [] ? syscall_trace_enter_phase1+0x153/0x180
> [  685.240467]  [] SyS_unlinkat+0x22/0x40
> [  685.240468]  [] system_call_fastpath+0x12/0x71
> [  685.240469] ---[ end trace a62437cb060baa71 ]---
> [  685.240470] JBD2: rm wants too many credits (149 > 128)
> 
> Signed-off-by: Junxiao Bi <junxiao...@oracle.com>
> ---
>  fs/ocfs2/alloc.c |   27 +--
>  1 file changed, 9 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index 51128789a661..f165f867f332 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -6404,43 +6404,34 @@ static int ocfs2_free_cached_blocks(struct 
> ocfs2_super *osb,
>   goto out_mutex;
>   }
>  
> - handle = ocfs2_start_trans(osb, OCFS2_SUBALLOC_FREE);
> - if (IS_ERR(handle)) {
> - ret = PTR_ERR(handle);
> - mlog_errno(ret);
> - goto out_unlock;
> - }
> -
>   while (head) {
>   if (head->free_bg)
>   bg_blkno = head->free_bg;
>   else
>   bg_blkno = ocfs2_which_suballoc_group(head->

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2: fix trans extend while flush truncate log

2016-09-12 Thread Joseph Qi
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -5922,7 +5922,6 @@ bail:
>  }
>  
>  static int ocfs2_replay_truncate_records(struct ocfs2_super *osb,
> -  handle_t *handle,
>struct inode *data_alloc_inode,
>struct buffer_head *data_alloc_bh)
>  {
> @@ -5935,11 +5934,19 @@ static int ocfs2_replay_truncate_records(struct 
> ocfs2_super *osb,
>   struct ocfs2_truncate_log *tl;
>   struct inode *tl_inode = osb->osb_tl_inode;
>   struct buffer_head *tl_bh = osb->osb_tl_bh;
> + handle_t *handle;
>  
>   di = (struct ocfs2_dinode *) tl_bh->b_data;
>   tl = >id2.i_dealloc;
>   i = le16_to_cpu(tl->tl_used) - 1;
>   while (i >= 0) {
> + handle = ocfs2_start_trans(osb, 
> OCFS2_TRUNCATE_LOG_FLUSH_ONE_REC);
> + if (IS_ERR(handle)) {
> + status = PTR_ERR(handle);
> + mlog_errno(status);
> + goto bail;
> + }
> +
>   /* Caller has given us at least enough credits to
>* update the truncate log dinode */
So we do not need this comment any more, am I right?
Anyway it looks good to me.

Reviewed-by: Joseph Qi <joseph...@huawei.com>

>   status = ocfs2_journal_access_di(handle, INODE_CACHE(tl_inode), 
> tl_bh,
> @@ -5974,12 +5981,7 @@ static int ocfs2_replay_truncate_records(struct 
> ocfs2_super *osb,
>   }
>   }
>  
> - status = ocfs2_extend_trans(handle,
> - OCFS2_TRUNCATE_LOG_FLUSH_ONE_REC);
> - if (status < 0) {
> - mlog_errno(status);
> - goto bail;
> - }
> + ocfs2_commit_trans(osb, handle);
>   i--;
>   }
>  
> @@ -5994,7 +5996,6 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>  {
>   int status;
>   unsigned int num_to_flush;
> - handle_t *handle;
>   struct inode *tl_inode = osb->osb_tl_inode;
>   struct inode *data_alloc_inode = NULL;
>   struct buffer_head *tl_bh = osb->osb_tl_bh;
> @@ -6038,21 +6039,11 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super 
> *osb)
>   goto out_mutex;
>   }
>  
> - handle = ocfs2_start_trans(osb, OCFS2_TRUNCATE_LOG_FLUSH_ONE_REC);
> - if (IS_ERR(handle)) {
> - status = PTR_ERR(handle);
> - mlog_errno(status);
> - goto out_unlock;
> - }
> -
> - status = ocfs2_replay_truncate_records(osb, handle, data_alloc_inode,
> + status = ocfs2_replay_truncate_records(osb, data_alloc_inode,
>  data_alloc_bh);
>   if (status < 0)
>   mlog_errno(status);
>  
> - ocfs2_commit_trans(osb, handle);
> -
> -out_unlock:
>   brelse(data_alloc_bh);
>   ocfs2_inode_unlock(data_alloc_inode, 1);
>  
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [Ocfs2-test-devel] OCFS2 test report for linux vanilla kernel V4.7.0

2016-08-21 Thread Joseph Qi
Hi Eric,
Thanks very much for your efforts.
I am interested in the failed inline testcase. Could you please send out
the testcase as well as detailed log? I'll do analysis when I have time.

Thanks,
Joseph

On 2016/8/22 10:42, Eric Ren wrote:
> Hi,
> 
> 
> The test report below is agaist vanilla kernel v4.7.0. Some highlights:
> 
> 1.  As you can see from logs attached, pcmk stack is used with 
> "blocksize=4096, clustersize=32768";
> 
> 2. "inline" testcase on multiple nodes failed 3/3 times so far; seems to be a 
> regression issue;
> 
> 3. Two cases are skipped:
> o lvb_torture:  libo2dlm [1] doesn't correctly support LVB operations for 
> fs/dlm;
> 
>   Actually, ocfs2 tools (mkfs and fsck) even don't need LVB operations, 
> so it is not worth touching
> 
>  libo2dlm right now, I think;
> 
> o filecheck: is not scheduled this time;
> 
> 4. "discontig test" is missing now:-/
> 
> 
> If anyone is intersted in  the detailed test logs, I can upload somewhere;-) 
> will schedule test for V4.8.rc2 soon.
> 
> [1] https://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-May/000761.html
> 
> Eric
> 
> run-dev-test
>   *BUILD UNSTABLE*
> Project:  run-dev-test
> Date of build:Sat, 20 Aug 2016 20:11:19 +0800
> Build duration:   19 hr
> Build cause:  Started by upstream project "zren-testing/ocfs2-dlm-dev" build 
> number 16
> Build description: linux vanilla kernel V4.7.0
> Built on: HA-232
> 
> 
>   Health Report
> 
> W Description Score
>   Build stability: 2 out of the last 5 builds failed. 60
>   Test Result: 1 test failing out of a total of 29 tests. 96
> 
> 
>   Tests Reports
> 
> 
> Package   Failed  Passed  Skipped Total
> MultipleNodes 1   8   1   *10*
> MultipleNodes.inline.inline-test
> SingleNode0   18  1   *19*
> 
> 
> 
> 
> ___
> Ocfs2-test-devel mailing list
> ocfs2-test-de...@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-test-devel
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] About ocfs2 file sytem fragmentation tool

2016-08-11 Thread Joseph Qi
Hi Gang,
I think you'd better analyze in which ENOSPC returns.
Once case is though free space looks enough, it still cannot fulfill
the requirement because of requesting contiguous space.

Thanks,
Joseph

On 2016/8/11 17:27, Gang He wrote:
> Hello Joseph,
> 
> Thank for your good suggestion.
> By the way, did you encounter the issue, the user could not write with the 
> error message "No space left on device", although there was enough fs space 
> available? the user just wondered if the problem was related to file system 
> fragmentation (it looks like the bug, which was fixed by Eric some week ago). 
> 
> Thanks
> Gang
> 
> 

>> Hi Gang,
>> We can also get some information from
>> "debugfs.ocfs2 -R 'stat //global_bitmap' "
>> But unfortunately there is no summary information such as fragmentation
>> ratio.
>> We have encountered a problem that once volume usage exceeds 95%, create
>> a new big file will consume much longer time, which is because of each
>> gd in chains has little contiguous clusters.
>>
>> Thanks,
>> Joseph
>>
>>
>> On 2016/8/11 14:03, Gang He wrote:
>>> Hello Guys,
>>>
>>> Our customer is asking one question, how to detect a ocfs2 file system 
>> fragmentation status.
>>> Current, I can think of ways for detecting fragmentation as below,
>>> 1) o2info --freefrag N /dev/vdb3
>>> this command can give some information for how many free chunks (based on 
>> size) are here,
>>> but it can not give any conclusive information, e.g. the file system is 
>> fragmented, or not.  fragmentation ratio.
>>> 2) debugfs.ocfs2 -R "frag /fio1/test1"  /dev/vdb3
>>> this command can give some information only for one file, how to get the 
>> whole volume information, e.g. the file system fragmentation ratio.
>>>
>>> So, anybody can give some suggestion for ocfs2 file system fragmentation 
>> tool, it is very appreciated.
>>> For example, there is any more better way to detect the file system 
>> fragmentation ratio.
>>> The further question, if there are some free blocks in the file system, but 
>> the user cannot create a file, it is also related to fragmentation problem?
>>> The use have to use "discontig-bg" feature to overcome this problem?
>>>
>>>
>>> Thanks
>>> Gang
>>>
>>>
>>> ___
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel@oss.oracle.com 
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>>>
>>>
> 
> 
> .
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] About ocfs2 file sytem fragmentation tool

2016-08-11 Thread Joseph Qi
Hi Gang,
We can also get some information from
"debugfs.ocfs2 -R 'stat //global_bitmap' "
But unfortunately there is no summary information such as fragmentation
ratio.
We have encountered a problem that once volume usage exceeds 95%, create
a new big file will consume much longer time, which is because of each
gd in chains has little contiguous clusters.

Thanks,
Joseph


On 2016/8/11 14:03, Gang He wrote:
> Hello Guys,
> 
> Our customer is asking one question, how to detect a ocfs2 file system 
> fragmentation status.
> Current, I can think of ways for detecting fragmentation as below,
> 1) o2info --freefrag N /dev/vdb3
> this command can give some information for how many free chunks (based on 
> size) are here,
> but it can not give any conclusive information, e.g. the file system is 
> fragmented, or not.  fragmentation ratio.
> 2) debugfs.ocfs2 -R "frag /fio1/test1"  /dev/vdb3
> this command can give some information only for one file, how to get the 
> whole volume information, e.g. the file system fragmentation ratio.
> 
> So, anybody can give some suggestion for ocfs2 file system fragmentation 
> tool, it is very appreciated.
> For example, there is any more better way to detect the file system 
> fragmentation ratio.
> The further question, if there are some free blocks in the file system, but 
> the user cannot create a file, it is also related to fragmentation problem?
> The use have to use "discontig-bg" feature to overcome this problem?
> 
> 
> Thanks
> Gang
> 
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH v3] ocfs2: retry on ENOSPC if sufficient space in truncate log

2016-07-11 Thread Joseph Qi
On 2016/7/9 10:32, Eric Ren wrote:
> The testcase "mmaptruncate" in ocfs2 test suite always fails with
> ENOSPC error on small volume (say less than 10G). This testcase
> repeatedly performs "extend" and "truncate" on a file. Continuously,
> it truncates the file to 1/2 of the size, and then extends to 100% of
> the size. The main bitmap will quickly run out of space because the
> "truncate" code prevent truncate log from being flushed by
> ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may
> have cached lots of clusters.
> 
> So retry to allocate after flushing truncate log when ENOSPC is
> returned. And we cannot reuse the deleted blocks before the transaction
> committed. Fortunately, we already have a function to do this -
> ocfs2_try_to_free_truncate_log(). Just need to remove the "static"
> modifier and put it into the right place.
> 
> The "unlock"/"lock" code isn't elegant, but looks no better option.
> 
> v3:
> 1. Also need to lock allocator inode when "= 0" is returned from
> ocfs2_schedule_truncate_log_flush(), which means no space really.
> -- spotted by Joseph Qi
> 
> v2:
> 1. Lock allocator inode again if ocfs2_schedule_truncate_log_flush()
> fails. -- spotted by Joseph Qi <joseph...@huawei.com>
> 
> Signed-off-by: Eric Ren <z...@suse.com>

Reviewed-by: Joseph Qi <joseph...@huawei.com>
> ---
>  fs/ocfs2/alloc.c| 37 +
>  fs/ocfs2/alloc.h|  2 ++
>  fs/ocfs2/aops.c | 37 -
>  fs/ocfs2/suballoc.c | 20 +++-
>  4 files changed, 58 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index 460c0ce..7dabbc3 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -6106,6 +6106,43 @@ void ocfs2_schedule_truncate_log_flush(struct 
> ocfs2_super *osb,
>   }
>  }
>  
> +/*
> + * Try to flush truncate logs if we can free enough clusters from it.
> + * As for return value, "< 0" means error, "0" no space and "1" means
> + * we have freed enough spaces and let the caller try to allocate again.
> + */
> +int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
> + unsigned int needed)
> +{
> + tid_t target;
> + int ret = 0;
> + unsigned int truncated_clusters;
> +
> + inode_lock(osb->osb_tl_inode);
> + truncated_clusters = osb->truncated_clusters;
> + inode_unlock(osb->osb_tl_inode);
> +
> + /*
> +  * Check whether we can succeed in allocating if we free
> +  * the truncate log.
> +  */
> + if (truncated_clusters < needed)
> + goto out;
> +
> + ret = ocfs2_flush_truncate_log(osb);
> + if (ret) {
> + mlog_errno(ret);
> + goto out;
> + }
> +
> + if (jbd2_journal_start_commit(osb->journal->j_journal, )) {
> + jbd2_log_wait_commit(osb->journal->j_journal, target);
> + ret = 1;
> + }
> +out:
> + return ret;
> +}
> +
>  static int ocfs2_get_truncate_log_info(struct ocfs2_super *osb,
>  int slot_num,
>  struct inode **tl_inode,
> diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
> index f3dc1b0..4a5152e 100644
> --- a/fs/ocfs2/alloc.h
> +++ b/fs/ocfs2/alloc.h
> @@ -188,6 +188,8 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
> u64 start_blk,
> unsigned int num_clusters);
>  int __ocfs2_flush_truncate_log(struct ocfs2_super *osb);
> +int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
> +unsigned int needed);
>  
>  /*
>   * Process local structure which describes the block unlinks done
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index c034edf..1802aef 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -1645,43 +1645,6 @@ static int ocfs2_zero_tail(struct inode *inode, struct 
> buffer_head *di_bh,
>   return ret;
>  }
>  
> -/*
> - * Try to flush truncate logs if we can free enough clusters from it.
> - * As for return value, "< 0" means error, "0" no space and "1" means
> - * we have freed enough spaces and let the caller try to allocate again.
> - */
> -static int ocfs2_try_to_free_truncate_log(struct ocfs2_super *osb,
> -   unsigned int needed)
> -{
> - tid_t target;
> - int ret = 0;
> - unsigned int truncated_cl

Re: [Ocfs2-devel] ocfs2/dlm: continue to purge recovery lockres when recovery master goes down

2016-07-10 Thread Joseph Qi
On 2016/7/10 18:04, piaojun wrote:
> We found a dlm-blocked situation caused by continuous breakdown of
> recovery masters described below. To solve this problem, we should purge
> recovery lock once detecting recovery master goes down.
> 
> N3  N2   N1(reco master)
> go down
>  pick up recovery lock and
>  begin recoverying for N2
> 
>  go down
> 
> pick up recovery
> lock failed, then
> purge it:
> dlm_purge_lockres
>   ->DROPPING_REF is set
> 
> send deref to N1 failed,
> recovery lock is not purged
> 
> find N1 go down, begin
> recoverying for N1, but
> blocked in dlm_do_recovery
> as DROPPING_REF is set:
> dlm_do_recovery
>   ->dlm_pick_recovery_master
> ->dlmlock
>   ->dlm_get_lock_resource
> ->__dlm_wait_on_lockres_flags(tmpres,
>   DLM_LOCK_RES_DROPPING_REF);
> 
> Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes 
> down")
> Signed-off-by: Jun Piao 
Looks good to me, thanks.

Thanks,
Joseph

> ---
>  fs/ocfs2/dlm/dlmcommon.h   |  2 ++
>  fs/ocfs2/dlm/dlmmaster.c   | 38 +++---
>  fs/ocfs2/dlm/dlmrecovery.c | 30 ++
>  fs/ocfs2/dlm/dlmthread.c   | 46 
> ++
>  4 files changed, 73 insertions(+), 43 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmcommon.h b/fs/ocfs2/dlm/dlmcommon.h
> index 004f2cb..3e3e9ba8 100644
> --- a/fs/ocfs2/dlm/dlmcommon.h
> +++ b/fs/ocfs2/dlm/dlmcommon.h
> @@ -1004,6 +1004,8 @@ int dlm_finalize_reco_handler(struct o2net_msg *msg, 
> u32 len, void *data,
>  int dlm_do_master_requery(struct dlm_ctxt *dlm, struct dlm_lock_resource 
> *res,
> u8 nodenum, u8 *real_master);
>  
> +void __dlm_do_purge_lockres(struct dlm_ctxt *dlm,
> + struct dlm_lock_resource *res);
>  
>  int dlm_dispatch_assert_master(struct dlm_ctxt *dlm,
>  struct dlm_lock_resource *res,
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 8c84641..c0b560d 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -2425,51 +2425,19 @@ int dlm_deref_lockres_done_handler(struct o2net_msg 
> *msg, u32 len, void *data,
>   mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
>   "but it is already derefed!\n", dlm->name,
>   res->lockname.len, res->lockname.name, node);
> - dlm_lockres_put(res);
>   goto done;
>   }
> -
> - if (!list_empty(>purge)) {
> - mlog(0, "%s: Removing res %.*s from purgelist\n",
> - dlm->name, res->lockname.len, res->lockname.name);
> - list_del_init(>purge);
> - dlm_lockres_put(res);
> - dlm->purge_count--;
> - }
> -
> - if (!__dlm_lockres_unused(res)) {
> - mlog(ML_ERROR, "%s: res %.*s in use after deref\n",
> - dlm->name, res->lockname.len, res->lockname.name);
> - __dlm_print_one_lock_resource(res);
> - BUG();
> - }
> -
> - __dlm_unhash_lockres(dlm, res);
> -
> - spin_lock(>track_lock);
> - if (!list_empty(>tracking))
> - list_del_init(>tracking);
> - else {
> - mlog(ML_ERROR, "%s: Resource %.*s not on the Tracking list\n",
> -  dlm->name, res->lockname.len, res->lockname.name);
> - __dlm_print_one_lock_resource(res);
> - }
> - spin_unlock(>track_lock);
> -
> - /* lockres is not in the hash now. drop the flag and wake up
> -  * any processes waiting in dlm_get_lock_resource.
> -  */
> - res->state &= ~DLM_LOCK_RES_DROPPING_REF;
> + __dlm_do_purge_lockres(dlm, res);
>   spin_unlock(>spinlock);
>   wake_up(>wq);
>  
> - dlm_lockres_put(res);
> -
>   spin_unlock(>spinlock);
>  
>   ret = 0;
>  
>  done:
> + if (res)
> + dlm_lockres_put(res);
>   dlm_put(dlm);
>   return ret;
>  }
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index f6b3138..d926887 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -2343,6 +2343,7 @@ static void dlm_do_local_recovery_cleanup(struct 
> dlm_ctxt *dlm, u8 dead_node)
>   struct dlm_lock_resource *res;
>   int i;
>   struct hlist_head *bucket;
> + struct hlist_node *tmp;
>   struct dlm_lock *lock;
>  
>  
> @@ -2365,7 +2366,7 @@ static void dlm_do_local_recovery_cleanup(struct 
> dlm_ctxt *dlm, u8 dead_node)
>*/
>   for (i = 0; i < DLM_HASH_BUCKETS; i++) {
>   bucket = dlm_lockres_hash(dlm, i);
> - hlist_for_each_entry(res, bucket, hash_node) {
> + hlist_for_each_entry_safe(res, tmp, bucket, hash_node) {
>  

Re: [Ocfs2-devel] ocfs2/dlm: solve a BUG when deref failed in dlm_drop_lockres_ref

2016-07-10 Thread Joseph Qi
On 2016/7/10 18:03, piaojun wrote:
> We found a BUG situation that lockres is migrated during deref
> described below. To solve the BUG, we could purge lockres directly when
> other node says I did not have a ref. Additionally, we'd better purge
> lockres if master goes down, as no one will response deref done.
> 
> Node 1  Node 2(old master) Node3(new master)
> dlm_purge_lockres
> send deref to N2
> 
> leave domain
> migrate lockres to N3
>finish migration
>send do assert
>master to N1
> 
> receive do assert msg
> form N3, but can not
> find lockres because
> DROPPING_REF is set,
> so the owner is still
> N2.
> 
> receive deref from N1
> and response -EINVAL
> because lockres is migrated
> 
> BUG when receive -EINVAL
> in dlm_drop_lockres_ref
> 
> Fixes: 842b90b62461d ("ocfs2/dlm: return in progress if master can not clear 
> the refmap bit...")
> Signed-off-by: Jun Piao 
Use full patch title please.
Others looks well.

Thanks,
Joseph

> ---
>  fs/ocfs2/dlm/dlmmaster.c |  9 ++---
>  fs/ocfs2/dlm/dlmthread.c | 13 +++--
>  2 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index f72e7ae..8c84641 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -2276,9 +2276,12 @@ int dlm_drop_lockres_ref(struct dlm_ctxt *dlm, struct 
> dlm_lock_resource *res)
>   mlog(ML_ERROR, "%s: res %.*s, DEREF to node %u got %d\n",
>dlm->name, namelen, lockname, res->owner, r);
>   dlm_print_one_lock_resource(res);
> - BUG();
> - }
> - return ret ? ret : r;
> + if (r == -ENOMEM)
> + BUG();
> + } else
> + ret = r;
> +
> + return ret;
>  }
>  
>  int dlm_deref_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
> diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
> index 68d239b..ce39722 100644
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -175,6 +175,15 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm,
>res->lockname.len, res->lockname.name, master);
>  
>   if (!master) {
> + if (res->state & DLM_LOCK_RES_DROPPING_REF) {
> + mlog(ML_NOTICE, "%s: res %.*s already in "
> + "DLM_LOCK_RES_DROPPING_REF state\n",
> + dlm->name, res->lockname.len,
> + res->lockname.name);
> + spin_unlock(>spinlock);
> + return;
> + }
> +
>   res->state |= DLM_LOCK_RES_DROPPING_REF;
>   /* drop spinlock...  retake below */
>   spin_unlock(>spinlock);
> @@ -203,8 +212,8 @@ static void dlm_purge_lockres(struct dlm_ctxt *dlm,
>   dlm->purge_count--;
>   }
>  
> - if (!master && ret != 0) {
> - mlog(0, "%s: deref %.*s in progress or master goes down\n",
> + if (!master && ret == DLM_DEREF_RESPONSE_INPROG) {
> + mlog(0, "%s: deref %.*s in progress\n",
>   dlm->name, res->lockname.len, res->lockname.name);
>   spin_unlock(>spinlock);
>   return;
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF, is cleared before dlm_deref_lockres_done_handler

2016-07-10 Thread Joseph Qi
Hi Jun,

On 2016/7/10 18:01, piaojun wrote:
> We found a BUG situation in which DLM_LOCK_RES_DROPPING_REF is cleared
> unexpected that described below. To solve the bug, we disable the BUG_ON
> and purge lockres in dlm_do_local_recovery_cleanup.
> 
> Node 1   Node 2(master)
> dlm_purge_lockres
>  dlm_deref_lockres_handler
> 
>  DLM_LOCK_RES_SETREF_INPROG is set
>  response DLM_DEREF_RESPONSE_INPROG
> 
> receive DLM_DEREF_RESPONSE_INPROG
> stop puring in dlm_purge_lockres
> and wait for DLM_DEREF_RESPONSE_DONE
> 
>  dispatch dlm_deref_lockres_worker
>  response DLM_DEREF_RESPONSE_DONE
> 
> receive DLM_DEREF_RESPONSE_DONE and
> prepare to purge lockres
> 
>  Node 2 goes down
> 
> find Node2 down and do local
> clean up for Node2:
> dlm_do_local_recovery_cleanup
>   -> clear DLM_LOCK_RES_DROPPING_REF
> 
> when purging lockres, BUG_ON happens
> because DLM_LOCK_RES_DROPPING_REF is clear:
> dlm_deref_lockres_done_handler
>   ->BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
> 
> Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message")
> Signed-off-by: Jun Piao 
> ---
>  fs/ocfs2/dlm/dlmmaster.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 9aed6e2..f72e7ae 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -2416,7 +2416,16 @@ int dlm_deref_lockres_done_handler(struct o2net_msg 
> *msg, u32 len, void *data,
>   }
>  
>   spin_lock(>spinlock);
> - BUG_ON(!(res->state & DLM_LOCK_RES_DROPPING_REF));
> + if (!(res->state & DLM_LOCK_RES_DROPPING_REF)) {
> + spin_unlock(>spinlock);
> + spin_unlock(>spinlock);
> + mlog(ML_NOTICE, "%s:%.*s: node %u sends deref done "
> + "but it is already derefed!\n", dlm->name,
> + res->lockname.len, res->lockname.name, node);
> + dlm_lockres_put(res);
So we treat this case as normal?
If so, we'd better return 0 other than -EINVAL.

Thanks,
Joseph

> + goto done;
> + }
> +
>   if (!list_empty(>purge)) {
>   mlog(0, "%s: Removing res %.*s from purgelist\n",
>   dlm->name, res->lockname.len, res->lockname.name);
> @@ -2455,6 +2464,8 @@ int dlm_deref_lockres_done_handler(struct o2net_msg 
> *msg, u32 len, void *data,
>  
>   spin_unlock(>spinlock);
>  
> + ret = 0;
> +
>  done:
>   dlm_put(dlm);
>   return ret;
> 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


  1   2   3   4   5   >