from:"Andrew Morton"

Re: [Ocfs2-devel] [PATCH] ocfs2: ocfs2_inode_lock_tracker does not distinguish lock level

2018-05-11 Thread Andrew Morton

On Fri, 11 May 2018 12:16:51 +0800 Larry Chen  wrote:

> > Nice changelog, but it gives no information about the severity of the
> > bug: how often does it hit and what is the end-user impact.
> >
> > This info is needed so that I and others can decide which kernel
> > version(s) need the patch, so please always include it when fixing a
> > bug, thanks.
> 
> Thanks for your review and feel sorry for not providing enough information.
> 
> For the status quo of ocfs2, without this patch, neither a bug nor end-user
> impact will be caused because the wrong logic is avoided.
> 
> But I'm afraid this generic interface, may be called by other
> developers in future and used in this situation.
> 
>      a process
> ocfs2_inode_lock_tracker(ex=0)
> ocfs2_inode_lock_tracker(ex=1)

OK, thanks.

> By the way, should I resend this patch with this info included?

I pasted the above into my copy of the changelog so we're good.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: ocfs2_inode_lock_tracker does not distinguish lock level

2018-05-10 Thread Andrew Morton

On Thu, 10 May 2018 13:32:30 +0800 Larry Chen  wrote:

> ocfs2_inode_lock_tracker as a variant of ocfs2_inode_lock,
> is used to prevent deadlock due to recursive lock acquisition.
> 
> But this function does not distinguish
> whether the requested level is EX or PR.
> 
> If a RP lock has been attained, this function
> will immediately return success afterwards even
> an EX lock is requested.
> 
> But actually the return value does not mean that
> the process got a EX lock, because ocfs2_inode_lock
> has not been called.
> 
> When taking lock levels into account, we face some different situations.
> 1. no lock is held
>In this case, just lock the inode and return 0
> 
> 2. We are holding a lock
>For this situation, things diverges into several cases
> 
>wanted holding  what to do
>ex ex  see 2.1 below
>ex pr  see 2.2 below
>pr ex  see 2.1 below
>pr pr  see 2.1 below
> 
>2.1 lock level that is been held is compatible
>with the wanted level, so no lock action will be tacken.
> 
>2.2 Otherwise, an upgrade is needed, but it is forbidden.
> 
> Reason why upgrade within a process is forbidden is that
> lock upgrade may cause dead lock. The following illustrate
> how it happens.
> 
> process 1 process 2
> ocfs2_inode_lock_tracker(ex=0)
><==   ocfs2_inode_lock_tracker(ex=1)
> 
> ocfs2_inode_lock_tracker(ex=1)
> 

Nice changelog, but it gives no information about the severity of the
bug: how often does it hit and what is the end-user impact.

This info is needed so that I and others can decide which kernel
version(s) need the patch, so please always include it when fixing a
bug, thanks.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH V2] ocfs2: Take inode cluster lock before moving reflinked inode from orphan dir

2018-05-01 Thread Andrew Morton

On Wed, 11 Apr 2018 12:31:47 -0700 Ashish Samant  
wrote:

> While reflinking an inode, we create a new inode in orphan directory, then
> take EX lock on it, reflink the original inode to orphan inode and release
> EX lock. Once the lock is released another node could request it in EX mode
> from ocfs2_recover_orphans() which causes downconvert of the lock, on this
> node, to NL mode.
> 
> Later we attempt to initialize security acl for the orphan inode and move
> it to the reflink destination. However, while doing this we dont take EX
> lock on the inode. This could potentially cause problems because we could
> be starting transaction, accessing journal and modifying metadata of the
> inode while holding NL lock and with another node holding EX lock on the
> inode.

Someone (eg, me) needs to decide which kernel version(s) need the
patch.  And all I have is "this could potentially cause problems".

Can you please be more specific about the end-user impact?  *Does* it
cause problems?  Is the problem sufficiently urgent to warrant a merge
into 4.17?  Into -stable kernels?

Please always include all the end-user-impact info when fixing bugs.

Thanks.

> Fix this by taking orphan inode cluster lock in EX mode before
> initializing security and moving orphan inode to reflink destination.
> Use the __tracker variant while taking inode lock to avoid recursive
> locking in the ocfs2_init_security_and_acl() call chain.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: don't evaluate buffer head to NULL managed by caller

2018-03-29 Thread Andrew Morton

On Thu, 29 Mar 2018 10:06:02 +0800 Changwei Ge  wrote:

> ocfs2_read_blocks() is used to read several blocks from disk.
> Currently, the input argument *bhs* can be NULL or NOT. It depends on
> the caller's behavior. If the function fails in reading blocks from
> disk, the corresponding bh will be assigned to NULL and put.
> 
> Obviously, above process for non-NULL input bh is not appropriate.
> Because the caller doesn't even know its bhs are put and re-assigned.
> 
> If buffer head is managed by caller, ocfs2_read_blocks should not
> evaluate it to NULL. It will cause caller accessing illegal memory,
> thus crash.

(What about ocfs2_read_blocks_sync()?)

Passing non-NULL entries in bhs[] looks like a weird thing to do.  Do
any callers actually do this?  And of they do, do they actually care
about the alteration of bhs[] if the call failed?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: don't handle migrate lockres if already in shutdown

2018-03-01 Thread Andrew Morton

On Thu, 1 Mar 2018 20:37:50 +0800 piaojun  wrote:

> Hi Changwei,
> 
> Thanks for your quick reply, please see my comments below.
> 
> On 2018/3/1 17:39, Changwei Ge wrote:
> > Hi Jun,
> > 
> > On 2018/3/1 17:27, piaojun wrote:
> >> We should not handle migrate lockres if we are already in
> >> 'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after
> >> leaving dlm domain. At last other nodes will get stuck into infinite
> >> loop when requsting lock from us.
> >>
> >>  N1 N2 (owner)
> >> touch file
> >>
> >> access the file,
> >> and get pr lock
> >>
> >> umount
> >>
> > 
> > Before migrating all lock resources, N1 should have already sent 
> > DLM_BEGIN_EXIT_DOMAIN_MSG in dlm_begin_exit_domain().
> > N2 will set ->exit_domain_map later.
> > So N2 can't take N1 as migration target.
> Before receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as
> the migrate target. So N2 will continue sending lockres to N1 even though
> N1 has left domain. Sorry for making you misunderstanding, I will give a
> more detailed description.
> 
> N1 N2 (owner)
>touch file
> 
> access the file,
> and get pr lock
> 
>begin leave domain and
>pick up N1 as new owner
> 
> begin leave domain and
> migrate all lockres done
> 
>begin migrate lockres to N1
> 
> end leave domain, but
> the lockres left
> unexpectedly, because
> migrate task has passed

If someone asked a question then this is a sign that the changelog was
missing details.  So please do send along a v2 with a more
comprehensive changelog.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2: make metadata estimation accurate and clear

2018-01-08 Thread Andrew Morton

On Mon, 8 Jan 2018 01:36:05 + Changwei Ge  wrote:

> Current code assume that ::w_unwritten_list always has only one item on.
> This is not right and hard to get understood.
> So improve how to count unwritten item.

I get a large number of patch rejects applying this (and patch 2) to
current mainline.  What kernel version are these against?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGE

2018-01-05 Thread Andrew Morton

  0   12140   7268   1456 S 0.333 0.355   0:00.34 haveged
 1282 root  rt   0  84 123108  97224 S 0.333 6.017   0:01.33 corosync

ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o
~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d
/dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared
Tests with "-b 4096 -C 32768"
Thu Dec 28 15:04:12 CST 2017
multi_mmap..Passed.
Runtime 487 seconds.

Link: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1514447305-2D30814-2D1-2Dgit-2Dsend-2Demail-2Dghe-40suse.com&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=M21nNntO7tCMZO9CQ1nGLTIQwQ6xQ3PgJCJKmqD9z70&s=5WyCM97hXAgkY6xKX9EY5dBcNi7g1rmMGG0sYehl4fs&e=
Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock")
Signed-off-by: Gang He 
Reviewed-by: Eric Ren 
Acked-by: alex chen 
Acked-by: piaojun 
Cc: Mark Fasheh 
Cc: Joel Becker 
Cc: Junxiao Bi 
Cc: Joseph Qi 
Cc: Changwei Ge 
Signed-off-by: Andrew Morton 
---

 fs/ocfs2/dlmglue.c |9 +
 1 file changed, 9 insertions(+)

diff -puN 
fs/ocfs2/dlmglue.c~ocfs2-try-a-blocking-lock-before-return-aop_truncated_page 
fs/ocfs2/dlmglue.c
--- 
a/fs/ocfs2/dlmglue.c~ocfs2-try-a-blocking-lock-before-return-aop_truncated_page
+++ a/fs/ocfs2/dlmglue.c
@@ -2529,6 +2529,15 @@ int ocfs2_inode_lock_with_page(struct in
ret = ocfs2_inode_lock_full(inode, ret_bh, ex, OCFS2_LOCK_NONBLOCK);
if (ret == -EAGAIN) {
unlock_page(page);
+   /*
+* If we can't get inode lock immediately, we should not return
+* directly here, since this will lead to a softlockup problem.
+* The method is to get a blocking lock and immediately unlock
+* before returning, this can avoid CPU resource waste due to
+* lots of retries, and benefits fairness in getting lock.
+*/
+   if (ocfs2_inode_lock(inode, ret_bh, ex) == 0)
+   ocfs2_inode_unlock(inode, ex);
ret = AOP_TRUNCATED_PAGE;
}
 
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing

2017-12-18 Thread Andrew Morton

On Tue, 19 Dec 2017 09:24:17 +0800 Joseph Qi  wrote:

> > --- 
> > a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
> > +++ a/fs/ocfs2/aops.c
> > @@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
> >   * Will look for holes and unwritten extents in the range starting at
> >   * pos for count bytes (inclusive).
> >   */
> > -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> > -  size_t count)
> > +static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t 
> > count)
> >  {
> > -   int ret = 0;
> > +   bool ret = false;
> 
> I have a different opinion here. If we change return value from int to
> bool, the error returned by ocfs2_get_clusters cannot be reflected. So
> I'd prefer the original version.

But that error code is not used?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing

2017-12-18 Thread Andrew Morton

On Mon, 18 Dec 2017 12:06:21 + Changwei Ge  wrote:

> Before ocfs2 supporting allocating clusters while doing append-dio, all append
> dio will fall back to buffer io to allocate clusters firstly. Also, when it
> steps on a file hole, it will fall back to buffer io, too. But for current
> code, writing to file hole will leverage dio to allocate clusters. This is not
> right, since whether append-io is enabled tells the capability whether ocfs2 
> can
> allocate space while doing dio.
> So introduce file hole check function back into ocfs2.
> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will 
> fall
> back to buffer IO to allocate clusters.
> 

hm, that's a bit hard to understand.  Hopefully reviewers will know
what's going on ;)

> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb,
>   return ret;
>   }
>   
> +/*
> + * Will look for holes and unwritten extents in the range starting at
> + * pos for count bytes (inclusive).
> + */
> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
> +size_t count)
> +{
> + int ret = 0;
> + unsigned int extent_flags;
> + u32 cpos, clusters, extent_len, phys_cpos;
> + struct super_block *sb = inode->i_sb;
> +
> + cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits;
> + clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
> +
> + while (clusters) {
> + ret = ocfs2_get_clusters(inode, cpos, &phys_cpos, &extent_len,
> +  &extent_flags);
> + if (ret < 0) {
> + mlog_errno(ret);
> + goto out;
> + }
> +
> + if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
> + ret = 1;
> + break;
> + }
> +
> + if (extent_len > clusters)
> + extent_len = clusters;
> +
> + clusters -= extent_len;
> + cpos += extent_len;
> + }
> +out:
> + return ret;
> +}

A few thoughts:

- a function which does "check_foo" isn't well named.  Because the
  reader cannot determine whether the return value means "foo is true"
  or "foo is false".

  So a better name for this function is ocfs2_range_has_holes(), so
  the reader immediately understand what its return value *means".

  Also a bool return value is more logical.

- Mixing "goto out" with "break" as above is a bit odd.

So...


--- 
a/fs/ocfs2/aops.c~ocfs2-fall-back-to-buffer-io-when-append-dio-is-disabled-with-file-hole-existing-fix
+++ a/fs/ocfs2/aops.c
@@ -2469,10 +2469,9 @@ static int ocfs2_dio_end_io(struct kiocb
  * Will look for holes and unwritten extents in the range starting at
  * pos for count bytes (inclusive).
  */
-static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
-  size_t count)
+static bool ocfs2_range_has_holes(struct inode *inode, loff_t pos, size_t 
count)
 {
-   int ret = 0;
+   bool ret = false;
unsigned int extent_flags;
u32 cpos, clusters, extent_len, phys_cpos;
struct super_block *sb = inode->i_sb;
@@ -2489,8 +2488,8 @@ static int ocfs2_check_range_for_holes(s
}
 
if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) {
-   ret = 1;
-   break;
+   ret = true;
+   goto out;
}
 
if (extent_len > clusters)
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: using the OCFS2_XATTR_ROOT_SIZE macro in ocfs2_reflink_xattr_header()

2017-12-12 Thread Andrew Morton

On Mon, 11 Dec 2017 14:24:08 +0800 alex chen  wrote:

> Using the OCFS2_XATTR_ROOT_SIZE macro improves the readability of the code.
> 
> Signed-off-by: Alex Chen 
> Reviewed-by: Jun Piao 
> ---
>  fs/ocfs2/xattr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
> index 5fdf269..ca3b61a 100644
> --- a/fs/ocfs2/xattr.c
> +++ b/fs/ocfs2/xattr.c
> @@ -6415,7 +6415,7 @@ static int ocfs2_reflink_xattr_header(handle_t *handle,
>* and then insert the extents one by one.
>*/
>   if (xv->xr_list.l_tree_depth) {
> - memcpy(new_xv, &def_xv, sizeof(def_xv));
> + memcpy(new_xv, &def_xv, OCFS2_XATTR_ROOT_SIZE);
>   vb->vb_xv = new_xv;
>   vb->vb_bh = value_bh;
>   ocfs2_init_xattr_value_extent_tree(&data_et,

OK.

But what's wrong with

*new_xv = def_xv;

?

That gets typechecked and the compiler may be able to perform
some optimizations...


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: add trimfs dlm lock

2017-12-07 Thread Andrew Morton

On Thu,  7 Dec 2017 21:21:58 +0800 Gang He  wrote:

> As you know, ocfs2 has support trim the underlying disk via
> fstrim command. But there is a problem, ocfs2 is a shared storage
> cluster file system, if the user configures a scheduled fstrim
> job on each file system node, this will trigger multiple nodes
> trim a shared disk simultaneously, it is very wasteful for CPU
> and IO consumption.
> Then, we introduce a trimfs dlm lock, which will make only one
> fstrim command is running on the shared disk among the cluster,
> the other fstrim command should be returned with -EBUSY errno.

Newly returning -EBUSY sounds a bit rude.  And non-backward-compatible. 
Would it be better for the other fstrim callers to wait until the
operation has completed then return success?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs patches for review

2017-11-30 Thread Andrew Morton


I'm sitting on a bunch of ocfs2 patches which need additional review,
please.  I'll send them out now.  I have a couple of notes-to-self:


Subject: ocfs2: remove ocfs2_is_o2cb_active()
Subject: ocfs2: give an obvious tip for mismatched cluster names

  Joseph had issues with this one

Subject: ocfs2: move some definitions to header file
Subject: ocfs2: fix some small problems
Subject: ocfs2: add kobject for online file check
Subject: ocfs2: add duplicated ino number check
Subject: ocfs2: fix qs_holds may could not be zero

  Needs a new changelog and a fixed comment

Subject: ocfs2/dlm: wait for dlm recovery done when migrating all lockres
Subject: ocfs2: add ocfs2_try_rw_lock() and ocfs2_try_inode_lock()
Subject: ocfs2: add ocfs2_overwrite_io()
Subject: ocfs2: nowait aio support


Thanks.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: The goto is not useful in the function ocfs2_reserve_cluster_bitmap_bits, so remove it.

2017-11-15 Thread Andrew Morton

On Thu, 16 Nov 2017 00:42:09 + Changwei Ge  wrote:

> Hi Zhonghua,
> On 2017/11/15 20:04, Guozhonghua wrote:
> > The goto is not useful anymore, removed from the context.
> 
> Perhaps we can make this change-log more clear like:
> The bail declare is not necessary any more, so trim it. If code path 
> falls into error branch, ocfs2_reserve_cluster_bitmap_bits will return 
> in next following step, too.
> 
> And this title can be changed into 'ocfs2: clean up unnecessary bail 
> declare'
> 
> I suppose after that this patch will be neater.
> 
> Can you resend this patch?
> Moreover, I think we should also CC this patch to OCFS2 maintainers.

The patch is OK.  I reworked it as below.  It's so obvious that no changelog
is needed.


From: Guozhonghua 
Subject: ocfs2: remove unneeded goto in ocfs2_reserve_cluster_bitmap_bits()

Link: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_71604351584F6A4EBAE558C676F37CA4F3CDE3A9-40H3CMLB14-2DEX.srv.huawei-2D3com.com&d=DwICAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=lcqqp5PvLSjaJ5MT6uqUC9J6gAhx8KP5Fd62d4ii4BA&s=KqlZy8JoC39DDT5b_TM6C12Vtg7AZhEE0QsYSC9ZJ30&e=
Signed-off-by: guozhonghua 
Cc: Mark Fasheh 
Cc: Joel Becker 
Cc: Junxiao Bi 
Cc: Joseph Qi 
Signed-off-by: Andrew Morton 
---

 fs/ocfs2/suballoc.c |5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff -puN 
fs/ocfs2/suballoc.c~ocfs2-the-goto-is-not-useful-in-the-function-ocfs2_reserve_cluster_bitmap_bits-so-remove-it
 fs/ocfs2/suballoc.c
--- 
a/fs/ocfs2/suballoc.c~ocfs2-the-goto-is-not-useful-in-the-function-ocfs2_reserve_cluster_bitmap_bits-so-remove-it
+++ a/fs/ocfs2/suballoc.c
@@ -1147,12 +1147,9 @@ int ocfs2_reserve_cluster_bitmap_bits(st
 GLOBAL_BITMAP_SYSTEM_INODE,
 OCFS2_INVALID_SLOT, NULL,
 ALLOC_NEW_GROUP);
-   if (status < 0 && status != -ENOSPC) {
+   if (status < 0 && status != -ENOSPC)
mlog_errno(status);
-   goto bail;
-   }
 
-bail:
return status;
 }
 
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: should wait dio before inode lock in ocfs2_setattr()

2017-11-01 Thread Andrew Morton

On Tue, 31 Oct 2017 14:20:38 +0800 alex chen  wrote:

> we should wait dio requests to finish before inode lock in
> ocfs2_setattr(), otherwise the following deadlock will be happened:
> process 1  process 2process 3
> truncate file 'A'  end_io of writing file 'A'   receiving the bast 
> messages
> ocfs2_setattr
>  ocfs2_inode_lock_tracker
>   ocfs2_inode_lock_full
>  inode_dio_wait
>   __inode_dio_wait
>   -->waiting for all dio
>   requests finish
> dlm_proxy_ast_handler
>  dlm_do_local_bast
>   ocfs2_blocking_ast
>
> ocfs2_generic_handle_bast
> set 
> OCFS2_LOCK_BLOCKED flag
> dio_end_io
>  dio_bio_end_aio
>   dio_complete
>ocfs2_dio_end_io
> ocfs2_dio_end_io_write
>  ocfs2_inode_lock
>   __ocfs2_cluster_lock
>ocfs2_wait_for_mask
>-->waiting for OCFS2_LOCK_BLOCKED
>flag to be cleared, that is waiting
>for 'process 1' unlocking the inode lock
>inode_dio_end
>-->here dec the i_dio_count, but will never
>be called, so a deadlock happened.
> 

This sounds like something which should be backported into -stable
kernels.  Do you agree?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [patch] ocfs2: fix qs_holds may could not be zero

2017-10-17 Thread Andrew Morton

On Thu, 21 Sep 2017 02:09:33 + Zhangyang  wrote:

> In our test, We fond that , when the network down, qs->qs_holds could not be 
> reduce to zero, it will lead to the node can't do fence.
> 
> 
> 
> o2net_idle_timer -> o2quo_conn_err -> qs->qs_holds++, after 
> O2NET_QUORUM_DELAY_MS if qs_holds could be subtract to zero, it could do 
> make_decision.
> 
> But if there are many nodes, when one node network down which contains o2net 
> connections may not do o2net_idle_timer at the same time.
> 
> So when a o2net_node have done nn->nn_still_up, but the qs_holds is not zero. 
> because the other o2net_node have not done nn->nn_still_up.
> 
> So the first o2net_node will do o2net_idle_timer again, and the qs_holds 
> could be add again. And the qs_holds is global variable, so it formed a loop, 
> the node could not do o2quo_make_decision, because of qs_holds never be zero.
> 
> 
> 
> I alter the function o2quo_conn_err, take o2quo_set_hold under control of the 
> bit map qs_conn_bm.

I merged this, subject to review by the ocfs2 maintainers.

The changelog and the comment are really hard to understand.  Perhaps
one of the ocfs2 developers could suggest some more clear words to use?

Thanks.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] mmotm 2016-08-02-15-53 uploaded

2017-10-10 Thread Andrew Morton

On Tue, 10 Oct 2017 14:06:41 -0400 Vitaly Mayatskih  
wrote:

> * ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery
> -master-goes-down.patch
> 
> This one completely broke two node cluster use case: when one node dies,
> the other one either eventually crashes (~4.14-rc4) or locks up (pre-4.14).

Are you sure?

Are you able to confirm that reverting this patch (ee8f7fcbe638b07e8)
and only this patch fixes up current mainline kernels?

Are you able to supply more info on the crashes and lockups so that the
ocfs2 developers can understand the failures?

Thanks.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock caused by recursive locking in xattr

2017-06-22 Thread Andrew Morton

On Thu, 22 Jun 2017 14:10:38 +0800 Joseph Qi  wrote:

> Looks good.
> Reviewed-by: Joseph Qi 

Should this fix be backported into -stable kernels?

> On 17/6/22 09:47, Eric Ren wrote:
> > Another deadlock path caused by recursive locking is reported.
> > This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2:
> > take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths
> > have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when
> > taking inode lock at vfs entry points"). Yes, we intend to fix this
> > kind of case in incremental way, because it's hard to find out all
> > possible paths at once.
> > 
> > This one can be reproduced like this. On node1, cp a large file from
> > home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl.
> > Both nodes will hang up there. The backtraces:
> > 
> > On node1:
> > [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
> > [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
> > [] ocfs2_write_begin+0x43/0x1a0 [ocfs2]
> > [] generic_perform_write+0xa9/0x180
> > [] __generic_file_write_iter+0x1aa/0x1d0
> > [] ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
> > [] __vfs_write+0xc3/0x130
> > [] vfs_write+0xb1/0x1a0
> > [] SyS_write+0x46/0xa0
> > 
> > On node2:
> > [] __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
> > [] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
> > [] ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
> > [] ocfs2_set_acl+0x22d/0x260 [ocfs2]
> > [] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
> > [] set_posix_acl+0x75/0xb0
> > [] posix_acl_xattr_set+0x49/0xa0
> > [] __vfs_setxattr+0x69/0x80
> > [] __vfs_setxattr_noperm+0x72/0x1a0
> > [] vfs_setxattr+0xa7/0xb0
> > [] setxattr+0x12d/0x190
> > [] path_setxattr+0x9f/0xb0
> > [] SyS_setxattr+0x14/0x20
> > 
> > Fixes this one by using ocfs2_inode_{lock|unlock}_tracker, which is
> > exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking
> > logic to avoid recursive cluster lock").
> > 
> > Changes since v1:
> > - Revised git commit description style in commit log.
> > 
> > Reported-by: Thomas Voegtle 
> > Tested-by: Thomas Voegtle 
> > Signed-off-by: Eric Ren 
> > ---
> >  fs/ocfs2/dlmglue.c |  4 
> >  fs/ocfs2/xattr.c   | 23 +--
> >  2 files changed, 17 insertions(+), 10 deletions(-)
> > 
> > diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> > index 3b7c937..4689940 100644
> > --- a/fs/ocfs2/dlmglue.c
> > +++ b/fs/ocfs2/dlmglue.c
> > @@ -2591,6 +2591,10 @@ void ocfs2_inode_unlock_tracker(struct inode *inode,
> > struct ocfs2_lock_res *lockres;
> >  
> > lockres = &OCFS2_I(inode)->ip_inode_lockres;
> > +   /* had_lock means that the currect process already takes the cluster
> > +* lock previously. If had_lock is 1, we have nothing to do here, and
> > +* it will get unlocked where we got the lock.
> > +*/
> > if (!had_lock) {
> > ocfs2_remove_holder(lockres, oh);
> > ocfs2_inode_unlock(inode, ex);
> > diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
> > index 3c5384d..f70c377 100644
> > --- a/fs/ocfs2/xattr.c
> > +++ b/fs/ocfs2/xattr.c
> > @@ -1328,20 +1328,21 @@ static int ocfs2_xattr_get(struct inode *inode,
> >void *buffer,
> >size_t buffer_size)
> >  {
> > -   int ret;
> > +   int ret, had_lock;
> > struct buffer_head *di_bh = NULL;
> > +   struct ocfs2_lock_holder oh;
> >  
> > -   ret = ocfs2_inode_lock(inode, &di_bh, 0);
> > -   if (ret < 0) {
> > -   mlog_errno(ret);
> > -   return ret;
> > +   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 0, &oh);
> > +   if (had_lock < 0) {
> > +   mlog_errno(had_lock);
> > +   return had_lock;
> > }
> > down_read(&OCFS2_I(inode)->ip_xattr_sem);
> > ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
> >  name, buffer, buffer_size);
> > up_read(&OCFS2_I(inode)->ip_xattr_sem);
> >  
> > -   ocfs2_inode_unlock(inode, 0);
> > +   ocfs2_inode_unlock_tracker(inode, 0, &oh, had_lock);
> >  
> > brelse(di_bh);
> >  
> > @@ -3537,11 +3538,12 @@ int ocfs2_xattr_set(struct inode *inode,
> >  {
> > struct buffer_head *di_bh = NULL;
> > struct ocfs2_dinode *di;
> > -   int ret, credits, ref_meta = 0, ref_credits = 0;
> > +   int ret, credits, had_lock, ref_meta = 0, ref_credits = 0;
> > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> > struct inode *tl_inode = osb->osb_tl_inode;
> > struct ocfs2_xattr_set_ctxt ctxt = { NULL, NULL, NULL, };
> > struct ocfs2_refcount_tree *ref_tree = NULL;
> > +   struct ocfs2_lock_holder oh;
> >  
> > struct ocfs2_xattr_info xi = {
> > .xi_name_index = name_index,
> > @@ -3572,8 +3574,9 @@ int ocfs2_xattr_set(struct inode *inode,
> > return -ENOMEM;
> > }
> >  
> > -   ret = ocfs2_inode_lock(inode, &di_bh, 1);
> > -   if (ret < 0) {
> > +   had_lock = ocfs2_inode_lock_tracker(inode, &di_bh, 1, &oh);
> > +   if (had_lock < 0) {
> > +

Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix a static checker warning

2017-05-23 Thread Andrew Morton

On Tue, 23 May 2017 13:17:14 +0800 Gang He  wrote:

> This patch will fix a static code checker warning, which looks 
> like below,
> fs/ocfs2/inode.c:179 ocfs2_iget()
> warn: passing zero to 'ERR_PTR'
> 
> this warning was caused by the 
> commit d56a8f32e4c6 ("ocfs2: check/fix inode block for online file check").
> after apply this patch, the error return value will not be NULL(zero).
> 
> Signed-off-by: Gang He 
> ---
>  fs/ocfs2/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 382401d..1a1e007 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -136,7 +136,7 @@ struct inode *ocfs2_ilookup(struct super_block *sb, u64 
> blkno)
>  struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>int sysfile_type)
>  {
> - int rc = 0;
> + int rc = -ESTALE;
>   struct inode *inode = NULL;
>   struct super_block *sb = osb->sb;
>   struct ocfs2_find_inode_args args;

hm, OK, thanks.  The resulting code still looks rather weird.  Could
someone who works in this area please take a look?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix a static checker warning

2017-05-22 Thread Andrew Morton

On Mon, 22 May 2017 12:54:37 +0800 Gang He  wrote:

> This patch will fix a static checker warning, this warning was
> caused by commit d56a8f32e4c662509ce50a37e78fa66c777977d3. after
> apply this patch, the error return value will not be NULL(zero).

When identifying another commit, please use the form

d56a8f32e4c6 ("ocfs2: check/fix inode block for online file check")


And when fixing warnings such as this, please quote the warning in full
and identify which checker tool generated that warning.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: o2hb: revert hb threshold to keep compatible

2017-03-28 Thread Andrew Morton

On Wed, 29 Mar 2017 09:07:08 +0800 Junxiao Bi  wrote:

> On 03/29/2017 06:31 AM, Andrew Morton wrote:
> > On Tue, 28 Mar 2017 09:40:45 +0800 Junxiao Bi  wrote:
> > 
> >> Configfs is the interface for ocfs2-tools to set configure to
> >> kernel. Change heartbeat dead threshold name in configfs will
> >> cause compatible issue, so revert it.
> >>
> >> Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store 
> >> methods")
> > 
> > I don't get it.  45b997737a80 was merged nearly two years ago, so isn't
> > it a bit late to fix compatibility issues?
> > 
> This compatibility will not cause ocfs2 down, just some configure (hb
> dead threshold) lose effect. If someone want to use the new kernel, they
> should apply this fix.

Well could someone please send a better changelog?  One which carefully
describes the present behaviour, what is wrong with it and how the
patch fixes it?

One reason for doing this is to permit effecitive patch review.

Another reason is to permit others to decide whether the patch should
be backported into -stable kernels.

Yet another reason is so that maintainers of other kernels can
determine whether this patch will fix behaviour which their users are
reporting.

Thanks.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: o2hb: revert hb threshold to keep compatible

2017-03-28 Thread Andrew Morton

On Tue, 28 Mar 2017 09:40:45 +0800 Junxiao Bi  wrote:

> Configfs is the interface for ocfs2-tools to set configure to
> kernel. Change heartbeat dead threshold name in configfs will
> cause compatible issue, so revert it.
> 
> Fixes: 45b997737a80 ("ocfs2/cluster: use per-attribute show and store 
> methods")

I don't get it.  45b997737a80 was merged nearly two years ago, so isn't
it a bit late to fix compatibility issues?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/journal: fix umount hang after flushing journal failure

2017-01-09 Thread Andrew Morton

On Sat, 7 Jan 2017 12:01:06 + Gechangwei  wrote:

> Hi,
> 
> When journal flushing in ocfs2_commit_cache() fails, following umount 
> procedure at stage of shutting journal may be blocked due to non-zero 
> transactions number.
> 
> Once jbd2_journal_flush() fails, the journal will be marked as ABORT state 
> inner JBD2. There is no way to come back afterwards.
> 
> So there is no chance to set transactions number to zero, thus, shutting 
> journal may be blocked .
> 
> 
> ocfs2_commit_thread()
> 
> ocfs2_commit_cache()
> 
> jbd2_journal_flush() -> failure takes journal into ABORT state, thus, 
> transaction number will never set to zero
> 
> 
> The back trace is cited blow:
> 
> [] kthread_stop+0x4c/0x150
> [] ocfs2_journal_shutdown+0xa7/0x400 [ocfs2]
> [] ocfs2_dismount_volume+0xbe/0x4a0 [ocfs2]
> [] ocfs2_put_super+0x37/0xb0 [ocfs2]
> [] generic_shutdown_super+0x7e/0x110
> [] kill_block_super+0x30/0x80
> [] deactivate_locked_super+0x59/0x90
> [] deactivate_super+0x4e/0x70
> [] cleanup_mnt+0x43/0x90
> [] __cleanup_mnt+0x12/0x20
> [] task_work_run+0xb7/0xf0
> [] do_notify_resume+0x8c/0xa0
> [] int_signal+0x12/0x17
> [] 0x
> 
> Through crash debug tool, It can seen that j_num_trans field is set to 2.
> 
> struct ocfs2_journal {
>   j_state = OCFS2_JOURNAL_IN_SHUTDOWN,
>   j_journal = 0x8800b4926000,
>   j_inode = 0x880122aba298,
>   j_osb = 0x880093caa000,
>   j_bh = 0x8800a09df888,
>   j_num_trans = {
> counter = 0x2
>   },
>   j_lock = {
> {
>   rlock = {
> raw_lock = {
>   {
> 
> 
> To solve this issue, I propose a patch. Any comments will be welcomed.
> 
> Since journal has been marked as ABORT and flushing journal failure will free 
> all corresponding buffer heads, it will be safe to directly set transactions 
> number to zero.
> 
> 
> From 98f42f5f52851ed84eb372a3e09a413a30ea2664 Mon Sep 17 00:00:00 2001
> From: gechangwei 
> Date: Sat, 7 Jan 2017 19:48:13 +0800
> Subject: [PATCH] fix umount hang after flushing journal failure
> 
> Signed-off-by: gechangwei 
> ---
>  fs/ocfs2/journal.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
> index a244f14..dab2094 100644
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -326,6 +326,7 @@ static int ocfs2_commit_cache(struct ocfs2_super *osb)
>  status = jbd2_journal_flush(journal->j_journal);
>  jbd2_journal_unlock_updates(journal->j_journal);
>  if (status < 0) {
> +atomic_set(&journal->j_num_trans, 0);
>  up_write(&journal->j_trans_barrier);
>  mlog_errno(status);
>  goto finally;

This seems a little hacky, but I'll let the ocfs2 developers decide.

I do think there should be a code comment here, telling readers why
this operation is being performed.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix crash caused by stale lvb with fsdlm plugin

2016-12-20 Thread Andrew Morton

On Mon, 19 Dec 2016 17:55:06 +0800 Joseph Qi  wrote:

> > I'd like to see this quick and small fix to be merged at this moment, 
> > because this issue is little emergency for us.
> > Anyway, we can supersede this one easily if someone familiar with o2cb 
> > works out a patch for o2cb in the future.
> >
> > Does this sounds good to you?
> 
> Fine.
> Reviewed-by: Joseph Qi 

I'll add cc:stable to this one so the fix gets backported into older
kernels.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] MLE releases issue.

2016-11-18 Thread Andrew Morton

On Fri, 28 Oct 2016 07:14:20 + Gechangwei  wrote:

> Hi,
> During my test on OCFS2 suffering a storage failure, a crash issue was found.
> Below was the call trace when crashed.
> >From the call trace, we can see a MLE's reference count is going to be 
> >negative, which aroused a BUG_ON()
> 
> [143355.593258] Call Trace:
> [143355.593268]  [] dlm_put_mle_inuse+0x47/0x70 [ocfs2_dlm]
> [143355.593276]  [] dlm_get_lock_resource+0xac5/0x10d0 
> [ocfs2_dlm]
> [143355.593286]  [] ? ip_queue_xmit+0x14a/0x3d0
> [143355.593292]  [] ? kmem_cache_alloc+0x1e4/0x220
> [143355.593300]  [] ? dlm_wait_for_recovery+0x6c/0x190 
> [ocfs2_dlm]
> [143355.593311]  [] dlmlock+0x62d/0x16e0 [ocfs2_dlm]
> [143355.593316]  [] ? __alloc_skb+0x9b/0x2b0
> [143355.593323]  [] ? 0xc01f6000
> 
> 
> I think I probably have found the root cause of this issue. Please
> 
> **Node 1**  **Node 2**
> Storage 
> failure
> An assert master 
> message is sent to Node 1
> Treat Node2 as down
> Assert master handler
> Decrease MLE reference count
> Clean blocked MLE
> Decrease MLE reference count
> 
> 
> In the above scenario, both dlm_assert_master_handler and dlm_clean_block_mle 
> will decease MLE
> reference count, thus, in the following get_resouce procedure, the reference 
> count is going to be negative.
> 
> I propose a patch to solve this, please take review if you have any time.
> 

I don't think I've seen any discussion of this patch?  I'll queue it up
for testing in the meanwhile.


> ---
>  dlm/dlmmaster.c | 8 +++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/dlm/dlmmaster.c b/dlm/dlmmaster.c
> index b747854..0540414 100644
> --- a/dlm/dlmmaster.c
> +++ b/dlm/dlmmaster.c
> @@ -2020,7 +2020,7 @@ ok:
> 
> spin_lock(&mle->spinlock);
> if (mle->type == DLM_MLE_BLOCK || mle->type == 
> DLM_MLE_MIGRATION)
> -   extra_ref = 1;
> +   extra_ref = test_bit(assert->node_idx, 
> mle->maybe_map) ? 1 : 0;
> else {
> /* MASTER mle: if any bits set in the response map
>  * then the calling node needs to re-assert to clear
> @@ -3465,12 +3465,18 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm,
> mlog(0, "mle found, but dead node %u would not have been "
>  "master\n", dead_node);
> spin_unlock(&mle->spinlock);
> +   } else if(mle->master != O2NM_MAX_NODES){
> +   mlog(ML_NOTICE, "mle found, master assert received, master 
> has "
> +"already set to %d.\n ", mle->master);
> +   spin_unlock(&mle->spinlock);
> } else {
> /* Must drop the refcount by one since the assert_master will
>  * never arrive. This may result in the mle being unlinked and
>  * freed, but there may still be a process waiting in the
>  * dlmlock path which is fine. */
> mlog(0, "node %u was expected master\n", dead_node);
> +   clear_bit(bit, mle->maybe_map);
> atomic_set(&mle->woken, 1);
> spin_unlock(&mle->spinlock);
> wake_up(&mle->wq);

There are quite a lot of issues here.

- The patch headers should be in `patch -p1' form.  So with
  "a/fs/ocfs2/dlm/dlmmaster.c", not "a/dlm/dlmmaster.c".

- Your email client makes a big mess: strange character encoding,
  tabs replaced with spaces, etc.  Please figure out how to send
  text/plain emails.  Mail a patch to yourself, check that it can still
  be applied.

- There are a few conding style issues.  Fixes:

--- a/fs/ocfs2/dlm/dlmmaster.c~mle-releases-issue-fix
+++ a/fs/ocfs2/dlm/dlmmaster.c
@@ -1935,7 +1935,7 @@ ok:
 
spin_lock(&mle->spinlock);
if (mle->type == DLM_MLE_BLOCK || mle->type == 
DLM_MLE_MIGRATION)
-   extra_ref = test_bit(assert->node_idx, mle->maybe_map) 
? 1 : 0;
+   extra_ref = test_bit(assert->node_idx, mle->maybe_map);
else {
/* MASTER mle: if any bits set in the response map
 * then the calling node needs to re-assert to clear
@@ -3338,7 +3338,7 @@ static void dlm_clean_block_mle(struct d
mlog(0, "mle found, but dead node %u would not have been "
 "master\n", dead_node);
spin_unlock(&mle->spinlock);
-   } else if(mle->master != O2NM_MAX_NODES){
+   } else if (mle->master != O2NM_MAX_NODES) {
mlog(ML_NOTICE, "mle found, master assert received, master has "
"already set to %d.\n ", mle->master);
spin_unlock(&mle->spinlock);
_


___
Ocfs2-devel m

Re: [Ocfs2-devel] [PATCH] ocfs2: Fix start offset to ocfs2_zero_range_for_truncate()

2016-09-14 Thread Andrew Morton

On Thu, 11 Aug 2016 16:12:27 -0700 Ashish Samant  
wrote:

> If we do fallocate with punch hole option on a reflink, with start offset
> on a cluster boundary and end offset somewhere in another cluster, we
> dont COW the first cluster starting at the start offset. But in this
> case, we were wrongly passing this cluster to
> ocfs2_zero_range_for_truncate() to zero out.
> 
> Fix this by skipping this cluster in such a scenario.

How serious is this bug?  It sounds like a data-corrupting error?  As
such, this is a high priority fix and it should be backported into the
-stable kernels?


Please always include such info when fixing bugs.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: free the mle while the res had one, to avoid mle memory leak.

2016-09-13 Thread Andrew Morton

On Tue, 13 Sep 2016 07:52:30 + Guozhonghua  wrote:

> In the function dlm_migrate_request_handler, while the ret is --EEXIST, the 
> mle should be freed, otherwise the memory will be leaked.
> 
> Signed-off-by: Guozhonghua 
> 
> --- ocfs2.orig/dlm/dlmmaster.c  2016-09-13 15:18:13.602684325 +0800
> +++ ocfs2/dlm/dlmmaster.c   2016-09-13 15:27:05.014675736 +0800
> @@ -3188,6 +3188,9 @@ int dlm_migrate_request_handler(struct o
> migrate->new_master,
> migrate->master);
> 
> +   if (ret < 0)
> +   kmem_cache_free(dlm_mle_cache, mle);
> +
> spin_unlock(&dlm->master_lock);
>  unlock:
> spin_unlock(&dlm->spinlock);

Looks OK to me.

I wonder if there's another bug here.  If dlm_add_migration_mle()
failed, is it correct to go ahead and detach `oldmle'?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs patches to review

2016-07-28 Thread Andrew Morton

It's that time again.  Just five patches this time, please.

ocfs2-insure-dlm-lockspace-is-created-by-kernel-module.patch
ocfs2-retry-on-enospc-if-sufficient-space-in-truncate-log.patch
ocfs2-dlm-disable-bug_on-when-dlm_lock_res_dropping_ref-is-cleared-before-dlm_deref_lockres_done_handler.patch
ocfs2-dlm-solve-a-bug-when-deref-failed-in-dlm_drop_lockres_ref.patch
ocfs2-dlm-continue-to-purge-recovery-lockres-when-recovery-master-goes-down.patch


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2/dlm: continue to purge recovery lockres when recovery master goes down

2016-07-11 Thread Andrew Morton

On Sun, 10 Jul 2016 18:04:33 +0800 piaojun  wrote:

> We found a dlm-blocked situation caused by continuous breakdown of
> recovery masters described below. To solve this problem, we should purge
> recovery lock once detecting recovery master goes down.

I'm not sure which kernel version this patch is against, but I get
significant rejects when merging into 4.7-rc7.  Can you please redo it
against the latest Linus tree?

Thanks.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2/dlm: continue to purge recovery lockres when recovery master goes down

2016-07-11 Thread Andrew Morton

On Mon, 11 Jul 2016 13:34:30 -0700 Andrew Morton  
wrote:

> On Sun, 10 Jul 2016 18:04:33 +0800 piaojun  wrote:
> 
> > We found a dlm-blocked situation caused by continuous breakdown of
> > recovery masters described below. To solve this problem, we should purge
> > recovery lock once detecting recovery master goes down.
> 
> I'm not sure which kernel version this patch is against, but I get
> significant rejects when merging into 4.7-rc7.  Can you please redo it
> against the latest Linus tree?

Oh, OK, it's probably due to changes in your earlier patches.  It helps
if you use sequence numbering:

[patch 1/3] ocfs2/dlm: disable BUG_ON when DLM_LOCK_RES_DROPPING_REF, is 
cleared before dlm_deref_lockres_done_handler
[patch 2/3] ocfs2/dlm: solve a BUG when deref failed in dlm_drop_lockres_ref
etc

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: improve recovery performance

2016-07-08 Thread Andrew Morton

On Thu,  7 Jul 2016 10:24:48 +0800 Junxiao Bi  wrote:

> Journal replay will be run when do recovery for a dead node,
> to avoid the stale cache impact, all blocks of dead node's
> journal inode were reload from disk. This hurts the performance,
> check whether one block is cached before reload it can improve
> a lot performance. In my test env, the time doing recovery was
> improved from 120s to 1s.

So since v1 you did this (unchangelogged bugfix!):

--- a/fs/ocfs2/journal.c~ocfs2-improve-recovery-performance-v2
+++ a/fs/ocfs2/journal.c
@@ -1194,6 +1194,7 @@ static int ocfs2_force_read_journal(stru
 
brelse(bh);
bh = NULL;
+   p_blkno++;
}
 
v_blkno += p_blocks;


I suppose this is a bit neater?

--- a/fs/ocfs2/journal.c~ocfs2-improve-recovery-performance-v2-fix
+++ a/fs/ocfs2/journal.c
@@ -1172,14 +1172,12 @@ static int ocfs2_force_read_journal(stru
goto bail;
}
 
-   for (i = 0; i < p_blocks; i++) {
+   for (i = 0; i < p_blocks; i++, p_blkno++) {
bh = __find_get_block(osb->sb->s_bdev, p_blkno,
osb->sb->s_blocksize);
/* block not cached. */
-   if (!bh) {
-   p_blkno++;
+   if (!bh)
continue;
-   }
 
brelse(bh);
bh = NULL;
@@ -1194,7 +1192,6 @@ static int ocfs2_force_read_journal(stru
 
brelse(bh);
bh = NULL;
-   p_blkno++;
}
 
v_blkno += p_blocks;
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: improve recovery performance

2016-06-23 Thread Andrew Morton

On Thu, 23 Jun 2016 09:17:53 +0800 Junxiao Bi  wrote:

> Hi Andrew,
> 
> Did you miss this patch to your tree?

I would have seen it eventually.  Explicitly cc'ing me on patches
helps, please.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: bump up o2cb network protocol version

2016-05-26 Thread Andrew Morton

On Thu, 26 May 2016 16:10:00 -0700 Mark Fasheh  wrote:

> On Thu, May 26, 2016 at 11:00:15AM +0800, Junxiao Bi wrote:
> > Two new messages are added to support negotiating hb timeout. Stopping
> > nodes talking old version to mount as they will cause the negotiation
> > fail.
> > 
> > Signed-off-by: Junxiao Bi 
> 
> Reviewed-by: Mark Fasheh 
> 
> Btw, you might want to send the whole series again as a whole so Andrew
> knows where this is coming from and that it should be bundled with the
> protocol changes themselves.
> 

It's these:

ocfs2-o2hb-add-negotiate-timer.patch
ocfs2-o2hb-add-nego_timeout-message.patch
ocfs2-o2hb-add-negotiate_approve-message.patch
ocfs2-o2hb-add-some-user-debug-log.patch
ocfs2-o2hb-dont-negotiate-if-last-hb-fail.patch
ocfs2-o2hb-fix-hb-hung-time.patch

from http://ozlabs.org/~akpm/mmotm/broken-out/, isn't it?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: o2hb: not fence self if storage down

2016-05-23 Thread Andrew Morton

On Wed,  2 Mar 2016 15:56:06 +0800 Junxiao Bi  wrote:

> 
> Hi Mark,
> 
> This serial of patches is to fix the issue that when storage down,
> all nodes will fence self due to write timeout.
> With this patch set, all nodes will keep going until storage back
> online, except if the following issue happens, then all nodes will
> do as before to fence self.
> 1. io error got
> 2. network between nodes down
> 3. nodes panic

Guys, can we please do a quick triple-check on this series?  I'd like
to unload them this week.  Thanks.

I'll send out the current version in a few secs.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: fix recursive locking deadlock

2016-05-09 Thread Andrew Morton

On Tue, 10 May 2016 14:10:43 +0800 Junxiao Bi  wrote:

> On 05/10/2016 01:58 PM, Andrew Morton wrote:
> > On Tue, 10 May 2016 12:53:41 +0800 Junxiao Bi  wrote:
> > 
> >> These two patches is to fix recursive locking deadlock issues. As we
> >> talked with Mark before, recursive locking is not allowed in ocfs2,
> >> so these two patches fixes the deadlock issue with reverting back
> >> patches to avoid recursive locking. Please review.
> > 
> > 2-3 years old so I guess it isn't a huge issue.  Did you consider
> > backporting into -stable kernels?
> > 
> We'd better do that. I reproduced every one in my test env.

OK, I added the cc:stable tags to the changelogs.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2: fix recursive locking deadlock

2016-05-09 Thread Andrew Morton

On Tue, 10 May 2016 12:53:41 +0800 Junxiao Bi  wrote:

> These two patches is to fix recursive locking deadlock issues. As we
> talked with Mark before, recursive locking is not allowed in ocfs2,
> so these two patches fixes the deadlock issue with reverting back
> patches to avoid recursive locking. Please review.

2-3 years old so I guess it isn't a huge issue.  Did you consider
backporting into -stable kernels?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec

2016-03-25 Thread Andrew Morton

On Thu, 17 Mar 2016 17:11:27 +0800 piaojun  wrote:

> Clean up an unused variable 'wants_rotate' in ocfs2_truncate_rec.
> 
> Signed-off-by: Jun Piao 
> ---
>  alloc.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/alloc.c b/alloc.c
> index d002579..2b8b721 100644
> --- a/alloc.c
> +++ b/alloc.c

This should be

--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c

please.  `patch -p1' form.



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs2 patch queue

2016-03-23 Thread Andrew Morton


As usual I'm a bit late sending this out, sorry.

Here's what I have:

  ocfs2-add-ocfs2_write_type_t-type-to-identify-the-caller-of-write.patch
  ocfs2-use-c_new-to-indicate-newly-allocated-extents.patch
  ocfs2-test-target-page-before-change-it.patch
  ocfs2-do-not-change-i_size-in-write_end-for-direct-io.patch
  ocfs2-return-the-physical-address-in-ocfs2_write_cluster.patch
  ocfs2-record-unwritten-extents-when-populate-write-desc.patch
  #ocfs2-fix-sparse-file-data-ordering-issue-in-direct-io.patch: Joseph Qi has 
questions
  ocfs2-fix-sparse-file-data-ordering-issue-in-direct-io.patch
  ocfs2-code-clean-up-for-direct-io.patch
  #
  ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue.patch
  ocfs2-take-ip_alloc_sem-in-ocfs2_dio_get_block-ocfs2_dio_end_io_write.patch
  ocfs2-fix-disk-file-size-and-memory-file-size-mismatch.patch
  ocfs2-fix-a-deadlock-issue-in-ocfs2_dio_end_io_write.patch
  #
  #ocfs2-dlm-fix-race-between-convert-and-recovery.patch: review discussion
  ocfs2-dlm-fix-race-between-convert-and-recovery.patch
  ocfs2-dlm-fix-bug-in-dlm_move_lockres_to_recovery_list.patch
  
All the above have been reviewed by Junxiao Bi and I think I'll send
them in to Linus this week.

  
ocfs2-extend-transaction-for-ocfs2_remove_rightmost_path-and-ocfs2_update_edge_lengths-before-to-avoid-inconsistency-between-inode-and-et.patch
  
extend-enough-credits-for-freeing-one-truncate-record-while-replaying-truncate-records.patch

The first patch was commented on by Mark, no conclusion.  The second
has his reviewed-by.

  ocfs2-avoid-occurring-deadlock-by-changing-ocfs2_wq-from-global-to-local.patch

Reviewed by Joseph - I'll send it to Linus this week.

  ocfs2-solve-a-problem-of-crossing-the-boundary-in-updating-backups.patch

Reviewed by Joseph - I'll send it to Linus this week.

  ocfs2-o2hb-add-negotiate-timer.patch
  ocfs2-o2hb-add-nego_timeout-message.patch
  ocfs2-o2hb-add-negotiate_approve-message.patch
  ocfs2-o2hb-add-some-user-debug-log.patch
  ocfs2-o2hb-dont-negotiate-if-last-hb-fail.patch
  ocfs2-o2hb-fix-hb-hung-time.patch

Reviewed by Ryan.  Are we all OK with this patchset?

  
ocfs2-dlm-move-lock-to-the-tail-of-grant-queue-while-doing-in-place-convert.patch

It's a bugfix but I have no record of it being reviewed.


I'll send all these patches out for people to take another look at.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v4 2/5] ocfs2: sysfile interfaces for online file check

2016-03-21 Thread Andrew Morton

On Mon, 21 Mar 2016 16:38:02 -0700 Mark Fasheh  wrote:

> On Mon, Mar 21, 2016 at 04:05:47PM -0700, Andrew Morton wrote:
> > On Mon, 21 Mar 2016 15:57:23 -0700 Mark Fasheh  wrote:
> > 
> > > On Mon, Feb 29, 2016 at 01:17:59PM +0800, Gang He wrote:
> > > > Implement online file check sysfile interfaces, e.g.
> > > > how to create the related sysfile according to device name,
> > > > how to display/handle file check request from the sysfile.
> > > > 
> > > > Signed-off-by: Gang He 
> > > Reviewed-by: Mark Fasheh 
> > 
> > Thanks.  So all of
> > 
> > ocfs2-export-ocfs2_kset-for-online-file-check.patch
> > ocfs2-sysfile-interfaces-for-online-file-check.patch
> > ocfs2-sysfile-interfaces-for-online-file-check-v4.patch
> > ocfs2-create-remove-sysfile-for-online-file-check.patch
> > ocfs2-check-fix-inode-block-for-online-file-check.patch
> > ocfs2-check-fix-inode-block-for-online-file-check-v4.patch
> > ocfs2-add-feature-document-for-online-file-check.patch
> > 
> > have your reviewed-by.  I'll send them on to Linus.
> 
> I'm curious, are the '-V4' patches built on top of the non '-V4' versions?

yes, the -v4 patches bring the base patch from
whatever-version-i-originally-merged up to v4.

> Otherwise though this generally sounds good to me - I have reviewed the
> entire online fsck patch series.

Cool.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v4 2/5] ocfs2: sysfile interfaces for online file check

2016-03-21 Thread Andrew Morton

On Mon, 21 Mar 2016 15:57:23 -0700 Mark Fasheh  wrote:

> On Mon, Feb 29, 2016 at 01:17:59PM +0800, Gang He wrote:
> > Implement online file check sysfile interfaces, e.g.
> > how to create the related sysfile according to device name,
> > how to display/handle file check request from the sysfile.
> > 
> > Signed-off-by: Gang He 
> Reviewed-by: Mark Fasheh 

Thanks.  So all of

ocfs2-export-ocfs2_kset-for-online-file-check.patch
ocfs2-sysfile-interfaces-for-online-file-check.patch
ocfs2-sysfile-interfaces-for-online-file-check-v4.patch
ocfs2-create-remove-sysfile-for-online-file-check.patch
ocfs2-check-fix-inode-block-for-online-file-check.patch
ocfs2-check-fix-inode-block-for-online-file-check-v4.patch
ocfs2-add-feature-document-for-online-file-check.patch

have your reviewed-by.  I'll send them on to Linus.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/6] ocfs2: o2hb: add NEGO_TIMEOUT message

2016-01-21 Thread Andrew Morton

On Fri, 22 Jan 2016 13:12:26 +0800 Junxiao Bi  wrote:

> On 01/22/2016 07:47 AM, Andrew Morton wrote:
> > On Wed, 20 Jan 2016 11:13:35 +0800 Junxiao Bi  wrote:
> > 
> >> This message is sent to master node when non-master nodes's
> >> negotiate timer expired. Master node records these nodes in
> >> a bitmap which is used to do write timeout timer re-queue
> >> decision.
> >>
> >> ...
> >>
> >> +static int o2hb_nego_timeout_handler(struct o2net_msg *msg, u32 len, void 
> >> *data,
> >> +  void **ret_data)
> >> +{
> >> +  struct o2hb_region *reg = (struct o2hb_region *)data;
> > 
> > It's best not to typecast a void*.  It's unneeded clutter and the cast
> > can actually hide bugs - if someone changes `data' to a different type
> > or if there's a different "data" in scope, etc.
> There are many kinds of messages in ocfs2 and each one needs a different
> type of "data", so it is made type void*.

What I mean is to do this:

struct o2hb_region *reg = data;

and not

struct o2hb_region *reg = (struct o2hb_region *)data;

Because the typecast is unneeded and is actually harmful.  Imagine if someone
goofed and had `int data;': no warning, runtime failure.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/6] ocfs2: o2hb: add NEGO_TIMEOUT message

2016-01-21 Thread Andrew Morton

On Wed, 20 Jan 2016 11:13:35 +0800 Junxiao Bi  wrote:

> This message is sent to master node when non-master nodes's
> negotiate timer expired. Master node records these nodes in
> a bitmap which is used to do write timeout timer re-queue
> decision.
> 
> ...
>
> +static int o2hb_nego_timeout_handler(struct o2net_msg *msg, u32 len, void 
> *data,
> + void **ret_data)
> +{
> + struct o2hb_region *reg = (struct o2hb_region *)data;

It's best not to typecast a void*.  It's unneeded clutter and the cast
can actually hide bugs - if someone changes `data' to a different type
or if there's a different "data" in scope, etc.

> + struct o2hb_nego_msg *nego_msg;
>  
> + nego_msg = (struct o2hb_nego_msg *)msg->buf;
> + if (nego_msg->node_num < O2NM_MAX_NODES)
> + set_bit(nego_msg->node_num, reg->hr_nego_node_bitmap);
> + else
> + mlog(ML_ERROR, "got nego timeout message from bad node.\n");
> +
> + return 0;
>  }


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/6] ocfs2: o2hb: add negotiate timer

2016-01-21 Thread Andrew Morton

On Wed, 20 Jan 2016 11:13:34 +0800 Junxiao Bi  wrote:

> When storage down, all nodes will fence self due to write timeout.
> The negotiate timer is designed to avoid this, with it node will
> wait until storage up again.
> 
> Negotiate timer working in the following way:
> 
> 1. The timer expires before write timeout timer, its timeout is half
> of write timeout now. It is re-queued along with write timeout timer.
> If expires, it will send NEGO_TIMEOUT message to master node(node with
> lowest node number). This message does nothing but marks a bit in a
> bitmap recording which nodes are negotiating timeout on master node.
> 
> 2. If storage down, nodes will send this message to master node, then
> when master node finds its bitmap including all online nodes, it sends
> NEGO_APPROVL message to all nodes one by one, this message will re-queue
> write timeout timer and negotiate timer.
> For any node doesn't receive this message or meets some issue when
> handling this message, it will be fenced.
> If storage up at any time, o2hb_thread will run and re-queue all the
> timer, nothing will be affected by these two steps.
> 
> ...
>
> +static void o2hb_nego_timeout(struct work_struct *work)
> +{
> + struct o2hb_region *reg =
> + container_of(work, struct o2hb_region,
> +  hr_nego_timeout_work.work);

It's better to just do

struct o2hb_region *reg;

reg = container_of(work, struct o2hb_region, hr_nego_timeout_work.work);

and avoid the weird 80-column tricks.

> + unsigned long live_node_bitmap[BITS_TO_LONGS(O2NM_MAX_NODES)];

the bitmap.h interfaces might be nicer here.  Perhaps.  A little bit.

> + int master_node;
> +
> + o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap));
> + /* lowest node as master node to make negotiate decision. */
> + master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0);
> +
> + if (master_node == o2nm_this_node()) {
> + set_bit(master_node, reg->hr_nego_node_bitmap);
> + if (memcmp(reg->hr_nego_node_bitmap, live_node_bitmap,
> + sizeof(reg->hr_nego_node_bitmap))) {
> + /* check negotiate bitmap every second to do timeout
> +  * approve decision.
> +  */
> + schedule_delayed_work(®->hr_nego_timeout_work,
> + msecs_to_jiffies(1000));

One second is long enough to unmount the fs (and to run `rmmod
ocfs2'!).  Is there anything preventing the work from triggering in
these situations?

> +
> + return;
> + }
> +
> + /* approve negotiate timeout request. */
> + } else {
> + /* negotiate timeout with master node. */
> + }
> +
>  }


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: dlmglue: fix false deadlock caused by clearing UPCONVERT_FINISHING too early

2016-01-21 Thread Andrew Morton

On Thu, 21 Jan 2016 16:18:38 +0800 Junxiao Bi  wrote:

> On 01/21/2016 04:10 PM, Eric Ren wrote:
> > Hi Junxiao,
> > 
> > On Thu, Jan 21, 2016 at 03:10:20PM +0800, Junxiao Bi wrote: 
> >> Hi Eric,
> >>
> >> This patch should fix your issue.
> >> "NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock"
> > 
> > Thanks a lot for bringing up this patch! It hasn't been merged into 
> > mainline(
> > at least 4.4), right?
> Right, it is still in linux-next.

I'll be sending it to Linus today.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3 1/4] ocfs2: export ocfs2_kset for online file check

2016-01-13 Thread Andrew Morton

On Wed, 13 Jan 2016 15:02:28 -0800 Mark Fasheh  wrote:

> On Fri, Dec 25, 2015 at 03:16:16PM +0800, Gang He wrote:
> > Export ocfs2_kset object from ocfs2_stackglue kernel module,
> > then online file check code will create the related sysfiles
> > under ocfs2_kset object.
> > 
> > Signed-off-by: Gang He 
> 
> Reviewed-by: Mark Fasheh 
> 
> This was easy to review, though you might consider adding a note that we're
> exporting this because it's built in ocfs2_stackglue.ko.

I've updated the changelog to reflect that.  (copy-n-paste is still in
my skillset).

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3 0/4] Add online file check feature

2015-12-29 Thread Andrew Morton

On Tue, 29 Dec 2015 20:00:08 -0700 "Gang He"  wrote:

> > This feature should be documented, please.  That means all pseudo-file
> > locations, all inputs, all outputs, expected behaviour etc etc.  Enough
> > info so that our users can usefully and fully use this feature in the
> > minimum time.  Documentation/filesystems/ocfs2.txt is the place for that.
> I ever sent out a document which describes this feature, I will modify that 
> document sightly since sysfs interfaces have been adjusted,
> then send out to the list as a separated patch. Is it OK for you?

Sounds good, thanks.  The documentation is important because it permits
reviewers to understand the proposed interfaces, which need to be
nailed down before there's any point in looking at the implementation.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3 0/4] Add online file check feature

2015-12-29 Thread Andrew Morton

On Fri, 25 Dec 2015 15:16:15 +0800 Gang He  wrote:

> When there are errors in the ocfs2 filesystem,
> they are usually accompanied by the inode number which caused the error.
> This inode number would be the input to fixing the file.
> One of these options could be considered:
> A file in the sys filesytem which would accept inode numbers.
> This could be used to communication back what has to be fixed or is fixed.
> You could write:
> $# echo "" > /sys/fs/ocfs2/devname/filecheck/check
> or
> $# echo "" > /sys/fs/ocfs2/devname/filecheck/fix
> 
> Compare with second version, I re-design filecheck sysfs interfaces, there
> are three sysfs files(check, fix and set) under filecheck directory(see 
> above),
> sysfs will accept only one argument . Second, I adjust some code in 
> ocfs2_filecheck_repair_inode_block() function according to upstream feedback,
> we cannot just add VALID_FL flag back as a inode block fix, then we will not 
> fix this field corruption currently until having a complete solution.
> Compare with first version, I use strncasecmp instead of double strncmp
> functions. Second, update the source file contribution vendor.

This feature should be documented, please.  That means all pseudo-file
locations, all inputs, all outputs, expected behaviour etc etc.  Enough
info so that our users can usefully and fully use this feature in the
minimum time.  Documentation/filesystems/ocfs2.txt is the place for that.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v3 1/4] ocfs2: export ocfs2_kset for online file check

2015-12-29 Thread Andrew Morton

On Fri, 25 Dec 2015 15:16:16 +0800 Gang He  wrote:

> Export ocfs2_kset object from ocfs2_stackglue kernel module,
> then online file check code will create the related sysfiles
> under ocfs2_kset object.
> 
> ...
>
> --- a/fs/ocfs2/stackglue.c
> +++ b/fs/ocfs2/stackglue.c
> @@ -629,7 +629,8 @@ static struct attribute_group ocfs2_attr_group = {
>   .attrs = ocfs2_attrs,
>  };
>  
> -static struct kset *ocfs2_kset;
> +struct kset *ocfs2_kset;
> +EXPORT_SYMBOL_GPL(ocfs2_kset);

The EXPORT_SYMBOL is only needed if this symbol is to be referred to
from a different module.  That isn't the case here - everything which
refers to ocfs2_kset is linked into ocfs2.ko, correct?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

2015-12-22 Thread Andrew Morton

On Tue, 22 Dec 2015 12:14:22 -0800 Tariq Saeed  wrote:

> Hi,
> Looks like this fell into through the cracks. This is a very real bug
> encountered by Luminex Software and they tested the fix.
> Regards
> -Tariq
> 
> 
>  Forwarded Message 
> Subject:  [Ocfs2-devel] [PATCH] NFS hangs in __ocfs2_cluster_lock due to 
> race with ocfs2_unblock_lock

I added a cc:stable to this so it gets backported.



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: call ocfs2_abort when journal abort

2015-12-18 Thread Andrew Morton

On Fri, 18 Dec 2015 15:19:25 +0800 Ryan Ding  wrote:

> orabug: 22293201
> 
> journal can not recover from abort state, so we should take following action 
> to
> prevent file system from corruption:
> 
> 1. change to readonly filesystem when local mount. We can not afford further
>write, so change to RO state is reasonable.
> 
> 2. panic when cluster mount. Because we can not release lock resource in this
>state, other node will hung when it require a lock owned by this node. So
>panic and remaster is a reasonable choise.
> 
> ocfs2_abort() will do all the above work.
> 
> ...
>
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -30,7 +30,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include 
>  
> @@ -2265,7 +2264,7 @@ static int __ocfs2_wait_on_mount(struct ocfs2_super 
> *osb, int quota)
>  
>  static int ocfs2_commit_thread(void *arg)
>  {
> - int status;
> + int status = 0;
>   struct ocfs2_super *osb = arg;
>   struct ocfs2_journal *journal = osb->journal;
>  
> @@ -2279,22 +2278,18 @@ static int ocfs2_commit_thread(void *arg)
>   wait_event_interruptible(osb->checkpoint_event,
>atomic_read(&journal->j_num_trans)
>|| kthread_should_stop());
> + if (status < 0)
> + /* As we can not terminate by myself, just enter an
> +  * empty loop to wait for stop. */
> + continue;

This is a busy-wait loop, isn't it?  That's going to chew lots of CPU
and in some situations (eg, SMP=n, PREEMPT=n) it will lock up the
kernel because kjournald will never run.

>   status = ocfs2_commit_cache(osb);
> - if (status < 0) {
> - static unsigned long abort_warn_time;
> -
> - /* Warn about this once per minute */
> - if (printk_timed_ratelimit(&abort_warn_time, 60*HZ))
> - mlog(ML_ERROR, "status = %d, journal is "
> - "already aborted.\n", status);
> - /*
> -  * After ocfs2_commit_cache() fails, j_num_trans has a
> -  * non-zero value.  Sleep here to avoid a busy-wait
> -  * loop.
> -  */
> - msleep_interruptible(1000);
> - }
> + if (status < 0)
> + /* journal can not recover from abort state, there is
> +  * no need to keep commit cache. So we should either
> +  * change to readonly(local mount) or just panic
> +  * (cluster mount). */
> + ocfs2_abort(osb->sb, "Detected aborted journal");

Coding-style issues:

It would be more conventional to add braces for the comment:

if (status < 0) {
/* journal can not recover from abort state, there is
 * no need to keep commit cache. So we should either
 * change to readonly(local mount) or just panic
 * (cluster mount). */
ocfs2_abort(osb->sb, "Detected aborted journal");
}

And to lay out the comment like this:

/*
 * journal can not recover from abort state, there is
 * no need to keep commit cache. So we should either
 * change to readonly(local mount) or just panic
 * (cluster mount).
 */


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix SGID not inherited issue

2015-12-10 Thread Andrew Morton

On Wed, 9 Dec 2015 10:28:17 +0800 Junxiao Bi  wrote:

> Hi Andrew,
> 
> On 12/09/2015 05:22 AM, Andrew Morton wrote:
> > On Mon,  7 Dec 2015 12:09:06 +0800 Junxiao Bi  wrote:
> > 
> >> commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an issue,
> >> SGID of sub dir was not inherited from its parents dir. It is because SGID
> >> is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten
> >> by "mode" which don't have SGID set later.
> >>
> >> Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue")
> >> Signed-off-by: Junxiao Bi 
> >> Cc: 
> > 
> > 8f1eb48758aa is only in 4.4-rcX so I removed the cc:stable.
> If 8f1eb48758aa is merged into stable, but this patch not, there will be
> a regression issue in stable branch?

Ah, yes, right you are, thanks.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix SGID not inherited issue

2015-12-08 Thread Andrew Morton

On Mon,  7 Dec 2015 12:09:06 +0800 Junxiao Bi  wrote:

> commit 8f1eb48758aa ("ocfs2: fix umask ignored issue") introduced an issue,
> SGID of sub dir was not inherited from its parents dir. It is because SGID
> is set into "inode->i_mode" in ocfs2_get_init_inode(), but is overwritten
> by "mode" which don't have SGID set later.
> 
> Fixes: 8f1eb48758aa ("ocfs2: fix umask ignored issue")
> Signed-off-by: Junxiao Bi 
> Cc: 

8f1eb48758aa is only in 4.4-rcX so I removed the cc:stable.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix BUG due to uncleaned localalloc during mount

2015-12-01 Thread Andrew Morton

On Tue, 1 Dec 2015 16:02:55 +0800 Junxiao Bi  wrote:

> On 11/24/2015 09:38 PM, Joseph Qi wrote:
> > Tariq has reported a BUG before and posted a fix at:
> > https://oss.oracle.com/pipermail/ocfs2-devel/2015-April/010696.html
> > 
> > This is because during umount, localalloc shutdown relies on journal
> > shutdown. But during journal shutdown, it just stops commit thread
> > without checking its result. So it may happen that localalloc shutdown
> > uncleaned during I/O error and after that, journal then has been marked
> > clean if I/O restores.
> The above is a storage issue. In this condition, io error can even
> happen to journal commit, some transactions may have wrong data. Let fs
> go without a fsck may cause corruption.
> I am thinking whether we can fail the mount and mark the journal dirty
> again. Then we can do fsck to it withoug a fsck patch.

hm, was that an ack, a nack or a quack?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/4] ocfs2: fix ip_unaligned_aio deadlock with dio work queue

2015-11-23 Thread Andrew Morton

On Fri, 20 Nov 2015 16:23:16 +0800 Ryan Ding  wrote:

> In the current implementation of unaligned aio+dio, lock order behave as 
> follow:
> 
> in user process context:
>   -> call io_submit()
> -> get i_mutex
>   <== window1
>   -> get ip_unaligned_aio
> -> submit direct io to block device
> -> release i_mutex
>   -> io_submit() return
> 
> in dio work queue context(the work queue is created in __blockdev_direct_IO):
>   -> release ip_unaligned_aio
>   <== window2
> -> get i_mutex
>   -> clear unwritten flag & change i_size
> -> release i_mutex
> 
> There is a limitation to the thread number of dio work queue. 256 at default.
> If all 256 thread are in the above 'window2' stage, and there is a user 
> process
> in the 'window1' stage, the system will became deadlock. Since the user 
> process
> hold i_mutex to wait ip_unaligned_aio lock, while there is a direct bio hold
> ip_unaligned_aio mutex who is waiting for a dio work queue thread to be
> schedule. But all the dio work queue thread is waiting for i_mutex lock in
> 'window2'.
> 
> This case only happened in a test which send a large number(more than 256) of
> aio at one io_submit() call.
> 
> My design is to remove ip_unaligned_aio lock. Change it to a sync io instead.
> Just like ip_unaligned_aio lock, serialize the unaligned aio dio.

So this patch series is a bunch of fixes against your previous patch series:

ocfs2-add-ocfs2_write_type_t-type-to-identify-the-caller-of-write.patch
ocfs2-use-c_new-to-indicate-newly-allocated-extents.patch
ocfs2-test-target-page-before-change-it.patch
ocfs2-do-not-change-i_size-in-write_end-for-direct-io.patch
ocfs2-return-the-physical-address-in-ocfs2_write_cluster.patch
ocfs2-record-unwritten-extents-when-populate-write-desc.patch
ocfs2-fix-sparse-file-data-ordering-issue-in-direct-io.patch
ocfs2-code-clean-up-for-direct-io.patch

correct?

Those patches are languishing a bit, awaiting review/ack.  I'll send
everything out for a round of review soon...


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] simplify configfs attributes V2

2015-10-05 Thread Andrew Morton

On Sat,  3 Oct 2015 15:32:36 +0200 Christoph Hellwig  wrote:

> This series consolidates the code to implement configfs attributes
> by providing the ->show and ->store method in common code and using
> container_of in the methods to access the containing structure.
> 
> This reduces source and binary size of configfs consumers a lot.

There's a version of this patch series (I assume V1) in linux-next via
Nicholas's tree so my hands are tied.  I trust that Nicholas will
update things.

Or maybe it was a slipup - modifying usb, ocfs2 etc from the iscsi tree
is innovative ;)


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: Fill-in the unused portion of the block with zeros by dio_zero_block()

2015-09-14 Thread Andrew Morton

On Thu, 10 Sep 2015 09:53:11 +0800 ryding  wrote:

> Hi Yiwen,
> 
> I'm working on this issue. The patch will be send out soon. And the 
> issue that do not support file hole will be fixed too. I have proved it 
> can pass all ltp-aiodio test cases, and has better performance. ;-)

Yes, your "ocfs2: fill in the unused portion of the block with zeros by
dio_zero_block()" conflicts with this patch a little.

Does your patchset need alteration to fix this bug?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: Fill-in the unused portion of the block with zeros by dio_zero_block()

2015-09-14 Thread Andrew Morton

On Wed, 9 Sep 2015 09:55:16 +0800 jiangyiwen  wrote:

> A simplified test case is (this case from Ryan):
> 1) dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct;
> 2) truncate /mnt/hello -s 2097152
> file 'hello' is not exist before test. After this command,
> file 'hello' should be all zero. But 512~4096 is some random data.
> 
> Setting bh state to new when get a new block, if so,
> direct_io_worker()->dio_zero_block() will fill-in the unused portion
> of the block with zero.
> 
> ...
>
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -581,6 +581,7 @@ static int ocfs2_direct_IO_get_blocks(struct inode 
> *inode, sector_t iblock,
>   ret = -EIO;
>   goto bail;
>   }
> + set_buffer_new(bh_result);
>   }
> 

You're working against an old kernel.  I did this:

--- 
a/fs/ocfs2/aops.c~ocfs2-fill-in-the-unused-portion-of-the-block-with-zeros-by-dio_zero_block
+++ a/fs/ocfs2/aops.c
@@ -589,6 +589,7 @@ static int ocfs2_direct_IO_get_blocks(st
ret = -EIO;
goto bail;
}
+   set_buffer_new(bh_result);
up_write(&OCFS2_I(inode)->ip_alloc_sem);
}
 

Probably we could run set_buffer_new() after the up_write(), which
would decrease lok hold times a little bit.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2_direct_IO_write misses ocfs2_is_overwrite error code

2015-09-11 Thread Andrew Morton

On Mon, 7 Sep 2015 10:11:56 +0800 "Norton.Zhu"  wrote:

> If ocfs2_is_overwrite failed, ocfs2_direct_IO_write mays till return success 
> to the caller.
> 
> ...
>
> --- a/aops.c
> +++ b/aops.c
> @@ -847,6 +847,7 @@ static ssize_t ocfs2_direct_IO_write(struct kiocb *iocb,
>   is_overwrite = ocfs2_is_overwrite(osb, inode, offset);
>   if (is_overwrite < 0) {
>   mlog_errno(is_overwrite);
> + ret = is_overwrite;
>   ocfs2_inode_unlock(inode, 1);
>   goto clean_orphan;
>   }

Looks OK.

We do `goto clean_orphan' and if (orphan), the code then proceeds to
overwrite the error code in `ret'.  This is odd, and probably wrong -
it's usually best to return the first-encountered error.



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [patch 11/28] ocfs2: extend enough credits for freeing one truncate record while replaying truncate records

2015-09-01 Thread Andrew Morton

On Tue, 1 Sep 2015 10:54:03 -0700 Mark Fasheh  wrote:

> > > Why is this particular change here?
> > >   --Mark
> > > 
> > I think Joyce wants to get enough credits at first and then call
> > ocfs2_replay_truncate_records.
> 
> Oh ok I see. That's fine then, the patch otherwise looked good to me,
> thanks.
> 
> Reviewed-by: Mark Fasheh 

Cool.  And where are we at with
http://ozlabs.org/~akpm/mmots/broken-out/ocfs2-extend-transaction-for-ocfs2_remove_rightmost_path-and-ocfs2_update_edge_lengths-before-to-avoid-inconsistency-between-inode-and-et.patch

?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [patch 05/28] ocfs2: acknowledge return value of ocfs2_error()

2015-08-31 Thread Andrew Morton

On Fri, 28 Aug 2015 16:20:43 -0700 Mark Fasheh  wrote:

> On Wed, Aug 26, 2015 at 03:11:32PM -0700, Andrew Morton wrote:
> > From: Goldwyn Rodrigues 
> > Subject: ocfs2: acknowledge return value of ocfs2_error()
> > 
> > Caveat: This may return -EROFS for a read case, which seems wrong.  This
> > is happening even without this patch series though.  Should we convert
> > EROFS to EIO?
> 
> I must be doing something wrong because in the kernels I've looked at,
> ocfs2_error() is a void function.

The preceding patch (ocfs2: add errors=continue) changed the
ocfs2_error() return type.



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] ocfs2 pending patch review

2015-08-26 Thread Andrew Morton

On Fri, 21 Aug 2015 14:39:55 -0700 Andrew Morton  
wrote:

> Guys, we're really getting behind with this and I'm sitting on a large
> number of rather old patches.
> 
> So can we please get down, review this code and make a decision one way
> or the other?
> 
> Here's the list of 24 patches I shall send out, along with a few notes:

oops, I forgot to send the patches.  Sorry for keeping you
all in suspense ;)

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: direct write will call ocfs2_rw_unlock() twice when doing aio+dio

2015-08-24 Thread Andrew Morton

On Mon, 24 Aug 2015 15:39:04 +0800 Junxiao Bi  wrote:

> On 08/24/2015 03:23 PM, Ryan Ding wrote:
> > Orabug: 21612107
> > 
> > Use wrong return value in ocfs2_file_write_iter(). This will cause
> > ocfs2_rw_unlock() be called both in write_iter & end_io, and trigger a 
> > BUG_ON.
> > 
> > This issue exist since commit 7da839c475894ea872ec909a5d2e83dddccff5be.
> Better say:
> This issue is introduced by commit 7da839c47589 ("ocfs2: use
> __generic_file_write_iter()") , or checkpatch will report a style error.
> Other looks good.
> 
> Reviewed-by: Junxiao Bi 
> 

Also we've recently adopted the convention of using the "Fixes:" tag:
Fixes: 7da839c47589 ("ocfs2: use __generic_file_write_iter()")

And as the patch fixes a regression we should add the "Cc:
" tag.

And we should the author of the bad patch for review (hi, Al).

I've made these changes and I'll get this patch into Linus this week,
for 4.2.


From: Ryan Ding 
Subject: ocfs2: direct write will call ocfs2_rw_unlock() twice when doing 
aio+dio

Orabug: 21612107

ocfs2_file_write_iter() is usng the wrong return value ('written').  This
will cause ocfs2_rw_unlock() be called both in write_iter & end_io,
triggering a BUG_ON.

This issue was introduced by commit 7da839c47589 ("ocfs2: use
__generic_file_write_iter()").

Fixes: 7da839c47589 ("ocfs2: use __generic_file_write_iter()")
Signed-off-by: Ryan Ding 
Reviewed-by: Junxiao Bi 
Cc: Al Viro 
Cc: Mark Fasheh 
Cc: Joel Becker 
Cc: 
Signed-off-by: Andrew Morton 
---

 fs/ocfs2/file.c |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff -puN 
fs/ocfs2/file.c~ocfs2-direct-write-will-call-ocfs2_rw_unlock-twice-when-doing-aiodio
 fs/ocfs2/file.c
--- 
a/fs/ocfs2/file.c~ocfs2-direct-write-will-call-ocfs2_rw_unlock-twice-when-doing-aiodio
+++ a/fs/ocfs2/file.c
@@ -2366,6 +2366,20 @@ relock:
/* buffered aio wouldn't have proper lock coverage today */
BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT));
 
+   /*
+* deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
+* function pointer which is called when o_direct io completes so that
+* it can unlock our rw lock.
+* Unfortunately there are error cases which call end_io and others
+* that don't.  so we don't have to unlock the rw_lock if either an
+* async dio is going to do it in the future or an end_io after an
+* error has already done it.
+*/
+   if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) {
+   rw_level = -1;
+   unaligned_dio = 0;
+   }
+
if (unlikely(written <= 0))
goto no_sync;
 
@@ -2390,20 +2404,6 @@ relock:
}
 
 no_sync:
-   /*
-* deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
-* function pointer which is called when o_direct io completes so that
-* it can unlock our rw lock.
-* Unfortunately there are error cases which call end_io and others
-* that don't.  so we don't have to unlock the rw_lock if either an
-* async dio is going to do it in the future or an end_io after an
-* error has already done it.
-*/
-   if ((ret == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) {
-   rw_level = -1;
-   unaligned_dio = 0;
-   }
-
if (unaligned_dio && ocfs2_iocb_is_unaligned_aio(iocb)) {
ocfs2_iocb_clear_unaligned_aio(iocb);
mutex_unlock(&OCFS2_I(inode)->ip_unaligned_aio);
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs2 pending patch review

2015-08-21 Thread Andrew Morton


Guys, we're really getting behind with this and I'm sitting on a large
number of rather old patches.

So can we please get down, review this code and make a decision one way
or the other?

Here's the list of 24 patches I shall send out, along with a few notes:


ocfs2-set-filesytem-read-only-when-ocfs2_delete_entry-failed.patch

ocfs2-trusted-xattr-missing-cap_sys_admin-check.patch
- breaks existing userspace?

ocfs2-flush-inode-data-to-disk-and-free-inode-when-i_count-becomes-zero.patch
add-errors=continue.patch
acknowledge-return-value-of-ocfs2_error.patch
clear-the-rest-of-the-buffers-on-error.patch

ocfs2-fix-a-tiny-case-that-inode-can-not-removed.patch

ocfs2-add-ip_alloc_sem-in-direct-io-to-protect-allocation-changes.patch

ocfs2-extend-transaction-for-ocfs2_remove_rightmost_path-and-ocfs2_update_edge_lengths-before-to-avoid-inconsistency-between-inode-and-et.patch
ocfs2-do-not-set-fs-read-only-if-rec-is-empty-while-committing-truncate.patch
extend-enough-credits-for-freeing-one-truncate-record-while-replaying-truncate-records.patch

ocfs2-optimize-error-handling-in-dlm_request_join.patch

resubmit-bug_onlockres-l_level-=-dlm_lock_ex-checkpointed-tripped-in-ocfs2_ci_checkpointed.patch
- Joseph Qi had a suggestion
resubmit-ocfs2_iop_set-get_acl-called-from-the-vfs-so-take-inode-lock-v2second-version.patch

ocfs2-fix-race-between-crashed-dio-and-rm.patch

ocfs2-use-64bit-variables-to-track-heartbeat-time.patch

ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch
- Zhangguanghui  raised an issue

ocfs2-avoid-access-invalid-address-when-read-o2dlm-debug-messages.patch
- Mark has issues

ocfs2-neaten-do_error-ocfs2_error-and-ocfs2_abort.patch

ocfs2-export-ocfs2_kset-for-online-file-check.patch
ocfs2-sysfile-interfaces-for-online-file-check.patch
ocfs2-create-remove-sysfile-for-online-file-check.patch
ocfs2-check-fix-inode-block-for-online-file-check.patch
ocfs2-add-feature-document-for-online-file-check.patch

Thanks.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix shift left overflow

2015-07-28 Thread Andrew Morton

On Wed, 29 Jul 2015 08:30:59 +0800 Joseph Qi  wrote:

> On 2015/7/29 4:51, Andrew Morton wrote:
> > On Mon, 27 Jul 2015 11:39:16 +0800 Joseph Qi  wrote:
> > 
> >> cluster pos is defined as u32, when calculate corresponding sector it
> >> should be converted to u64 first, otherwise it may overflow.
> > 
> > What are the runtime effects of this change?
> > 
> The issue is when using large volume, for example, 9T volume with 2T already
> used, frequently create small files with O_DIRECT and the IO is not cluster
> aligned, it may clear sectors in the wrong place.

You mean it corrupts the filesystem?

So this wants to be merged asap and backported as far as possible?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix shift left overflow

2015-07-28 Thread Andrew Morton

On Mon, 27 Jul 2015 11:39:16 +0800 Joseph Qi  wrote:

> cluster pos is defined as u32, when calculate corresponding sector it
> should be converted to u64 first, otherwise it may overflow.

What are the runtime effects of this change?

Please always include this info when fixing bugs so that others can
work out which kernel version(s) should be patched.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [patch -next] ocfs2: scheduling in atomic in ocfs2_filecheck_store()

2015-07-27 Thread Andrew Morton

On Mon, 27 Jul 2015 11:27:23 +0300 Dan Carpenter  
wrote:

> We're hold "spin_lock(&ent->fs_fcheck->fc_lock)" so the allocation has
> to be GFP_ATOMIC.
> 
> I changed the sizeof() because otherwise the line goes over the 80
> character limit and also the new way is prefered kernel style.
> 
> Fixes: e467fe5da718 ('ocfs2: sysfile interfaces for online file check')
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/fs/ocfs2/filecheck.c b/fs/ocfs2/filecheck.c
> index 3332af1..9613663 100644
> --- a/fs/ocfs2/filecheck.c
> +++ b/fs/ocfs2/filecheck.c
> @@ -544,7 +544,7 @@ static ssize_t ocfs2_filecheck_store(struct kobject *kobj,
>   BUG_ON(!ocfs2_filecheck_erase_entry(ent));
>   }
>  
> - entry = kmalloc(sizeof(struct ocfs2_filecheck_entry), GFP_NOFS);
> + entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
>   if (entry) {
>   entry->fe_ino = args.fa_ino;
>   entry->fe_type = args.fa_type;

GFP_NOFS is more reliable than GFP_ATOMIC, so we should retain that if
possible.  And it's pretty easy to do here.

Gang, please carefully review and test?


--- a/fs/ocfs2/filecheck.c~ocfs2-sysfile-interfaces-for-online-file-check-fix-2
+++ a/fs/ocfs2/filecheck.c
@@ -501,7 +501,7 @@ static ssize_t ocfs2_filecheck_store(str
const char *buf, size_t count)
 {
struct ocfs2_filecheck_args args;
-   struct ocfs2_filecheck_entry *entry = NULL;
+   struct ocfs2_filecheck_entry *entry;
struct ocfs2_filecheck_sysfs_entry *ent;
ssize_t ret = 0;
 
@@ -527,12 +527,19 @@ static ssize_t ocfs2_filecheck_store(str
return (!ret ? count : ret);
}
 
+   entry = kmalloc(sizeof(*entry), GFP_NOFS);
+   if (!entry) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
spin_lock(&ent->fs_fcheck->fc_lock);
if ((ent->fs_fcheck->fc_size >= ent->fs_fcheck->fc_max) &&
-   (ent->fs_fcheck->fc_done == 0)) {
-   mlog(ML_ERROR,
-   "Online file check queue(%u) is full\n",
-   ent->fs_fcheck->fc_max);
+   (ent->fs_fcheck->fc_done == 0)) {
+   mlog(ML_ERROR, "Online file check queue(%u) is full\n",
+   ent->fs_fcheck->fc_max);
+   kfree(entry);
+   entry = NULL;
ret = -EBUSY;
} else {
if ((ent->fs_fcheck->fc_size >= ent->fs_fcheck->fc_max) &&
@@ -544,26 +551,21 @@ static ssize_t ocfs2_filecheck_store(str
BUG_ON(!ocfs2_filecheck_erase_entry(ent));
}
 
-   entry = kmalloc(sizeof(struct ocfs2_filecheck_entry), GFP_NOFS);
-   if (entry) {
-   entry->fe_ino = args.fa_ino;
-   entry->fe_type = args.fa_type;
-   entry->fe_done = 0;
-   entry->fe_status = OCFS2_FILECHECK_ERR_INPROGRESS;
-   list_add_tail(&entry->fe_list,
-   &ent->fs_fcheck->fc_head);
-
-   ent->fs_fcheck->fc_size++;
-   ret = count;
-   } else {
-   ret = -ENOMEM;
-   }
+   entry->fe_ino = args.fa_ino;
+   entry->fe_type = args.fa_type;
+   entry->fe_done = 0;
+   entry->fe_status = OCFS2_FILECHECK_ERR_INPROGRESS;
+   list_add_tail(&entry->fe_list, &ent->fs_fcheck->fc_head);
+
+   ent->fs_fcheck->fc_size++;
+   ret = count;
}
spin_unlock(&ent->fs_fcheck->fc_lock);
 
if (entry)
ocfs2_filecheck_handle_entry(ent, entry);
 
+out:
ocfs2_filecheck_sysfs_put(ent);
return ret;
 }
_



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 0/4] ocfs2: add online file check feature

2015-07-09 Thread Andrew Morton

On Wed, 24 Jun 2015 13:24:17 +0800 Gang He  wrote:

> When there are errors in the ocfs2 filesystem,
> they are usually accompanied by the inode number which caused the error.
> This inode number would be the input to fixing the file.
> One of these options could be considered:
> A file in the sys filesytem which would accept inode numbers.
> This could be used to communication back what has to be fixed or is fixed.
> You could write:
> $# echo "CHECK " > /sys/fs/ocfs2/devname/filecheck
> or
> $# echo "FIX " > /sys/fs/ocfs2/devname/filecheck

This is a userspace feature so it should have user documentation. 
Documentation/filesystems/ocfs2.txt is presumably the appropriate
place.

And the availability of the complete user documentation will make the
review of this patchset more productive.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: do not BUG if jbd2_journal_dirty_metadata fails

2015-05-12 Thread Andrew Morton

On Tue, 12 May 2015 09:53:47 +0800 Joseph Qi  wrote:

> jbd2_journal_dirty_metadata may fail. Currently it cannot take care of
> non zero return value and just BUG in ocfs2_journal_dirty.
> This patch is aborting the handle and journal instead of BUG.
> 

This patch is internally inconsistent.

> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -775,7 +775,17 @@ void ocfs2_journal_dirty(handle_t *handle, struct 
> buffer_head *bh)

The "-775,7 +775,17" means "the 7 lines at line 775 get turned into 17
lines".  ie: 10 lines are added.

>   trace_ocfs2_journal_dirty((unsigned long long)bh->b_blocknr);
> 
>   status = jbd2_journal_dirty_metadata(handle, bh);
> - BUG_ON(status);
> + if (status) {
> + mlog_errno(status);
> + if (!is_handle_aborted(handle)) {
> + journal_t *journal = handle->h_transaction->t_journal;
> +
> + mlog(ML_ERROR, "jbd2_journal_dirty_metadata failed. "
> + "Aborting transaction and journal.");
> + handle->h_err = status;
> + jbd2_journal_abort_handle(handle);
> + jbd2_journal_abort(journal, status);
> + }
> + }
>  }

But the patch deletes 1 line and adds 12, for a net addition of 11 lines,
not 10.

So the patch doesn't apply.  Changing the "17" to "18" fixes that.


Whatever tool you used for generating this patch needs slapping.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix BUG in ocfs2_downconvert_thread_do_work

2015-05-06 Thread Andrew Morton

On Wed, 6 May 2015 19:12:54 +0800 Joseph Qi  wrote:

> BUG_ON(list_empty(&osb->blocked_lock_list)) in
> ocfs2_downconvert_thread_do_work can be triggered in the following case:
> ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
> processed, and then processes the dentry lockres.
> During the dentry put, it calls iput and then deletes rw, inode and
> open lockres from blocked list in ocfs2_mark_lockres_freeing.
> And this casues the variable processed is not the number of blocked
> lockres to be processed and then triggers the BUG.
> 
> Signed-off-by: Joseph Qi 
> Cc: 

cc:stable means I'd like to merge this into 4.1-rcx.  Can we get some
acks, please?


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH RESEND] ocfs2: use retval instead of status for checking error

2015-05-02 Thread Andrew Morton

On Sun, 3 May 2015 12:02:39 +0900 DaeSeok Youn  wrote:

> 2015-04-24 10:45 GMT+09:00 Daeseok Youn :
> > The use of 'status' in __ocfs2_add_entry() can return wrong
> > value. Some functions' return value in __ocfs2_add_entry(),
> > i.e ocfs2_journal_access_di() is saved to 'status'.
> > But 'status' is not used in 'bail' label for returning result
> > of __ocfs2_add_entry().
> >
> > So use retval instead of status.
> >
> > Reviewed-by: Joseph Qi 
> > Signed-off-by: Daeseok Youn 
> > ---
> Andrew.
> 
> How is it going this patch, please check for me.

I merged this over a week ago and it is in linux-next:
http://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-use-retval-instead-of-status-for-checking-error.patch

You were sent a commit email at the time.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH V3] ocfs2: return error while ocfs2_figure_merge_contig_type failing

2015-04-28 Thread Andrew Morton

On Sat, 25 Apr 2015 17:18:27 +0800 Xue jiufei  wrote:

> Now function ocfs2_figure_merge_contig_type() still return CONTIG_NONE
> when some error occurs which will cause unpredictable error.
> So return error while ocfs2_figure_merge_contig_type failing.
> 
> @@ -4336,8 +4336,11 @@ ocfs2_figure_merge_contig_type(struct 
> ocfs2_extent_tree *et,
>  
>   if (left_cpos != 0) {
>   left_path = ocfs2_new_path_from_path(path);
> - if (!left_path)
> + if (!left_path) {
> + status = -ENOMEM;
> + mlog_errno(status);
>   goto exit;
> + }
>  
>   status = ocfs2_find_path(et->et_ci, left_path,
>
>
> ...
>left_cpos);
> @@ -4392,8 +4395,11 @@ ocfs2_figure_merge_contig_type(struct 
> ocfs2_extent_tree *et,
>   goto free_left_path;
>  
>   right_path = ocfs2_new_path_from_path(path);
> - if (!right_path)
> + if (!right_path) {
> + status = -ENOMEM;
> + mlog_errno(status);
>   goto free_left_path;
> + }
>  
>   status = ocfs2_find_path(et->et_ci, right_path, right_cpos);
>   if (status)
> ...
>
> + ret = ocfs2_figure_merge_contig_type(et, path, el,
> +  split_index,
> +  split_rec,
> +  &ctxt);
> + if (ret) {
> + mlog_errno(ret);
> + goto out;
> + }

This guarantees that mlog_errno() will be called twice for a single error.

But I suppose that ocfs2 does this a lot.  I guess the place to fix it
is in mlog_errno(): teach it to filter out duplicates somehow.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: fix race between purge and get lock resource

2015-04-28 Thread Andrew Morton

On Sat, 25 Apr 2015 15:05:15 +0800 Joseph Qi  wrote:

> There is a race between purge and get lock resource, which will lead to
> ast unfinished and system hung. The case is described below:
> 
> mkdir  dlm_thread
> ---
> o2cb_dlm_lock|
> -> dlmlock   |
>   -> dlm_get_lock_resource   |
> -> __dlm_lookup_lockres_full |
>   -> spin_unlock(&dlm->spinlock) |
>  | dlm_run_purge_list
>  | -> dlm_purge_lockres
>  |   -> dlm_drop_lockres_ref
>  |   -> spin_lock(&dlm->spinlock)
>  |   -> spin_lock(&res->spinlock)
>  |   -> ~DLM_LOCK_RES_DROPPING_REF
>  |   -> spin_unlock(&res->spinlock)
>  |   -> spin_unlock(&dlm->spinlock)
>   -> spin_lock(&tmpres->spinlock)|
>   DLM_LOCK_RES_DROPPING_REF cleared |
>   -> spin_unlock(&tmpres->spinlock) |
>   return the purged lockres |
> 
> So after this, once ast comes, it will ingore the ast because the
> lockres cannot be found anymore. Thus the OCFS2_LOCK_BUSY won't be
> cleared and corresponding thread hangs.
> The &dlm->spinlock was hold when checking DLM_LOCK_RES_DROPPING_REF at
> the very begining. And commit 7b791d6856 (ocfs2/dlm: Fix race during
> lockres mastery) moved it up because of the possible wait.
> So take the &dlm->spinlock and introduce a new wait function to fix the
> race.
> 
> ...
>
> --- a/fs/ocfs2/dlm/dlmthread.c
> +++ b/fs/ocfs2/dlm/dlmthread.c
> @@ -77,6 +77,29 @@ repeat:
>   __set_current_state(TASK_RUNNING);
>  }
> 
> +void __dlm_wait_on_lockres_flags_new(struct dlm_ctxt *dlm,
> + struct dlm_lock_resource *res, int flags)
> +{
> + DECLARE_WAITQUEUE(wait, current);
> +
> + assert_spin_locked(&dlm->spinlock);
> + assert_spin_locked(&res->spinlock);

Not strictly needed - lockdep will catch this.   A minor thing.

> + add_wait_queue(&res->wq, &wait);
> +repeat:
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + if (res->state & flags) {
> + spin_unlock(&res->spinlock);
> + spin_unlock(&dlm->spinlock);
> + schedule();
> + spin_lock(&dlm->spinlock);
> + spin_lock(&res->spinlock);
> + goto repeat;
> + }
> + remove_wait_queue(&res->wq, &wait);
> + __set_current_state(TASK_RUNNING);
> +}

This is pretty nasty.  Theoretically this could spin forever, if other
tasks are setting the flag in a suitably synchronized fashion.

Is there no clean approach?  A reorganization of the locking?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH V2] ocfs2: return error while ocfs2_figure_merge_contig_type failing

2015-04-24 Thread Andrew Morton

On Sat, 25 Apr 2015 10:15:31 +0800
Xue jiufei  wrote:

> Hi andrew,
> I did this against mainline linux-4.0.

That's extremely out of date ;)

> Do you mean that I should
> do patch against linux-next tree?

"current mainline" is current Linus git.  Or wait for 4.1-rc1.



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH next] ocfs2: Reduce object size of mlog uses

2015-04-23 Thread Andrew Morton

On Thu, 23 Apr 2015 16:04:18 -0700 Mark Fasheh  wrote:

> > This code needs some pretty serious rework and rethink, perhaps
> > involving a change to the emitted info.  I was hoping one of the ocfs2
> > developers would take the bait, but they're all in hiding.
> 
> If it functions the same and doesn't have a major performance change, I'm
> pretty sure it'll be fine. We sometimes ask customers to enable some of the
> debugging if they are having an issue. I would ask that it be tested
> on a live system - a local fs, no cluster or cluster config required.

Is there a simpleton's guide to testing ocfs2 on a local disk?  One
which assumes a starting point of "knows how to type".

A few paragraphs in Documentation/filesystems/ocfs2.txt would be great
- then we can point non-ocfs2 people at it when they muck with stuff.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH next] ocfs2: Reduce object size of mlog uses

2015-04-22 Thread Andrew Morton

On Fri, 17 Apr 2015 00:17:50 -0700 Joe Perches  wrote:

> Using a function for __mlog_printk instead of a macro
> reduces the object size of built-in.o more than 120KB, or
> ~10% overall (x86-64 defconfig with all ocfs2 options)
> 
> $ size fs/ocfs2/built-in.o*
>text  data bss dec hex filename
>  936255118071  134408 1188734  12237e fs/ocfs2/built-in.o.new
> 1064081118071  134408 1316560  1416d0 fs/ocfs2/built-in.o.old

It's a start.

> --- a/fs/ocfs2/cluster/masklog.c
> +++ b/fs/ocfs2/cluster/masklog.c
> @@ -64,6 +64,23 @@ static ssize_t mlog_mask_store(u64 mask, const char *buf, 
> size_t count)
>   return count;
>  }
>  
> +void __mlog_printk(const char *level, const char *func, int line,
> +const char *fmt, ...)
> +{
> + struct va_format vaf;
> + va_list args;
> +
> + va_start(args, fmt);
> +
> + vaf.fmt = fmt;
> + vaf.va = &args;
> +
> + printk("%s(%s,%u,%lu):%s:%d %pV",
> +level, current->comm, task_pid_nr(current), __mlog_cpu_guess,
> +func, line, &vaf);
> +
> + va_end(args);
> +}

Logging function-name and line-number was a bit weird.  I wonder if
anyone will mind if this is converted to file-n-line, as God intended. 
That will shrink rodata a bit, because number-of-files is a lot less
than number-of-functions.

>  struct mlog_attribute {
>   struct attribute attr;
>   u64 mask;
> diff --git a/fs/ocfs2/cluster/masklog.h b/fs/ocfs2/cluster/masklog.h
> index 7fdc25a..6036e6a 100644
> --- a/fs/ocfs2/cluster/masklog.h
> +++ b/fs/ocfs2/cluster/masklog.h
> @@ -168,7 +168,8 @@ extern struct mlog_bits mlog_and_bits, mlog_not_bits;
>   * scream.  just do this instead of trying to guess which we're building
>   * against.. *sigh*.
>   */
> -#define __mlog_cpu_guess ({  \
> +#define __mlog_cpu_guess \
> +({   \

While we're in there we should turn this into __mlog_cpu_guess().

Or, preferably, just zap the sorry thing and use
raw_smp_processor_id().

>   unsigned long _cpu = get_cpu(); \
>   put_cpu();  \
>   _cpu;   \
> @@ -178,21 +179,25 @@ extern struct mlog_bits mlog_and_bits, mlog_not_bits;
>   * before ##args is intentional. Otherwise, gcc 2.95 will eat the
>   * previous token if args expands to nothing.
>   */
> -#define __mlog_printk(level, fmt, args...)   \
> - printk(level "(%s,%u,%lu):%s:%d " fmt, current->comm,   \
> -task_pid_nr(current), __mlog_cpu_guess,  \
> -__PRETTY_FUNCTION__, __LINE__ , ##args)
> +__printf(4, 5)
> +void __mlog_printk(const char *level, const char *func, int line,
> +const char *fmt, ...);
>  
> -#define mlog(mask, fmt, args...) do {
> \
> +#define mlog(mask, fmt, ...) \
> +do { \
>   u64 __m = MLOG_MASK_PREFIX | (mask);\
>   if ((__m & ML_ALLOWED_BITS) &&  \
>   __mlog_test_u64(__m, mlog_and_bits) &&  \
>   !__mlog_test_u64(__m, mlog_not_bits)) { \
>   if (__m & ML_ERROR) \

All this goop can also be uninlined?

> - __mlog_printk(KERN_ERR, "ERROR: "fmt , ##args); \
> + __mlog_printk(KERN_ERR, __func__, __LINE__, \
> +   "ERROR: " fmt, ##__VA_ARGS__);\
>   else if (__m & ML_NOTICE)   \
> - __mlog_printk(KERN_NOTICE, fmt , ##args);   \
> - else __mlog_printk(KERN_INFO, fmt , ##args);\
> + __mlog_printk(KERN_NOTICE, __func__, __LINE__,  \
> +   fmt, ##__VA_ARGS__);  \
> + else\
> + __mlog_printk(KERN_INFO, __func__, __LINE__,\
> +   fmt, ##__VA_ARGS__);  \
>   }   \
>  } while (0)
>  

I guess this patch is a step on the way - a 10% shrink is decent.  But
I believe that with full uninlining of the ocfs2 logging code we can
shrink the filesystem's footprint by 50%.

This code needs some pretty serious rework and rethink, perhaps
involving a change to the emitted info.  I was hoping one of the ocfs2
developers would take the bait, but they're all in hiding.

If you feel like undertaking such a rotorooting then go wild - that should
wake 'em up ;)


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH V2] ocfs2: return error while ocfs2_figure_merge_contig_type failing

2015-04-22 Thread Andrew Morton

On Wed, 22 Apr 2015 08:36:10 +0800 Xue jiufei  wrote:

> Now function ocfs2_figure_merge_contig_type() still return CONTIG_NONE
> when some error occurs which will cause unpredictable error.
> So return error while ocfs2_figure_merge_contig_type failing.

Can you please redo this patch against current mainline?  The
conflicts are significant.

Thanks.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs2 patches for review

2015-04-07 Thread Andrew Morton


Nine patches for which I feel additional review/test/etc is needed,
please.

Some of these are somewhat old - if people think I should just drop
them then please say so.


ocfs2-trusted-xattr-missing-cap_sys_admin-check.patch
- breaks userspace?

ocfs2-flush-inode-data-to-disk-and-free-inode-when-i_count-becomes-zero.patch

add-errors=continue.patch
acknowledge-return-value-of-ocfs2_error.patch
clear-the-rest-of-the-buffers-on-error.patch

ocfs2-fix-a-tiny-case-that-inode-can-not-removed.patch

ocfs2-use-64bit-variables-to-track-heartbeat-time.patch

ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch
- Mark wants changes. Mark has list of notes

ocfs2-avoid-access-invalid-address-when-read-o2dlm-debug-messages.patch
- Mark has issues



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: check if the ocfs2 lock resource be initialized before calling ocfs2_dlm_lock

2015-04-01 Thread Andrew Morton

On Thu, 2 Apr 2015 11:14:38 +0800 alex chen  wrote:

> >>>
> >> I don't think this is fit for all.
> >> In many places it should do cleanup rather than just return the error
> >> code.
> > 
> > There are about 50 sites which can use this.
> > 
> 
> Can we define a new macro 'mlog_errno_return' as described below ?
> In addition, ocfs2 does
>   if (v)
>   mlog_errno(v);
>   return v;
> in some places. In order to deal with this situation we can judge if
> 'st' is not equal to zero before printing log.

Macros which hide control flow are evil, although in this case the
"return" in the name gives people info about what's happening.

But why bother?  This:

--- a/fs/ocfs2/cluster/masklog.h~ocfs2-make-mlog_errno-return-the-errno
+++ a/fs/ocfs2/cluster/masklog.h
@@ -196,13 +196,14 @@ extern struct mlog_bits mlog_and_bits, m
}   \
 } while (0)
 
-#define mlog_errno(st) do {\
+#define mlog_errno(st) ({  \
int _st = (st); \
if (_st != -ERESTARTSYS && _st != -EINTR && \
_st != AOP_TRUNCATED_PAGE && _st != -ENOSPC &&  \
_st != -EDQUOT) \
mlog(ML_ERROR, "status = %lld\n", (long long)_st);  \
-} while (0)
+   st; \
+})
 
 #define mlog_bug_on_msg(cond, fmt, args...) do {   \
if (cond) { \


is clean, idiomatic and works great.  And with no other change it
shrinks the fs object code by 6k.



Gad, mlog_errno() is a *huge* source of bloat.  Look:

--- a/fs/ocfs2/cluster/masklog.h~a
+++ a/fs/ocfs2/cluster/masklog.h
@@ -183,27 +183,9 @@ extern struct mlog_bits mlog_and_bits, m
   task_pid_nr(current), __mlog_cpu_guess,  \
   __PRETTY_FUNCTION__, __LINE__ , ##args)
 
-#define mlog(mask, fmt, args...) do {  \
-   u64 __m = MLOG_MASK_PREFIX | (mask);\
-   if ((__m & ML_ALLOWED_BITS) &&  \
-   __mlog_test_u64(__m, mlog_and_bits) &&  \
-   !__mlog_test_u64(__m, mlog_not_bits)) { \
-   if (__m & ML_ERROR) \
-   __mlog_printk(KERN_ERR, "ERROR: "fmt , ##args); \
-   else if (__m & ML_NOTICE)   \
-   __mlog_printk(KERN_NOTICE, fmt , ##args);   \
-   else __mlog_printk(KERN_INFO, fmt , ##args);\
-   }   \
-} while (0)
+#define mlog(mask, fmt, args...) do { } while (0)
 
-#define mlog_errno(st) ({  \
-   int _st = (st); \
-   if (_st != -ERESTARTSYS && _st != -EINTR && \
-   _st != AOP_TRUNCATED_PAGE && _st != -ENOSPC &&  \
-   _st != -EDQUOT) \
-   mlog(ML_ERROR, "status = %lld\n", (long long)_st);  \
-   st; \
-})
+#define mlog_errno(st) do { } while (0)
 
 #define mlog_bug_on_msg(cond, fmt, args...) do {   \
if (cond) { \


z:/usr/src/25> size fs/ocfs2/ocfs2.ko
   textdata bss dec hex filename
1140849   82767  832192 2055808  1f5e80 fs/ocfs2/ocfs2.ko-before
 675402   82767  226104  984273   f04d1 fs/ocfs2/ocfs2.ko


It almost doubles the size of the object code!

Someone, please put this thing on a diet.  It doesn't all need to be
inlined.  We could just do

extern void __mlog_errno(int st);
#define mlog_errno(st) __mlog_errno(st, __LINE__)

and save hundreds of kilobytes of text.  It'll be faster too, due to the
reduced instruction cache footprint.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: check if the ocfs2 lock resource be initialized before calling ocfs2_dlm_lock

2015-03-31 Thread Andrew Morton

On Wed, 1 Apr 2015 08:43:45 +0800 Joseph Qi  wrote:

> > From: Andrew Morton 
> > Subject: ocfs2: make mlog_errno return the errno
> > 
> > ocfs2 does
> > 
> > mlog_errno(v);
> > return v;
> > 
> > in many places.  Change mlog_errno() so we can do
> > 
> > return mlog_errno(v);
> > 
> I don't think this is fit for all.
> In many places it should do cleanup rather than just return the error
> code.

There are about 50 sites which can use this.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: check if the ocfs2 lock resource be initialized before calling ocfs2_dlm_lock

2015-03-31 Thread Andrew Morton

On Mon, 30 Mar 2015 11:22:13 +0800 alex chen  wrote:

> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1391,6 +1391,11 @@ static int __ocfs2_cluster_lock(struct ocfs2_super 
> *osb,
>   int noqueue_attempted = 0;
>   int dlm_locked = 0;
> 
> + if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED)) {
> + mlog_errno(-EINVAL);
> + return -EINVAL;
> +     }

hm.  How about we do this?


From: Andrew Morton 
Subject: ocfs2: make mlog_errno return the errno

ocfs2 does

mlog_errno(v);
return v;

in many places.  Change mlog_errno() so we can do

return mlog_errno(v);

For some weird reason this patch reduces the size of ocfs2 by 6k:

akpm3:/usr/src/25> size fs/ocfs2/ocfs2.ko
   textdata bss dec hex filename
1146613   82767  832192 2061572  1f7504 fs/ocfs2/ocfs2.ko-before
1140857   82767  832192 2055816  1f5e88 fs/ocfs2/ocfs2.ko-after

Cc: Mark Fasheh 
Cc: Joel Becker 
Cc: alex chen 
Signed-off-by: Andrew Morton 
---

 fs/ocfs2/cluster/masklog.h |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff -puN fs/ocfs2/cluster/masklog.h~a fs/ocfs2/cluster/masklog.h
--- a/fs/ocfs2/cluster/masklog.h~a
+++ a/fs/ocfs2/cluster/masklog.h
@@ -196,13 +196,14 @@ extern struct mlog_bits mlog_and_bits, m
}   \
 } while (0)
 
-#define mlog_errno(st) do {\
+#define mlog_errno(st) ({  \
int _st = (st); \
if (_st != -ERESTARTSYS && _st != -EINTR && \
_st != AOP_TRUNCATED_PAGE && _st != -ENOSPC &&  \
_st != -EDQUOT) \
mlog(ML_ERROR, "status = %lld\n", (long long)_st);  \
-} while (0)
+   st; \
+})
 
 #define mlog_bug_on_msg(cond, fmt, args...) do {   \
if (cond) { \
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: Neaten do_error, ocfs2_error and ocfs2_abort

2015-03-27 Thread Andrew Morton

On Thu, 26 Mar 2015 20:07:08 -0700 Joe Perches  wrote:

> These uses sometimes do and sometimes don't have '\n' terminations.
> Make the uses consistently use '\n' terminations and remove the
> newline from the functions.

This is going to take a while to merge, as it's backed up behind a pile
of needs-more-reviewing ocfs2 patches.  Namely:

#ocfs2-trusted-xattr-missing-cap_sys_admin-check.patch: breaks userspace?
ocfs2-trusted-xattr-missing-cap_sys_admin-check.patch
#
ocfs2-flush-inode-data-to-disk-and-free-inode-when-i_count-becomes-zero.patch
#
add-errors=continue.patch
acknowledge-return-value-of-ocfs2_error.patch
clear-the-rest-of-the-buffers-on-error.patch
#
ocfs2-fix-a-tiny-case-that-inode-can-not-removed.patch
#
#ocfs2-use-64bit-variables-to-track-heartbeat-time.patch: acks?
ocfs2-use-64bit-variables-to-track-heartbeat-time.patch
#
#ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch:
 acks? Mark wants changes. Mark has list of notes
ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch
#
#ocfs2-avoid-access-invalid-address-when-read-o2dlm-debug-messages.patch: acks? 
Mark has issues
ocfs2-avoid-access-invalid-address-when-read-o2dlm-debug-messages.patch
ocfs2-avoid-access-invalid-address-when-read-o2dlm-debug-messages-v3.patch

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: trusted xattr missing CAP_SYS_ADMIN check

2015-03-24 Thread Andrew Morton

On Tue, 24 Mar 2015 11:46:32 -0400 Sanidhya Kashyap  
wrote:

> The trusted extended attributes are only visible to the process which hvae
> CAP_SYS_ADMIN capability but the check is missing in ocfs2 xattr_handler
> trusted list. The check is important because this will be used for 
> implementing
> mechanisms in the userspace for which other ordinary processes should not
> have access to.
> 
> ...
>
> --- a/fs/ocfs2/xattr.c
> +++ b/fs/ocfs2/xattr.c
> @@ -7326,6 +7326,9 @@ static size_t ocfs2_xattr_trusted_list(struct dentry 
> *dentry, char *list,
>   const size_t prefix_len = XATTR_TRUSTED_PREFIX_LEN;
>   const size_t total_len = prefix_len + name_len + 1;
>  
> + if (!capable(CAP_SYS_ADMIN))
> + return 0;
> +
>   if (list && total_len <= list_size) {
>   memcpy(list, XATTR_TRUSTED_PREFIX, prefix_len);
>   memcpy(list + prefix_len, name, name_len);

Ouch.  Won't this break existing userspace?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/4 V2] ocfs2: use retval instead of status for checking error

2015-03-19 Thread Andrew Morton

On Sat, 28 Feb 2015 08:48:40 +0900 Daeseok Youn  wrote:

> The use of 'status' in __ocfs2_add_entry() can return wrong
> status when some functions are failed.
> 
> If ocfs2_journal_access_db() in __ocfs2_add_entry() is failed,
> that status is saved to 'status' but return variable is 'retval'
> which is saved 'success' status. In case of this,  __ocfs2_add_entry()
> is failed but can be returned as 'success'.
> 
> So replace 'status' with 'retval'.
> 
> - mlog_errno(status);
> + if (retval) {
> + mlog_errno(retval);
>   goto bail;

and

bail:
if (retval)
mlog_errno(retval);

return retval;
}

so we'll clearly log the same error twice.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/3] ocfs2: remove unneeded variable 'status'

2015-02-26 Thread Andrew Morton

On Mon, 23 Feb 2015 19:38:10 +0900 Daeseok Youn  wrote:

> Use 'retval' instead of 'status'.
> 

The patch does a lot more than this.  It causes __ocfs2_add_entry to
propagate error codes which were previously dropped on the floor.

Please update the changelog to fully explain the functional changes and
to explain why they are desirable.

After the patch there is still one unchecked call to
ocfs2_journal_access_di() and one unchecked call to
ocfs2_journal_access_db().  Probably these are bugs.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2: remove useless if statement

2015-02-17 Thread Andrew Morton

On Tue, 17 Feb 2015 16:13:50 +0900 Daeseok Youn  wrote:

> The Local variable "i" in for loop is always less then
> O2CB_MAP_STABILIZE_COUNT.
> 
> Signed-off-by: Daeseok Youn 
> ---
>  fs/ocfs2/stack_o2cb.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/stack_o2cb.c b/fs/ocfs2/stack_o2cb.c
> index 1724d43..813d726 100644
> --- a/fs/ocfs2/stack_o2cb.c
> +++ b/fs/ocfs2/stack_o2cb.c
> @@ -295,8 +295,8 @@ static int o2cb_cluster_check(void)
>   set_bit(node_num, netmap);
>   if (!memcmp(hbmap, netmap, sizeof(hbmap)))
>   return 0;
> - if (i < O2CB_MAP_STABILIZE_COUNT)
> - msleep(1000);
> +
> + msleep(1000);
>   }

I assume the code was intended to do

if (i < O2CB_MAP_STABILIZE_COUNT - 1)
msleep(1000);

to avoid a pointless 1-second delay when the operation times out.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs2 patch review for 3.19

2015-02-04 Thread Andrew Morton


Thsi time I have only four patches which I consider need expert review,
please.

ocfs2-prune-the-dcache-before-deleting-the-dentry-of-directory.patch

o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper.patch
(Joseph Qi had issues)

ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch
(Mark has list of things which need doing)

ocfs2-avoid-access-invalid-address-when-read-o2dlm-debug-messages.patch
(Mark has issues with it)

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH v2] ocfs2: fix warning 'ocfs2_orphan_del' uses dynamic stack allocation

2015-01-26 Thread Andrew Morton

On Mon, 26 Jan 2015 11:20:47 +0800 Joseph Qi  wrote:

> In ocfs2_orphan_del it uses dynamic stack allocation for orphan entry
> name. Fix it by using dynamic heap allocation.
> 
> ...
>
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -2298,18 +2298,22 @@ int ocfs2_orphan_del(struct ocfs2_super *osb,
>  {
>   int namelen = dio ? OCFS2_DIO_ORPHAN_PREFIX_LEN + OCFS2_ORPHAN_NAMELEN :
>   OCFS2_ORPHAN_NAMELEN;
> - char name[namelen + 1];
> + char *name;
>   struct ocfs2_dinode *orphan_fe;
>   int status = 0;
>   struct ocfs2_dir_lookup_result lookup = { NULL, };
> 
> + name = kmalloc(namelen + 1, GFP_NOFS);
> + if (!name)
> + goto leave;
> +
>   if (dio) {
>   status = snprintf(name, OCFS2_DIO_ORPHAN_PREFIX_LEN + 1, "%s",
>   OCFS2_DIO_ORPHAN_PREFIX);
>   if (status != OCFS2_DIO_ORPHAN_PREFIX_LEN) {
>   status = -EINVAL;
>   mlog_errno(status);
> - return status;
> + goto leave;
>   }
> 
>   status = ocfs2_blkno_stringify(OCFS2_I(inode)->ip_blkno,
> @@ -2357,6 +2361,7 @@ int ocfs2_orphan_del(struct ocfs2_super *osb,
>   ocfs2_journal_dirty(handle, orphan_dir_bh);
> 
>  leave:
> + kfree(name);
>   ocfs2_free_dir_lookup_result(&lookup);
> 
>   if (status)

I think I prefer my fix:

--- 
a/fs/ocfs2/namei.c~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir-fix
+++ a/fs/ocfs2/namei.c
@@ -2296,8 +2296,7 @@ int ocfs2_orphan_del(struct ocfs2_super
 struct buffer_head *orphan_dir_bh,
 bool dio)
 {
-   int namelen = dio ? OCFS2_DIO_ORPHAN_PREFIX_LEN + OCFS2_ORPHAN_NAMELEN :
-   OCFS2_ORPHAN_NAMELEN;
+   const int namelen = OCFS2_DIO_ORPHAN_PREFIX_LEN + OCFS2_ORPHAN_NAMELEN;
char name[namelen + 1];
struct ocfs2_dinode *orphan_fe;
int status = 0;

It means we use 20 bytes of stack all the time, instead of
sometimes-20, sometimes-16.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [mmotm:master 396/417] fs/ocfs2/namei.c:2365:1: warning: 'ocfs2_orphan_del' uses dynamic stack allocation

2015-01-26 Thread Andrew Morton

On Sat, 24 Jan 2015 11:49:02 +0800 kbuild test robot  
wrote:

> tree:   git://git.cmpxchg.org/linux-mmotm.git master
> head:   c64429bcc60a702f19f5cfdb5c39277863278a8c
> commit: 98bc024d7e86a52b7c6266f7bf3bac93626f002b [396/417] ocfs2: add 
> functions to add and remove inode in orphan dir
> config: s390-allmodconfig (attached as .config)
> reproduce:
>   wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout 98bc024d7e86a52b7c6266f7bf3bac93626f002b
>   # save the attached .config to linux build tree
>   make.cross ARCH=s390 
> 
> All warnings:
> 
>fs/ocfs2/namei.c: In function 'ocfs2_orphan_del':
> >> fs/ocfs2/namei.c:2365:1: warning: 'ocfs2_orphan_del' uses dynamic stack 
> >> allocation
> }
> ^
> 

OK, thanks.  I suppose we can just use the larger size - it's only 4 bytes.

--- 
a/fs/ocfs2/namei.c~ocfs2-add-functions-to-add-and-remove-inode-in-orphan-dir-fix
+++ a/fs/ocfs2/namei.c
@@ -2296,8 +2296,7 @@ int ocfs2_orphan_del(struct ocfs2_super
 struct buffer_head *orphan_dir_bh,
 bool dio)
 {
-   int namelen = dio ? OCFS2_DIO_ORPHAN_PREFIX_LEN + OCFS2_ORPHAN_NAMELEN :
-   OCFS2_ORPHAN_NAMELEN;
+   const int namelen = OCFS2_DIO_ORPHAN_PREFIX_LEN + OCFS2_ORPHAN_NAMELEN;
char name[namelen + 1];
struct ocfs2_dinode *orphan_fe;
int status = 0;


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/2] ocfs2: fix uninitialized variable access

2015-01-06 Thread Andrew Morton

On Thu, 25 Dec 2014 13:52:16 +0800 Junxiao Bi  wrote:

> Variable "why" is not yet initialized at line 615, fix it.
> 
> ...
>
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -569,7 +569,7 @@ static int __ocfs2_extend_allocation(struct inode *inode, 
> u32 logical_start,
>   handle_t *handle = NULL;
>   struct ocfs2_alloc_context *data_ac = NULL;
>   struct ocfs2_alloc_context *meta_ac = NULL;
> - enum ocfs2_alloc_restarted why;
> + enum ocfs2_alloc_restarted why = RESTART_NONE;
>   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>   struct ocfs2_extent_tree et;
>   int did_quota = 0;

Oh geeze, are you really sure about this?  __ocfs2_extend_allocation()
is as clear as mud.  What happens when ocfs2_add_inode_data() returns
-EAGAIN and leaves *reason_ret unwritten to?

What are the runtime effects of this bug?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: add a mount option journal_async_commit on ocfs2 filesystem

2015-01-06 Thread Andrew Morton

On Thu, 25 Dec 2014 10:53:05 +0800 alex chen  wrote:

> Add a mount option to support JBD2 feature:
> JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT. When this feature is opened,
> journal commit block can be written to disk without waiting for
> descriptor blocks, which can improve journal commit performance. This
> option will enable 'journal_checksum' internally.
> 
> Using the fs_mark benchmark, using journal_async_commit shows a 50%
> improvement, the files per second go up from 215.2 to 317.5.
> 
> test script:
> fs_mark  -d  /mnt/ocfs2/  -s  10240  -n  1000
> 
> default:
> FSUse%Count SizeFiles/sec App Overhead
>  0 100010240215.217878
> 
> with journal_async_commit option:
> FSUse%Count SizeFiles/sec App Overhead
>  0 100010240317.517881

For some reason this patch is a bit mangled and I had to apply the
first hunk by hand.

>  fs/ocfs2/ocfs2.h |  2 ++
>  fs/ocfs2/super.c | 17 +

Documentation/filesystems/ocfs2.txt needs an update.



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file

2015-01-05 Thread Andrew Morton

On Mon, 22 Dec 2014 09:52:14 +0100 Aron Szabo  wrote:

> 12/19/2014 11:15 PM keltez__ssel, Andrew Morton __rta:
> > On Fri, 19 Dec 2014 18:07:45 +0800 Xue jiufei  wrote:
> >
> >> In function ocfs2_link(), parent directory inode passed to function
> >> ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent
> >> of new_dentry not old_dentry. We should get old_dir from old_dentry
> >> and lookup old_dentry in old_dir in case another node remove the old 
> >> dentry.
> > What are the user-visible effects of this change?
> 
> Hi Andew!
> 
> Hard linking works again, when paths are relative with at least one 
> subdirectory. This is how the problem was reproducable:
> 
> # mkdir a
> # mkdir b
> # touch a/test
> # ln a/test b/test
> ln: failed to create hard link `b/test' => `a/test': No such file or 
> directory
> 
> However when creating links in the same dir, it worked well.
> 
> Now the link gets created.
> 
> Thanks for the quick fix Xue!

(top-posting untangled)

When you say "works again", you mean that we broke it?  This patch
fixes a regression?  If so, do we know what caused that regression?  Or
at least when it occurred?



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 1/1 linux-next] ocfs2: remove unnecessary sizeof(char)

2014-12-22 Thread Andrew Morton

On Mon, 22 Dec 2014 20:05:09 +0100 Fabian Frederick  wrote:

> sizeof(char) is always 1.
> 
> ...
>
> --- a/fs/ocfs2/journal.c
> +++ b/fs/ocfs2/journal.c
> @@ -129,8 +129,8 @@ int ocfs2_compute_replay_slots(struct ocfs2_super *osb)
>   if (osb->replay_map)
>   return 0;
>  
> - replay_map = kzalloc(sizeof(struct ocfs2_replay_map) +
> -  (osb->max_slots * sizeof(char)), GFP_KERNEL);
> + replay_map = kzalloc(sizeof(struct ocfs2_replay_map) + osb->max_slots,
> +  GFP_KERNEL);
>  
>   if (!replay_map) {
>   mlog_errno(-ENOMEM);

I dunno.  The code at present isn't particularly idiomatic, but it has
some documentation value and says "I know what I'm doing".

It would be better if it was

kzalloc(sizeof(struct ocfs2_replay_map) *
sizeof(struct ocfs2_replay_map.rm_replay_slots[0]), ...);

And it would be better if C permitted that ;)

kzalloc(sizeof(struct ocfs2_replay_map) *
sizeof((struct ocfs2_replay_map *)0)->rm_replay_slots[0]), ...);

yuk.


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file

2014-12-19 Thread Andrew Morton

On Fri, 19 Dec 2014 18:07:45 +0800 Xue jiufei  wrote:

> In function ocfs2_link(), parent directory inode passed to function
> ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent
> of new_dentry not old_dentry. We should get old_dir from old_dentry
> and lookup old_dentry in old_dir in case another node remove the old dentry.

What are the user-visible effects of this change?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [patch 03/15] ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()

2014-12-17 Thread Andrew Morton

transaction A)
ocfs2_journal_access_di
ocfs2_write_cluster_by_desc
  ocfs2_mark_extent_written
ocfs2_change_extent_flag
  ocfs2_split_extent
ocfs2_extend_rotate_transaction
  jbd2_journal_restart
  (t_updates-1,transaction B) t_updates==0
__jbd2_journal_refile_buffer

ocfs2_write_end
ocfs2_write_end_nolock
ocfs2_journal_dirty
jbd2_journal_dirty_metadata(bug)
   ocfs2_commit_trans

In ext4, I found that: jbd2_journal_get_write_access() called by

ext4_write_end.
ext4_write_begin
ext4_journal_start
__ext4_journal_start_sb
ext4_journal_check_start
jbd2__journal_start

ext4_write_end
ext4_mark_inode_dirty
ext4_reserve_inode_write
ext4_journal_get_write_access
jbd2_journal_get_write_access
ext4_mark_iloc_dirty
ext4_do_update_inode
ext4_handle_dirty_metadata
jbd2_journal_dirty_metadata


So I think we should put ocfs2_journal_access_di before
  ocfs2_journal_dirty in the ocfs2_write_end.  and it works well after my
  modification.

Signed-off-by: vicky 
Cc: Mark Fasheh 
Cc: Joel Becker 
Signed-off-by: Andrew Morton 
---

 fs/ocfs2/aops.c |   21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff -puN 
fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock
 fs/ocfs2/aops.c
--- 
a/fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock
+++ a/fs/ocfs2/aops.c
@@ -1822,16 +1822,6 @@ try_again:
if (ret)
goto out_commit;
}
-   /*
-* We don't want this to fail in ocfs2_write_end(), so do it
-* here.
-*/
-   ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc->w_di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
-   if (ret) {
-   mlog_errno(ret);
-   goto out_quota;
-   }
 
/*
 * Fill our page array first. That way we've grabbed enough so
@@ -1982,7 +1972,7 @@ int ocfs2_write_end_nolock(struct addres
   loff_t pos, unsigned len, unsigned copied,
   struct page *page, void *fsdata)
 {
-   int i;
+   int i, ret;
unsigned from, to, start = pos & (PAGE_CACHE_SIZE - 1);
struct inode *inode = mapping->host;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
@@ -2032,6 +2022,14 @@ int ocfs2_write_end_nolock(struct addres
}
}
 
+   ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc->w_di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+   if (ret) {
+   copied = ret;
+   mlog_errno(ret);
+   goto out;
+   }
+
 out_write_size:
pos += copied;
if (pos > i_size_read(inode)) {
@@ -2053,6 +2051,7 @@ out_write_size:
 */
ocfs2_unlock_pages(wc);
 
+out:
ocfs2_commit_trans(osb, handle);
 
ocfs2_run_deallocs(osb, &wc->w_dealloc);
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] ocfs patches for review

2014-12-15 Thread Andrew Morton


Another round of jolly patch reviewing, please.  A number of these
patches have been stalled for quite a long time.

I have the following notes:

o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper.patch:
   - Joseph Qi had issues

ocfs2-free-inode-when-i_count-becomes-zero.patch:
   - Mark is finding a better way of doing this

ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch:
   - Mark requested changes



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] Remove filesize checks for sync I/O journal commit

2014-11-04 Thread Andrew Morton

On Tue, 4 Nov 2014 09:52:22 -0600 Goldwyn Rodrigues  wrote:

> Filesize is not a good indication that the file needs to be synced.
> An example where this breaks is:
>  1. Open the file in O_SYNC|O_RDWR
>  2. Read a small portion of the file (say 64 bytes)
>  3. Lseek to starting of the file
>  4. Write 64 bytes
> 
> If the node crashes, it is not written out to disk because this
> was not committed in the journal and the other node which reads
> the file after recovery reads stale data (even if the write on
> the other node was successful)
> 
> ...
>
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2381,9 +2381,7 @@ out_dio:
>   if (ret < 0)
>   written = ret;
>  
> - if (!ret && ((old_size != i_size_read(inode)) ||
> -  (old_clusters != OCFS2_I(inode)->ip_clusters) ||
> -  has_refcount)) {
> + if (!ret) {
>   ret = 
> jbd2_journal_force_commit(osb->journal->j_journal);
>   if (ret < 0)
>   written = ret;

Can we have a signed-off-by for this, please?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 2/2] ocfs2: Fix d_splice_alias() return code checking

2014-10-20 Thread Andrew Morton

On Sun, 19 Oct 2014 12:39:44 +0200 Richard Weinberger  wrote:

> d_splice_alias() can return a valid dentry, NULL or an ERR_PTR.
> Currently the code checks not for ERR_PTR and my oops in
> ocfs2_dentry_attach_lock().

It's unclear what the second sentence is trying to tell us.  The patch
fixes an oops?  If so, a copy of the trace would be useful, as would an
explanation of why it occurred.  If not, I'm all confused.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 4/7 v4] ocfs2: add and remove inode in orphan dir in ocfs2_direct_IO

2014-10-15 Thread Andrew Morton

On Wed, 15 Oct 2014 16:42:44 -0700 Andrew Morton  
wrote:

> On Sat, 11 Oct 2014 20:29:08 +0800 WeiWei Wang  wrote:
> 
> > Add the inode to orphan dir first, and then delete it once append
> > O_DIRECT finished.
> > This is to make sure block allocation and inode size are consistent.
> > 
> > ...
> >
> > +static ssize_t ocfs2_direct_IO_write(struct kiocb *iocb,
> > +   struct iov_iter *iter,
> > +   loff_t offset)
> > +{
> > +   ssize_t ret = 0;
> > +   int orphan = 0;
> > +   int is_overwrite = 0;
> > +   struct file *file = iocb->ki_filp;
> > +   struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host;
> > +   struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> > +   struct buffer_head *di_bh = NULL;
> > +   size_t count = iter->count;
> > +   journal_t *journal = osb->journal->j_journal;
> > +   u32 p_cpos = 0;
> > +   u32 v_cpos = ocfs2_clusters_for_bytes(osb->sb, offset);
> > +   u32 zero_len = offset % (1 << osb->s_clustersize_bits);
> 
> On i386 this generates a call to _moddi3, which doesn't exist.  You'll
> need to use do_div() or something similar.
> 
> > +   int cluster_align = offset % (1 << osb->s_clustersize_bits) ? 0 : 1;
> 
> Similar here.  Why not just do cluster_align = !!zero_len?
> 

Something like this...

--- 
a/fs/ocfs2/aops.c~ocfs2-add-and-remove-inode-in-orphan-dir-in-ocfs2_direct_io-fix
+++ a/fs/ocfs2/aops.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -640,13 +641,20 @@ static ssize_t ocfs2_direct_IO_write(str
journal_t *journal = osb->journal->j_journal;
u32 p_cpos = 0;
u32 v_cpos = ocfs2_clusters_for_bytes(osb->sb, offset);
-   u32 zero_len = offset % (1 << osb->s_clustersize_bits);
-   int cluster_align = offset % (1 << osb->s_clustersize_bits) ? 0 : 1;
+   u32 zero_len;
+   int cluster_align;
+   loff_t final_size = offset + count;
int append_write = offset >= i_size_read(inode) ? 1 : 0;
unsigned int num_clusters = 0;
unsigned int ext_flags = 0;
 
-   loff_t final_size = offset + count;
+   {
+   loff_t o = offset;
+
+   zero_len = do_div(o, 1 << osb->s_clustersize_bits);
+   cluster_align = !!zero_len;
+   }
+
/*
 * when final_size > inode->i_size, inode->i_size will be
 * updated after direct write, so add the inode to orphan
_


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 4/7 v4] ocfs2: add and remove inode in orphan dir in ocfs2_direct_IO

2014-10-15 Thread Andrew Morton

On Sat, 11 Oct 2014 20:29:08 +0800 WeiWei Wang  wrote:

> Add the inode to orphan dir first, and then delete it once append
> O_DIRECT finished.
> This is to make sure block allocation and inode size are consistent.
> 
> ...
>
> +static ssize_t ocfs2_direct_IO_write(struct kiocb *iocb,
> + struct iov_iter *iter,
> + loff_t offset)
> +{
> + ssize_t ret = 0;
> + int orphan = 0;
> + int is_overwrite = 0;
> + struct file *file = iocb->ki_filp;
> + struct inode *inode = file->f_path.dentry->d_inode->i_mapping->host;
> + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> + struct buffer_head *di_bh = NULL;
> + size_t count = iter->count;
> + journal_t *journal = osb->journal->j_journal;
> + u32 p_cpos = 0;
> + u32 v_cpos = ocfs2_clusters_for_bytes(osb->sb, offset);
> + u32 zero_len = offset % (1 << osb->s_clustersize_bits);

On i386 this generates a call to _moddi3, which doesn't exist.  You'll
need to use do_div() or something similar.

> + int cluster_align = offset % (1 << osb->s_clustersize_bits) ? 0 : 1;

Similar here.  Why not just do cluster_align = !!zero_len?

> + int append_write = offset >= i_size_read(inode) ? 1 : 0;
> + unsigned int num_clusters = 0;
> + unsigned int ext_flags = 0;
>
> ...
>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: call dlm_lockres_put without resource spinlock

2014-10-01 Thread Andrew Morton

On Fri, 26 Sep 2014 16:41:39 +0800 alex chen  wrote:

> dlm_lockres_put should be called without &res->spinlock, otherwise a
> deadlock case may happen.
> 
> spin_lock(&res->spinlock)
> ...
> dlm_lockres_put
>   ->dlm_lockres_release
> ->dlm_print_one_lock_resource
>   ->spin_lock(&res->spinlock)
> 
> Signed-off-by: Alex Chen 
> Reviewed-by: Joseph Qi 
> ---
>  fs/ocfs2/dlm/dlmrecovery.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 45067fa..3365839 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -1710,9 +1710,12 @@ int dlm_master_requery_handler(struct o2net_msg *msg, 
> u32 len, void *data,
>   BUG();

This code does a GFP_ATOMIC allocation attempt and if that fails, it
goes BUG().

Guys, GFP_ATOMIC is unreliable.  This isn't production quality code :(



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [PATCH 0/7] ocfs2: allocate blocks in direct I/O write

2014-09-10 Thread Andrew Morton

On Wed, 10 Sep 2014 20:38:04 +0800 WeiWei Wang  wrote:

> hi all,
> In ocfs2 append I/O write and fill holes I/O write situation, blocks have not 
> been allocated yet, so the direct I/O write will fallback to buffer I/O write.
> Buffer I/O write the data to page cache first, then flush the page cache to 
> disk, this will consume some performance. In this patch, the direct I/O write
> doesn't not need to fallback to buffer I/O write any more because the 
> allocate blocks are enabled in direct I/O now.
> 

The entire point of the patchset is to improve performance, but the
changelog contains no performance measurements!  How do we know it's
worth considering?  Please include quantitative benchmarking results
in the changelog.



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

1 2 3 >

1 - 100 of 219 matches

Mail list logo