Re: Re: Re: EXT4 regression caused 4eec7

2013-05-13 Thread Zheng Liu
On Mon, May 13, 2013 at 05:17:27PM +0200, Jan Kara wrote:
> On Mon 13-05-13 21:56:43, Zheng Liu wrote:
> > On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> > > On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
> > > > 
> > > > 
> > > > >> Since at this point it's safer to rollback the change and we can
> > > > >> investigate more deeply how to fix it correctly for the next
> > > > >> development cycle, this is the patch which I'm testing.
> > > > 
> > > > >> - Ted
> > > > 
> > > > > Hello, I've tested with your patch. But the same problem was 
> > > > > reproduced.
> > > > > Currently, I'm trying to git bisect. If i done git bisect, i will let 
> > > > > you know.
> > > > 
> > > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() 
> > > > is caused by 
> > > > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head 
> > > > size."
> > > > I write just code patch which revert 
> > > > ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> > > > I don't know the root cause. 
> > >   This is really strange. I've verified the code and all the places
> > > modifying b_jlist or b_modified are holding bh_state lock so we should be
> > > safe...
> > 
> > Hi Jan,
> > 
> > Could you please take a look at this mail [1].  I don't think we hold
> > bh_state lock there.
> > 
> > 1. http://www.spinics.net/lists/linux-ext4/msg38205.html
>   I'll also reply to that thread but: Yes, we don't hold bh_state lock when
> reading b_jlist in that one case (that's why I wrote 'modify' and not just
> 'access' in my previous email). But that's really harmless since we don't
> do any complex operations with b_jlist (only get & set) so we either see an
> old value or a new one. And that particular use is going away anyway later
> in my series.

I see.  Thanks for your explanation.

- Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: EXT4 regression caused 4eec7

2013-05-13 Thread Jan Kara
On Mon 13-05-13 21:56:43, Zheng Liu wrote:
> On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> > On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
> > > 
> > > 
> > > >> Since at this point it's safer to rollback the change and we can
> > > >> investigate more deeply how to fix it correctly for the next
> > > >> development cycle, this is the patch which I'm testing.
> > > 
> > > >> - Ted
> > > 
> > > > Hello, I've tested with your patch. But the same problem was reproduced.
> > > > Currently, I'm trying to git bisect. If i done git bisect, i will let 
> > > > you know.
> > > 
> > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is 
> > > caused by 
> > > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head 
> > > size."
> > > I write just code patch which revert 
> > > ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> > > I don't know the root cause. 
> >   This is really strange. I've verified the code and all the places
> > modifying b_jlist or b_modified are holding bh_state lock so we should be
> > safe...
> 
> Hi Jan,
> 
> Could you please take a look at this mail [1].  I don't think we hold
> bh_state lock there.
> 
> 1. http://www.spinics.net/lists/linux-ext4/msg38205.html
  I'll also reply to that thread but: Yes, we don't hold bh_state lock when
reading b_jlist in that one case (that's why I wrote 'modify' and not just
'access' in my previous email). But that's really harmless since we don't
do any complex operations with b_jlist (only get & set) so we either see an
old value or a new one. And that particular use is going away anyway later
in my series.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: EXT4 regression caused 4eec7

2013-05-13 Thread Zheng Liu
On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote:
> On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
> > 
> > 
> > >> Since at this point it's safer to rollback the change and we can
> > >> investigate more deeply how to fix it correctly for the next
> > >> development cycle, this is the patch which I'm testing.
> > 
> > >> - Ted
> > 
> > > Hello, I've tested with your patch. But the same problem was reproduced.
> > > Currently, I'm trying to git bisect. If i done git bisect, i will let you 
> > > know.
> > 
> > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is 
> > caused by 
> > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
> > I write just code patch which revert 
> > ae4647fb7654676fc44a97e86eb35f9f06b99f66 because
> > I don't know the root cause. 
>   This is really strange. I've verified the code and all the places
> modifying b_jlist or b_modified are holding bh_state lock so we should be
> safe...

Hi Jan,

Could you please take a look at this mail [1].  I don't think we hold
bh_state lock there.

1. http://www.spinics.net/lists/linux-ext4/msg38205.html

Thanks,
- Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: EXT4 regression caused 4eec7

2013-05-13 Thread Jan Kara
On Sun 12-05-13 13:04:59, EUNBONG SONG wrote:
> 
> 
> >> Since at this point it's safer to rollback the change and we can
> >> investigate more deeply how to fix it correctly for the next
> >> development cycle, this is the patch which I'm testing.
> 
> >> - Ted
> 
> > Hello, I've tested with your patch. But the same problem was reproduced.
> > Currently, I'm trying to git bisect. If i done git bisect, i will let you 
> > know.
> 
> Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is 
> caused by 
> ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
> I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 
> because
> I don't know the root cause. 
  This is really strange. I've verified the code and all the places
modifying b_jlist or b_modified are holding bh_state lock so we should be
safe...

Hum, but I remember I was debugging similar problems with bit fields in
btrfs on ia64 as well (see http://lwn.net/Articles/478657/). So I think what
has happened is that your compiler compiled bitfield access as 64-bit and
updates to b_jcount and b_jlist / b_modified crashed into one another. I
didn't hit it because my compiler was not so "clever".

Grumble. In this case I think bitfields are not worth the trouble with gcc.
It's a pitty we have to spend additional 8 bytes for every journal_head but
we'll survive... I'll send Ted a partial revert and add a comment so that
we won't repeat this mistake in future.

Honza
 
> Signed-off-by: Eunbong Song 
> ---
>  include/linux/journal-head.h |   11 +--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
> index 13a3da2..c18b46f 100644
> --- a/include/linux/journal-head.h
> +++ b/include/linux/journal-head.h
> @@ -31,14 +31,21 @@ struct journal_head {
>   /*
>* Journalling list for this buffer [jbd_lock_bh_state()]
>*/
> - unsigned b_jlist:4;
> + unsigned b_jlist;
>  
>   /*
>* This flag signals the buffer has been modified by
>* the currently running transaction
>* [jbd_lock_bh_state()]
>*/
> - unsigned b_modified:1;
> + unsigned b_modified;
> +
> + /*
> +  * This feild tracks the last transaction id in which this buffer
> +  * has been cowed
> +  * [jbd_lock_bh_state()]
> +  */
> + tid_t b_cow_tid;
>  
>   /*
>* Copy of the buffer data frozen for writing to the log.
> -- 
> 1.7.0.4
> 
> 
> Thanks. 
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: EXT4 regression caused 4eec7

2013-05-12 Thread Dmitry Monakhov
On Sun, 12 May 2013 13:05:00 + (GMT), EUNBONG SONG  
wrote:
> 
> 
> >> Since at this point it's safer to rollback the change and we can
> >> investigate more deeply how to fix it correctly for the next
> >> development cycle, this is the patch which I'm testing.
> 
> >> - Ted
> 
> > Hello, I've tested with your patch. But the same problem was reproduced.
> > Currently, I'm trying to git bisect. If i done git bisect, i will let you 
> > know.
> 
> Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is 
> caused by 
> ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
OK that explains both regressions ext4/jbd2 and 
ext3/jbd(https://lkml.org/lkml/2013/5/10/434)
Probably some race because b_jlist accessed w/o lock some places. 
BUT i still can not reproduce original issue :(
I've tried all types of disks near me (ssd,hdd,usb-dongle),
What disk type you use? Is it reproducible on x86?
Please post following info:
#fdisk -l  $HDD
#hdparm -t $HDD
#df -h 
#cat /proc/mounts

> I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 
> because
> I don't know the root cause. 
> 
> 
> Signed-off-by: Eunbong Song 
> ---
>  include/linux/journal-head.h |   11 +--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
> index 13a3da2..c18b46f 100644
> --- a/include/linux/journal-head.h
> +++ b/include/linux/journal-head.h
> @@ -31,14 +31,21 @@ struct journal_head {
>   /*
>* Journalling list for this buffer [jbd_lock_bh_state()]
>*/
> - unsigned b_jlist:4;
> + unsigned b_jlist;
>  
>   /*
>* This flag signals the buffer has been modified by
>* the currently running transaction
>* [jbd_lock_bh_state()]
>*/
> - unsigned b_modified:1;
> + unsigned b_modified;
> +
> + /*
> +  * This feild tracks the last transaction id in which this buffer
> +  * has been cowed
> +  * [jbd_lock_bh_state()]
> +  */
> + tid_t b_cow_tid;
>  
>   /*
>* Copy of the buffer data frozen for writing to the log.
> -- 
> 1.7.0.4
> 
> 
> Thanks. 
> N‹㎠咽r›y鉉šbXФ푤v^–)頻{.n+‰램Š{‘喩zX㎍›●}ž꼿z&j:+v‰㉭‘喩zZ+€+zf"hšˆ~††i鎬zwⅱ?™②&)刪f”^j푹ym
> …@Aa뛴0띠h’i
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: Re: EXT4 regression caused 4eec7

2013-05-12 Thread EUNBONG SONG


>> Since at this point it's safer to rollback the change and we can
>> investigate more deeply how to fix it correctly for the next
>> development cycle, this is the patch which I'm testing.

>> - Ted

> Hello, I've tested with your patch. But the same problem was reproduced.
> Currently, I'm trying to git bisect. If i done git bisect, i will let you 
> know.

Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is 
caused by 
ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size."
I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 
because
I don't know the root cause. 


Signed-off-by: Eunbong Song 
---
 include/linux/journal-head.h |   11 +--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h
index 13a3da2..c18b46f 100644
--- a/include/linux/journal-head.h
+++ b/include/linux/journal-head.h
@@ -31,14 +31,21 @@ struct journal_head {
/*
 * Journalling list for this buffer [jbd_lock_bh_state()]
 */
-   unsigned b_jlist:4;
+   unsigned b_jlist;
 
/*
 * This flag signals the buffer has been modified by
 * the currently running transaction
 * [jbd_lock_bh_state()]
 */
-   unsigned b_modified:1;
+   unsigned b_modified;
+
+   /*
+* This feild tracks the last transaction id in which this buffer
+* has been cowed
+* [jbd_lock_bh_state()]
+*/
+   tid_t b_cow_tid;
 
/*
 * Copy of the buffer data frozen for writing to the log.
-- 
1.7.0.4


Thanks. 
N떑꿩�r툤y鉉싕b쾊Ф푤v�^�)頻{.n�+돴쪐{콗喩zX㎍썳變}찠꼿쟺�&j:+v돣�쳭喩zZ+€�+zf"톒쉱�~넮녬i鎬z�췿ⅱ�?솳鈺�&�)刪f뷌^j푹y쬶끷@A첺뛴
0띠h��뭝

Re: Re: EXT4 regression caused 4eec7

2013-05-12 Thread EUNBONG SONG


> Since at this point it's safer to rollback the change and we can
> investigate more deeply how to fix it correctly for the next
> development cycle, this is the patch which I'm testing.

> - Ted

Hello, I've tested with your patch. But the same problem was reproduced.
Currently, I'm trying to git bisect. If i done git bisect, i will let you know.

Thanks.