Re: Re: Re: EXT4 regression caused 4eec7
On Mon, May 13, 2013 at 05:17:27PM +0200, Jan Kara wrote: > On Mon 13-05-13 21:56:43, Zheng Liu wrote: > > On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote: > > > On Sun 12-05-13 13:04:59, EUNBONG SONG wrote: > > > > > > > > > > > > >> Since at this point it's safer to rollback the change and we can > > > > >> investigate more deeply how to fix it correctly for the next > > > > >> development cycle, this is the patch which I'm testing. > > > > > > > > >> - Ted > > > > > > > > > Hello, I've tested with your patch. But the same problem was > > > > > reproduced. > > > > > Currently, I'm trying to git bisect. If i done git bisect, i will let > > > > > you know. > > > > > > > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() > > > > is caused by > > > > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head > > > > size." > > > > I write just code patch which revert > > > > ae4647fb7654676fc44a97e86eb35f9f06b99f66 because > > > > I don't know the root cause. > > > This is really strange. I've verified the code and all the places > > > modifying b_jlist or b_modified are holding bh_state lock so we should be > > > safe... > > > > Hi Jan, > > > > Could you please take a look at this mail [1]. I don't think we hold > > bh_state lock there. > > > > 1. http://www.spinics.net/lists/linux-ext4/msg38205.html > I'll also reply to that thread but: Yes, we don't hold bh_state lock when > reading b_jlist in that one case (that's why I wrote 'modify' and not just > 'access' in my previous email). But that's really harmless since we don't > do any complex operations with b_jlist (only get & set) so we either see an > old value or a new one. And that particular use is going away anyway later > in my series. I see. Thanks for your explanation. - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: Re: EXT4 regression caused 4eec7
On Mon 13-05-13 21:56:43, Zheng Liu wrote: > On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote: > > On Sun 12-05-13 13:04:59, EUNBONG SONG wrote: > > > > > > > > > >> Since at this point it's safer to rollback the change and we can > > > >> investigate more deeply how to fix it correctly for the next > > > >> development cycle, this is the patch which I'm testing. > > > > > > >> - Ted > > > > > > > Hello, I've tested with your patch. But the same problem was reproduced. > > > > Currently, I'm trying to git bisect. If i done git bisect, i will let > > > > you know. > > > > > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is > > > caused by > > > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head > > > size." > > > I write just code patch which revert > > > ae4647fb7654676fc44a97e86eb35f9f06b99f66 because > > > I don't know the root cause. > > This is really strange. I've verified the code and all the places > > modifying b_jlist or b_modified are holding bh_state lock so we should be > > safe... > > Hi Jan, > > Could you please take a look at this mail [1]. I don't think we hold > bh_state lock there. > > 1. http://www.spinics.net/lists/linux-ext4/msg38205.html I'll also reply to that thread but: Yes, we don't hold bh_state lock when reading b_jlist in that one case (that's why I wrote 'modify' and not just 'access' in my previous email). But that's really harmless since we don't do any complex operations with b_jlist (only get & set) so we either see an old value or a new one. And that particular use is going away anyway later in my series. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: Re: EXT4 regression caused 4eec7
On Mon, May 13, 2013 at 03:18:09PM +0200, Jan Kara wrote: > On Sun 12-05-13 13:04:59, EUNBONG SONG wrote: > > > > > > >> Since at this point it's safer to rollback the change and we can > > >> investigate more deeply how to fix it correctly for the next > > >> development cycle, this is the patch which I'm testing. > > > > >> - Ted > > > > > Hello, I've tested with your patch. But the same problem was reproduced. > > > Currently, I'm trying to git bisect. If i done git bisect, i will let you > > > know. > > > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is > > caused by > > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size." > > I write just code patch which revert > > ae4647fb7654676fc44a97e86eb35f9f06b99f66 because > > I don't know the root cause. > This is really strange. I've verified the code and all the places > modifying b_jlist or b_modified are holding bh_state lock so we should be > safe... Hi Jan, Could you please take a look at this mail [1]. I don't think we hold bh_state lock there. 1. http://www.spinics.net/lists/linux-ext4/msg38205.html Thanks, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: Re: EXT4 regression caused 4eec7
On Sun 12-05-13 13:04:59, EUNBONG SONG wrote: > > > >> Since at this point it's safer to rollback the change and we can > >> investigate more deeply how to fix it correctly for the next > >> development cycle, this is the patch which I'm testing. > > >> - Ted > > > Hello, I've tested with your patch. But the same problem was reproduced. > > Currently, I'm trying to git bisect. If i done git bisect, i will let you > > know. > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is > caused by > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size." > I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 > because > I don't know the root cause. This is really strange. I've verified the code and all the places modifying b_jlist or b_modified are holding bh_state lock so we should be safe... Hum, but I remember I was debugging similar problems with bit fields in btrfs on ia64 as well (see http://lwn.net/Articles/478657/). So I think what has happened is that your compiler compiled bitfield access as 64-bit and updates to b_jcount and b_jlist / b_modified crashed into one another. I didn't hit it because my compiler was not so "clever". Grumble. In this case I think bitfields are not worth the trouble with gcc. It's a pitty we have to spend additional 8 bytes for every journal_head but we'll survive... I'll send Ted a partial revert and add a comment so that we won't repeat this mistake in future. Honza > Signed-off-by: Eunbong Song > --- > include/linux/journal-head.h | 11 +-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h > index 13a3da2..c18b46f 100644 > --- a/include/linux/journal-head.h > +++ b/include/linux/journal-head.h > @@ -31,14 +31,21 @@ struct journal_head { > /* >* Journalling list for this buffer [jbd_lock_bh_state()] >*/ > - unsigned b_jlist:4; > + unsigned b_jlist; > > /* >* This flag signals the buffer has been modified by >* the currently running transaction >* [jbd_lock_bh_state()] >*/ > - unsigned b_modified:1; > + unsigned b_modified; > + > + /* > + * This feild tracks the last transaction id in which this buffer > + * has been cowed > + * [jbd_lock_bh_state()] > + */ > + tid_t b_cow_tid; > > /* >* Copy of the buffer data frozen for writing to the log. > -- > 1.7.0.4 > > > Thanks. -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: Re: EXT4 regression caused 4eec7
On Sun, 12 May 2013 13:05:00 + (GMT), EUNBONG SONG wrote: > > > >> Since at this point it's safer to rollback the change and we can > >> investigate more deeply how to fix it correctly for the next > >> development cycle, this is the patch which I'm testing. > > >> - Ted > > > Hello, I've tested with your patch. But the same problem was reproduced. > > Currently, I'm trying to git bisect. If i done git bisect, i will let you > > know. > > Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is > caused by > ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size." OK that explains both regressions ext4/jbd2 and ext3/jbd(https://lkml.org/lkml/2013/5/10/434) Probably some race because b_jlist accessed w/o lock some places. BUT i still can not reproduce original issue :( I've tried all types of disks near me (ssd,hdd,usb-dongle), What disk type you use? Is it reproducible on x86? Please post following info: #fdisk -l $HDD #hdparm -t $HDD #df -h #cat /proc/mounts > I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 > because > I don't know the root cause. > > > Signed-off-by: Eunbong Song > --- > include/linux/journal-head.h | 11 +-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h > index 13a3da2..c18b46f 100644 > --- a/include/linux/journal-head.h > +++ b/include/linux/journal-head.h > @@ -31,14 +31,21 @@ struct journal_head { > /* >* Journalling list for this buffer [jbd_lock_bh_state()] >*/ > - unsigned b_jlist:4; > + unsigned b_jlist; > > /* >* This flag signals the buffer has been modified by >* the currently running transaction >* [jbd_lock_bh_state()] >*/ > - unsigned b_modified:1; > + unsigned b_modified; > + > + /* > + * This feild tracks the last transaction id in which this buffer > + * has been cowed > + * [jbd_lock_bh_state()] > + */ > + tid_t b_cow_tid; > > /* >* Copy of the buffer data frozen for writing to the log. > -- > 1.7.0.4 > > > Thanks. > N㎠咽ry鉉bXФ푤v^)頻{.n+램{喩zX㎍●}꼿z&j:+v㉭喩zZ++zf"h~i鎬zwⅱ?②&)刪f^j푹ym > @Aa뛴0띠hi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: Re: EXT4 regression caused 4eec7
>> Since at this point it's safer to rollback the change and we can >> investigate more deeply how to fix it correctly for the next >> development cycle, this is the patch which I'm testing. >> - Ted > Hello, I've tested with your patch. But the same problem was reproduced. > Currently, I'm trying to git bisect. If i done git bisect, i will let you > know. Hi, I've done git bisect. and panic at jbd2_journal_put_journal_head() is caused by ae4647fb7654676fc44a97e86eb35f9f06b99f66: "jbd2: reduce journal_head size." I write just code patch which revert ae4647fb7654676fc44a97e86eb35f9f06b99f66 because I don't know the root cause. Signed-off-by: Eunbong Song --- include/linux/journal-head.h | 11 +-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/journal-head.h b/include/linux/journal-head.h index 13a3da2..c18b46f 100644 --- a/include/linux/journal-head.h +++ b/include/linux/journal-head.h @@ -31,14 +31,21 @@ struct journal_head { /* * Journalling list for this buffer [jbd_lock_bh_state()] */ - unsigned b_jlist:4; + unsigned b_jlist; /* * This flag signals the buffer has been modified by * the currently running transaction * [jbd_lock_bh_state()] */ - unsigned b_modified:1; + unsigned b_modified; + + /* +* This feild tracks the last transaction id in which this buffer +* has been cowed +* [jbd_lock_bh_state()] +*/ + tid_t b_cow_tid; /* * Copy of the buffer data frozen for writing to the log. -- 1.7.0.4 Thanks. N떑꿩�r툤y鉉싕b쾊Ф푤v�^�)頻{.n�+돴쪐{콗喩zX㎍썳變}찠꼿쟺�&j:+v돣�쳭喩zZ+�+zf"톒쉱�~넮녬i鎬z�췿ⅱ�?솳鈺�&�)刪f뷌^j푹y쬶끷@A첺뛴 0띠h��뭝
Re: Re: EXT4 regression caused 4eec7
> Since at this point it's safer to rollback the change and we can > investigate more deeply how to fix it correctly for the next > development cycle, this is the patch which I'm testing. > - Ted Hello, I've tested with your patch. But the same problem was reproduced. Currently, I'm trying to git bisect. If i done git bisect, i will let you know. Thanks.