On Mon, 22 Sep 2014, Anton Altaparmakov wrote: > Hi Hugh, > > On 22 Sep 2014, at 05:43, Hugh Dickins <hu...@google.com> wrote: > > On Mon, 22 Sep 2014, Anton Altaparmakov wrote: > >> Any code that uses __getblk() and thus bread(), breadahead(), sb_bread(), > >> sb_breadahead(), sb_getblk(), and calls it using a 64-bit block on a > >> 32-bit arch (where "long" is 32-bit) causes an inifinite loop in > >> __getblk_slow() with an infinite stream of errors logged to dmesg like > >> this: > >> > >> __find_get_block_slow() failed. block=6740375944, b_blocknr=2445408648 > >> b_state=0x00000020, b_size=512 > >> device sda1 blocksize: 512 > >> > >> Note how in hex block is 0x191C1F988 and b_blocknr is 0x91C1F988 i.e. the > >> top 32-bits are missing (in this case the 0x1 at the top). > >> > >> This is because grow_dev_page() was broken in commit 676ce6d5ca30: "block: > >> replace __getblk_slow misfix by grow_dev_page fix" by Hugh Dickins so that > >> it now has a 32-bit overflow due to shifting the block value to the right > >> so it fits in 32-bits and storing the result in pgoff_t variable which is > >> 32-bit and then passing the pgoff_t variable left-shifted as the block > >> number which causes the top bits to get lost as the pgoff_t is not type > >> cast to sector_t / 64-bit before the shift. > >> > >> This patch fixes this issue by type casting "index" to sector_t before > >> doing the left shift. > >> > >> Note this is not a theoretical bug but has been seen in the field on a > >> 4TiB hard drive with logical sector size 512 bytes. > >> > >> This patch has been verified to fix the infinite loop problem on 3.17-rc5 > >> kernel using a 4TB disk image mounted using "-o loop". Without this patch > >> doing a "find /nt" where /nt is an NTFS volume causes the inifinite loop > >> 100% reproducibly whilst with the patch it works fine as expected. > >> > >> Signed-off-by: Anton Altaparmakov <ai...@cantab.net> > >> Cc: sta...@vger.kernel.org # 3.6 3.8 3.10 3.12 3.14 3.16 > > > > Acked-by: Hugh Dickins <hu...@google.com> > > > > Yes indeed, that's bad, and entirely my fault (though it took me a while > > to see how the "block = index << sizebits" which was there before was okay, > > Ouch. I failed to see that (I admit I did not pay too much attention to the > original code - I just used "git blame" to find out which commit put that > code in place - my bad). It was never ok! I went back to 2.6.12-rc2 > (original commit to git Linus' kernel repository) and that is also affected. > That implies this is the first time anyone has used a 4TiB disk with 512 byte > sectors with NTFS where Windows had previously created a file/directory with > an attribute list attribute in it. - That is the only metadata type we use > sb_bread() for and the disk image from the customer does contain it and I > verified that is where the inifinite loop comes from.
Ow, that line needs some truncating itself :) You generously interpret my words as seeing that for myself. No, thank you for following up: I had persuaded myself when writing before, that index would be promoted from pgoff_t to sector_t before shifting in the original assignment, but not when I passed it as arg. I've resorted to writing a proggy to check, and it looks like I was earlier confused, and you are right, and that code was "always" wrong. Not much use of 4TiB disks on 32-bit kernels I suppose. > > So it appears sb_bread() and friends have always been broken on 32-bit > architectures when accessing blocks outside the range 2^32 - 1 and it appears > google finds lots of occurrences of such infinite loops being reported but > the fixes have always been to not make the calls in the first place. No-one > seems to have recognized that there is a genuine problem here before. > > Still my patch stands correct and should be applied to all kernel versions > that have your patch and older kernels should have an equivalent patch > applied to fix them, too for people who run very old kernels. Yes, though I'm now uncertain whether your patch is a bugfix or an enhancement. Definitely a bugfix, given the infinite loops. But the oldest code ("index = block >> sizebits; block = index << sizebits;") is so clearly trying to truncate block, that I wonder what departing from that might be letting us in for. I see Andrew did 2.6.19's intervening e5657933863f ("grow_buffers() infinite loop fix"): let's hope that he has a clearer head in the morning than I have now - there's a chance that he might think it safer to extend that check to exclude your case, than take your patch. I hope not, that would not please you; but right now I'm ruling myself out of grasping the issues here! Hugh > > > but my passing "index << sizebits" as arg to function very much not okay). > > Thank you, Anton. > > You are welcome. > > > But I see my commit was marked for stable 3.0 3.2 3.4 3.5: so your fix > > is needed in 3.2 and 3.4 longterm also (the others now beyond life). > > You are quite right. It needs to go back to all those kernels, too. Thank > you! > > Best regards, > > Anton > > > Hugh > > > >> --- > >> > >> Linus, can you please apply this? Alternatively, Andrew, can you please > >> pick this up and send it to Linus? > >> > >> It would be good it it can be applied for 3.17 as well as to all stable > >> kernels >= 3.6 as they are all broken! > >> > >> Thanks a lot in advance! > >> > >> Best regards, > >> > >> Anton > >> -- > >> Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) > >> Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK > >> Linux NTFS maintainer, http://www.linux-ntfs.org/ > >> > >> diff --git a/fs/buffer.c b/fs/buffer.c > >> index 8f05111..3588a80 100644 > >> --- a/fs/buffer.c > >> +++ b/fs/buffer.c > >> @@ -1022,7 +1022,8 @@ grow_dev_page(struct block_device *bdev, sector_t > >> block, > >> bh = page_buffers(page); > >> if (bh->b_size == size) { > >> end_block = init_page_buffers(page, bdev, > >> - index << sizebits, size); > >> + (sector_t)index << sizebits, > >> + size); > >> goto done; > >> } > >> if (!try_to_free_buffers(page)) > >> @@ -1043,7 +1044,8 @@ grow_dev_page(struct block_device *bdev, sector_t > >> block, > >> */ > >> spin_lock(&inode->i_mapping->private_lock); > >> link_dev_buffers(page, bh); > >> - end_block = init_page_buffers(page, bdev, index << sizebits, size); > >> + end_block = init_page_buffers(page, bdev, (sector_t)index << sizebits, > >> + size); > >> spin_unlock(&inode->i_mapping->private_lock); > >> done: > >> ret = (block < end_block) ? 1 : -ENXIO; > > -- > Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @) > University of Cambridge Information Services, Roger Needham Building > 7 JJ Thomson Avenue, Cambridge, CB3 0RB, UK > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/