Hello, I want to bring to your attention a problem with the XFS module of grub2 since I was told that you might not always pay attention to the bugtracker. I've reported the bug at savannah.gnu.org as well as on the Debian BTS, so I'm pasting the interesting snippets from the latter one for your convenience:
From: Alex Malinovich <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Re: Bug#436943: another confirmation Date: Tue, 12 Feb 2008 05:06:15 -0800 [Message part 1 (text/plain, inline)] On Tue, 2008-02-12 at 13:19 +0100, Robert Millan wrote: --snip-- > On Tue, Feb 12, 2008 at 02:14:55AM -0800, Alex Malinovich wrote: > > > > Oddly, trying to do "ls (hd0,1)/" works just fine. Yet when running > "ls > > (hd0,1)/boot" gives that same "out of partition error", even > > though /boot is NOT on a separate partition. > > Alex, what is your filesystem in (hd0,1)/ ? > > Can you identify the problem if you do: > > set debug=all > ls (hd0,1)/boot The fs on hd0,1 is xfs. I just did an fsck on it and the fs itself is fine. Since it's hard to paste a good amount of code that happens at boot, here's what I get when running grub-emu from the console with the above commands. I'll do a reboot later and verify that I get the same errors. If I see anything different on a regular boot I'll send in a follow-up email. One thing that might be potentially useful, when just doing the ls without the debug=all, I actually get a little bit of output prior to the out of partition error. In this particular case the output is: grub> ls (hd0,1)/boot ?^^ ?^^ ^Q^_ error: out of partition grub> Not sure what those extra characters are about, but they are consistent across multiple runs of grub-emu. So, running ls after setting debug=all I get: grub> ls (hd0,1)/boot --snip-- /home/rmh/hacking/grub/svn/upload/grub2-1.96+20080210/kern/disk.c:364: Reading `hd0,1'... /home/rmh/hacking/grub/svn/upload/grub2-1.96+20080210/kern/disk.c:371: Read out of range: sector 0xffffffffef400000 (out of partition). /home/rmh/hacking/grub/svn/upload/grub2-1.96+20080210/kern/disk.c:364: of range: sector 0xffffffffef400000 (out of partition). --snip-- repeating a few hundred times. There's some other scattered output but it's very hard to make out. I do see some lines about detecting and opening the xfs filesystem early in the log. From: Niels Boehm <[EMAIL PROTECTED]> To: Debian Bug Tracking System <[EMAIL PROTECTED]> Subject: grub-pc: xfs.mod reads some directories incorrectly Date: Sat, 28 Jun 2008 08:15:33 +0200 Package: grub-pc Version: 1.96+20080512-1 Followup-For: Bug #436943 Hi, grub2 fails for me in the aforementioned manner. It is unable to read anything from /boot or /boot/grub which are both on my root partition with an xfs file system. Trying to ls some directories, I find that some read without problem (apparently ones containing only a few entries, like / /mnt /media /lib64 for example) and others produce garbled output and the "out of partition" error (ones with many entries, like /boot /boot/grub /etc /bin for example). I checked the root fs with /usr/sbin/xfs_check, but it looks alright. And if I remember correctly, I created the root fs not long ago, so it should have quite recent data structures. The log being version 2 confirms that: # xfs_info / meta-data=/dev/root isize=256 agcount=4, agsize=64510 blks = sectsz=512 attr=2 data = bsize=4096 blocks=258040, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 log =internal bsize=4096 blocks=1200, version=2 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 From: Niels Boehm <[EMAIL PROTECTED]> To: Debian Bug Tracking System <[EMAIL PROTECTED]> Subject: grub-pc: missing mapping from fs-block-no. to disk-block-no. in xfs.c Date: Sat, 28 Jun 2008 14:50:36 +0200 Package: grub-pc Followup-For: Bug #436943 Okay, I hunted the problem down myself. It's a missing mapping from the file system block numbering scheme ((agno << agbits) | block_in_ag) to the on-partition block numbering (agno * agsize + block_in_ag) in the grub_xfs_read_block() function. It would affect all users who have a partition with more than one allocation group with an agsize which is not a power of 2. The problem arises when grub encounters files with blocks not on ag#0 and directories which are extent lists not stored on ag#0. I changed the offending file like that: ---- CUT HERE ---- --- grub2-1.96+20080512/fs/xfs.c 2008-02-02 15:15:31.000000000 +0100 +++ xfs.c_Niels 2008-06-28 12:40:39.487565975 +0200 @@ -162,4 +162,8 @@ (grub_be_to_cpu64 (ino) >> GRUB_XFS_INO_AGBITS (data)) +#define GRUB_XFS_FSB_TO_BLOCK(data, fsb) \ + (((fsb) >> (data)->sblock.log2_agblk) * (data)->agsize \ + + ((fsb) & ((1 << (data)->sblock.log2_agblk) - 1))) + #define GRUB_XFS_EXTENT_OFFSET(exts,ex) \ ((grub_be_to_cpu32 (exts[ex][0]) & ~(1 << 31)) << 23 \ @@ -309,5 +313,5 @@ grub_free (leaf); - return ret; + return GRUB_XFS_FSB_TO_BLOCK(node->data, ret); } ---- CUT HERE ---- The patch works fine for me, but I can't tell if I missed any intricacies, since I'm not into grub development. From: [EMAIL PROTECTED] To: Robert Millan <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Subject: Re: Bug#436943: grub-pc: xfs.mod reads some directories incorrectly Date: Sun, 29 Jun 2008 18:36:08 +0200 On Sunday 29 June 2008, Robert Millan wrote: > The version you're using (1.96+20080512-1) is a bit old. Could you check > if this problem is reproducible with the sid one? Or, since I notice you > sent this upstream (thanks!), with latest CVS. It the same with 1.96+20080626-1. And I also had a look at the source of the CVS version and it looks like the mapping is still missing there (tho it's a bit strange that they didn't notice it when they added uuid detection to xfs.c - maybe they happen to have agsizes that are a power of 2), but I'm not sure. I'd prefer to stick to the normal packages, since I don't really feel at home with package maintenance stuff. Regards, Niels Böhm _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org http://lists.gnu.org/mailman/listinfo/grub-devel