Hello Dell folks, I'm Guilherme Piccoli from Canonical - first of all, apologies for the out-of-nowhere communication. We've been investigating an issue that seems to date long time ago, and eventually we could narrow it to what appears to be a Dell BIOS bug. Notice I'm also looping a kernel x86 ML and grub-devel, just for the purpose of archiving such discussion in public lists, to help others that may find such an issue in the future.
Since I don't have contacts of Dell representatives, I've just raised a list of people from Dell contributing to kernel in the last 2 years - maybe one of you could point me towards the path of a proper contact/channel to discuss such an issue. If not, I'm sorry for the noise. Let me detail the problem we're observing - notice all of this is about legacy BIOS mode, not UEFI. After creating a HW RAID on a Dell PowerEdge R730 (RAID5, total of 8T), GRUB fails to load its modules, dropping to "rescue mode". After a lot of investigation, we narrowed the issue to a bad return from BIOS to service 48h, int 13h [0] - this is the way GRUB collects disk size information. To double-check that, I've booted Linux in 16-bit realmode and with that, I could observe that EDD module [1] gets the same wrong value as total sectors - both GRUB and kernel EDD returns 0xFFFFFFFF. The correct value would be 0x3A3600000 according to SCSI Read Capacity 16 command (tested through the sg_readcap tool). In the P.S. session below there are details of the outputs collected by GRUB instrumentation, kernel EDD and sg_readcap tool. There are some workarounds to that, like having a smaller partition _before_ the rootfs in the disk topology, to hold grub modules and linux/initrd images - in that case it seems the BIOS responds the int 13h/48h service with proper values, but this issue dates from a while ago [3][4], so I'm hereby seeking a proper discussion with Dell firmware engineers to understand if that could be fixed or at least to understand the root cause of such limitation. Thanks in advance, Guilherme [0] https://en.wikipedia.org/wiki/INT_13H#INT_13h_AH=48h:_Extended_Read_Drive_Parameters [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/edd.c [2] https://askubuntu.com/q/867047 [3] https://askubuntu.com/q/416418 P.S. GRUB debug output [dump of struct grub_biosdisk_drp in grub_biosdisk_get_diskinfo_real() function]: size=1e, flags=9 cyl=0, heads=0, sec=0 bytesp_s=200, total=ffffffff, kernel EDD output: [ 0.741378] edd[0]->total_secs=ffffffff sg_readcap output: $ sg_readcap /dev/sdb READ CAPACITY (10) indicates device capacity too large now trying 16 byte cdb variant Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last logical block address=15625879551 (0x3a35fffff), Number of logical blocks=15625879552 [...] _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel