Re: PCI arbiter crash on last qemu image
El 13/9/20 a les 15:54, Samuel Thibault ha escrit: > Hello, > > Joan Lledó, le dim. 13 sept. 2020 08:38:48 +0200, a ecrit: >> El 10/9/20 a les 0:29, Samuel Thibault ha escrit: >>> Now fixed in libpciaccess 0.16-1+hurd.6 and upstream. >> >> Then should I merge jlledom-pciaccess-map into master? >> >> http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=jlledom-pciaccess-map > > This looks so indeed. Done, I've also removed my branch
Re: PCI arbiter crash on last qemu image
Hello, Joan Lledó, le dim. 13 sept. 2020 08:38:48 +0200, a ecrit: > El 10/9/20 a les 0:29, Samuel Thibault ha escrit: > > Now fixed in libpciaccess 0.16-1+hurd.6 and upstream. > > Then should I merge jlledom-pciaccess-map into master? > > http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=jlledom-pciaccess-map This looks so indeed. Thanks, Samuel
Re: PCI arbiter crash on last qemu image
El 10/9/20 a les 0:29, Samuel Thibault ha escrit: > Now fixed in libpciaccess 0.16-1+hurd.6 and upstream. > Then should I merge jlledom-pciaccess-map into master? http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=jlledom-pciaccess-map
Re: PCI arbiter crash on last qemu image
Samuel Thibault, le mer. 09 sept. 2020 23:45:48 +0200, a ecrit: > Damien Zammit, le lun. 07 sept. 2020 11:16:13 +1000, a ecrit: > > On 6/9/20 11:17 pm, Samuel Thibault wrote: > > > One issue remains, however: Xorg's vesa driver produces > > > > > > [1669282.478] (II) VESA(0): initializing int10 > > > [1669282.478] (EE) VESA(0): Cannot read int vect > > > > > > which comes from hw/xfree86/int10/generic.c xf86ExtendedInitInt10: > > > > > > if (!sysMem) > > > pci_device_map_legacy(pInt->dev, V_BIOS, BIOS_SIZE + SYS_BIOS - > > > V_BIOS, > > > PCI_DEV_MAP_FLAG_WRITABLE, ); > > > > It appears that it's trying to map more than BIOS_SIZE, which probably > > fails under the current scheme. > > Perhaps we need to make overreads return zeroes instead of failure? > > No, that part of the code works fine. > > Looking closer, I noticed that in > > if (!readIntVec(pInt->dev, base, LOW_PAGE_SIZE)) { > > LOW_PAGE_SIZE is 0x600, that's what is passed as len in > > if (pci_device_map_legacy(dev, 0, len, 0, )) > > So I guess that the problem is that gnu's version of map_dev_mem is > missing rounding up the len like mmap does. That was not the only error, also a bogus anywhere parameter in vm_map call, and bogus size parameter in device_map call. Now fixed in libpciaccess 0.16-1+hurd.6 and upstream. Samuel
Re: PCI arbiter crash on last qemu image
Damien Zammit, le lun. 07 sept. 2020 11:16:13 +1000, a ecrit: > On 6/9/20 11:17 pm, Samuel Thibault wrote: > > One issue remains, however: Xorg's vesa driver produces > > > > [1669282.478] (II) VESA(0): initializing int10 > > [1669282.478] (EE) VESA(0): Cannot read int vect > > > > which comes from hw/xfree86/int10/generic.c xf86ExtendedInitInt10: > > > > if (!sysMem) > > pci_device_map_legacy(pInt->dev, V_BIOS, BIOS_SIZE + SYS_BIOS - > > V_BIOS, > > PCI_DEV_MAP_FLAG_WRITABLE, ); > > It appears that it's trying to map more than BIOS_SIZE, which probably fails > under the current scheme. > Perhaps we need to make overreads return zeroes instead of failure? No, that part of the code works fine. Looking closer, I noticed that in if (!readIntVec(pInt->dev, base, LOW_PAGE_SIZE)) { LOW_PAGE_SIZE is 0x600, that's what is passed as len in if (pci_device_map_legacy(dev, 0, len, 0, )) So I guess that the problem is that gnu's version of map_dev_mem is missing rounding up the len like mmap does. Samuel
Re: PCI arbiter crash on last qemu image
On 6/9/20 11:17 pm, Samuel Thibault wrote: >> I have uploaded libpciaccess_0.16-1+hurd.5 with the latest upstream >> version. Thanks! > One issue remains, however: Xorg's vesa driver produces > > [1669282.478] (II) VESA(0): initializing int10 > [1669282.478] (EE) VESA(0): Cannot read int vect > > which comes from hw/xfree86/int10/generic.c xf86ExtendedInitInt10: > > if (!sysMem) > pci_device_map_legacy(pInt->dev, V_BIOS, BIOS_SIZE + SYS_BIOS - > V_BIOS, > PCI_DEV_MAP_FLAG_WRITABLE, ); It appears that it's trying to map more than BIOS_SIZE, which probably fails under the current scheme. Perhaps we need to make overreads return zeroes instead of failure? Purely guessing here... Damien
Re: PCI arbiter crash on last qemu image
Samuel Thibault, le dim. 06 sept. 2020 15:17:51 +0200, a ecrit: > Samuel Thibault, le dim. 06 sept. 2020 15:14:27 +0200, a ecrit: > > Thanks for working on this! > > > > I have uploaded libpciaccess_0.16-1+hurd.5 with the latest upstream > > version. > > One issue remains, however: Xorg's vesa driver produces > > [1669282.478] (II) VESA(0): initializing int10 > [1669282.478] (EE) VESA(0): Cannot read int vect > > which comes from hw/xfree86/int10/generic.c xf86ExtendedInitInt10: > > if (!sysMem) > pci_device_map_legacy(pInt->dev, V_BIOS, BIOS_SIZE + SYS_BIOS - > V_BIOS, > PCI_DEV_MAP_FLAG_WRITABLE, ); > INTPriv(pInt)->sysMem = sysMem; > > if (!readIntVec(pInt->dev, base, LOW_PAGE_SIZE)) { > xf86DrvMsg(pScrn->scrnIndex, X_ERROR, "Cannot read int vect\n"); > goto error1; > } > > where > > #define V_BIOS0xC > #define BIOS_SIZE 0x1 > #define SYS_BIOS 0xF > > static Bool > readIntVec(struct pci_device *dev, unsigned char *buf, int len) > { > void *map; > > if (pci_device_map_legacy(dev, 0, len, 0, )) > return FALSE; > > memcpy(buf, map, len); > pci_device_unmap_legacy(dev, map, len); > > return TRUE; > } > > Linux uses its own implementation, so it could very well be a bug in the > code above, please check according to what you have understood of the > libpciaccess mapping practices. (or possibly it's simply a pci-arbiter bug?) Samuel
Re: PCI arbiter crash on last qemu image
Samuel Thibault, le dim. 06 sept. 2020 15:14:27 +0200, a ecrit: > Thanks for working on this! > > I have uploaded libpciaccess_0.16-1+hurd.5 with the latest upstream > version. One issue remains, however: Xorg's vesa driver produces [1669282.478] (II) VESA(0): initializing int10 [1669282.478] (EE) VESA(0): Cannot read int vect which comes from hw/xfree86/int10/generic.c xf86ExtendedInitInt10: if (!sysMem) pci_device_map_legacy(pInt->dev, V_BIOS, BIOS_SIZE + SYS_BIOS - V_BIOS, PCI_DEV_MAP_FLAG_WRITABLE, ); INTPriv(pInt)->sysMem = sysMem; if (!readIntVec(pInt->dev, base, LOW_PAGE_SIZE)) { xf86DrvMsg(pScrn->scrnIndex, X_ERROR, "Cannot read int vect\n"); goto error1; } where #define V_BIOS0xC #define BIOS_SIZE 0x1 #define SYS_BIOS 0xF static Bool readIntVec(struct pci_device *dev, unsigned char *buf, int len) { void *map; if (pci_device_map_legacy(dev, 0, len, 0, )) return FALSE; memcpy(buf, map, len); pci_device_unmap_legacy(dev, map, len); return TRUE; } Linux uses its own implementation, so it could very well be a bug in the code above, please check according to what you have understood of the libpciaccess mapping practices. Samuel
Re: PCI arbiter crash on last qemu image
Hello, Thanks for working on this! I have uploaded libpciaccess_0.16-1+hurd.5 with the latest upstream version. Samuel
Re: PCI arbiter crash on last qemu image
Hi, El 26/8/20 a les 11:13, Damien Zammit ha escrit: > If you think everything is okay with this, I will squash the last patch and > submit patches upstream. Yes it's OK for me
Re: PCI arbiter crash on last qemu image
Hi, On 23/8/20 8:47 pm, Joan Lledó wrote: > http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=jlledom-pciaccess-map Thanks for doing this, I tried it locally and fixed two bugs in my libpciaccess patches: diff --git a/src/x86_pci.c b/src/x86_pci.c index 1614729..1e70f35 100644 --- a/src/x86_pci.c +++ b/src/x86_pci.c @@ -275,6 +275,7 @@ map_dev_mem(void **dest, size_t mem_offset, size_t mem_size, int write) return errno; } +close(memfd); return 0; #endif } @@ -505,7 +506,7 @@ pci_nfuncs(struct pci_device *dev, uint8_t *nfuncs) static error_t pci_device_x86_read_rom(struct pci_device *dev, void *buffer) { -void *bios; +void *bios = NULL; struct pci_device_private *d = (struct pci_device_private *)dev; int err; > Also in map_dev_mem(), it seems to be some problem when mapping the rom. > I tried to read the rom with hexdump: > > hexdump -Cn 16 /servers/bus/pci//00/03/0/rom This command now works (every time). > In pci_device_x86_read_rom() the memory is mapped and unmapped for each > read. I wonder if it's correct to unmap with munmap() something mapped > with vm_map() I think it's okay to munmap memory mapped with vm_map but I forgot to initialise the rom memory pointer to NULL. See http://git.zammit.org/libpciaccess.git/log/?h=rumpdisk-upstream again for updated branch. If you think everything is okay with this, I will squash the last patch and submit patches upstream. Thanks, Damien
Re: PCI arbiter crash on last qemu image
Hi, I made my changes on the arbiter and works fine, you can check my code at http://git.savannah.gnu.org/cgit/hurd/hurd.git/log/?h=jlledom-pciaccess-map On the other hand, I found a couple of issues in your patch In map_dev_mem(): +memfd = open("/dev/mem", flags | O_CLOEXEC); +if (memfd == -1) + return errno; + +*dest = mmap(NULL, mem_size, prot, MAP_SHARED, memfd, mem_offset); +if (*dest == MAP_FAILED) { + close(memfd); + *dest = NULL; + return errno; +} + +return 0; here close() is only called when the map fails, it should be called also before returning 0, when the map success. Also in map_dev_mem(), it seems to be some problem when mapping the rom. I tried to read the rom with hexdump: hexdump -Cn 16 /servers/bus/pci//00/03/0/rom When running this command, it sometimes returns all zeroes and other times returns the correct values, I checked it with the debugger and found that is the call to vm_map who not always sets *dest correctly. You can checkout my branch and try yourself. In pci_device_x86_read_rom() the memory is mapped and unmapped for each read. I wonder if it's correct to unmap with munmap() something mapped with vm_map() El 22/8/20 a les 15:10, Damien Zammit ha escrit: > Hi Joan, > > I found another probe() call in hurd_pci.c that should not be there. > (So I dropped a second incorrect patch). > Can you please confirm this final branch looks correct? > > http://git.zammit.org/libpciaccess.git/log/?h=rumpdisk-upstream > > Thanks, > Damien >
Re: PCI arbiter crash on last qemu image
Hi, El 22/8/20 a les 15:10, Damien Zammit ha escrit: > Hi Joan, > > I found another probe() call in hurd_pci.c that should not be there. > (So I dropped a second incorrect patch). > Can you please confirm this final branch looks correct? > > http://git.zammit.org/libpciaccess.git/log/?h=rumpdisk-upstream > Yes, it looks correct for me. Since neither hurd nor x86 backends probe on create() or map memory on probe() I'd say that's OK. Please give me some days to write the arbiter part and test it over your version of libpciaccess to ensure it works before sending patches to upstream.
Re: PCI arbiter crash on last qemu image
Hi Joan, I found another probe() call in hurd_pci.c that should not be there. (So I dropped a second incorrect patch). Can you please confirm this final branch looks correct? http://git.zammit.org/libpciaccess.git/log/?h=rumpdisk-upstream Thanks, Damien
Re: PCI arbiter crash on last qemu image
On 22/8/20 8:38 pm, Joan Lledó wrote: > However, I think the problem here is the x86 backend, not the common > interface. If we take a look at all other backends we'll see that: > > 1.- Neither of them call its probe() from its create(). So it's the > client who must call pci_device_probe(), it's our arbiter who should do > it and it doesn't. OK, so we need to add this patch upstream: --- src/x86_pci.c | 4 1 file changed, 4 deletions(-) diff --git a/src/x86_pci.c b/src/x86_pci.c index ac598b1..1614729 100644 --- a/src/x86_pci.c +++ b/src/x86_pci.c @@ -814,10 +814,6 @@ pci_system_x86_scan_bus (uint8_t bus) d->base.device_class = reg >> 8; -err = pci_device_x86_probe (>base); -if (err) -return err; - pci_sys->devices = devices; pci_sys->num_devices++; -- 2.25.1 > 2.- Neither of them map memory from its probe(). So again it's the > client who must call either pci_device_map_range() or > pci_device_map_region(), and again it's our arbiter who is not doing it. And so my original patch for this part was correct: --- src/x86_pci.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/src/x86_pci.c b/src/x86_pci.c index 0d9f2e7..ac598b1 100644 --- a/src/x86_pci.c +++ b/src/x86_pci.c @@ -630,9 +630,6 @@ pci_device_x86_region_probe (struct pci_device *dev, int reg_num) if (err) return err; } - -/* Clear the map pointer */ -dev->regions[reg_num].memory = 0; } else if (dev->regions[reg_num].size > 0) { @@ -649,15 +646,11 @@ pci_device_x86_region_probe (struct pci_device *dev, int reg_num) if (err) return err; } - -/* Map the region in our space */ - if ( (err = map_dev_mem(>regions[reg_num].memory, -dev->regions[reg_num].base_addr, -dev->regions[reg_num].size, -1)) ) -return err; } +/* Clear the map pointer */ +dev->regions[reg_num].memory = 0; + return 0; } -- 2.25.1 > This is due to our transition from having a libpciaccess copy of x86 > backend embedded in the arbiter to use libpciaccess. When it was > embedded, I modified it to probe and map everything during the bus scan, > but the original code in libpciaccess[1] didn't do it. So we are not > using libpciaccess properly and we modified libpciaccess to fit us > instead of the other way around. I see now. Sorry, I think I added the probe() in create()! > Would you like to fix the backend and I fix the arbiter? Can you confirm I have it correct this time? If so I will alter my merge request upstream. Thanks, Damien
Re: PCI arbiter crash on last qemu image
Hi, > I have removed my latest patch from my upstream merge request and replaced it > with a patch that fixes the problem: I took a look at your patch. > mappings[devp->num_mappings].flags = map_flags; > mappings[devp->num_mappings].memory = NULL; > > -if (dev->regions[region].memory == NULL) { > -err = (*pci_sys->methods->map_range)(dev, > - [devp->num_mappings]); > -} > +err = (*pci_sys->methods->map_range)(dev, [devp->num_mappings]); > > if (err == 0) { > *addr = mappings[devp->num_mappings].memory; I've spent some time studying memory management in libpciaccess and haven't found a reason to make range mappings and region mappings mutually exclusive. This completely disables range mappings in the x86 backend, since a region mapping for each region is done during the bus scan. However, I think the problem here is the x86 backend, not the common interface. If we take a look at all other backends we'll see that: 1.- Neither of them call its probe() from its create(). So it's the client who must call pci_device_probe(), it's our arbiter who should do it and it doesn't. 2.- Neither of them map memory from its probe(). So again it's the client who must call either pci_device_map_range() or pci_device_map_region(), and again it's our arbiter who is not doing it. This is due to our transition from having a libpciaccess copy of x86 backend embedded in the arbiter to use libpciaccess. When it was embedded, I modified it to probe and map everything during the bus scan, but the original code in libpciaccess[1] didn't do it. So we are not using libpciaccess properly and we modified libpciaccess to fit us instead of the other way around. It's the commit in [2] that introduced the problem. And that affected to all clients who use the x86 backend and mapped memory, though it seems we are the only ones or at least the first ones to worry. I don't think modifying the common interface of libpciaccess is the solution, b/c that would affect all clients, not only those using the x86 backend, I don't see why range and region mappings are mutually exclusive but they are and clients assume that. So what we should do instead is modify the x86 backend to behave as others and adapt the arbiter to use libpciaccess right. Would you like to fix the backend and I fix the arbiter? - [1] https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/blob/a167bd6474522a709ff3cbb00476c0e4309cb66f/src/x86_pci.c [2] https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/commit/95fbfeeacfd054de1037d6a10dee03b2b2cbc290 El 22/8/20 a les 8:35, Damien Zammit ha escrit: > Joan, > > On 18/8/20 6:51 am, Joan Lledó wrote: >> El 17/8/20 a les 1:51, Damien Zammit ha escrit: >>> Perhaps a better way to fix the mapping problem I encountered >>> is by removing the check for previous mappings when trying to map regions, > > I have removed my latest patch from my upstream merge request and replaced it > with a patch that fixes the problem: > > https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/merge_requests/12/commits > > Samuel, would you be able to fix the debian package of libpciaccess based on > the above? > (ie, remove one patch and add one patch). > > I now get the following with my version of pciaccess (and rumpdisk still > works with x86 method): > > root@zamhurd:~# hexdump -C -n16 /servers/bus/pci//00/02/0/region0 > 45 02 14 18 00 00 00 00 83 02 00 00 00 00 00 00 |E...| > 0010 > root@zamhurd:~# > > Damien >
Re: PCI arbiter crash on last qemu image
Joan, On 18/8/20 6:51 am, Joan Lledó wrote: > El 17/8/20 a les 1:51, Damien Zammit ha escrit: >> Perhaps a better way to fix the mapping problem I encountered >> is by removing the check for previous mappings when trying to map regions, I have removed my latest patch from my upstream merge request and replaced it with a patch that fixes the problem: https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/merge_requests/12/commits Samuel, would you be able to fix the debian package of libpciaccess based on the above? (ie, remove one patch and add one patch). I now get the following with my version of pciaccess (and rumpdisk still works with x86 method): root@zamhurd:~# hexdump -C -n16 /servers/bus/pci//00/02/0/region0 45 02 14 18 00 00 00 00 83 02 00 00 00 00 00 00 |E...| 0010 root@zamhurd:~# Damien
Re: PCI arbiter crash on last qemu image
On 18/8/20 6:51 am, Joan Lledó wrote: > El 17/8/20 a les 1:51, Damien Zammit ha escrit: >> Perhaps a better way to fix the mapping problem I encountered >> is by removing the check for previous mappings when trying to map regions, > > I could check the pointer before reading from it at func_files.c, and if > it happens to be null, then call the libpciaccess mapping function from > there, i.e. mapping the memory in the first access instead of doing it > during the startup. So there's no need to make any changes in your > patch, do you think that'd work? > That would probably work, but I don't want to break other usages of pciaccess by including my change upstream, doing it your way would mean every other use case for pciaccess would need to do the same thing as you are suggesting. Perhaps we can just remove the check instead after all in pciaccess, as that would be compatible with existing code. Damien
Re: PCI arbiter crash on last qemu image
El 17/8/20 a les 1:51, Damien Zammit ha escrit: > It's probably due to this patch: It's surely for that > Perhaps a better way to fix the mapping problem I encountered > is by removing the check for previous mappings when trying to map regions, I could check the pointer before reading from it at func_files.c, and if it happens to be null, then call the libpciaccess mapping function from there, i.e. mapping the memory in the first access instead of doing it during the startup. So there's no need to make any changes in your patch, do you think that'd work?
Re: PCI arbiter crash on last qemu image
Hi there, On 17/8/20 1:04 am, Joan Lledó wrote: > I found the same issue, investigating a bit more I found that in > func_files.c:201[1], the value of region->memory is 0x0, so reading from > there raises a segfault. That pointer should be filled in libpciacces, > at x86_pci.c:601[2] during the startup, but for some reason it seems it > doesn't. Regrettably I don't have the time to go further right know. > > Could it be some issue with /dev/mem? It's probably due to this patch: https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/merge_requests/12/diffs?commit_id=952f4553f91da3f3eb8615a8d53a6da7f7bc0080 (We are not using master of pciaccess). I removed the mapping and set it to NULL because there was a bug that caused the initial mapping within "probe" to prevent further mappings of the same region later on. Perhaps a better way to fix the mapping problem I encountered is by removing the check for previous mappings when trying to map regions, but no one commented on my PR upstream (yet). Damien
Re: PCI arbiter crash on last qemu image
El 16/8/20 a les 4:46, Damien Zammit ha escrit: > Hi there, > > On 15/8/20 9:49 pm, Joan Lledó wrote >> I downloaded and tried the last qemu image "debian-hurd-20200731.img". >> When I try to read the memory mapped content of region files in the >> arbiter, it crashes and shows the message "Real-time signal 0". > > I am also getting this on my latest hurd system that I have been working on. > > I ran gdb on pci-arbiter pid, put breakpoints on S_pci_conf_read and > S_pci_dev_get_regions > but seemed to have no effect, and when I continue and then run the hexdump, > I get no useful backtrace, could it be a recursion problem with stack > overflow? > > Thread 1 received signal ?, Unknown signal. > memcpy () at ../sysdeps/i386/i686/memcpy.S:71 > 71 ../sysdeps/i386/i686/memcpy.S: No such file or directory. > (gdb) bt > #0 memcpy () at ../sysdeps/i386/i686/memcpy.S:71 > #1 0x08059588 in ?? () > Backtrace stopped: previous frame inner to this frame (corrupt stack?) > > Any ideas? > > Damien > I found the same issue, investigating a bit more I found that in func_files.c:201[1], the value of region->memory is 0x0, so reading from there raises a segfault. That pointer should be filled in libpciacces, at x86_pci.c:601[2] during the startup, but for some reason it seems it doesn't. Regrettably I don't have the time to go further right know. Could it be some issue with /dev/mem? --- [1] http://git.savannah.gnu.org/cgit/hurd/hurd.git/tree/pci-arbiter/func_files.c#n201 [2] https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/blob/master/src/x86_pci.c#L601
Re: PCI arbiter crash on last qemu image
Hi there, On 15/8/20 9:49 pm, Joan Lledó wrote > I downloaded and tried the last qemu image "debian-hurd-20200731.img". > When I try to read the memory mapped content of region files in the > arbiter, it crashes and shows the message "Real-time signal 0". I am also getting this on my latest hurd system that I have been working on. I ran gdb on pci-arbiter pid, put breakpoints on S_pci_conf_read and S_pci_dev_get_regions but seemed to have no effect, and when I continue and then run the hexdump, I get no useful backtrace, could it be a recursion problem with stack overflow? Thread 1 received signal ?, Unknown signal. memcpy () at ../sysdeps/i386/i686/memcpy.S:71 71 ../sysdeps/i386/i686/memcpy.S: No such file or directory. (gdb) bt #0 memcpy () at ../sysdeps/i386/i686/memcpy.S:71 #1 0x08059588 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Any ideas? Damien
PCI arbiter crash on last qemu image
Hello, I downloaded and tried the last qemu image "debian-hurd-20200731.img". When I try to read the memory mapped content of region files in the arbiter, it crashes and shows the message "Real-time signal 0". This happens when executing "hexdump -Cn 256 /servers/bus/pci//00/02/0/region0" Region files for IO-ports seems to work fine, it only fails with /dev/mem mapped regions. Any ideas on why?