Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 19:31 +, Christoph Hellwig wrote: > On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: > > PS. drm_memory.h has a "drm_follow_page": this forces us to uninline > > various page tables ops. Can this use follow_page() somehow, or do we > > need an "__follow_page" export for this case? > > Not if avoidable. And it seems avoidable as drm really should be using > vmalloc_to_page. Untested patch below: Thanks Christoph, that patch looks great to me! I didn't know about vmalloc_to_page... Want to produce a signed-off version? Thanks, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Even better we can actualy avid most of the page table walks completely. First there is a number of places that can never have the vmalloc case an may use ioremap/iounmap directly. Secondly drm_core_ioremap/ drm_core_ioremapfree already have the right drm_map to check wich kind of mapping we have - we just need to use it instead of discarding that information! The only leaves direct drm_ioremapfree in i810_dma.c and i830_dma.c which I don't quite manage to follow. Maybe Dave has an idea how to get rid of those aswell. I've applied two patches to the DRM git http://gitweb.freedesktop.org/?p=mesa/drm.git;a=summary They need a fair bit of testing, I've tested them no i810, but I need to test them on a few other boards before I'd consider putting them in -mm.. Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie Linux kernel - DRI, VAX / pam_smb / ILUG - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > On Sat, Jan 06, 2007 at 07:31:52PM +, Christoph Hellwig wrote: > > On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: > > > PS. drm_memory.h has a "drm_follow_page": this forces us to uninline > > > various page tables ops. Can this use follow_page() somehow, or do we > > > need an "__follow_page" export for this case? > > > > Not if avoidable. And it seems avoidable as drm really should be using > > vmalloc_to_page. Untested patch below: > > Even better we can actualy avid most of the page table walks > completely. agreed. I think there's an important side-observation here as well: having inlined functions uninlined and exported puts them under a lot more scrutiny. Hence individual exports instead of the global paravirt_ops export is a big plus. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, Jan 06, 2007 at 07:31:52PM +, Christoph Hellwig wrote: > On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: > > PS. drm_memory.h has a "drm_follow_page": this forces us to uninline > > various page tables ops. Can this use follow_page() somehow, or do we > > need an "__follow_page" export for this case? > > Not if avoidable. And it seems avoidable as drm really should be using > vmalloc_to_page. Untested patch below: Even better we can actualy avid most of the page table walks completely. First there is a number of places that can never have the vmalloc case an may use ioremap/iounmap directly. Secondly drm_core_ioremap/ drm_core_ioremapfree already have the right drm_map to check wich kind of mapping we have - we just need to use it instead of discarding that information! The only leaves direct drm_ioremapfree in i810_dma.c and i830_dma.c which I don't quite manage to follow. Maybe Dave has an idea how to get rid of those aswell. Index: linux-2.6/drivers/char/drm/drm_bufs.c === --- linux-2.6.orig/drivers/char/drm/drm_bufs.c 2007-01-07 14:07:18.0 +0100 +++ linux-2.6/drivers/char/drm/drm_bufs.c 2007-01-07 14:09:40.0 +0100 @@ -178,7 +178,7 @@ } } if (map->type == _DRM_REGISTERS) - map->handle = drm_ioremap(map->offset, map->size, dev); + map->handle = ioremap(map->offset, map->size); break; case _DRM_SHM: @@ -238,7 +238,7 @@ list = drm_alloc(sizeof(*list), DRM_MEM_MAPS); if (!list) { if (map->type == _DRM_REGISTERS) - drm_ioremapfree(map->handle, map->size, dev); + iounmap(map->handle); drm_free(map, sizeof(*map), DRM_MEM_MAPS); return -EINVAL; } @@ -255,7 +255,7 @@ ret = drm_map_handle(dev, >hash, user_token, 0); if (ret) { if (map->type == _DRM_REGISTERS) - drm_ioremapfree(map->handle, map->size, dev); + iounmap(map->handle); drm_free(map, sizeof(*map), DRM_MEM_MAPS); drm_free(list, sizeof(*list), DRM_MEM_MAPS); mutex_unlock(>struct_mutex); @@ -362,7 +362,7 @@ switch (map->type) { case _DRM_REGISTERS: - drm_ioremapfree(map->handle, map->size, dev); + iounmap(map->handle); /* FALLTHROUGH */ case _DRM_FRAME_BUFFER: if (drm_core_has_MTRR(dev) && map->mtrr >= 0) { Index: linux-2.6/drivers/char/drm/drm_vm.c === --- linux-2.6.orig/drivers/char/drm/drm_vm.c2007-01-07 14:07:18.0 +0100 +++ linux-2.6/drivers/char/drm/drm_vm.c 2007-01-07 14:10:50.0 +0100 @@ -227,7 +227,12 @@ map->size); DRM_DEBUG("mtrr_del = %d\n", retcode); } - drm_ioremapfree(map->handle, map->size, dev); + /* +* XXX(hch) autmatic mapping/unmapping only happens for +* _DRM_REGISTERS in all other places. Should we have +* this check here aswell? +*/ + iounmap(map->handle); break; case _DRM_SHM: vfree(map->handle); Index: linux-2.6/drivers/char/drm/drmP.h === --- linux-2.6.orig/drivers/char/drm/drmP.h 2007-01-07 14:13:48.0 +0100 +++ linux-2.6/drivers/char/drm/drmP.h 2007-01-07 14:15:32.0 +0100 @@ -1059,27 +1059,8 @@ extern int drm_mm_init(drm_mm_t *mm, unsigned long start, unsigned long size); extern void drm_mm_takedown(drm_mm_t *mm); -/* Inline replacements for DRM_IOREMAP macros */ -static __inline__ void drm_core_ioremap(struct drm_map *map, - struct drm_device *dev) -{ - map->handle = drm_ioremap(map->offset, map->size, dev); -} - -#if 0 -static __inline__ void drm_core_ioremap_nocache(struct drm_map *map, - struct drm_device *dev) -{ - map->handle = drm_ioremap_nocache(map->offset, map->size, dev); -} -#endif /* 0 */ - -static __inline__ void drm_core_ioremapfree(struct drm_map *map, - struct drm_device *dev) -{ - if (map->handle && map->size) - drm_ioremapfree(map->handle, map->size, dev); -} +extern void drm_core_ioremap(struct drm_map *map, struct drm_device *dev); +extern void drm_core_ioremapfree(struct drm_map
Re: [patch] paravirt: isolate module ops
Rusty Russell wrote: On Sat, 2007-01-06 at 12:55 -0800, Zachary Amsden wrote: Rusty Russell wrote: +int paravirt_write_msr(unsigned int msr, u64 val); If binary modules using debug registers makes us nervous, the reprogramming MSRs is also similarly bad. Yes, but this is simply from experience. Several modules wrote msrs (you can take out the EXPORT_SYMBOL and find them quite quickly). Several in tree, GPL'd modules did so. I'm not aware of out of tree modules that do that, and if they do, they are misbehaving. Reprogramming MTRR memory regions under the kernel's nose is not a proper way to behave, and this is the most benign use I can think of for write access to MSRs. If this really breaks any code out there, then there should be a proper, controlled API to do this so the kernel can manage it. +void raw_safe_halt(void); +void halt(void); These shouldn't be done by modules, ever. Only the scheduler should decide to halt. Several modules implement alternate halt loops. I guess being in a module for specific CPUs makes sense... Yes, but halting is a behavior that can easily introduce critical, grind to a halt problems because of race conditions. I have no problems having modules implement alternative halt loops. I think there is a serious debuggability issue with binary modules invoking halt directly. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Rusty Russell wrote: On Sat, 2007-01-06 at 12:55 -0800, Zachary Amsden wrote: Rusty Russell wrote: +int paravirt_write_msr(unsigned int msr, u64 val); If binary modules using debug registers makes us nervous, the reprogramming MSRs is also similarly bad. Yes, but this is simply from experience. Several modules wrote msrs (you can take out the EXPORT_SYMBOL and find them quite quickly). Several in tree, GPL'd modules did so. I'm not aware of out of tree modules that do that, and if they do, they are misbehaving. Reprogramming MTRR memory regions under the kernel's nose is not a proper way to behave, and this is the most benign use I can think of for write access to MSRs. If this really breaks any code out there, then there should be a proper, controlled API to do this so the kernel can manage it. +void raw_safe_halt(void); +void halt(void); These shouldn't be done by modules, ever. Only the scheduler should decide to halt. Several modules implement alternate halt loops. I guess being in a module for specific CPUs makes sense... Yes, but halting is a behavior that can easily introduce critical, grind to a halt problems because of race conditions. I have no problems having modules implement alternative halt loops. I think there is a serious debuggability issue with binary modules invoking halt directly. Zach - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, Jan 06, 2007 at 07:31:52PM +, Christoph Hellwig wrote: On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: PS. drm_memory.h has a drm_follow_page: this forces us to uninline various page tables ops. Can this use follow_page() somehow, or do we need an __follow_page export for this case? Not if avoidable. And it seems avoidable as drm really should be using vmalloc_to_page. Untested patch below: Even better we can actualy avid most of the page table walks completely. First there is a number of places that can never have the vmalloc case an may use ioremap/iounmap directly. Secondly drm_core_ioremap/ drm_core_ioremapfree already have the right drm_map to check wich kind of mapping we have - we just need to use it instead of discarding that information! The only leaves direct drm_ioremapfree in i810_dma.c and i830_dma.c which I don't quite manage to follow. Maybe Dave has an idea how to get rid of those aswell. Index: linux-2.6/drivers/char/drm/drm_bufs.c === --- linux-2.6.orig/drivers/char/drm/drm_bufs.c 2007-01-07 14:07:18.0 +0100 +++ linux-2.6/drivers/char/drm/drm_bufs.c 2007-01-07 14:09:40.0 +0100 @@ -178,7 +178,7 @@ } } if (map-type == _DRM_REGISTERS) - map-handle = drm_ioremap(map-offset, map-size, dev); + map-handle = ioremap(map-offset, map-size); break; case _DRM_SHM: @@ -238,7 +238,7 @@ list = drm_alloc(sizeof(*list), DRM_MEM_MAPS); if (!list) { if (map-type == _DRM_REGISTERS) - drm_ioremapfree(map-handle, map-size, dev); + iounmap(map-handle); drm_free(map, sizeof(*map), DRM_MEM_MAPS); return -EINVAL; } @@ -255,7 +255,7 @@ ret = drm_map_handle(dev, list-hash, user_token, 0); if (ret) { if (map-type == _DRM_REGISTERS) - drm_ioremapfree(map-handle, map-size, dev); + iounmap(map-handle); drm_free(map, sizeof(*map), DRM_MEM_MAPS); drm_free(list, sizeof(*list), DRM_MEM_MAPS); mutex_unlock(dev-struct_mutex); @@ -362,7 +362,7 @@ switch (map-type) { case _DRM_REGISTERS: - drm_ioremapfree(map-handle, map-size, dev); + iounmap(map-handle); /* FALLTHROUGH */ case _DRM_FRAME_BUFFER: if (drm_core_has_MTRR(dev) map-mtrr = 0) { Index: linux-2.6/drivers/char/drm/drm_vm.c === --- linux-2.6.orig/drivers/char/drm/drm_vm.c2007-01-07 14:07:18.0 +0100 +++ linux-2.6/drivers/char/drm/drm_vm.c 2007-01-07 14:10:50.0 +0100 @@ -227,7 +227,12 @@ map-size); DRM_DEBUG(mtrr_del = %d\n, retcode); } - drm_ioremapfree(map-handle, map-size, dev); + /* +* XXX(hch) autmatic mapping/unmapping only happens for +* _DRM_REGISTERS in all other places. Should we have +* this check here aswell? +*/ + iounmap(map-handle); break; case _DRM_SHM: vfree(map-handle); Index: linux-2.6/drivers/char/drm/drmP.h === --- linux-2.6.orig/drivers/char/drm/drmP.h 2007-01-07 14:13:48.0 +0100 +++ linux-2.6/drivers/char/drm/drmP.h 2007-01-07 14:15:32.0 +0100 @@ -1059,27 +1059,8 @@ extern int drm_mm_init(drm_mm_t *mm, unsigned long start, unsigned long size); extern void drm_mm_takedown(drm_mm_t *mm); -/* Inline replacements for DRM_IOREMAP macros */ -static __inline__ void drm_core_ioremap(struct drm_map *map, - struct drm_device *dev) -{ - map-handle = drm_ioremap(map-offset, map-size, dev); -} - -#if 0 -static __inline__ void drm_core_ioremap_nocache(struct drm_map *map, - struct drm_device *dev) -{ - map-handle = drm_ioremap_nocache(map-offset, map-size, dev); -} -#endif /* 0 */ - -static __inline__ void drm_core_ioremapfree(struct drm_map *map, - struct drm_device *dev) -{ - if (map-handle map-size) - drm_ioremapfree(map-handle, map-size, dev); -} +extern void drm_core_ioremap(struct drm_map *map, struct drm_device *dev); +extern void drm_core_ioremapfree(struct drm_map *map, struct drm_device *dev); static
Re: [patch] paravirt: isolate module ops
* Christoph Hellwig [EMAIL PROTECTED] wrote: On Sat, Jan 06, 2007 at 07:31:52PM +, Christoph Hellwig wrote: On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: PS. drm_memory.h has a drm_follow_page: this forces us to uninline various page tables ops. Can this use follow_page() somehow, or do we need an __follow_page export for this case? Not if avoidable. And it seems avoidable as drm really should be using vmalloc_to_page. Untested patch below: Even better we can actualy avid most of the page table walks completely. agreed. I think there's an important side-observation here as well: having inlined functions uninlined and exported puts them under a lot more scrutiny. Hence individual exports instead of the global paravirt_ops export is a big plus. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Even better we can actualy avid most of the page table walks completely. First there is a number of places that can never have the vmalloc case an may use ioremap/iounmap directly. Secondly drm_core_ioremap/ drm_core_ioremapfree already have the right drm_map to check wich kind of mapping we have - we just need to use it instead of discarding that information! The only leaves direct drm_ioremapfree in i810_dma.c and i830_dma.c which I don't quite manage to follow. Maybe Dave has an idea how to get rid of those aswell. I've applied two patches to the DRM git http://gitweb.freedesktop.org/?p=mesa/drm.git;a=summary They need a fair bit of testing, I've tested them no i810, but I need to test them on a few other boards before I'd consider putting them in -mm.. Dave. -- David Airlie, Software Engineer http://www.skynet.ie/~airlied / airlied at skynet.ie Linux kernel - DRI, VAX / pam_smb / ILUG - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 19:31 +, Christoph Hellwig wrote: On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: PS. drm_memory.h has a drm_follow_page: this forces us to uninline various page tables ops. Can this use follow_page() somehow, or do we need an __follow_page export for this case? Not if avoidable. And it seems avoidable as drm really should be using vmalloc_to_page. Untested patch below: Thanks Christoph, that patch looks great to me! I didn't know about vmalloc_to_page... Want to produce a signed-off version? Thanks, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 12:55 -0800, Zachary Amsden wrote: > Rusty Russell wrote: > > > > +int paravirt_write_msr(unsigned int msr, u64 val); > > If binary modules using debug registers makes us nervous, the > reprogramming MSRs is also similarly bad. Yes, but this is simply from experience. Several modules wrote msrs (you can take out the EXPORT_SYMBOL and find them quite quickly). > > +void raw_safe_halt(void); > > +void halt(void); > > These shouldn't be done by modules, ever. Only the scheduler should > decide to halt. Several modules implement alternate halt loops. I guess being in a module for specific CPUs makes sense... Cheers! Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Rusty Russell wrote: +int paravirt_write_msr(unsigned int msr, u64 val); If binary modules using debug registers makes us nervous, the reprogramming MSRs is also similarly bad. +void raw_safe_halt(void); +void halt(void); These shouldn't be done by modules, ever. Only the scheduler should decide to halt. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: > PS. drm_memory.h has a "drm_follow_page": this forces us to uninline > various page tables ops. Can this use follow_page() somehow, or do we > need an "__follow_page" export for this case? Not if avoidable. And it seems avoidable as drm really should be using vmalloc_to_page. Untested patch below: Index: linux-2.6/drivers/char/drm/drm_memory.c === --- linux-2.6.orig/drivers/char/drm/drm_memory.c2007-01-06 20:21:07.0 +0100 +++ linux-2.6/drivers/char/drm/drm_memory.c 2007-01-06 20:29:03.0 +0100 @@ -211,6 +211,23 @@ } #endif /* 0 */ +static int is_agp_mapping(void *pt, unsigned long size, drm_device_t *dev) +{ + unsigned long addr = (unsigned long)pt; + + if (addr >= VMALLOC_START && addr < VMALLOC_END) { + unsigned long phys; + drm_map_t *map; + + phys = page_to_phys(vmalloc_to_page(pt)) + offset_in_page(pt); + map = drm_lookup_map(phys, size, dev); + if (map && map->type == _DRM_AGP) + return 1; + } + + return 0; +} + void drm_ioremapfree(void *pt, unsigned long size, drm_device_t * dev) { @@ -219,21 +236,11 @@ * routines for handling mappings in the AGP space. Hopefully this can be done in * a future revision of the interface... */ - if (drm_core_has_AGP(dev) && dev->agp && dev->agp->cant_use_aperture - && ((unsigned long)pt >= VMALLOC_START - && (unsigned long)pt < VMALLOC_END)) { - unsigned long offset; - drm_map_t *map; - - offset = drm_follow_page(pt) | ((unsigned long)pt & ~PAGE_MASK); - map = drm_lookup_map(offset, size, dev); - if (map && map->type == _DRM_AGP) { - vunmap(pt); - return; - } - } - - iounmap(pt); + if (drm_core_has_AGP(dev) && dev->agp && dev->agp->cant_use_aperture && + is_agp_mapping(pt, size, dev)) + vunmap(pt); + else + iounmap(pt); } EXPORT_SYMBOL(drm_ioremapfree); Index: linux-2.6/drivers/char/drm/drm_memory.h === --- linux-2.6.orig/drivers/char/drm/drm_memory.h2007-01-06 20:21:07.0 +0100 +++ linux-2.6/drivers/char/drm/drm_memory.h 2007-01-06 20:26:50.0 +0100 @@ -55,23 +55,6 @@ # define PAGE_AGP PAGE_KERNEL # endif #endif - -static inline unsigned long drm_follow_page(void *vaddr) -{ - pgd_t *pgd = pgd_offset_k((unsigned long)vaddr); - pud_t *pud = pud_offset(pgd, (unsigned long)vaddr); - pmd_t *pmd = pmd_offset(pud, (unsigned long)vaddr); - pte_t *ptep = pte_offset_kernel(pmd, (unsigned long)vaddr); - return pte_pfn(*ptep) << PAGE_SHIFT; -} - -#else /* __OS_HAS_AGP */ - -static inline unsigned long drm_follow_page(void *vaddr) -{ - return 0; -} - #endif void *drm_ioremap(unsigned long offset, unsigned long size, - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
* Rusty Russell <[EMAIL PROTECTED]> wrote: > I moved drm_follow_page into the core, to avoid having to wrap the > various pte ops. Unlining kernel_fpu_end and using that in the RAID6 > code would remove the need to export clts/read_cr0/write_cr0 too. yeah, let's do that too. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 08:08 +0100, Ingo Molnar wrote: > btw., your patch does not apply to current -git - could you please > rebase this patch to the head of your queue so that upstream can pick it > up? OK, here it is against rc3-git4. Name: don't export paravirt_ops structure, do individual functions Wrap the paravirt_ops members we want to export in wrapper functions. Since we binary-patch the critical ones, this doesn't make a speed impact. I moved drm_follow_page into the core, to avoid having to wrap the various pte ops. Unlining kernel_fpu_end and using that in the RAID6 code would remove the need to export clts/read_cr0/write_cr0 too. Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/dontdiff --minimal linux-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c working-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c --- linux-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c 2007-01-07 03:41:32.0 +1100 +++ working-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c 2007-01-07 04:21:59.0 +1100 @@ -482,6 +482,123 @@ static int __init print_banner(void) } core_initcall(print_banner); +unsigned long paravirt_save_flags(void) +{ + return paravirt_ops.save_fl(); +} +EXPORT_SYMBOL(paravirt_save_flags); + +void paravirt_restore_flags(unsigned long flags) +{ + paravirt_ops.restore_fl(flags); +} +EXPORT_SYMBOL(paravirt_restore_flags); + +void paravirt_irq_disable(void) +{ + paravirt_ops.irq_disable(); +} +EXPORT_SYMBOL(paravirt_irq_disable); + +void paravirt_irq_enable(void) +{ + paravirt_ops.irq_enable(); +} +EXPORT_SYMBOL(paravirt_irq_enable); + +void paravirt_io_delay(void) +{ + paravirt_ops.io_delay(); +} +EXPORT_SYMBOL(paravirt_io_delay); + +void paravirt_const_udelay(unsigned long loops) +{ + paravirt_ops.const_udelay(loops); +} +EXPORT_SYMBOL(paravirt_const_udelay); + +u64 paravirt_read_msr(unsigned int msr, int *err) +{ + return paravirt_ops.read_msr(msr, err); +} +EXPORT_SYMBOL(paravirt_read_msr); + +int paravirt_write_msr(unsigned int msr, u64 val) +{ + return paravirt_ops.write_msr(msr, val); +} +EXPORT_SYMBOL(paravirt_write_msr); + +u64 paravirt_read_tsc(void) +{ + return paravirt_ops.read_tsc(); +} +EXPORT_SYMBOL(paravirt_read_tsc); + +int paravirt_enabled(void) +{ + return paravirt_ops.paravirt_enabled; +} +EXPORT_SYMBOL(paravirt_enabled); + +void clts(void) +{ + paravirt_ops.clts(); +} +EXPORT_SYMBOL(clts); + +unsigned long read_cr0(void) +{ + return paravirt_ops.read_cr0(); +} +EXPORT_SYMBOL_GPL(read_cr0); + +void write_cr0(unsigned long cr0) +{ + paravirt_ops.write_cr0(cr0); +} +EXPORT_SYMBOL_GPL(write_cr0); + +void wbinvd(void) +{ + paravirt_ops.wbinvd(); +} +EXPORT_SYMBOL(wbinvd); + +void raw_safe_halt(void) +{ + paravirt_ops.safe_halt(); +} +EXPORT_SYMBOL_GPL(raw_safe_halt); + +void halt(void) +{ + paravirt_ops.safe_halt(); +} +EXPORT_SYMBOL_GPL(halt); + +#ifdef CONFIG_X86_LOCAL_APIC +void apic_write(unsigned long reg, unsigned long v) +{ + paravirt_ops.apic_write(reg,v); +} +EXPORT_SYMBOL_GPL(apic_write); + +unsigned long apic_read(unsigned long reg) +{ + return paravirt_ops.apic_read(reg); +} +EXPORT_SYMBOL_GPL(apic_read); +#endif + +void __cpuid(unsigned int *eax, unsigned int *ebx, +unsigned int *ecx, unsigned int *edx) +{ + paravirt_ops.cpuid(eax, ebx, ecx, edx); +} +EXPORT_SYMBOL(__cpuid); + /* We simply declare start_kernel to be the paravirt probe of last resort. */ paravirt_probe(start_kernel); @@ -566,4 +683,3 @@ struct paravirt_ops paravirt_ops = { .irq_enable_sysexit = native_irq_enable_sysexit, .iret = native_iret, }; -EXPORT_SYMBOL(paravirt_ops); diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/dontdiff --minimal linux-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h working-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h --- linux-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h 2006-09-22 15:36:13.0 +1000 +++ working-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h 2007-01-07 04:19:07.0 +1100 @@ -58,11 +58,7 @@ static inline unsigned long drm_follow_page(void *vaddr) { - pgd_t *pgd = pgd_offset_k((unsigned long)vaddr); - pud_t *pud = pud_offset(pgd, (unsigned long)vaddr); - pmd_t *pmd = pmd_offset(pud, (unsigned long)vaddr); - pte_t *ptep = pte_offset_kernel(pmd, (unsigned long)vaddr); - return pte_pfn(*ptep) << PAGE_SHIFT; + return __follow_page(vaddr); } #else /* __OS_HAS_AGP */ diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/dontdiff --minimal linux-2.6.20-rc3-git4/include/asm-i386/delay.h working-2.6.20-rc3-git4/include/asm-i386/delay.h --- linux-2.6.20-rc3-git4/include/asm-i386/delay.h 2007-01-07 03:42:32.0 +1100 +++ working-2.6.20-rc3-git4/include/asm-i386/delay.h2007-01-07 04:08:46.0 +1100 @@
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 08:14 +0100, Ingo Molnar wrote: > * Rusty Russell <[EMAIL PROTECTED]> wrote: > > > +EXPORT_SYMBOL(clts); > > +EXPORT_SYMBOL(read_cr0); > > +EXPORT_SYMBOL(write_cr0); > > mark these a _GPL export. Perhaps even mark the symbol deprecated, to be > unexported once we fix raid6. OK... > > +EXPORT_SYMBOL(wbinvd); > > +EXPORT_SYMBOL(raw_safe_halt); > > +EXPORT_SYMBOL(halt); > > +EXPORT_SYMBOL_GPL(apic_write); > > +EXPORT_SYMBOL_GPL(apic_read); > > these should be _GPL too. If any module uses it and breaks a user's box > we need that big licensing hint to be able to debug them ... OK. I GPLed the ones which I thought were really obscure, but I was trying to follow existing policy of not _GPL'ing existing symbols. Cheers, Rusty. PS. drm_memory.h has a "drm_follow_page": this forces us to uninline various page tables ops. Can this use follow_page() somehow, or do we need an "__follow_page" export for this case? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 01:50 -0800, Zachary Amsden wrote: > Ingo Molnar wrote: > > * Rusty Russell <[EMAIL PROTECTED]> wrote: > > > > > >> +EXPORT_SYMBOL(clts); > >> +EXPORT_SYMBOL(read_cr0); > >> +EXPORT_SYMBOL(write_cr0); > >> > > > > mark these a _GPL export. Perhaps even mark the symbol deprecated, to be > > unexported once we fix raid6. > > > > read / write cr0 must not be GPL, since kernel_fpu_end uses them but kernel_fpu_begin() is _GPL already so this is just symmetry... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Ingo Molnar wrote: * Rusty Russell <[EMAIL PROTECTED]> wrote: +EXPORT_SYMBOL(clts); +EXPORT_SYMBOL(read_cr0); +EXPORT_SYMBOL(write_cr0); mark these a _GPL export. Perhaps even mark the symbol deprecated, to be unexported once we fix raid6. read / write cr0 must not be GPL, since kernel_fpu_end uses them and is inline. clts I don't think matters. +EXPORT_SYMBOL(wbinvd); +EXPORT_SYMBOL(raw_safe_halt); +EXPORT_SYMBOL(halt); +EXPORT_SYMBOL_GPL(apic_write); +EXPORT_SYMBOL_GPL(apic_read); these should be _GPL too. If any module uses it and breaks a user's box we need that big licensing hint to be able to debug them ... Perhaps also, MSRs are too dangerous for binary modules to be messing with. Agree on halt - but wbinvd can theoretically be used for device mapped memory consistency. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Ingo Molnar wrote: * Rusty Russell [EMAIL PROTECTED] wrote: +EXPORT_SYMBOL(clts); +EXPORT_SYMBOL(read_cr0); +EXPORT_SYMBOL(write_cr0); mark these a _GPL export. Perhaps even mark the symbol deprecated, to be unexported once we fix raid6. read / write cr0 must not be GPL, since kernel_fpu_end uses them and is inline. clts I don't think matters. +EXPORT_SYMBOL(wbinvd); +EXPORT_SYMBOL(raw_safe_halt); +EXPORT_SYMBOL(halt); +EXPORT_SYMBOL_GPL(apic_write); +EXPORT_SYMBOL_GPL(apic_read); these should be _GPL too. If any module uses it and breaks a user's box we need that big licensing hint to be able to debug them ... Perhaps also, MSRs are too dangerous for binary modules to be messing with. Agree on halt - but wbinvd can theoretically be used for device mapped memory consistency. Zach - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 01:50 -0800, Zachary Amsden wrote: Ingo Molnar wrote: * Rusty Russell [EMAIL PROTECTED] wrote: +EXPORT_SYMBOL(clts); +EXPORT_SYMBOL(read_cr0); +EXPORT_SYMBOL(write_cr0); mark these a _GPL export. Perhaps even mark the symbol deprecated, to be unexported once we fix raid6. read / write cr0 must not be GPL, since kernel_fpu_end uses them but kernel_fpu_begin() is _GPL already so this is just symmetry... -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 08:14 +0100, Ingo Molnar wrote: * Rusty Russell [EMAIL PROTECTED] wrote: +EXPORT_SYMBOL(clts); +EXPORT_SYMBOL(read_cr0); +EXPORT_SYMBOL(write_cr0); mark these a _GPL export. Perhaps even mark the symbol deprecated, to be unexported once we fix raid6. OK... +EXPORT_SYMBOL(wbinvd); +EXPORT_SYMBOL(raw_safe_halt); +EXPORT_SYMBOL(halt); +EXPORT_SYMBOL_GPL(apic_write); +EXPORT_SYMBOL_GPL(apic_read); these should be _GPL too. If any module uses it and breaks a user's box we need that big licensing hint to be able to debug them ... OK. I GPLed the ones which I thought were really obscure, but I was trying to follow existing policy of not _GPL'ing existing symbols. Cheers, Rusty. PS. drm_memory.h has a drm_follow_page: this forces us to uninline various page tables ops. Can this use follow_page() somehow, or do we need an __follow_page export for this case? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 08:08 +0100, Ingo Molnar wrote: btw., your patch does not apply to current -git - could you please rebase this patch to the head of your queue so that upstream can pick it up? OK, here it is against rc3-git4. Name: don't export paravirt_ops structure, do individual functions Wrap the paravirt_ops members we want to export in wrapper functions. Since we binary-patch the critical ones, this doesn't make a speed impact. I moved drm_follow_page into the core, to avoid having to wrap the various pte ops. Unlining kernel_fpu_end and using that in the RAID6 code would remove the need to export clts/read_cr0/write_cr0 too. Signed-off-by: Rusty Russell [EMAIL PROTECTED] diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/dontdiff --minimal linux-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c working-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c --- linux-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c 2007-01-07 03:41:32.0 +1100 +++ working-2.6.20-rc3-git4/arch/i386/kernel/paravirt.c 2007-01-07 04:21:59.0 +1100 @@ -482,6 +482,123 @@ static int __init print_banner(void) } core_initcall(print_banner); +unsigned long paravirt_save_flags(void) +{ + return paravirt_ops.save_fl(); +} +EXPORT_SYMBOL(paravirt_save_flags); + +void paravirt_restore_flags(unsigned long flags) +{ + paravirt_ops.restore_fl(flags); +} +EXPORT_SYMBOL(paravirt_restore_flags); + +void paravirt_irq_disable(void) +{ + paravirt_ops.irq_disable(); +} +EXPORT_SYMBOL(paravirt_irq_disable); + +void paravirt_irq_enable(void) +{ + paravirt_ops.irq_enable(); +} +EXPORT_SYMBOL(paravirt_irq_enable); + +void paravirt_io_delay(void) +{ + paravirt_ops.io_delay(); +} +EXPORT_SYMBOL(paravirt_io_delay); + +void paravirt_const_udelay(unsigned long loops) +{ + paravirt_ops.const_udelay(loops); +} +EXPORT_SYMBOL(paravirt_const_udelay); + +u64 paravirt_read_msr(unsigned int msr, int *err) +{ + return paravirt_ops.read_msr(msr, err); +} +EXPORT_SYMBOL(paravirt_read_msr); + +int paravirt_write_msr(unsigned int msr, u64 val) +{ + return paravirt_ops.write_msr(msr, val); +} +EXPORT_SYMBOL(paravirt_write_msr); + +u64 paravirt_read_tsc(void) +{ + return paravirt_ops.read_tsc(); +} +EXPORT_SYMBOL(paravirt_read_tsc); + +int paravirt_enabled(void) +{ + return paravirt_ops.paravirt_enabled; +} +EXPORT_SYMBOL(paravirt_enabled); + +void clts(void) +{ + paravirt_ops.clts(); +} +EXPORT_SYMBOL(clts); + +unsigned long read_cr0(void) +{ + return paravirt_ops.read_cr0(); +} +EXPORT_SYMBOL_GPL(read_cr0); + +void write_cr0(unsigned long cr0) +{ + paravirt_ops.write_cr0(cr0); +} +EXPORT_SYMBOL_GPL(write_cr0); + +void wbinvd(void) +{ + paravirt_ops.wbinvd(); +} +EXPORT_SYMBOL(wbinvd); + +void raw_safe_halt(void) +{ + paravirt_ops.safe_halt(); +} +EXPORT_SYMBOL_GPL(raw_safe_halt); + +void halt(void) +{ + paravirt_ops.safe_halt(); +} +EXPORT_SYMBOL_GPL(halt); + +#ifdef CONFIG_X86_LOCAL_APIC +void apic_write(unsigned long reg, unsigned long v) +{ + paravirt_ops.apic_write(reg,v); +} +EXPORT_SYMBOL_GPL(apic_write); + +unsigned long apic_read(unsigned long reg) +{ + return paravirt_ops.apic_read(reg); +} +EXPORT_SYMBOL_GPL(apic_read); +#endif + +void __cpuid(unsigned int *eax, unsigned int *ebx, +unsigned int *ecx, unsigned int *edx) +{ + paravirt_ops.cpuid(eax, ebx, ecx, edx); +} +EXPORT_SYMBOL(__cpuid); + /* We simply declare start_kernel to be the paravirt probe of last resort. */ paravirt_probe(start_kernel); @@ -566,4 +683,3 @@ struct paravirt_ops paravirt_ops = { .irq_enable_sysexit = native_irq_enable_sysexit, .iret = native_iret, }; -EXPORT_SYMBOL(paravirt_ops); diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/dontdiff --minimal linux-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h working-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h --- linux-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h 2006-09-22 15:36:13.0 +1000 +++ working-2.6.20-rc3-git4/drivers/char/drm/drm_memory.h 2007-01-07 04:19:07.0 +1100 @@ -58,11 +58,7 @@ static inline unsigned long drm_follow_page(void *vaddr) { - pgd_t *pgd = pgd_offset_k((unsigned long)vaddr); - pud_t *pud = pud_offset(pgd, (unsigned long)vaddr); - pmd_t *pmd = pmd_offset(pud, (unsigned long)vaddr); - pte_t *ptep = pte_offset_kernel(pmd, (unsigned long)vaddr); - return pte_pfn(*ptep) PAGE_SHIFT; + return __follow_page(vaddr); } #else /* __OS_HAS_AGP */ diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/dontdiff --minimal linux-2.6.20-rc3-git4/include/asm-i386/delay.h working-2.6.20-rc3-git4/include/asm-i386/delay.h --- linux-2.6.20-rc3-git4/include/asm-i386/delay.h 2007-01-07 03:42:32.0 +1100 +++ working-2.6.20-rc3-git4/include/asm-i386/delay.h2007-01-07 04:08:46.0 +1100 @@ -17,9
Re: [patch] paravirt: isolate module ops
* Rusty Russell [EMAIL PROTECTED] wrote: I moved drm_follow_page into the core, to avoid having to wrap the various pte ops. Unlining kernel_fpu_end and using that in the RAID6 code would remove the need to export clts/read_cr0/write_cr0 too. yeah, let's do that too. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sun, Jan 07, 2007 at 03:18:45AM +1100, Rusty Russell wrote: PS. drm_memory.h has a drm_follow_page: this forces us to uninline various page tables ops. Can this use follow_page() somehow, or do we need an __follow_page export for this case? Not if avoidable. And it seems avoidable as drm really should be using vmalloc_to_page. Untested patch below: Index: linux-2.6/drivers/char/drm/drm_memory.c === --- linux-2.6.orig/drivers/char/drm/drm_memory.c2007-01-06 20:21:07.0 +0100 +++ linux-2.6/drivers/char/drm/drm_memory.c 2007-01-06 20:29:03.0 +0100 @@ -211,6 +211,23 @@ } #endif /* 0 */ +static int is_agp_mapping(void *pt, unsigned long size, drm_device_t *dev) +{ + unsigned long addr = (unsigned long)pt; + + if (addr = VMALLOC_START addr VMALLOC_END) { + unsigned long phys; + drm_map_t *map; + + phys = page_to_phys(vmalloc_to_page(pt)) + offset_in_page(pt); + map = drm_lookup_map(phys, size, dev); + if (map map-type == _DRM_AGP) + return 1; + } + + return 0; +} + void drm_ioremapfree(void *pt, unsigned long size, drm_device_t * dev) { @@ -219,21 +236,11 @@ * routines for handling mappings in the AGP space. Hopefully this can be done in * a future revision of the interface... */ - if (drm_core_has_AGP(dev) dev-agp dev-agp-cant_use_aperture -((unsigned long)pt = VMALLOC_START -(unsigned long)pt VMALLOC_END)) { - unsigned long offset; - drm_map_t *map; - - offset = drm_follow_page(pt) | ((unsigned long)pt ~PAGE_MASK); - map = drm_lookup_map(offset, size, dev); - if (map map-type == _DRM_AGP) { - vunmap(pt); - return; - } - } - - iounmap(pt); + if (drm_core_has_AGP(dev) dev-agp dev-agp-cant_use_aperture + is_agp_mapping(pt, size, dev)) + vunmap(pt); + else + iounmap(pt); } EXPORT_SYMBOL(drm_ioremapfree); Index: linux-2.6/drivers/char/drm/drm_memory.h === --- linux-2.6.orig/drivers/char/drm/drm_memory.h2007-01-06 20:21:07.0 +0100 +++ linux-2.6/drivers/char/drm/drm_memory.h 2007-01-06 20:26:50.0 +0100 @@ -55,23 +55,6 @@ # define PAGE_AGP PAGE_KERNEL # endif #endif - -static inline unsigned long drm_follow_page(void *vaddr) -{ - pgd_t *pgd = pgd_offset_k((unsigned long)vaddr); - pud_t *pud = pud_offset(pgd, (unsigned long)vaddr); - pmd_t *pmd = pmd_offset(pud, (unsigned long)vaddr); - pte_t *ptep = pte_offset_kernel(pmd, (unsigned long)vaddr); - return pte_pfn(*ptep) PAGE_SHIFT; -} - -#else /* __OS_HAS_AGP */ - -static inline unsigned long drm_follow_page(void *vaddr) -{ - return 0; -} - #endif void *drm_ioremap(unsigned long offset, unsigned long size, - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Rusty Russell wrote: +int paravirt_write_msr(unsigned int msr, u64 val); If binary modules using debug registers makes us nervous, the reprogramming MSRs is also similarly bad. +void raw_safe_halt(void); +void halt(void); These shouldn't be done by modules, ever. Only the scheduler should decide to halt. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Sat, 2007-01-06 at 12:55 -0800, Zachary Amsden wrote: Rusty Russell wrote: +int paravirt_write_msr(unsigned int msr, u64 val); If binary modules using debug registers makes us nervous, the reprogramming MSRs is also similarly bad. Yes, but this is simply from experience. Several modules wrote msrs (you can take out the EXPORT_SYMBOL and find them quite quickly). +void raw_safe_halt(void); +void halt(void); These shouldn't be done by modules, ever. Only the scheduler should decide to halt. Several modules implement alternate halt loops. I guess being in a module for specific CPUs makes sense... Cheers! Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
* Rusty Russell <[EMAIL PROTECTED]> wrote: > +EXPORT_SYMBOL(clts); > +EXPORT_SYMBOL(read_cr0); > +EXPORT_SYMBOL(write_cr0); mark these a _GPL export. Perhaps even mark the symbol deprecated, to be unexported once we fix raid6. > +EXPORT_SYMBOL(wbinvd); > +EXPORT_SYMBOL(raw_safe_halt); > +EXPORT_SYMBOL(halt); > +EXPORT_SYMBOL_GPL(apic_write); > +EXPORT_SYMBOL_GPL(apic_read); these should be _GPL too. If any module uses it and breaks a user's box we need that big licensing hint to be able to debug them ... Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > this doesnt do the most crutial step: the removal of the paravirt_ops > export. [...] ah, you removed it already ... it hid at the very last line of the patch chunk. Good :) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
* Rusty Russell <[EMAIL PROTECTED]> wrote: > diff -r 48f31ae5d7b5 arch/i386/kernel/paravirt.c > --- a/arch/i386/kernel/paravirt.c Sat Jan 06 10:32:24 2007 +1100 > +++ b/arch/i386/kernel/paravirt.c Sat Jan 06 17:23:12 2007 +1100 > @@ -596,6 +596,154 @@ static int __init print_banner(void) > return 0; > } > core_initcall(print_banner); > + > +unsigned long paravirt_save_flags(void) > +{ > + return paravirt_ops.save_fl(); > +} > +EXPORT_SYMBOL(paravirt_save_flags); ok, i like this one too - i agree that it's better than mine because it isolates on a per-API level not on a per-lowlevel-paravirt-op level. But this doesnt do the most crutial step: the removal of the paravirt_ops export. Without that the module build test is pointless. btw., your patch does not apply to current -git - could you please rebase this patch to the head of your queue so that upstream can pick it up? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Fri, 2007-01-05 at 16:31 -0800, Zachary Amsden wrote: > Ingo Molnar wrote: > > Subject: [patch] paravirt: isolate module ops > > From: Ingo Molnar <[EMAIL PROTECTED]> > > > > only export those operations to modules that have been available to them > > historically: irq disable/enable, io-delay, udelay, etc. > > > > this isolates that functionality from other paravirtualization > > functionality that modules have no business messing with. > > > > boot and build tested with CONFIG_PARAVIRT=y. > > > > Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> > > --- > > arch/i386/kernel/paravirt.c | 41 +++ > > include/asm-i386/delay.h|4 +- > > include/asm-i386/paravirt.h | 65 > > ++-- > > 3 files changed, 75 insertions(+), 35 deletions(-) > > > > Index: linux/arch/i386/kernel/paravirt.c > > === > > --- linux.orig/arch/i386/kernel/paravirt.c > > +++ linux/arch/i386/kernel/paravirt.c > > @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { > > > > .patch = native_patch, > > .banner = default_banner, > > + > > .arch_setup = native_nop, > > .memory_setup = machine_specific_memory_setup, > > .get_wallclock = native_get_wallclock, > > @@ -566,4 +567,42 @@ struct paravirt_ops paravirt_ops = { > > .irq_enable_sysexit = native_irq_enable_sysexit, > > .iret = native_iret, > > }; > > -EXPORT_SYMBOL(paravirt_ops); > > + > > +/* > > + * These are exported to modules: > > + */ > > +struct paravirt_ops paravirt_mod_ops = { > > + .name = "bare hardware", > > + .paravirt_enabled = 0, > > + .kernel_rpl = 0, > > + > > + .patch = native_patch, > > > > I don't think you want to leave that one... patching the kernel isn't > something modules should be doing. Yeah, this patch is terrible, but I know why Ingo did it; it's my fault for not finishing my version yesterday (but allmodconfig takes a while to build and every change to paravirt.h rebuilds the universe...). So here's my variant, compile-tested with "make allmodconfig". Exports individual functions, some of which can be reduced over time as the modules involved are whacked into shape. Note: it reduces the patching space by 1 byte (direct jumps vs indirect jumps). If this a problem for any paravirt_ops backend in practice, we can add noops... Cheers, Rusty. PS. May break with other configurations... Name: don't export paravirt_ops structure, do individual functions Some of the more obscure ones are only used by one or two modules. We can almost certainly reduce them with more work. Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> diff -r 48f31ae5d7b5 arch/i386/kernel/paravirt.c --- a/arch/i386/kernel/paravirt.c Sat Jan 06 10:32:24 2007 +1100 +++ b/arch/i386/kernel/paravirt.c Sat Jan 06 17:23:12 2007 +1100 @@ -596,6 +596,154 @@ static int __init print_banner(void) return 0; } core_initcall(print_banner); + +unsigned long paravirt_save_flags(void) +{ + return paravirt_ops.save_fl(); +} +EXPORT_SYMBOL(paravirt_save_flags); + +void paravirt_restore_flags(unsigned long flags) +{ + paravirt_ops.restore_fl(flags); +} +EXPORT_SYMBOL(paravirt_restore_flags); + +void paravirt_irq_disable(void) +{ + paravirt_ops.irq_disable(); +} +EXPORT_SYMBOL(paravirt_irq_disable); + +void paravirt_irq_enable(void) +{ + paravirt_ops.irq_enable(); +} +EXPORT_SYMBOL(paravirt_irq_enable); + +void paravirt_io_delay(void) +{ + paravirt_ops.io_delay(); +} +EXPORT_SYMBOL(paravirt_io_delay); + +void paravirt_const_udelay(unsigned long loops) +{ + paravirt_ops.const_udelay(loops); +} +EXPORT_SYMBOL(paravirt_const_udelay); + +u64 paravirt_read_msr(unsigned int msr, int *err) +{ + return paravirt_ops.read_msr(msr, err); +} +EXPORT_SYMBOL(paravirt_read_msr); + +int paravirt_write_msr(unsigned int msr, u64 val) +{ + return paravirt_ops.write_msr(msr, val); +} +EXPORT_SYMBOL(paravirt_write_msr); + +u64 paravirt_read_tsc(void) +{ + return paravirt_ops.read_tsc(); +} +EXPORT_SYMBOL(paravirt_read_tsc); + +int paravirt_enabled(void) +{ + return paravirt_ops.paravirt_enabled; +} +EXPORT_SYMBOL(paravirt_enabled); + +void clts(void) +{ + paravirt_ops.clts(); +} +EXPORT_SYMBOL(clts); + +unsigned long read_cr0(void) +{ + return paravirt_ops.read_cr0(); +} +EXPORT_SYMBOL(read_cr0); + +void write_cr0(unsigned long cr0) +{ + paravirt_ops.write_cr0(cr0); +} +EXPORT_SYMBOL(write_cr0); + +void wbinvd(void) +{ + paravirt_ops.wbinvd(); +} +EXPORT_SYMBOL(wbinvd); + +void raw_safe_halt(void) +{ +
Re: [patch] paravirt: isolate module ops
On Fri, 2007-01-05 at 18:07 -0800, Arjan van de Ven wrote: > > > > I would suggest a slightly different carving. For one, no TLB flushes. > > If you can't modify PTEs, why do you need to have TLB flushes? And I > > would allow CR0 read / write for code which saves and restores FPU state > > no that is abstracted away by kernel_fpu_begin/end. Modules have no > business doing that themselves Sure, but it'll take some time to fix the raid modules (which are the ones which abuse this). I'm testing a patch now, I'll send the clts removal patch on top of that once it's done. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Arjan van de Ven wrote: I would suggest a slightly different carving. For one, no TLB flushes. If you can't modify PTEs, why do you need to have TLB flushes? And I would allow CR0 read / write for code which saves and restores FPU state no that is abstracted away by kernel_fpu_begin/end. Modules have no business doing that themselves As long as they don't rely on inlines for that... checking and kernel_fpu_end is inline and uses stts(), which requires CR0 read / write. One can easily imagine binary modules which do use the fpu, and these were not broken before, so breaking them now seems the wrong thing to do. I agree on debug registers - anything touching them is way too shady. And there is no reason modules should be doing raw page table operations, they should use proper mm functions and leave the page details to the mm layer, which doesn't do these things inline. Basically, it is just the things that do get inlined that I think we should worry about. If you all feel strongly that this should be fixed in 2.6.20, perhaps the best thing to do is in fact EXPORT_SYMBOL_GPL(paravirt_ops), and we can queue up a patch in -mm which will export those paravirt_ops required inline by modules for 2.6.21. Otherwise, I think there will be too many rejects against the paravirt code in Andrew's tree. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
> > I would suggest a slightly different carving. For one, no TLB flushes. > If you can't modify PTEs, why do you need to have TLB flushes? And I > would allow CR0 read / write for code which saves and restores FPU state no that is abstracted away by kernel_fpu_begin/end. Modules have no business doing that themselves > - possibly even debug register access, although any code which touches > DRs could be doing something sneaky. I'm on the fence about that one. lets not allow it at all -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Ingo Molnar wrote: Subject: [patch] paravirt: isolate module ops From: Ingo Molnar <[EMAIL PROTECTED]> only export those operations to modules that have been available to them historically: irq disable/enable, io-delay, udelay, etc. this isolates that functionality from other paravirtualization functionality that modules have no business messing with. boot and build tested with CONFIG_PARAVIRT=y. I would suggest a slightly different carving. For one, no TLB flushes. If you can't modify PTEs, why do you need to have TLB flushes? And I would allow CR0 read / write for code which saves and restores FPU state - possibly even debug register access, although any code which touches DRs could be doing something sneaky. I'm on the fence about that one. Here is a partially tested patch against the -mm tree. Let me know what you think of this slightly different approach. diff -r 3242f865ce8e arch/i386/kernel/paravirt.c --- a/arch/i386/kernel/paravirt.c Thu Jan 04 20:17:02 2007 -0800 +++ b/arch/i386/kernel/paravirt.c Fri Jan 05 16:48:25 2007 -0800 @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { .patch = native_patch, .banner = default_banner, + .arch_setup = native_nop, .memory_setup = machine_specific_memory_setup, .get_wallclock = native_get_wallclock, @@ -577,4 +578,42 @@ struct paravirt_ops paravirt_ops = { .startup_ipi_hook = (void *)native_nop, }; -EXPORT_SYMBOL(paravirt_ops); + +/* + * These are exported to modules: + */ +struct paravirt_mod_ops paravirt_mod_ops = { + .name = "bare hardware", + .paravirt_enabled = 0, + .kernel_rpl = 0, + + .patch = native_patch, + .banner = default_banner, + + .save_fl = native_save_fl, + .restore_fl = native_restore_fl, + .irq_disable = native_irq_disable, + .irq_enable = native_irq_enable, + + .cpuid = native_cpuid, + + .read_msr = native_read_msr, + .write_msr = native_write_msr, + + .read_tsc = native_read_tsc, + .read_pmc = native_read_pmc, + + .io_delay = native_io_delay, + .const_udelay = __const_udelay, + +#ifdef CONFIG_X86_LOCAL_APIC + .apic_write = native_apic_write, + .apic_write_atomic = native_apic_write_atomic, + .apic_read = native_apic_read, +#endif + + .flush_tlb_user = native_flush_tlb, + .flush_tlb_kernel = native_flush_tlb_global, + .flush_tlb_single = native_flush_tlb_single, +}; +EXPORT_SYMBOL(paravirt_mod_ops); diff -r 3242f865ce8e include/asm-i386/delay.h --- a/include/asm-i386/delay.h Thu Jan 04 20:17:02 2007 -0800 +++ b/include/asm-i386/delay.h Fri Jan 05 16:11:43 2007 -0800 @@ -17,9 +17,9 @@ extern void __delay(unsigned long loops) extern void __delay(unsigned long loops); #if defined(CONFIG_PARAVIRT) && !defined(USE_REAL_TIME_DELAY) -#define udelay(n) paravirt_ops.const_udelay((n) * 0x10c7ul) +#define udelay(n) paravirt_mod_ops.const_udelay((n) * 0x10c7ul) -#define ndelay(n) paravirt_ops.const_udelay((n) * 5ul) +#define ndelay(n) paravirt_mod_ops.const_udelay((n) * 5ul) #else /* !PARAVIRT || USE_REAL_TIME_DELAY */ diff -r 3242f865ce8e include/asm-i386/paravirt.h --- a/include/asm-i386/paravirt.h Thu Jan 04 20:17:02 2007 -0800 +++ b/include/asm-i386/paravirt.h Fri Jan 05 16:51:10 2007 -0800 @@ -29,12 +29,52 @@ struct Xgt_desc_struct; struct Xgt_desc_struct; struct tss_struct; struct mm_struct; -struct paravirt_ops +struct paravirt_mod_ops { unsigned int kernel_rpl; int paravirt_enabled; const char *name; + /* All the function pointers here are declared as "fastcall" + so that we get a specific register-based calling + convention. This makes it easier to implement inline + assembler replacements. */ + + void (fastcall *cpuid)(unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx); + + /* CLTS / cr0 may be used for math state save / restore */ + void (fastcall *clts)(void); + unsigned long (fastcall *read_cr0)(void); + void (fastcall *write_cr0)(unsigned long); + + unsigned long (fastcall *save_fl)(void); + void (fastcall *restore_fl)(unsigned long); + void (fastcall *irq_disable)(void); + void (fastcall *irq_enable)(void); + + /* err = 0/-EFAULT. wrmsr returns 0/-EFAULT. */ + u64 (fastcall *read_msr)(unsigned int msr, int *err); + int (fastcall *write_msr)(unsigned int msr, u64 val); + + u64 (fastcall *read_tsc)(void); + u64 (fastcall *read_pmc)(void); + + void (fastcall *io_delay)(void); + void (*const_udelay)(unsigned long loops); + + /* May be needed by device drivers to flush cache */ + void (fastcall *wbinvd)(void); + +#ifdef CONFIG_X86_LOCAL_APIC + void (fastcall *apic_write)(unsigned long reg, unsigned long v); + void (fastcall *ap
Re: [patch] paravirt: isolate module ops
Ingo Molnar wrote: Subject: [patch] paravirt: isolate module ops From: Ingo Molnar <[EMAIL PROTECTED]> only export those operations to modules that have been available to them historically: irq disable/enable, io-delay, udelay, etc. this isolates that functionality from other paravirtualization functionality that modules have no business messing with. boot and build tested with CONFIG_PARAVIRT=y. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/i386/kernel/paravirt.c | 41 +++ include/asm-i386/delay.h|4 +- include/asm-i386/paravirt.h | 65 ++-- 3 files changed, 75 insertions(+), 35 deletions(-) Index: linux/arch/i386/kernel/paravirt.c === --- linux.orig/arch/i386/kernel/paravirt.c +++ linux/arch/i386/kernel/paravirt.c @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { .patch = native_patch, .banner = default_banner, + .arch_setup = native_nop, .memory_setup = machine_specific_memory_setup, .get_wallclock = native_get_wallclock, @@ -566,4 +567,42 @@ struct paravirt_ops paravirt_ops = { .irq_enable_sysexit = native_irq_enable_sysexit, .iret = native_iret, }; -EXPORT_SYMBOL(paravirt_ops); + +/* + * These are exported to modules: + */ +struct paravirt_ops paravirt_mod_ops = { + .name = "bare hardware", + .paravirt_enabled = 0, + .kernel_rpl = 0, + + .patch = native_patch, I don't think you want to leave that one... patching the kernel isn't something modules should be doing. Perhaps a separate structure is better for module ops - this seems error prone to calling the wrong one of paravirt_ops vs. paravirt_mod_ops, also error prone to set them up in the code which sets paravirt_ops for each hypervisor. Although that does require more dissection, it does make sure everyone gets it right, and makes the interface very explicitly clear in the header file, which is nice. If you feel strongly about this, I suggest you push it upstream now, not into Andrew's tree (it doesn't apply cleanly there, and breaks the VMI patches which are in there). But this is no problem, I can fix that up easily once you get it in. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] paravirt: isolate module ops
Subject: [patch] paravirt: isolate module ops From: Ingo Molnar <[EMAIL PROTECTED]> only export those operations to modules that have been available to them historically: irq disable/enable, io-delay, udelay, etc. this isolates that functionality from other paravirtualization functionality that modules have no business messing with. boot and build tested with CONFIG_PARAVIRT=y. Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- arch/i386/kernel/paravirt.c | 41 +++ include/asm-i386/delay.h|4 +- include/asm-i386/paravirt.h | 65 ++-- 3 files changed, 75 insertions(+), 35 deletions(-) Index: linux/arch/i386/kernel/paravirt.c === --- linux.orig/arch/i386/kernel/paravirt.c +++ linux/arch/i386/kernel/paravirt.c @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { .patch = native_patch, .banner = default_banner, + .arch_setup = native_nop, .memory_setup = machine_specific_memory_setup, .get_wallclock = native_get_wallclock, @@ -566,4 +567,42 @@ struct paravirt_ops paravirt_ops = { .irq_enable_sysexit = native_irq_enable_sysexit, .iret = native_iret, }; -EXPORT_SYMBOL(paravirt_ops); + +/* + * These are exported to modules: + */ +struct paravirt_ops paravirt_mod_ops = { + .name = "bare hardware", + .paravirt_enabled = 0, + .kernel_rpl = 0, + + .patch = native_patch, + .banner = default_banner, + + .save_fl = native_save_fl, + .restore_fl = native_restore_fl, + .irq_disable = native_irq_disable, + .irq_enable = native_irq_enable, + + .cpuid = native_cpuid, + + .read_msr = native_read_msr, + .write_msr = native_write_msr, + + .read_tsc = native_read_tsc, + .read_pmc = native_read_pmc, + + .io_delay = native_io_delay, + .const_udelay = __const_udelay, + +#ifdef CONFIG_X86_LOCAL_APIC + .apic_write = native_apic_write, + .apic_write_atomic = native_apic_write_atomic, + .apic_read = native_apic_read, +#endif + + .flush_tlb_user = native_flush_tlb, + .flush_tlb_kernel = native_flush_tlb_global, + .flush_tlb_single = native_flush_tlb_single, +}; +EXPORT_SYMBOL(paravirt_mod_ops); Index: linux/include/asm-i386/delay.h === --- linux.orig/include/asm-i386/delay.h +++ linux/include/asm-i386/delay.h @@ -17,9 +17,9 @@ extern void __const_udelay(unsigned long extern void __delay(unsigned long loops); #if defined(CONFIG_PARAVIRT) && !defined(USE_REAL_TIME_DELAY) -#define udelay(n) paravirt_ops.const_udelay((n) * 0x10c7ul) +#define udelay(n) paravirt_mod_ops.const_udelay((n) * 0x10c7ul) -#define ndelay(n) paravirt_ops.const_udelay((n) * 5ul) +#define ndelay(n) paravirt_mod_ops.const_udelay((n) * 5ul) #else /* !PARAVIRT || USE_REAL_TIME_DELAY */ Index: linux/include/asm-i386/paravirt.h === --- linux.orig/include/asm-i386/paravirt.h +++ linux/include/asm-i386/paravirt.h @@ -151,8 +151,9 @@ struct paravirt_ops __attribute__((__section__(".paravirtprobe"))) = fn extern struct paravirt_ops paravirt_ops; +extern struct paravirt_ops paravirt_mod_ops; -#define paravirt_enabled() (paravirt_ops.paravirt_enabled) +#define paravirt_enabled() (paravirt_mod_ops.paravirt_enabled) static inline void load_esp0(struct tss_struct *tss, struct thread_struct *thread) @@ -180,7 +181,7 @@ static inline void do_time_init(void) static inline void __cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - paravirt_ops.cpuid(eax, ebx, ecx, edx); + paravirt_mod_ops.cpuid(eax, ebx, ecx, edx); } /* @@ -219,52 +220,52 @@ static inline void halt(void) #define rdmsr(msr,val1,val2) do { \ int _err; \ - u64 _l = paravirt_ops.read_msr(msr,&_err); \ + u64 _l = paravirt_mod_ops.read_msr(msr,&_err); \ val1 = (u32)_l; \ val2 = _l >> 32;\ } while(0) #define wrmsr(msr,val1,val2) do { \ u64 _l = ((u64)(val2) << 32) | (val1); \ - paravirt_ops.write_msr((msr), _l); \ + paravirt_mod_ops.write_msr((msr), _l); \ } while(0) #define rdmsrl(msr,val) do { \ int _err; \ - val = paravirt_ops.read_msr((msr),&_err); \ + val = paravirt_mod_ops.read_msr((msr),&_err); \ } wh
[patch] paravirt: isolate module ops
Subject: [patch] paravirt: isolate module ops From: Ingo Molnar [EMAIL PROTECTED] only export those operations to modules that have been available to them historically: irq disable/enable, io-delay, udelay, etc. this isolates that functionality from other paravirtualization functionality that modules have no business messing with. boot and build tested with CONFIG_PARAVIRT=y. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- arch/i386/kernel/paravirt.c | 41 +++ include/asm-i386/delay.h|4 +- include/asm-i386/paravirt.h | 65 ++-- 3 files changed, 75 insertions(+), 35 deletions(-) Index: linux/arch/i386/kernel/paravirt.c === --- linux.orig/arch/i386/kernel/paravirt.c +++ linux/arch/i386/kernel/paravirt.c @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { .patch = native_patch, .banner = default_banner, + .arch_setup = native_nop, .memory_setup = machine_specific_memory_setup, .get_wallclock = native_get_wallclock, @@ -566,4 +567,42 @@ struct paravirt_ops paravirt_ops = { .irq_enable_sysexit = native_irq_enable_sysexit, .iret = native_iret, }; -EXPORT_SYMBOL(paravirt_ops); + +/* + * These are exported to modules: + */ +struct paravirt_ops paravirt_mod_ops = { + .name = bare hardware, + .paravirt_enabled = 0, + .kernel_rpl = 0, + + .patch = native_patch, + .banner = default_banner, + + .save_fl = native_save_fl, + .restore_fl = native_restore_fl, + .irq_disable = native_irq_disable, + .irq_enable = native_irq_enable, + + .cpuid = native_cpuid, + + .read_msr = native_read_msr, + .write_msr = native_write_msr, + + .read_tsc = native_read_tsc, + .read_pmc = native_read_pmc, + + .io_delay = native_io_delay, + .const_udelay = __const_udelay, + +#ifdef CONFIG_X86_LOCAL_APIC + .apic_write = native_apic_write, + .apic_write_atomic = native_apic_write_atomic, + .apic_read = native_apic_read, +#endif + + .flush_tlb_user = native_flush_tlb, + .flush_tlb_kernel = native_flush_tlb_global, + .flush_tlb_single = native_flush_tlb_single, +}; +EXPORT_SYMBOL(paravirt_mod_ops); Index: linux/include/asm-i386/delay.h === --- linux.orig/include/asm-i386/delay.h +++ linux/include/asm-i386/delay.h @@ -17,9 +17,9 @@ extern void __const_udelay(unsigned long extern void __delay(unsigned long loops); #if defined(CONFIG_PARAVIRT) !defined(USE_REAL_TIME_DELAY) -#define udelay(n) paravirt_ops.const_udelay((n) * 0x10c7ul) +#define udelay(n) paravirt_mod_ops.const_udelay((n) * 0x10c7ul) -#define ndelay(n) paravirt_ops.const_udelay((n) * 5ul) +#define ndelay(n) paravirt_mod_ops.const_udelay((n) * 5ul) #else /* !PARAVIRT || USE_REAL_TIME_DELAY */ Index: linux/include/asm-i386/paravirt.h === --- linux.orig/include/asm-i386/paravirt.h +++ linux/include/asm-i386/paravirt.h @@ -151,8 +151,9 @@ struct paravirt_ops __attribute__((__section__(.paravirtprobe))) = fn extern struct paravirt_ops paravirt_ops; +extern struct paravirt_ops paravirt_mod_ops; -#define paravirt_enabled() (paravirt_ops.paravirt_enabled) +#define paravirt_enabled() (paravirt_mod_ops.paravirt_enabled) static inline void load_esp0(struct tss_struct *tss, struct thread_struct *thread) @@ -180,7 +181,7 @@ static inline void do_time_init(void) static inline void __cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { - paravirt_ops.cpuid(eax, ebx, ecx, edx); + paravirt_mod_ops.cpuid(eax, ebx, ecx, edx); } /* @@ -219,52 +220,52 @@ static inline void halt(void) #define rdmsr(msr,val1,val2) do { \ int _err; \ - u64 _l = paravirt_ops.read_msr(msr,_err); \ + u64 _l = paravirt_mod_ops.read_msr(msr,_err); \ val1 = (u32)_l; \ val2 = _l 32;\ } while(0) #define wrmsr(msr,val1,val2) do { \ u64 _l = ((u64)(val2) 32) | (val1); \ - paravirt_ops.write_msr((msr), _l); \ + paravirt_mod_ops.write_msr((msr), _l); \ } while(0) #define rdmsrl(msr,val) do { \ int _err; \ - val = paravirt_ops.read_msr((msr),_err); \ + val = paravirt_mod_ops.read_msr((msr),_err); \ } while(0) -#define wrmsrl(msr,val) (paravirt_ops.write_msr((msr),(val))) +#define wrmsrl(msr
Re: [patch] paravirt: isolate module ops
Ingo Molnar wrote: Subject: [patch] paravirt: isolate module ops From: Ingo Molnar [EMAIL PROTECTED] only export those operations to modules that have been available to them historically: irq disable/enable, io-delay, udelay, etc. this isolates that functionality from other paravirtualization functionality that modules have no business messing with. boot and build tested with CONFIG_PARAVIRT=y. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- arch/i386/kernel/paravirt.c | 41 +++ include/asm-i386/delay.h|4 +- include/asm-i386/paravirt.h | 65 ++-- 3 files changed, 75 insertions(+), 35 deletions(-) Index: linux/arch/i386/kernel/paravirt.c === --- linux.orig/arch/i386/kernel/paravirt.c +++ linux/arch/i386/kernel/paravirt.c @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { .patch = native_patch, .banner = default_banner, + .arch_setup = native_nop, .memory_setup = machine_specific_memory_setup, .get_wallclock = native_get_wallclock, @@ -566,4 +567,42 @@ struct paravirt_ops paravirt_ops = { .irq_enable_sysexit = native_irq_enable_sysexit, .iret = native_iret, }; -EXPORT_SYMBOL(paravirt_ops); + +/* + * These are exported to modules: + */ +struct paravirt_ops paravirt_mod_ops = { + .name = bare hardware, + .paravirt_enabled = 0, + .kernel_rpl = 0, + + .patch = native_patch, I don't think you want to leave that one... patching the kernel isn't something modules should be doing. Perhaps a separate structure is better for module ops - this seems error prone to calling the wrong one of paravirt_ops vs. paravirt_mod_ops, also error prone to set them up in the code which sets paravirt_ops for each hypervisor. Although that does require more dissection, it does make sure everyone gets it right, and makes the interface very explicitly clear in the header file, which is nice. If you feel strongly about this, I suggest you push it upstream now, not into Andrew's tree (it doesn't apply cleanly there, and breaks the VMI patches which are in there). But this is no problem, I can fix that up easily once you get it in. Zach - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Ingo Molnar wrote: Subject: [patch] paravirt: isolate module ops From: Ingo Molnar [EMAIL PROTECTED] only export those operations to modules that have been available to them historically: irq disable/enable, io-delay, udelay, etc. this isolates that functionality from other paravirtualization functionality that modules have no business messing with. boot and build tested with CONFIG_PARAVIRT=y. I would suggest a slightly different carving. For one, no TLB flushes. If you can't modify PTEs, why do you need to have TLB flushes? And I would allow CR0 read / write for code which saves and restores FPU state - possibly even debug register access, although any code which touches DRs could be doing something sneaky. I'm on the fence about that one. Here is a partially tested patch against the -mm tree. Let me know what you think of this slightly different approach. diff -r 3242f865ce8e arch/i386/kernel/paravirt.c --- a/arch/i386/kernel/paravirt.c Thu Jan 04 20:17:02 2007 -0800 +++ b/arch/i386/kernel/paravirt.c Fri Jan 05 16:48:25 2007 -0800 @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { .patch = native_patch, .banner = default_banner, + .arch_setup = native_nop, .memory_setup = machine_specific_memory_setup, .get_wallclock = native_get_wallclock, @@ -577,4 +578,42 @@ struct paravirt_ops paravirt_ops = { .startup_ipi_hook = (void *)native_nop, }; -EXPORT_SYMBOL(paravirt_ops); + +/* + * These are exported to modules: + */ +struct paravirt_mod_ops paravirt_mod_ops = { + .name = bare hardware, + .paravirt_enabled = 0, + .kernel_rpl = 0, + + .patch = native_patch, + .banner = default_banner, + + .save_fl = native_save_fl, + .restore_fl = native_restore_fl, + .irq_disable = native_irq_disable, + .irq_enable = native_irq_enable, + + .cpuid = native_cpuid, + + .read_msr = native_read_msr, + .write_msr = native_write_msr, + + .read_tsc = native_read_tsc, + .read_pmc = native_read_pmc, + + .io_delay = native_io_delay, + .const_udelay = __const_udelay, + +#ifdef CONFIG_X86_LOCAL_APIC + .apic_write = native_apic_write, + .apic_write_atomic = native_apic_write_atomic, + .apic_read = native_apic_read, +#endif + + .flush_tlb_user = native_flush_tlb, + .flush_tlb_kernel = native_flush_tlb_global, + .flush_tlb_single = native_flush_tlb_single, +}; +EXPORT_SYMBOL(paravirt_mod_ops); diff -r 3242f865ce8e include/asm-i386/delay.h --- a/include/asm-i386/delay.h Thu Jan 04 20:17:02 2007 -0800 +++ b/include/asm-i386/delay.h Fri Jan 05 16:11:43 2007 -0800 @@ -17,9 +17,9 @@ extern void __delay(unsigned long loops) extern void __delay(unsigned long loops); #if defined(CONFIG_PARAVIRT) !defined(USE_REAL_TIME_DELAY) -#define udelay(n) paravirt_ops.const_udelay((n) * 0x10c7ul) +#define udelay(n) paravirt_mod_ops.const_udelay((n) * 0x10c7ul) -#define ndelay(n) paravirt_ops.const_udelay((n) * 5ul) +#define ndelay(n) paravirt_mod_ops.const_udelay((n) * 5ul) #else /* !PARAVIRT || USE_REAL_TIME_DELAY */ diff -r 3242f865ce8e include/asm-i386/paravirt.h --- a/include/asm-i386/paravirt.h Thu Jan 04 20:17:02 2007 -0800 +++ b/include/asm-i386/paravirt.h Fri Jan 05 16:51:10 2007 -0800 @@ -29,12 +29,52 @@ struct Xgt_desc_struct; struct Xgt_desc_struct; struct tss_struct; struct mm_struct; -struct paravirt_ops +struct paravirt_mod_ops { unsigned int kernel_rpl; int paravirt_enabled; const char *name; + /* All the function pointers here are declared as fastcall + so that we get a specific register-based calling + convention. This makes it easier to implement inline + assembler replacements. */ + + void (fastcall *cpuid)(unsigned int *eax, unsigned int *ebx, + unsigned int *ecx, unsigned int *edx); + + /* CLTS / cr0 may be used for math state save / restore */ + void (fastcall *clts)(void); + unsigned long (fastcall *read_cr0)(void); + void (fastcall *write_cr0)(unsigned long); + + unsigned long (fastcall *save_fl)(void); + void (fastcall *restore_fl)(unsigned long); + void (fastcall *irq_disable)(void); + void (fastcall *irq_enable)(void); + + /* err = 0/-EFAULT. wrmsr returns 0/-EFAULT. */ + u64 (fastcall *read_msr)(unsigned int msr, int *err); + int (fastcall *write_msr)(unsigned int msr, u64 val); + + u64 (fastcall *read_tsc)(void); + u64 (fastcall *read_pmc)(void); + + void (fastcall *io_delay)(void); + void (*const_udelay)(unsigned long loops); + + /* May be needed by device drivers to flush cache */ + void (fastcall *wbinvd)(void); + +#ifdef CONFIG_X86_LOCAL_APIC + void (fastcall *apic_write)(unsigned long reg, unsigned long v); + void (fastcall *apic_write_atomic)(unsigned long reg, unsigned
Re: [patch] paravirt: isolate module ops
I would suggest a slightly different carving. For one, no TLB flushes. If you can't modify PTEs, why do you need to have TLB flushes? And I would allow CR0 read / write for code which saves and restores FPU state no that is abstracted away by kernel_fpu_begin/end. Modules have no business doing that themselves - possibly even debug register access, although any code which touches DRs could be doing something sneaky. I'm on the fence about that one. lets not allow it at all -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
Arjan van de Ven wrote: I would suggest a slightly different carving. For one, no TLB flushes. If you can't modify PTEs, why do you need to have TLB flushes? And I would allow CR0 read / write for code which saves and restores FPU state no that is abstracted away by kernel_fpu_begin/end. Modules have no business doing that themselves As long as they don't rely on inlines for that... checking and kernel_fpu_end is inline and uses stts(), which requires CR0 read / write. One can easily imagine binary modules which do use the fpu, and these were not broken before, so breaking them now seems the wrong thing to do. I agree on debug registers - anything touching them is way too shady. And there is no reason modules should be doing raw page table operations, they should use proper mm functions and leave the page details to the mm layer, which doesn't do these things inline. Basically, it is just the things that do get inlined that I think we should worry about. If you all feel strongly that this should be fixed in 2.6.20, perhaps the best thing to do is in fact EXPORT_SYMBOL_GPL(paravirt_ops), and we can queue up a patch in -mm which will export those paravirt_ops required inline by modules for 2.6.21. Otherwise, I think there will be too many rejects against the paravirt code in Andrew's tree. Zach - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Fri, 2007-01-05 at 18:07 -0800, Arjan van de Ven wrote: I would suggest a slightly different carving. For one, no TLB flushes. If you can't modify PTEs, why do you need to have TLB flushes? And I would allow CR0 read / write for code which saves and restores FPU state no that is abstracted away by kernel_fpu_begin/end. Modules have no business doing that themselves Sure, but it'll take some time to fix the raid modules (which are the ones which abuse this). I'm testing a patch now, I'll send the clts removal patch on top of that once it's done. Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
On Fri, 2007-01-05 at 16:31 -0800, Zachary Amsden wrote: Ingo Molnar wrote: Subject: [patch] paravirt: isolate module ops From: Ingo Molnar [EMAIL PROTECTED] only export those operations to modules that have been available to them historically: irq disable/enable, io-delay, udelay, etc. this isolates that functionality from other paravirtualization functionality that modules have no business messing with. boot and build tested with CONFIG_PARAVIRT=y. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- arch/i386/kernel/paravirt.c | 41 +++ include/asm-i386/delay.h|4 +- include/asm-i386/paravirt.h | 65 ++-- 3 files changed, 75 insertions(+), 35 deletions(-) Index: linux/arch/i386/kernel/paravirt.c === --- linux.orig/arch/i386/kernel/paravirt.c +++ linux/arch/i386/kernel/paravirt.c @@ -492,6 +492,7 @@ struct paravirt_ops paravirt_ops = { .patch = native_patch, .banner = default_banner, + .arch_setup = native_nop, .memory_setup = machine_specific_memory_setup, .get_wallclock = native_get_wallclock, @@ -566,4 +567,42 @@ struct paravirt_ops paravirt_ops = { .irq_enable_sysexit = native_irq_enable_sysexit, .iret = native_iret, }; -EXPORT_SYMBOL(paravirt_ops); + +/* + * These are exported to modules: + */ +struct paravirt_ops paravirt_mod_ops = { + .name = bare hardware, + .paravirt_enabled = 0, + .kernel_rpl = 0, + + .patch = native_patch, I don't think you want to leave that one... patching the kernel isn't something modules should be doing. Yeah, this patch is terrible, but I know why Ingo did it; it's my fault for not finishing my version yesterday (but allmodconfig takes a while to build and every change to paravirt.h rebuilds the universe...). So here's my variant, compile-tested with make allmodconfig. Exports individual functions, some of which can be reduced over time as the modules involved are whacked into shape. Note: it reduces the patching space by 1 byte (direct jumps vs indirect jumps). If this a problem for any paravirt_ops backend in practice, we can add noops... Cheers, Rusty. PS. May break with other configurations... Name: don't export paravirt_ops structure, do individual functions Some of the more obscure ones are only used by one or two modules. We can almost certainly reduce them with more work. Signed-off-by: Rusty Russell [EMAIL PROTECTED] diff -r 48f31ae5d7b5 arch/i386/kernel/paravirt.c --- a/arch/i386/kernel/paravirt.c Sat Jan 06 10:32:24 2007 +1100 +++ b/arch/i386/kernel/paravirt.c Sat Jan 06 17:23:12 2007 +1100 @@ -596,6 +596,154 @@ static int __init print_banner(void) return 0; } core_initcall(print_banner); + +unsigned long paravirt_save_flags(void) +{ + return paravirt_ops.save_fl(); +} +EXPORT_SYMBOL(paravirt_save_flags); + +void paravirt_restore_flags(unsigned long flags) +{ + paravirt_ops.restore_fl(flags); +} +EXPORT_SYMBOL(paravirt_restore_flags); + +void paravirt_irq_disable(void) +{ + paravirt_ops.irq_disable(); +} +EXPORT_SYMBOL(paravirt_irq_disable); + +void paravirt_irq_enable(void) +{ + paravirt_ops.irq_enable(); +} +EXPORT_SYMBOL(paravirt_irq_enable); + +void paravirt_io_delay(void) +{ + paravirt_ops.io_delay(); +} +EXPORT_SYMBOL(paravirt_io_delay); + +void paravirt_const_udelay(unsigned long loops) +{ + paravirt_ops.const_udelay(loops); +} +EXPORT_SYMBOL(paravirt_const_udelay); + +u64 paravirt_read_msr(unsigned int msr, int *err) +{ + return paravirt_ops.read_msr(msr, err); +} +EXPORT_SYMBOL(paravirt_read_msr); + +int paravirt_write_msr(unsigned int msr, u64 val) +{ + return paravirt_ops.write_msr(msr, val); +} +EXPORT_SYMBOL(paravirt_write_msr); + +u64 paravirt_read_tsc(void) +{ + return paravirt_ops.read_tsc(); +} +EXPORT_SYMBOL(paravirt_read_tsc); + +int paravirt_enabled(void) +{ + return paravirt_ops.paravirt_enabled; +} +EXPORT_SYMBOL(paravirt_enabled); + +void clts(void) +{ + paravirt_ops.clts(); +} +EXPORT_SYMBOL(clts); + +unsigned long read_cr0(void) +{ + return paravirt_ops.read_cr0(); +} +EXPORT_SYMBOL(read_cr0); + +void write_cr0(unsigned long cr0) +{ + paravirt_ops.write_cr0(cr0); +} +EXPORT_SYMBOL(write_cr0); + +void wbinvd(void) +{ + paravirt_ops.wbinvd(); +} +EXPORT_SYMBOL(wbinvd); + +void raw_safe_halt(void) +{ + paravirt_ops.safe_halt(); +} +EXPORT_SYMBOL(raw_safe_halt); + +void halt(void) +{ + paravirt_ops.safe_halt(); +} +EXPORT_SYMBOL(halt); + +#ifdef CONFIG_X86_LOCAL_APIC +void apic_write(unsigned long reg, unsigned long v) +{ + paravirt_ops.apic_write(reg,v); +} +EXPORT_SYMBOL_GPL(apic_write); + +unsigned long apic_read(unsigned long reg) +{ + return paravirt_ops.apic_read(reg); +} +EXPORT_SYMBOL_GPL(apic_read); +#endif + +void
Re: [patch] paravirt: isolate module ops
* Rusty Russell [EMAIL PROTECTED] wrote: diff -r 48f31ae5d7b5 arch/i386/kernel/paravirt.c --- a/arch/i386/kernel/paravirt.c Sat Jan 06 10:32:24 2007 +1100 +++ b/arch/i386/kernel/paravirt.c Sat Jan 06 17:23:12 2007 +1100 @@ -596,6 +596,154 @@ static int __init print_banner(void) return 0; } core_initcall(print_banner); + +unsigned long paravirt_save_flags(void) +{ + return paravirt_ops.save_fl(); +} +EXPORT_SYMBOL(paravirt_save_flags); ok, i like this one too - i agree that it's better than mine because it isolates on a per-API level not on a per-lowlevel-paravirt-op level. But this doesnt do the most crutial step: the removal of the paravirt_ops export. Without that the module build test is pointless. btw., your patch does not apply to current -git - could you please rebase this patch to the head of your queue so that upstream can pick it up? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
* Ingo Molnar [EMAIL PROTECTED] wrote: this doesnt do the most crutial step: the removal of the paravirt_ops export. [...] ah, you removed it already ... it hid at the very last line of the patch chunk. Good :) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] paravirt: isolate module ops
* Rusty Russell [EMAIL PROTECTED] wrote: +EXPORT_SYMBOL(clts); +EXPORT_SYMBOL(read_cr0); +EXPORT_SYMBOL(write_cr0); mark these a _GPL export. Perhaps even mark the symbol deprecated, to be unexported once we fix raid6. +EXPORT_SYMBOL(wbinvd); +EXPORT_SYMBOL(raw_safe_halt); +EXPORT_SYMBOL(halt); +EXPORT_SYMBOL_GPL(apic_write); +EXPORT_SYMBOL_GPL(apic_read); these should be _GPL too. If any module uses it and breaks a user's box we need that big licensing hint to be able to debug them ... Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/