Query regarding 2.6.335 RT[Ingo's] and Non-RT performance
Hello All, I created a very simple program which has higher priority than normal tasks and runs a tight loop. Under same test environment I ran this program on both non-rt and rt 2.6.33.5 kernel. To my suprise I see that performance of non-RT kernel is better than RT. non-RT kernel took 3 sec and 366156 usec while RT kernel took about 3 sec and 418011 usec.Can someone please explain why the performance of non-rt kernel is better than rt kernel? From the face of the test result, I feel RT has more overhead,Is there any configuration that I could do to bring down the overhead? Processor: processor : 0 cpu : 7448 clock : 996.00MHz revision: 2.2 (pvr 8004 0202) bogomips: 83.10 processor : 1 cpu : 7448 clock : 996.00MHz revision: 2.2 (pvr 8004 0202) bogomips: 83.10 CFS optimization: -- # cat /proc/sys/kernel/sched_rt_runtime_us 100 # cat /proc/sys/kernel/sched_rt_period_us 100 # cat /proc/sys/kernel/sched_compat_yield 1 Test Program: - main() { int sched_rr_min,sched_rr_max; struct sched_param scheduling_parameters; struct timeval tv,late_tv; suseconds_t usec_diff,avg_usec = 0; time_t sec_diff, avg_sec = 0; int i; long count = 1; sched_rr_min = sched_get_priority_min(SCHED_RR); sched_rr_max = sched_get_priority_max(SCHED_RR); scheduling_parameters.sched_priority = sched_rr_min+4; sched_setscheduler(0, SCHED_RR, scheduling_parameters);// Run the process with the given priority for(i = 0 ; i 150 ; i++) { gettimeofday(tv, NULL); while(count 0){ //printf(.); count++; } gettimeofday(late_tv, NULL); count = 1; sec_diff = (late_tv.tv_sec - tv.tv_sec); avg_sec += sec_diff; usec_diff = ( (late_tv.tv_usec tv.tv_usec) ? (late_tv.tv_usec - tv.tv_usec) : ( tv.tv_usec - late_tv.tv_usec)); avg_usec += usec_diff; printf(Iteration #%d sec %x usec %x\n,i,(sec_diff),(usec_diff)); } printf(Average of #%d sec %x usec %x\n,i,(avg_sec/i),(avg_usec)/i); } Partial Result of non-rt kernel: --- Iteration #140 sec 3 usec 3aef8 Iteration #141 sec 3 usec 3aefe Iteration #142 sec 3 usec 3aee4 *Iteration #143 sec 4 usec b935b [Why there is this periodic bump ??] [Scheduler at work??]* Iteration #144 sec 3 usec 3aef2 Iteration #145 sec 3 usec 3aef0 Iteration #146 sec 3 usec 3aef4 *Iteration #147 sec 4 usec b934b* Iteration #148 sec 3 usec 3aeed Iteration #149 sec 3 usec 3aef9 Partial Result of rt kernel: --- Iteration #135 sec 3 usec 47328 *Iteration #136 sec 4 usec ac4fd *Iteration #137 sec 3 usec 48b0b Iteration #138 sec 3 usec 4738c Iteration #139 sec 4 usec ac4d5 Iteration #140 sec 3 usec 483cb Iteration #141 sec 3 usec 48500 *Iteration #142 sec 4 usec acc49 *Iteration #143 sec 3 usec 47c1f Iteration #144 sec 3 usec 478c2 Iteration #145 sec 3 usec 47e48 Iteration #146 sec 4 usec ac9b5 Iteration #147 sec 3 usec 48de4 Iteration #148 sec 3 usec 46fbe Iteration #149 sec 4 usec ac52e Average of #150 sec 3 usec 660db Thanks, Mani -- Thanks, Manik Think twice about a tree before you take a printout ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Looking for a tutorial on the use of the new of_??? init functions
Hello, Is there a tutorial or an HOWTO out somewhere explaining the use of those new of_platform_xxx() and other of_xxx() functions in the init of drivers ? It looks like a very nice way to write drivers in Linux 2-6 but a little help would be welcomed. Regards Christophe ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] fs-enet/mac-fec: Restore multicast and promiscous settings during restart
Signed-off-by: Wolfgang Ocker w...@reccoware.de --- drivers/net/fs_enet/mac-fec.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/fs_enet/mac-fec.c b/drivers/net/fs_enet/mac-fec.c index 7ca1642..05f4bb1 100644 --- a/drivers/net/fs_enet/mac-fec.c +++ b/drivers/net/fs_enet/mac-fec.c @@ -344,6 +344,9 @@ static void restart(struct net_device *dev) FW(fecp, imask, FEC_ENET_TXF | FEC_ENET_TXB | FEC_ENET_RXF | FEC_ENET_RXB); + /* Restore multicast and promiscuous settings */ + set_multicast_list(dev); + /* * And last, enable the transmit and receive processing. */ -- 1.7.2.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: How to use mpc8xxx_gpio.c device driver
On Wed, Aug 11, 2010 at 9:45 PM, MJ embd mj.e...@gmail.com wrote: u can directly access GPIO registers in kernel, by ioremap of GPIO memory mapped registers. you might need to check - muxing of gpio -mj Hi MJ, Thanks for the reply. I tried memory mapping but it fails, here is my code : #include linux/module.h #include linux/errno.h/* error codes */ #include linux/mm.h void __iomem *ioaddr = NULL; static __init int sample_module_init(void) { ioaddr = ioremap(0xFF400C00, 0x24); if(ioaddr == NULL) { printk(KERN_WARNING ioremap failed\n); } printk(KERN_WARNING ioremap successed\n); printk(KERN_WARNING GP1DIR = %u\n, ioread32(ioaddr)); return 0; } static __exit void sample_module_exit(void) { iounmap(ioaddr); } MODULE_LICENSE(GPL); module_init(sample_module_init); module_exit(sample_module_exit); As per the MPC8377ERDB data sheet default IMMRBAR address is 0xFF40_ and offset of GPIO1 is 0C00 and each GPIO has programmable registers that occupy 24 bytes of memory-mapped space, so I mapped from 24bytes (0x18) starting from 0xFF40_0C00 address. But when I tried to read the values from the mapped memory I get the following errors. Is there something I am missing. Any help with reference to MPC8377ERDB board will be highly appreciable. # tftp -l ~/immrbar.ko -r immrbar.ko -g 10.20.50.70 # insmod ./immrbar.ko [ 717.825241] ioremap successed [ 717.849215] Machine check in kernel mode. [ 717.853220] Caused by (from SRR1=41000): Transfer error ack signal [ 717.859405] Oops: Machine check, sig: 7 [#1] [ 717.863668] MPC837x RDB [ 717.866106] Modules linked in: immrbar(+) [ 717.870119] NIP: 0900 LR: d1034054 CTR: c0014d50 [ 717.875079] REGS: cf895d00 TRAP: 0200 Not tainted (2.6.28.9) [ 717.880992] MSR: 00041000 ME CR: 2482 XER: 2000 [ 717.886578] TASK = cf8e8640[647] 'insmod' THREAD: cf894000 [ 717.891882] GPR00: d103404c cf895db0 cf8e8640 23d5 c01e 04f4 0002 [ 717.900265] GPR08: 0001 c0383f3c 23d5 c0014d50 4c72ff56 10019100 1007 77e0 1007ea98 [ 717.908650] GPR16: 10077834 100a 100a 100a bfaf4828 1009 f23c 1cfc [ 717.917034] GPR24: 1d00 1d24 10012008 c03650e8 d1034000 1001 2018 d103 [ 717.925598] NIP [0900] 0x900 [ 717.928828] LR [d1034054] sample_module_init+0x54/0xc0 [immrbar] [ 717.934828] Call Trace: [ 717.937273] [cf895db0] [d103404c] sample_module_init+0x4c/0xc0 [immrbar] (unr eliable) [ 717.945115] [cf895dc0] [c00038a0] do_one_initcall+0x64/0x18c [ 717.950780] [cf895f20] [c004d7b8] sys_init_module+0xac/0x19c [ 717.956441] [cf895f40] [c00122f0] ret_from_syscall+0x0/0x38 [ 717.962013] --- Exception: c01 at 0x48043f6c [ 717.962017] LR = 0x19cc [ 717.969407] Instruction dump: [ 717.972370] XX XX [ 717.980140] 7d5043a6 XX XX [ 717.987919] ---[ end trace a47be794e2873cef ]--- Thanks in advance Ravi Gupta ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Running out of SDHCI quirk space (Re: [PATCH 1/3 v2] sdhci: Add auto CMD12 support for eSDHC driver)
On Tue, Aug 03, 2010 at 04:43:46PM -0700, Andrew Morton wrote: On Tue, 3 Aug 2010 11:11:10 +0800 Roy Zang tie-fei.z...@freescale.com wrote: --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -240,6 +240,8 @@ struct sdhci_host { #define SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN (125) /* Controller cannot support End Attribute in NOP ADMA descriptor */ #define SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC (126) +/* Controller uses Auto CMD12 command to stop the transfer */ +#define SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 (127) This becomes 129 in my tree. We're about to run out. What happens then? I've been wondering for a while now if many of the quirks should be hidden behind function pointers. While we could of course extend the quirk space, I think that's kinda missing the point that quirks are being used too liberally. Take SDHCI_QUIRK_SINGLE_POWER_WRITE in drivers/mmc/host/sdhci.c:sdhci_set_power(). Really, that quirk should probably be hidden inside a set_power() function in the sdhci_ops structure. I'm gonna have a go at trying to remove some of the quirks that don't make sense being quirks. I'll post the series when I'm done. Does anyone think that this approach is crazy? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Flash Programmer Problem in Code Warrior
Hi, I am trying to program NOR flash (M29DW323DT) on MPC8321 board. I have imported the details of the flash into FPDeviceConfig.xml. When I try to run Program/verify flash, it is taking large amount of time(in hours). I could not figure out the reason for that. Kindly let me know the troubleshooting method for this. -- Thanks and Regards Naresh Reddy S. Noida, 9873240342 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: How to use mpc8xxx_gpio.c device driver
On Thu, Aug 12, 2010 at 03:55:49PM +0530, Ravi Gupta wrote: On Wed, Aug 11, 2010 at 9:45 PM, MJ embd mj.e...@gmail.com wrote: u can directly access GPIO registers in kernel, by ioremap of GPIO memory mapped registers. you might need to check - muxing of gpio -mj Hi MJ, Thanks for the reply. I tried memory mapping but it fails, here is my code : #include linux/module.h #include linux/errno.h/* error codes */ #include linux/mm.h void __iomem *ioaddr = NULL; static __init int sample_module_init(void) { ioaddr = ioremap(0xFF400C00, 0x24); if(ioaddr == NULL) { printk(KERN_WARNING ioremap failed\n); } printk(KERN_WARNING ioremap successed\n); printk(KERN_WARNING GP1DIR = %u\n, ioread32(ioaddr)); return 0; } static __exit void sample_module_exit(void) { iounmap(ioaddr); } MODULE_LICENSE(GPL); module_init(sample_module_init); module_exit(sample_module_exit); As per the MPC8377ERDB data sheet default IMMRBAR address is 0xFF40_ and offset of GPIO1 is 0C00 and each GPIO has programmable registers that occupy 24 bytes of memory-mapped space, so I mapped from 24bytes (0x18) starting from 0xFF40_0C00 address. But when I tried to read the values from the mapped memory I get the following errors. Is there something I am missing. Any help with reference to MPC8377ERDB board will be highly appreciable. # tftp -l ~/immrbar.ko -r immrbar.ko -g 10.20.50.70 # insmod ./immrbar.ko [ 717.825241] ioremap successed [ 717.849215] Machine check in kernel mode. [ 717.853220] Caused by (from SRR1=41000): Transfer error ack signal [ 717.859405] Oops: Machine check, sig: 7 [#1] [ 717.863668] MPC837x RDB [ 717.866106] Modules linked in: immrbar(+) [ 717.870119] NIP: 0900 LR: d1034054 CTR: c0014d50 [ 717.875079] REGS: cf895d00 TRAP: 0200 Not tainted (2.6.28.9) [ 717.880992] MSR: 00041000 ME CR: 2482 XER: 2000 [ 717.886578] TASK = cf8e8640[647] 'insmod' THREAD: cf894000 [ 717.891882] GPR00: d103404c cf895db0 cf8e8640 23d5 c01e 04f4 0002 [ 717.900265] GPR08: 0001 c0383f3c 23d5 c0014d50 4c72ff56 10019100 1007 77e0 1007ea98 [ 717.908650] GPR16: 10077834 100a 100a 100a bfaf4828 1009 f23c 1cfc [ 717.917034] GPR24: 1d00 1d24 10012008 c03650e8 d1034000 1001 2018 d103 [ 717.925598] NIP [0900] 0x900 [ 717.928828] LR [d1034054] sample_module_init+0x54/0xc0 [immrbar] [ 717.934828] Call Trace: [ 717.937273] [cf895db0] [d103404c] sample_module_init+0x4c/0xc0 [immrbar] (unr eliable) [ 717.945115] [cf895dc0] [c00038a0] do_one_initcall+0x64/0x18c [ 717.950780] [cf895f20] [c004d7b8] sys_init_module+0xac/0x19c [ 717.956441] [cf895f40] [c00122f0] ret_from_syscall+0x0/0x38 [ 717.962013] --- Exception: c01 at 0x48043f6c [ 717.962017] LR = 0x19cc [ 717.969407] Instruction dump: [ 717.972370] XX XX [ 717.980140] 7d5043a6 XX XX [ 717.987919] ---[ end trace a47be794e2873cef ]--- Looking at the device tree for this board, it appears U-Boot remaps the IMMR registers to 0xe000. They are no longer accessible at 0xff40. I would recommend studying arch/powerpc/boot/dts/mpc8377_rdb.dts in the Linux source code. That describes the device layout on your board after U-Boot has run. A wonderful tool for testing devices from userspace is busybox devmem. It allows you to poke any physical address with any value. The output of busybox devmem --help should get you started. As a quick example, busybox devmem 0xec00 w 0x1 will write the 32-bit value 0x1 to address 0xec00. I would also recommend using the built-in Linux GPIO API. It works, you just need to figure out how to use it. It will be much easier to get your code upstream if you use the provided APIs. The Documentation/gpio.txt file should help you in understanding the in-kernel Linux GPIO API. I'm afraid I don't have much experience other than accessing it via sysfs from userspace. Ira ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Query regarding 2.6.335 RT[Ingo's] and Non-RT performance
On 08/11/2010 06:18 PM, Manikandan Ramachandran wrote: Hello All, I created a very simple program which has higher priority than normal tasks and runs a tight loop. Under same test environment I ran this program on both non-rt and rt 2.6.33.5 kernel. To my suprise I see that performance of non-RT kernel is better than RT. non-RT kernel took 3 sec and 366156 usec while RT kernel took about 3 sec and 418011 usec.Can someone please explain why the performance of non-rt kernel is better than rt kernel? From the face of the test result, I feel RT has more overhead,Is there any configuration that I could do to bring down the overhead? Your surprise is due to your definition of performance. The purpose of the -rt kernels is to reduce the kernel latency. This is important for servicing hardware. Normal users find the -rt useful for audio/video applications. Engineering and scientific users find the -rt beneficially for servicing hardware like sensors or control systems. If you are just trying to run calculations as fast as you can in user space, you'd be better off using the non-rt variants. -- Jeff Angielski The PTR Group www.theptrgroup.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On Mon, 09 Aug 2010 12:53:00 -0500 Nathan Fontenot nf...@austin.ibm.com wrote: This set of patches de-couples the idea that there is a single directory in sysfs for each memory section. The intent of the patches is to reduce the number of sysfs directories created to resolve a boot-time performance issue. On very large systems boot time are getting very long (as seen on powerpc hardware) due to the enormous number of sysfs directories being created. On a system with 1 TB of memory we create ~63,000 directories. For even larger systems boot times are being measured in hours. And those hours are mainly due to this problem, I assume. This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. What you're proposing appears to be a non-back-compatible userspace-visible change. This is a big issue! It's not an unresolvable issue, as this is a must-fix problem. But you should tell us what your proposal is to prevent breakage of existing installations. A Kconfig option would be good, but a boot-time kernel command line option which selects the new format would be much better. However you didn't mention this issue at all, and it's the most important one. Updates for version 5 of the patchset include the following: Patch 4/8 Add mutex for add/remove of memory blocks - Define the mutex using DEFINE_MUTEX macro. Patch 8/8 Update memory-hotplug documentation - Add information concerning memory holes in phys_index..end_phys_index. And you forgot to tell us how long those machines boot with the patchset applied, which is the entire point of the patchset! ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections
On Thu, 2010-08-12 at 12:08 -0700, Andrew Morton wrote: This set of patches allows for each directory created in sysfs to cover more than one memory section. The default behavior for sysfs directory creation is the same, in that each directory represents a single memory section. A new file 'end_phys_index' in each directory contains the physical_id of the last memory section covered by the directory so that users can easily determine the memory section range of a directory. What you're proposing appears to be a non-back-compatible userspace-visible change. This is a big issue! Nathan, one thought to get around this at the moment would be to bump up the size that we export in /sys/devices/system/memory/block_size_bytes. I think you have already done most of the hard work to accomplish this. You can still add the end_phys_index stuff. But, for now, it would always be equal to start_phys_index. -- Dave ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Question about dma_direct_ops in PowerPC.
We have a board with PCI device driver that calls for pci_dma_sync_single_for_device. This driver used to work for Linux kernel 2.6.25. We ported to the driver to Linux kernel 2.6.32. The PCI device driver doesn't work anymore. The following call trace shows why the PCI driver won't work in kernel 2.6.32. 1. In pci_include/asm-generic/pci-dma-compat.h pci_dma_sync_single_for_device calls for dma_sync_single_for_cpu 2. In include/asm-generic/dma-mapping-common.h dma_sync_single_for_cpu calls for ops-sync_single_for_cpu 3. In arch/powerpc/kernel/dma.c struct dma_map_ops dma_direct_ops = { .alloc_coherent = dma_direct_alloc_coherent, .free_coherent = dma_direct_free_coherent, .map_sg = dma_direct_map_sg, .unmap_sg = dma_direct_unmap_sg, .dma_supported = dma_direct_dma_supported, .map_page = dma_direct_map_page, .unmap_page = dma_direct_unmap_page, #ifdef CONFIG_NOT_COHERENT_CACHE .sync_single_range_for_cpu = dma_direct_sync_single_range, .sync_single_range_for_device = dma_direct_sync_single_range, .sync_sg_for_cpu= dma_direct_sync_sg, .sync_sg_for_device = dma_direct_sync_sg, #endif }; There is no ops defined for sync_single_for_cpu. The pci_dma_sync_single_for_device is a no-op. However Linux kernel 2.6.35.1 from kernel.org has the .sync_single_for_cpu for dma_direct_ops. in arch/powerpc/kernel/dma.c #ifdef CONFIG_NOT_COHERENT_CACHE .sync_single_for_cpu= dma_direct_sync_single, .sync_single_for_device = dma_direct_sync_single, .sync_sg_for_cpu= dma_direct_sync_sg, .sync_sg_for_device = dma_direct_sync_sg, #endif We won't move to Linux kernel 2.6.35 anytime soon. My questions: 1. Is there any side effect for adding .sync_single_for_cpu to dma_direct_ops in 2.6.32? 2. What will be the future development here? Best regards Thanks, Fushen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Query regarding 2.6.335 RT[Ingo's] and Non-RT performance
On Thu, Aug 12, 2010 at 12:53 PM, Jeff Angielski j...@theptrgroup.com wrote: On 08/11/2010 06:18 PM, Manikandan Ramachandran wrote: Hello All, I created a very simple program which has higher priority than normal tasks and runs a tight loop. Under same test environment I ran this program on both non-rt and rt 2.6.33.5 kernel. To my suprise I see that performance of non-RT kernel is better than RT. non-RT kernel took 3 sec and 366156 usec while RT kernel took about 3 sec and 418011 usec.Can someone please explain why the performance of non-rt kernel is better than rt kernel? From the face of the test result, I feel RT has more overhead,Is there any configuration that I could do to bring down the overhead? Your surprise is due to your definition of performance. The purpose of the -rt kernels is to reduce the kernel latency. This is important for servicing hardware. Normal users find the -rt useful for audio/video applications. Engineering and scientific users find the -rt beneficially for servicing hardware like sensors or control systems. If you are just trying to run calculations as fast as you can in user space, you'd be better off using the non-rt variants. -- Jeff Angielski The PTR Group www.theptrgroup.com ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev true, in most cases non-rt will have better performance/throughput, while rt's major goal is to have better latency for high priority tasks. also true is that, rt kernel will have more overhead. xianghua ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: Add support for popcnt instructions
POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step this patch does all the work out of line, but it would be nice to implement them as inlines with an out of line fallback. The performance issue with hweight was noticed when disabling SMT on a large (192 thread) POWER7 box. The patch improves that testcase by about 8%. Signed-off-by: Anton Blanchard an...@samba.org --- Index: powerpc.git/arch/powerpc/include/asm/cputable.h === --- powerpc.git.orig/arch/powerpc/include/asm/cputable.h2010-08-13 11:19:42.691991439 +1000 +++ powerpc.git/arch/powerpc/include/asm/cputable.h 2010-08-13 11:24:55.510741618 +1000 @@ -199,6 +199,8 @@ extern const char *powerpc_base_platform #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080) #define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0100) #define CPU_FTR_STCX_CHECKS_ADDRESSLONG_ASM_CONST(0x0200) +#define CPU_FTR_POPCNTB LONG_ASM_CONST(0x0400) +#define CPU_FTR_POPCNTD LONG_ASM_CONST(0x0800) #ifndef __ASSEMBLY__ @@ -403,21 +405,22 @@ extern const char *powerpc_base_platform CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ - CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS) + CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS | \ + CPU_FTR_POPCNTB) #define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \ - CPU_FTR_STCX_CHECKS_ADDRESS) + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB) #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \ - CPU_FTR_STCX_CHECKS_ADDRESS) + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD) #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ Index: powerpc.git/arch/powerpc/lib/Makefile === --- powerpc.git.orig/arch/powerpc/lib/Makefile 2010-08-13 11:19:43.653241065 +1000 +++ powerpc.git/arch/powerpc/lib/Makefile 2010-08-13 11:19:45.930743841 +1000 @@ -18,7 +18,7 @@ obj-$(CONFIG_HAS_IOMEM) += devres.o obj-$(CONFIG_PPC64)+= copypage_64.o copyuser_64.o \ memcpy_64.o usercopy_64.o mem_64.o string.o \ - checksum_wrappers_64.o + checksum_wrappers_64.o hweight_64.o obj-$(CONFIG_XMON) += sstep.o ldstfp.o obj-$(CONFIG_KPROBES) += sstep.o ldstfp.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o Index: powerpc.git/arch/powerpc/lib/hweight_64.S === --- /dev/null 1970-01-01 00:00:00.0 + +++ powerpc.git/arch/powerpc/lib/hweight_64.S 2010-08-13 11:19:45.940741462 +1000 @@ -0,0 +1,110 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2010 + * + * Author: Anton Blanchard an...@au.ibm.com + */ +#include asm/processor.h +#include asm/ppc_asm.h + +/* Note: This code relies on -mminimal-toc */ + +_GLOBAL(__arch_hweight8) +BEGIN_FTR_SECTION + b .__sw_hweight8 + nop + nop +FTR_SECTION_ELSE + popcntb r3,r3 + clrldi r3,r3,64-8 + blr +ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB) + +_GLOBAL(__arch_hweight16) +BEGIN_FTR_SECTION + b .__sw_hweight16 + nop + nop + nop + nop +FTR_SECTION_ELSE + BEGIN_FTR_SECTION_NESTED(50) +
Re: [PATCH] powerpc: Add support for popcnt instructions
On Fri, 2010-08-13 at 12:28 +1000, Anton Blanchard wrote: POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step this patch does all the work out of line, but it would be nice to implement them as inlines with an out of line fallback. The performance issue with hweight was noticed when disabling SMT on a large (192 thread) POWER7 box. The patch improves that testcase by about 8%. Especially from modules it will suck big time. If kept out of line they should probably be linked-in with each module, but I'd rather have them inlined. Cheers, Ben. Signed-off-by: Anton Blanchard an...@samba.org --- Index: powerpc.git/arch/powerpc/include/asm/cputable.h === --- powerpc.git.orig/arch/powerpc/include/asm/cputable.h 2010-08-13 11:19:42.691991439 +1000 +++ powerpc.git/arch/powerpc/include/asm/cputable.h 2010-08-13 11:24:55.510741618 +1000 @@ -199,6 +199,8 @@ extern const char *powerpc_base_platform #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080) #define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0100) #define CPU_FTR_STCX_CHECKS_ADDRESS LONG_ASM_CONST(0x0200) +#define CPU_FTR_POPCNTB LONG_ASM_CONST(0x0400) +#define CPU_FTR_POPCNTD LONG_ASM_CONST(0x0800) #ifndef __ASSEMBLY__ @@ -403,21 +405,22 @@ extern const char *powerpc_base_platform CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ - CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS) + CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS | \ + CPU_FTR_POPCNTB) #define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \ - CPU_FTR_STCX_CHECKS_ADDRESS) + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB) #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYM_SMT | \ - CPU_FTR_STCX_CHECKS_ADDRESS) + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD) #define CPU_FTRS_CELL(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \ Index: powerpc.git/arch/powerpc/lib/Makefile === --- powerpc.git.orig/arch/powerpc/lib/Makefile2010-08-13 11:19:43.653241065 +1000 +++ powerpc.git/arch/powerpc/lib/Makefile 2010-08-13 11:19:45.930743841 +1000 @@ -18,7 +18,7 @@ obj-$(CONFIG_HAS_IOMEM) += devres.o obj-$(CONFIG_PPC64) += copypage_64.o copyuser_64.o \ memcpy_64.o usercopy_64.o mem_64.o string.o \ -checksum_wrappers_64.o +checksum_wrappers_64.o hweight_64.o obj-$(CONFIG_XMON) += sstep.o ldstfp.o obj-$(CONFIG_KPROBES)+= sstep.o ldstfp.o obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o Index: powerpc.git/arch/powerpc/lib/hweight_64.S === --- /dev/null 1970-01-01 00:00:00.0 + +++ powerpc.git/arch/powerpc/lib/hweight_64.S 2010-08-13 11:19:45.940741462 +1000 @@ -0,0 +1,110 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2010 + * + * Author: Anton Blanchard an...@au.ibm.com + */ +#include asm/processor.h +#include asm/ppc_asm.h + +/* Note: This code relies on -mminimal-toc */ + +_GLOBAL(__arch_hweight8) +BEGIN_FTR_SECTION + b .__sw_hweight8 + nop + nop +FTR_SECTION_ELSE + popcntb r3,r3 +
Re: [PATCH] powerpc: Add support for popcnt instructions
Hi, Especially from modules it will suck big time. If kept out of line they should probably be linked-in with each module, but I'd rather have them inlined. Inlining would be good, but this is as far as I can take this for now. If someone else is interested go for it :) Anton ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev