Query regarding 2.6.335 RT[Ingo's] and Non-RT performance

2010-08-12 Thread Manikandan Ramachandran
Hello All,

I created a very simple program which has higher priority than normal
tasks and runs a tight loop. Under same test environment I ran this
program on both non-rt and rt 2.6.33.5 kernel.  To my suprise I see that
performance of non-RT kernel is better than RT. non-RT kernel took 3 sec and
366156 usec while RT kernel took about 3 sec and 418011 usec.Can someone
please explain why the performance of non-rt kernel is better than rt
kernel? From the face of the test result, I feel RT has more overhead,Is
there any configuration that I could do to bring down the overhead?

Processor:

processor   : 0
cpu : 7448
clock   : 996.00MHz
revision: 2.2 (pvr 8004 0202)
bogomips: 83.10
processor   : 1
cpu : 7448
clock   : 996.00MHz
revision: 2.2 (pvr 8004 0202)
bogomips: 83.10

CFS optimization:
--
# cat /proc/sys/kernel/sched_rt_runtime_us
100
# cat /proc/sys/kernel/sched_rt_period_us
100
# cat /proc/sys/kernel/sched_compat_yield
1

Test Program:
-

main()
{

int sched_rr_min,sched_rr_max;
struct sched_param scheduling_parameters;
struct timeval tv,late_tv;
suseconds_t usec_diff,avg_usec = 0;
time_t sec_diff, avg_sec = 0;
int i;
long count = 1;

sched_rr_min = sched_get_priority_min(SCHED_RR);
sched_rr_max = sched_get_priority_max(SCHED_RR);
scheduling_parameters.sched_priority = sched_rr_min+4;
sched_setscheduler(0, SCHED_RR, scheduling_parameters);// Run the
process with the given priority


for(i = 0 ; i  150 ; i++) {
   gettimeofday(tv, NULL);
   while(count  0){
//printf(.);
count++;
   }
   gettimeofday(late_tv, NULL);
   count = 1;
   sec_diff = (late_tv.tv_sec - tv.tv_sec);
   avg_sec += sec_diff;
   usec_diff = ( (late_tv.tv_usec  tv.tv_usec) ? (late_tv.tv_usec -
tv.tv_usec) : ( tv.tv_usec - late_tv.tv_usec));
   avg_usec += usec_diff;
   printf(Iteration #%d sec %x usec %x\n,i,(sec_diff),(usec_diff));
}
   printf(Average of #%d sec %x usec %x\n,i,(avg_sec/i),(avg_usec)/i);
}

Partial Result of non-rt kernel:
---

Iteration #140 sec 3 usec 3aef8
Iteration #141 sec 3 usec 3aefe
Iteration #142 sec 3 usec 3aee4
*Iteration #143 sec 4 usec b935b  [Why there is this periodic bump ??]
[Scheduler at work??]*
Iteration #144 sec 3 usec 3aef2
Iteration #145 sec 3 usec 3aef0
Iteration #146 sec 3 usec 3aef4
*Iteration #147 sec 4 usec b934b*
Iteration #148 sec 3 usec 3aeed
Iteration #149 sec 3 usec 3aef9

 Partial Result of rt kernel:
---
Iteration #135 sec 3 usec 47328
*Iteration #136 sec 4 usec ac4fd
*Iteration #137 sec 3 usec 48b0b
Iteration #138 sec 3 usec 4738c
Iteration #139 sec 4 usec ac4d5
Iteration #140 sec 3 usec 483cb
Iteration #141 sec 3 usec 48500
*Iteration #142 sec 4 usec acc49
*Iteration #143 sec 3 usec 47c1f
Iteration #144 sec 3 usec 478c2
Iteration #145 sec 3 usec 47e48
Iteration #146 sec 4 usec ac9b5
Iteration #147 sec 3 usec 48de4
Iteration #148 sec 3 usec 46fbe
Iteration #149 sec 4 usec ac52e
Average of #150 sec 3 usec 660db

Thanks,
Mani


-- 
Thanks,
Manik

Think twice about a tree before you take a printout
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Looking for a tutorial on the use of the new of_??? init functions

2010-08-12 Thread LEROY Christophe

Hello,

Is there a tutorial or an HOWTO out somewhere explaining the use of 
those new of_platform_xxx() and other of_xxx() functions in the init of 
drivers ? It looks like a very nice way to write drivers in Linux 2-6 
but a little help would be welcomed.


Regards
Christophe
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] fs-enet/mac-fec: Restore multicast and promiscous settings during restart

2010-08-12 Thread Wolfgang Ocker
Signed-off-by: Wolfgang Ocker w...@reccoware.de
---
 drivers/net/fs_enet/mac-fec.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/fs_enet/mac-fec.c b/drivers/net/fs_enet/mac-fec.c
index 7ca1642..05f4bb1 100644
--- a/drivers/net/fs_enet/mac-fec.c
+++ b/drivers/net/fs_enet/mac-fec.c
@@ -344,6 +344,9 @@ static void restart(struct net_device *dev)
FW(fecp, imask, FEC_ENET_TXF | FEC_ENET_TXB |
   FEC_ENET_RXF | FEC_ENET_RXB);
 
+   /* Restore multicast and promiscuous settings */
+   set_multicast_list(dev);
+
/*
 * And last, enable the transmit and receive processing.
 */
-- 
1.7.2.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: How to use mpc8xxx_gpio.c device driver

2010-08-12 Thread Ravi Gupta
On Wed, Aug 11, 2010 at 9:45 PM, MJ embd mj.e...@gmail.com wrote:

 u can directly access GPIO registers in kernel, by ioremap of GPIO
 memory mapped registers.
 you might need to check
 - muxing of gpio

 -mj


Hi MJ,

Thanks for the reply.
I tried memory mapping but it fails, here is my code :

#include linux/module.h
#include linux/errno.h/* error codes */
#include linux/mm.h

void __iomem *ioaddr = NULL;

static __init int sample_module_init(void)
{
ioaddr = ioremap(0xFF400C00, 0x24);
if(ioaddr == NULL) {
printk(KERN_WARNING ioremap failed\n);
}
printk(KERN_WARNING ioremap successed\n);
printk(KERN_WARNING GP1DIR = %u\n, ioread32(ioaddr));
return 0;
}

static __exit void sample_module_exit(void)
{
iounmap(ioaddr);
}

MODULE_LICENSE(GPL);
module_init(sample_module_init);
module_exit(sample_module_exit);

As per the MPC8377ERDB data sheet default IMMRBAR address is 0xFF40_ and
offset of GPIO1 is 0C00 and each GPIO has programmable registers that occupy
24 bytes of memory-mapped space, so I mapped from 24bytes (0x18) starting
from 0xFF40_0C00 address. But when I tried to read the values from the
mapped memory I get the following errors. Is there something I am missing.
Any help with reference to MPC8377ERDB board will be highly appreciable.

# tftp -l ~/immrbar.ko -r immrbar.ko -g 10.20.50.70
# insmod ./immrbar.ko
[  717.825241] ioremap successed
[  717.849215] Machine check in kernel mode.
[  717.853220] Caused by (from SRR1=41000): Transfer error ack signal
[  717.859405] Oops: Machine check, sig: 7 [#1]
[  717.863668] MPC837x RDB
[  717.866106] Modules linked in: immrbar(+)
[  717.870119] NIP: 0900 LR: d1034054 CTR: c0014d50
[  717.875079] REGS: cf895d00 TRAP: 0200   Not tainted  (2.6.28.9)
[  717.880992] MSR: 00041000 ME  CR: 2482  XER: 2000
[  717.886578] TASK = cf8e8640[647] 'insmod' THREAD: cf894000
[  717.891882] GPR00: d103404c cf895db0 cf8e8640  23d5 
c01e
04f4 0002
[  717.900265] GPR08: 0001 c0383f3c 23d5 c0014d50 4c72ff56 10019100
1007
77e0 1007ea98
[  717.908650] GPR16: 10077834 100a 100a 100a bfaf4828 
1009
f23c 1cfc
[  717.917034] GPR24: 1d00 1d24 10012008 c03650e8  d1034000
1001
2018 d103
[  717.925598] NIP [0900] 0x900
[  717.928828] LR [d1034054] sample_module_init+0x54/0xc0 [immrbar]
[  717.934828] Call Trace:
[  717.937273] [cf895db0] [d103404c] sample_module_init+0x4c/0xc0 [immrbar]
(unr
eliable)
[  717.945115] [cf895dc0] [c00038a0] do_one_initcall+0x64/0x18c
[  717.950780] [cf895f20] [c004d7b8] sys_init_module+0xac/0x19c
[  717.956441] [cf895f40] [c00122f0] ret_from_syscall+0x0/0x38
[  717.962013] --- Exception: c01 at 0x48043f6c
[  717.962017] LR = 0x19cc
[  717.969407] Instruction dump:
[  717.972370]      
 XX
XX
[  717.980140]     7d5043a6 
 XX
XX
[  717.987919] ---[ end trace a47be794e2873cef ]---

Thanks in advance
Ravi Gupta
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Running out of SDHCI quirk space (Re: [PATCH 1/3 v2] sdhci: Add auto CMD12 support for eSDHC driver)

2010-08-12 Thread Matt Fleming
On Tue, Aug 03, 2010 at 04:43:46PM -0700, Andrew Morton wrote:
 On Tue, 3 Aug 2010 11:11:10 +0800
 Roy Zang tie-fei.z...@freescale.com wrote:
 
  --- a/drivers/mmc/host/sdhci.h
  +++ b/drivers/mmc/host/sdhci.h
  @@ -240,6 +240,8 @@ struct sdhci_host {
   #define SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN  (125)
   /* Controller cannot support End Attribute in NOP ADMA descriptor */
   #define SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC  (126)
  +/* Controller uses Auto CMD12 command to stop the transfer */
  +#define SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 (127)
 
 This becomes 129 in my tree.
 
 We're about to run out.  What happens then?

I've been wondering for a while now if many of the quirks should be
hidden behind function pointers. While we could of course extend the
quirk space, I think that's kinda missing the point that quirks are
being used too liberally. Take SDHCI_QUIRK_SINGLE_POWER_WRITE in
drivers/mmc/host/sdhci.c:sdhci_set_power(). Really, that quirk should
probably be hidden inside a set_power() function in the sdhci_ops
structure.

I'm gonna have a go at trying to remove some of the quirks that don't
make sense being quirks. I'll post the series when I'm done.

Does anyone think that this approach is crazy?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Flash Programmer Problem in Code Warrior

2010-08-12 Thread Naresh Reddy Sankapelly
Hi,
I am trying to program NOR flash (M29DW323DT) on MPC8321 board. I have
imported the details of the flash into FPDeviceConfig.xml. When I try to run
Program/verify flash, it is taking large amount of time(in hours). I could
not figure out the reason for that. Kindly let me know the troubleshooting
method for this.

-- 
Thanks and Regards
Naresh Reddy S.
Noida, 9873240342
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: How to use mpc8xxx_gpio.c device driver

2010-08-12 Thread Ira W. Snyder
On Thu, Aug 12, 2010 at 03:55:49PM +0530, Ravi Gupta wrote:
 On Wed, Aug 11, 2010 at 9:45 PM, MJ embd mj.e...@gmail.com wrote:
 
  u can directly access GPIO registers in kernel, by ioremap of GPIO
  memory mapped registers.
  you might need to check
  - muxing of gpio
 
  -mj
 
 
 Hi MJ,
 
 Thanks for the reply.
 I tried memory mapping but it fails, here is my code :
 
 #include linux/module.h
 #include linux/errno.h/* error codes */
 #include linux/mm.h
 
 void __iomem *ioaddr = NULL;
 
 static __init int sample_module_init(void)
 {
 ioaddr = ioremap(0xFF400C00, 0x24);
 if(ioaddr == NULL) {
 printk(KERN_WARNING ioremap failed\n);
 }
 printk(KERN_WARNING ioremap successed\n);
 printk(KERN_WARNING GP1DIR = %u\n, ioread32(ioaddr));
 return 0;
 }
 
 static __exit void sample_module_exit(void)
 {
 iounmap(ioaddr);
 }
 
 MODULE_LICENSE(GPL);
 module_init(sample_module_init);
 module_exit(sample_module_exit);
 
 As per the MPC8377ERDB data sheet default IMMRBAR address is 0xFF40_ and
 offset of GPIO1 is 0C00 and each GPIO has programmable registers that occupy
 24 bytes of memory-mapped space, so I mapped from 24bytes (0x18) starting
 from 0xFF40_0C00 address. But when I tried to read the values from the
 mapped memory I get the following errors. Is there something I am missing.
 Any help with reference to MPC8377ERDB board will be highly appreciable.
 
 # tftp -l ~/immrbar.ko -r immrbar.ko -g 10.20.50.70
 # insmod ./immrbar.ko
 [  717.825241] ioremap successed
 [  717.849215] Machine check in kernel mode.
 [  717.853220] Caused by (from SRR1=41000): Transfer error ack signal
 [  717.859405] Oops: Machine check, sig: 7 [#1]
 [  717.863668] MPC837x RDB
 [  717.866106] Modules linked in: immrbar(+)
 [  717.870119] NIP: 0900 LR: d1034054 CTR: c0014d50
 [  717.875079] REGS: cf895d00 TRAP: 0200   Not tainted  (2.6.28.9)
 [  717.880992] MSR: 00041000 ME  CR: 2482  XER: 2000
 [  717.886578] TASK = cf8e8640[647] 'insmod' THREAD: cf894000
 [  717.891882] GPR00: d103404c cf895db0 cf8e8640  23d5 
 c01e
 04f4 0002
 [  717.900265] GPR08: 0001 c0383f3c 23d5 c0014d50 4c72ff56 10019100
 1007
 77e0 1007ea98
 [  717.908650] GPR16: 10077834 100a 100a 100a bfaf4828 
 1009
 f23c 1cfc
 [  717.917034] GPR24: 1d00 1d24 10012008 c03650e8  d1034000
 1001
 2018 d103
 [  717.925598] NIP [0900] 0x900
 [  717.928828] LR [d1034054] sample_module_init+0x54/0xc0 [immrbar]
 [  717.934828] Call Trace:
 [  717.937273] [cf895db0] [d103404c] sample_module_init+0x4c/0xc0 [immrbar]
 (unr
 eliable)
 [  717.945115] [cf895dc0] [c00038a0] do_one_initcall+0x64/0x18c
 [  717.950780] [cf895f20] [c004d7b8] sys_init_module+0xac/0x19c
 [  717.956441] [cf895f40] [c00122f0] ret_from_syscall+0x0/0x38
 [  717.962013] --- Exception: c01 at 0x48043f6c
 [  717.962017] LR = 0x19cc
 [  717.969407] Instruction dump:
 [  717.972370]      
  XX
 XX
 [  717.980140]     7d5043a6 
  XX
 XX
 [  717.987919] ---[ end trace a47be794e2873cef ]---
 

Looking at the device tree for this board, it appears U-Boot remaps the
IMMR registers to 0xe000. They are no longer accessible at
0xff40.

I would recommend studying arch/powerpc/boot/dts/mpc8377_rdb.dts in the
Linux source code. That describes the device layout on your board after
U-Boot has run.

A wonderful tool for testing devices from userspace is busybox devmem.
It allows you to poke any physical address with any value. The output of
busybox devmem --help should get you started. As a quick example,
busybox devmem 0xec00 w 0x1 will write the 32-bit value 0x1 to
address 0xec00.

I would also recommend using the built-in Linux GPIO API. It works, you
just need to figure out how to use it. It will be much easier to get
your code upstream if you use the provided APIs.

The Documentation/gpio.txt file should help you in understanding the
in-kernel Linux GPIO API. I'm afraid I don't have much experience other
than accessing it via sysfs from userspace.

Ira
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Query regarding 2.6.335 RT[Ingo's] and Non-RT performance

2010-08-12 Thread Jeff Angielski

On 08/11/2010 06:18 PM, Manikandan Ramachandran wrote:

Hello All,
 I created a very simple program which has higher priority than
normal tasks and runs a tight loop. Under same test environment I ran
this program on both non-rt and rt 2.6.33.5 kernel.  To my suprise I see
that performance of non-RT kernel is better than RT. non-RT kernel took
3 sec and 366156 usec while RT kernel took about 3 sec and 418011
usec.Can someone please explain why the performance of non-rt kernel is
better than rt kernel? From the face of the test result, I feel RT has
more overhead,Is there any configuration that I could do to bring down
the overhead?


Your surprise is due to your definition of performance.

The purpose of the -rt kernels is to reduce the kernel latency.  This is 
important for servicing hardware.  Normal users find the -rt useful for 
audio/video applications.  Engineering and scientific users find the -rt 
beneficially for servicing hardware like sensors or control systems.


If you are just trying to run calculations as fast as you can in user 
space, you'd be better off using the non-rt variants.



--
Jeff Angielski
The PTR Group
www.theptrgroup.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections

2010-08-12 Thread Andrew Morton
On Mon, 09 Aug 2010 12:53:00 -0500
Nathan Fontenot nf...@austin.ibm.com wrote:

 This set of patches de-couples the idea that there is a single
 directory in sysfs for each memory section.  The intent of the
 patches is to reduce the number of sysfs directories created to
 resolve a boot-time performance issue.  On very large systems
 boot time are getting very long (as seen on powerpc hardware)
 due to the enormous number of sysfs directories being created.
 On a system with 1 TB of memory we create ~63,000 directories.
 For even larger systems boot times are being measured in hours.

And those hours are mainly due to this problem, I assume.

 This set of patches allows for each directory created in sysfs
 to cover more than one memory section.  The default behavior for
 sysfs directory creation is the same, in that each directory
 represents a single memory section.  A new file 'end_phys_index'
 in each directory contains the physical_id of the last memory
 section covered by the directory so that users can easily
 determine the memory section range of a directory.

What you're proposing appears to be a non-back-compatible
userspace-visible change.  This is a big issue!

It's not an unresolvable issue, as this is a must-fix problem.  But you
should tell us what your proposal is to prevent breakage of existing
installations.  A Kconfig option would be good, but a boot-time kernel
command line option which selects the new format would be much better.

However you didn't mention this issue at all, and it's the most
important one.


 Updates for version 5 of the patchset include the following:
 
 Patch 4/8 Add mutex for add/remove of memory blocks
 - Define the mutex using DEFINE_MUTEX macro.
 
 Patch 8/8 Update memory-hotplug documentation
 - Add information concerning memory holes in phys_index..end_phys_index.

And you forgot to tell us how long those machines boot with the
patchset applied, which is the entire point of the patchset!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/8] v5 De-couple sysfs memory directories from memory sections

2010-08-12 Thread Dave Hansen
On Thu, 2010-08-12 at 12:08 -0700, Andrew Morton wrote:
  This set of patches allows for each directory created in sysfs
  to cover more than one memory section.  The default behavior for
  sysfs directory creation is the same, in that each directory
  represents a single memory section.  A new file 'end_phys_index'
  in each directory contains the physical_id of the last memory
  section covered by the directory so that users can easily
  determine the memory section range of a directory.
 
 What you're proposing appears to be a non-back-compatible
 userspace-visible change.  This is a big issue! 

Nathan, one thought to get around this at the moment would be to bump up
the size that we export in /sys/devices/system/memory/block_size_bytes.
I think you have already done most of the hard work to accomplish
this.  

You can still add the end_phys_index stuff.  But, for now, it would
always be equal to start_phys_index.

-- Dave

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Question about dma_direct_ops in PowerPC.

2010-08-12 Thread Fushen Chen
We have a board with PCI device driver that calls for
pci_dma_sync_single_for_device.
This driver used to work for Linux kernel 2.6.25.

We ported to the driver to Linux kernel 2.6.32. The PCI device driver
doesn't work anymore.
The following call trace shows why the PCI driver won't work in kernel
2.6.32.
1. In pci_include/asm-generic/pci-dma-compat.h
pci_dma_sync_single_for_device calls for dma_sync_single_for_cpu
2. In include/asm-generic/dma-mapping-common.h
dma_sync_single_for_cpu calls for ops-sync_single_for_cpu
3. In arch/powerpc/kernel/dma.c
struct dma_map_ops dma_direct_ops = {
.alloc_coherent = dma_direct_alloc_coherent,
.free_coherent  = dma_direct_free_coherent,
.map_sg = dma_direct_map_sg,
.unmap_sg   = dma_direct_unmap_sg,
.dma_supported  = dma_direct_dma_supported,
.map_page   = dma_direct_map_page,
.unmap_page = dma_direct_unmap_page,
#ifdef CONFIG_NOT_COHERENT_CACHE
.sync_single_range_for_cpu  = dma_direct_sync_single_range,
.sync_single_range_for_device   = dma_direct_sync_single_range,
.sync_sg_for_cpu= dma_direct_sync_sg,
.sync_sg_for_device = dma_direct_sync_sg,
#endif
};
There is no ops defined for sync_single_for_cpu.
The pci_dma_sync_single_for_device is a no-op.

However Linux kernel 2.6.35.1 from kernel.org has the  .sync_single_for_cpu
for dma_direct_ops.
in arch/powerpc/kernel/dma.c
#ifdef CONFIG_NOT_COHERENT_CACHE
.sync_single_for_cpu= dma_direct_sync_single,
.sync_single_for_device = dma_direct_sync_single,
.sync_sg_for_cpu= dma_direct_sync_sg,
.sync_sg_for_device = dma_direct_sync_sg,
#endif


We won't move to Linux kernel 2.6.35 anytime soon.
My questions:
1. Is there any side effect for adding .sync_single_for_cpu to
dma_direct_ops in 2.6.32?
2. What will be the future development here?


Best regards  Thanks,
Fushen
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Query regarding 2.6.335 RT[Ingo's] and Non-RT performance

2010-08-12 Thread Xianghua Xiao
On Thu, Aug 12, 2010 at 12:53 PM, Jeff Angielski j...@theptrgroup.com wrote:
 On 08/11/2010 06:18 PM, Manikandan Ramachandran wrote:

 Hello All,
     I created a very simple program which has higher priority than
 normal tasks and runs a tight loop. Under same test environment I ran
 this program on both non-rt and rt 2.6.33.5 kernel.  To my suprise I see
 that performance of non-RT kernel is better than RT. non-RT kernel took
 3 sec and 366156 usec while RT kernel took about 3 sec and 418011
 usec.Can someone please explain why the performance of non-rt kernel is
 better than rt kernel? From the face of the test result, I feel RT has
 more overhead,Is there any configuration that I could do to bring down
 the overhead?

 Your surprise is due to your definition of performance.

 The purpose of the -rt kernels is to reduce the kernel latency.  This is
 important for servicing hardware.  Normal users find the -rt useful for
 audio/video applications.  Engineering and scientific users find the -rt
 beneficially for servicing hardware like sensors or control systems.

 If you are just trying to run calculations as fast as you can in user space,
 you'd be better off using the non-rt variants.


 --
 Jeff Angielski
 The PTR Group
 www.theptrgroup.com
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


true, in most cases non-rt will have better performance/throughput,
while rt's major goal is to have better latency for high priority
tasks. also true is that, rt kernel will have more overhead.

xianghua
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: Add support for popcnt instructions

2010-08-12 Thread Anton Blanchard

POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step
this patch does all the work out of line, but it would be nice to implement
them as inlines with an out of line fallback.

The performance issue with hweight was noticed when disabling SMT on a large
(192 thread) POWER7 box. The patch improves that testcase by about 8%.

Signed-off-by: Anton Blanchard an...@samba.org
---

Index: powerpc.git/arch/powerpc/include/asm/cputable.h
===
--- powerpc.git.orig/arch/powerpc/include/asm/cputable.h2010-08-13 
11:19:42.691991439 +1000
+++ powerpc.git/arch/powerpc/include/asm/cputable.h 2010-08-13 
11:24:55.510741618 +1000
@@ -199,6 +199,8 @@ extern const char *powerpc_base_platform
 #define CPU_FTR_UNALIGNED_LD_STD   LONG_ASM_CONST(0x0080)
 #define CPU_FTR_ASYM_SMT   LONG_ASM_CONST(0x0100)
 #define CPU_FTR_STCX_CHECKS_ADDRESSLONG_ASM_CONST(0x0200)
+#define CPU_FTR_POPCNTB
LONG_ASM_CONST(0x0400)
+#define CPU_FTR_POPCNTD
LONG_ASM_CONST(0x0800)
 
 #ifndef __ASSEMBLY__
 
@@ -403,21 +405,22 @@ extern const char *powerpc_base_platform
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
-   CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS)
+   CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS | \
+   CPU_FTR_POPCNTB)
 #define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \
-   CPU_FTR_STCX_CHECKS_ADDRESS)
+   CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB)
 #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
-   CPU_FTR_STCX_CHECKS_ADDRESS)
+   CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
Index: powerpc.git/arch/powerpc/lib/Makefile
===
--- powerpc.git.orig/arch/powerpc/lib/Makefile  2010-08-13 11:19:43.653241065 
+1000
+++ powerpc.git/arch/powerpc/lib/Makefile   2010-08-13 11:19:45.930743841 
+1000
@@ -18,7 +18,7 @@ obj-$(CONFIG_HAS_IOMEM)   += devres.o
 
 obj-$(CONFIG_PPC64)+= copypage_64.o copyuser_64.o \
   memcpy_64.o usercopy_64.o mem_64.o string.o \
-  checksum_wrappers_64.o
+  checksum_wrappers_64.o hweight_64.o
 obj-$(CONFIG_XMON) += sstep.o ldstfp.o
 obj-$(CONFIG_KPROBES)  += sstep.o ldstfp.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += sstep.o ldstfp.o
Index: powerpc.git/arch/powerpc/lib/hweight_64.S
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ powerpc.git/arch/powerpc/lib/hweight_64.S   2010-08-13 11:19:45.940741462 
+1000
@@ -0,0 +1,110 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2010
+ *
+ * Author: Anton Blanchard an...@au.ibm.com
+ */
+#include asm/processor.h
+#include asm/ppc_asm.h
+
+/* Note: This code relies on -mminimal-toc */
+
+_GLOBAL(__arch_hweight8)
+BEGIN_FTR_SECTION
+   b .__sw_hweight8
+   nop
+   nop
+FTR_SECTION_ELSE
+   popcntb r3,r3
+   clrldi  r3,r3,64-8
+   blr
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_POPCNTB)
+
+_GLOBAL(__arch_hweight16)
+BEGIN_FTR_SECTION
+   b .__sw_hweight16
+   nop
+   nop
+   nop
+   nop
+FTR_SECTION_ELSE
+  BEGIN_FTR_SECTION_NESTED(50)
+   

Re: [PATCH] powerpc: Add support for popcnt instructions

2010-08-12 Thread Benjamin Herrenschmidt
On Fri, 2010-08-13 at 12:28 +1000, Anton Blanchard wrote:
 POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step
 this patch does all the work out of line, but it would be nice to implement
 them as inlines with an out of line fallback.
 
 The performance issue with hweight was noticed when disabling SMT on a large
 (192 thread) POWER7 box. The patch improves that testcase by about 8%.

Especially from modules it will suck big time. If kept out of line they
should probably be linked-in with each module, but I'd rather have them
inlined.

Cheers,
Ben.

 Signed-off-by: Anton Blanchard an...@samba.org
 ---
 
 Index: powerpc.git/arch/powerpc/include/asm/cputable.h
 ===
 --- powerpc.git.orig/arch/powerpc/include/asm/cputable.h  2010-08-13 
 11:19:42.691991439 +1000
 +++ powerpc.git/arch/powerpc/include/asm/cputable.h   2010-08-13 
 11:24:55.510741618 +1000
 @@ -199,6 +199,8 @@ extern const char *powerpc_base_platform
  #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080)
  #define CPU_FTR_ASYM_SMT LONG_ASM_CONST(0x0100)
  #define CPU_FTR_STCX_CHECKS_ADDRESS  LONG_ASM_CONST(0x0200)
 +#define CPU_FTR_POPCNTB  
 LONG_ASM_CONST(0x0400)
 +#define CPU_FTR_POPCNTD  
 LONG_ASM_CONST(0x0800)
  
  #ifndef __ASSEMBLY__
  
 @@ -403,21 +405,22 @@ extern const char *powerpc_base_platform
   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
   CPU_FTR_MMCRA | CPU_FTR_SMT | \
   CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
 - CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS)
 + CPU_FTR_PURR | CPU_FTR_STCX_CHECKS_ADDRESS | \
 + CPU_FTR_POPCNTB)
  #define CPU_FTRS_POWER6 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
   CPU_FTR_MMCRA | CPU_FTR_SMT | \
   CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
   CPU_FTR_DSCR | CPU_FTR_UNALIGNED_LD_STD | \
 - CPU_FTR_STCX_CHECKS_ADDRESS)
 + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB)
  #define CPU_FTRS_POWER7 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
   CPU_FTR_MMCRA | CPU_FTR_SMT | \
   CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
   CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
   CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
 - CPU_FTR_STCX_CHECKS_ADDRESS)
 + CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD)
  #define CPU_FTRS_CELL(CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
   CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
   CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
 Index: powerpc.git/arch/powerpc/lib/Makefile
 ===
 --- powerpc.git.orig/arch/powerpc/lib/Makefile2010-08-13 
 11:19:43.653241065 +1000
 +++ powerpc.git/arch/powerpc/lib/Makefile 2010-08-13 11:19:45.930743841 
 +1000
 @@ -18,7 +18,7 @@ obj-$(CONFIG_HAS_IOMEM) += devres.o
  
  obj-$(CONFIG_PPC64)  += copypage_64.o copyuser_64.o \
  memcpy_64.o usercopy_64.o mem_64.o string.o \
 -checksum_wrappers_64.o
 +checksum_wrappers_64.o hweight_64.o
  obj-$(CONFIG_XMON)   += sstep.o ldstfp.o
  obj-$(CONFIG_KPROBES)+= sstep.o ldstfp.o
  obj-$(CONFIG_HAVE_HW_BREAKPOINT) += sstep.o ldstfp.o
 Index: powerpc.git/arch/powerpc/lib/hweight_64.S
 ===
 --- /dev/null 1970-01-01 00:00:00.0 +
 +++ powerpc.git/arch/powerpc/lib/hweight_64.S 2010-08-13 11:19:45.940741462 
 +1000
 @@ -0,0 +1,110 @@
 +/*
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License as published by
 + * the Free Software Foundation; either version 2 of the License, or
 + * (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 + *
 + * Copyright (C) IBM Corporation, 2010
 + *
 + * Author: Anton Blanchard an...@au.ibm.com
 + */
 +#include asm/processor.h
 +#include asm/ppc_asm.h
 +
 +/* Note: This code relies on -mminimal-toc */
 +
 +_GLOBAL(__arch_hweight8)
 +BEGIN_FTR_SECTION
 + b .__sw_hweight8
 + nop
 + nop
 +FTR_SECTION_ELSE
 + popcntb r3,r3
 + 

Re: [PATCH] powerpc: Add support for popcnt instructions

2010-08-12 Thread Anton Blanchard
 
Hi,

 Especially from modules it will suck big time. If kept out of line they
 should probably be linked-in with each module, but I'd rather have them
 inlined.

Inlining would be good, but this is as far as I can take this for now.
If someone else is interested go for it :)

Anton
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev