Re: Module vs Kernel main performacne

2012-06-08 Thread Abu Rasheda
I modified my module (m.c). Still sending buffer from user space using
ioctl, but instead of copying data from buffer provided by user, I have
allocated (kmalloc) a buffer and I copy from this buffer to another kernel
buffer which is allocated each time this module ioclt is invoked.

copy_from_user is now replaced with memcpy. I still see processor stall.
This means the buffer allocated per call is the cause.

Abu
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-06-07 Thread Peter Senna Tschudin
Hello Abu,

I had to include linux/module.h or an error was issued about THIS_MODULE.

What Kernel version are you using? I'm trying to compile it and I'm
getting the error:

[peter@ace m]$ make
make -C /lib/modules/3.3.7-1.fc17.x86_64/build SUBDIRS=`pwd` modules
make[1]: Entering directory `/usr/src/kernels/3.3.7-1.fc17.x86_64'
  CC [M]  /tmp/m/m.o
/tmp/m/m.c:36:2: error: unknown field ‘ioctl’ specified in initializer
/tmp/m/m.c:36:2: warning: initialization from incompatible pointer
type [enabled by default]
/tmp/m/m.c:36:2: warning: (near initialization for ‘m_fops.llseek’)
[enabled by default]
make[2]: *** [/tmp/m/m.o] Error 1
make[1]: *** [_module_/tmp/m] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.3.7-1.fc17.x86_64'
make: *** [module] Error 2

According to:
http://lxr.linux.no/linux+v3.4.1/include/linux/fs.h#L1609

There is no .ioctl at struct file_operations...

Can you share how you've used perf/oprofile on your module/Kernel code?

[]'s

Peter


On Fri, Jun 1, 2012 at 3:52 PM, Abu Rasheda rcpilot2...@gmail.com wrote:
 If the buffer at user side is more then a page, then it may be that
 complete user space buffer is not available in memory and kernel spend time
 in processing page fault


 I have attached code for module and user program. If anyone is bored over
 the weekend they are welcome to try and explain the behavior.

 Abu Rasheda

 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies




-- 
Peter Senna Tschudin
peter.se...@gmail.com
gpg id: 48274C36

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-06-07 Thread Abu Rasheda

 Hello Abu,

 I had to include linux/module.h or an error was issued about
 THIS_MODULE.


I am running this tool on Scientific Linux 6.0, which is 2.6.32 kernel. I
know this is old but this is what I have for my product.


 What Kernel version are you using? I'm trying to compile it and I'm
 getting the error:

 [peter@ace m]$ make
 make -C /lib/modules/3.3.7-1.fc17.x86_64/build SUBDIRS=`pwd` modules
 make[1]: Entering directory `/usr/src/kernels/3.3.7-1.fc17.x86_64'
  CC [M]  /tmp/m/m.o
 /tmp/m/m.c:36:2: error: unknown field ‘ioctl’ specified in initializer
 /tmp/m/m.c:36:2: warning: initialization from incompatible pointer
 type [enabled by default]
 /tmp/m/m.c:36:2: warning: (near initialization for ‘m_fops.llseek’)
 [enabled by default]
 make[2]: *** [/tmp/m/m.o] Error 1
 make[1]: *** [_module_/tmp/m] Error 2
 make[1]: Leaving directory `/usr/src/kernels/3.3.7-1.fc17.x86_64'
 make: *** [module] Error 2

 According to:
 http://lxr.linux.no/linux+v3.4.1/include/linux/fs.h#L1609

 There is no .ioctl at struct file_operations...

 Can you share how you've used perf/oprofile on your module/Kernel code?

 []'s

 Peter


for perf:

perf stat -e
cpu-cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,cache-references,cache-misses,branch-instructions,branch-misses,bus-cycles,cpu-clock,task-clock,page-faults,minor-faults,major-faults,context-switches,cpu-migrations,alignment-faults,emulation-faults,L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores,L1-dcache-store-misses,L1-dcache-prefetches,L1-dcache-prefetch-misses,L1-icache-loads,L1-icache-load-misses,L1-icache-prefetches,L1-icache-prefetch-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches,LLC-prefetch-misses,dTLB-loads,dTLB-load-misses,dTLB-stores,dTLB-store-misses,dTLB-prefetches,dTLB-prefetch-misses,iTLB-loads,iTLB-load-misses,branch-loads,branch-load-misses,syscalls:sys_enter_sendmsg,syscalls:sys_exit_sendmsg,sched:sched_wakeup,sched:sched_stat_sleep
./prog

for oprofile:

# opcontrol --reset
# opcontrol --vmlinux=/boot/vmlinux.64
# opcontrol --start
# ./a.out
# opcontrol --shutdown
# opreport -l -p
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-06-07 Thread Peter Senna Tschudin
Hi again!

On Tue, May 29, 2012 at 8:50 PM, Abu Rasheda rcpilot2...@gmail.com wrote:
 Hi,

 I am working on x8_64 arch. Profiled (oprofile) Linux kernel module
 and notice that whole lot of cycles are spent in copy_from_user call.
 I compared same flow from kernel proper and noticed that for more data
 through put cycles spent in copy_from_user are much less. Kernel
 proper has 1/8 cycles compared to module. (There is a user process
 which keeps sending data, like iperf)

 Used perf tool to gather some statistics and found that call from kernel 
 proper

 185,719,857,837 cpu-cycles               #    3.318 GHz
     [90.01%]
  99,886,030,243 instructions              #    0.54  insns per cycle
       [95.00%]
    1,696,072,702 cache-references     #   30.297 M/sec
   [94.99%]
       786,929,244 cache-misses           #   46.397 % of all cache
 refs     [95.00%]
  16,867,747,688 branch-instructions   #  301.307 M/sec
   [95.03%]
         86,752,646 branch-misses          #    0.51% of all branches
       [95.00%]
    5,482,768,332 bus-cycles                #   97.938 M/sec
        [20.08%]
    55967.269801 cpu-clock
    55981.842225 task-clock                 #    0.933 CPUs utilized

 and call from kernel module

  9,388,787,678 cpu-cycles               #    1.527 GHz
    [89.77%]
  1,706,203,221 instructions             #    0.18  insns per cycle
    [94.59%]
    551,010,961 cache-references    #   89.588 M/sec                   [94.73%]
   369,632,492 cache-misses           #   67.083 % of all cache refs
  [95.18%]
   291,358,658 branch-instructions   #   47.372 M/sec                   
 [94.68%]
    10,291,678 branch-misses           #    3.53% of all branches
   [95.01%]
  582,651,999 bus-cycles                 #   94.733 M/sec
     [20.55%]
  6112.471585 cpu-clock
  6150.490210 task-clock                 #    0.102 CPUs utilized
                367 page-faults                #    0.000 M/sec
                367 minor-faults                #    0.000 M/sec
                    0 major-faults                #    0.000 M/sec
           25,770 context-switches        #    0.004 M/sec
                 23 cpu-migrations            #    0.000 M/sec

How did you call from Kernel module?



 So obviously, CPU is stalling when it is copying data and there are
 more cache misses. My question is, is there a difference calling
 copy_from_user from kernel proper compared to calling from LKM ?

 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

[]'s

-- 
Peter Senna Tschudin
peter.se...@gmail.com
gpg id: 48274C36

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-06-07 Thread Abu Rasheda
peter.se...@gmail.com wrote:

 Hi again!


Hi


 How did you call from Kernel module?


In original code, copied data is dmaed and in experimental code data is
dropped.
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-06-01 Thread Abu Rasheda

 If the buffer at user side is more then a page, then it may be that
 complete user space buffer is not available in memory and kernel spend time
 in processing page fault


I have attached code for module and user program. If anyone is bored over
the weekend they are welcome to try and explain the behavior.

Abu Rasheda


m.tgz
Description: GNU Zip compressed data
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-31 Thread Abu Rasheda
On Wed, May 30, 2012 at 10:35 PM, Mulyadi Santosa
mulyadi.sant...@gmail.com wrote:
 Hi...

 On Thu, May 31, 2012 at 4:44 AM, Abu Rasheda rcpilot2...@gmail.com wrote:
 as I increase size of buffer, insns per cycle keep decreasing. Here is the 
 data:

    1k 0.90  insns per cycle
    8k 0.43  insns per cycle
  43k 0.18  insns per cycle
 100k 0.08  insns per cycle

 Showing that copy_from_user is more efficient when copy data is small,
 why it is so ?

 you meant, the bigger the buffer, the fewer the instructions, right?

yes


 Not sure why, but I am sure it will reach some peak point.

 Anyway, you did kmalloc and then kfree()? I think that's why...bigger
 buffer will grab large chunk from slab...and again likely it's
 physically contigous. Also, it will be placed in the same cache line.

 Whereas the smaller onewill hit allocate/free cycle more...thus
 flushing the L1/L2 cache even more.

It seems to be doing opposite, bigger the allocation / copy longer stall is.

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-31 Thread Chetan Nanda
On May 31, 2012 9:37 PM, Abu Rasheda rcpilot2...@gmail.com wrote:

 On Wed, May 30, 2012 at 10:35 PM, Mulyadi Santosa
 mulyadi.sant...@gmail.com wrote:
  Hi...
 
  On Thu, May 31, 2012 at 4:44 AM, Abu Rasheda rcpilot2...@gmail.com
wrote:
  as I increase size of buffer, insns per cycle keep decreasing. Here is
the data:
 
 1k 0.90  insns per cycle
 8k 0.43  insns per cycle
   43k 0.18  insns per cycle
  100k 0.08  insns per cycle
 
  Showing that copy_from_user is more efficient when copy data is small,
  why it is so ?
 
  you meant, the bigger the buffer, the fewer the instructions, right?

 yes

If the buffer at user side is more then a page, then it may be that
complete user space buffer is not available in memory and kernel spend time
in processing page fault
 
  Not sure why, but I am sure it will reach some peak point.
 
  Anyway, you did kmalloc and then kfree()? I think that's why...bigger
  buffer will grab large chunk from slab...and again likely it's
  physically contigous. Also, it will be placed in the same cache line.
 
  Whereas the smaller onewill hit allocate/free cycle more...thus
  flushing the L1/L2 cache even more.

 It seems to be doing opposite, bigger the allocation / copy longer stall
is.

 ___
 Kernelnewbies mailing list
 Kernelnewbies@kernelnewbies.org
 http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-30 Thread Mulyadi Santosa
Hi...

On Wed, May 30, 2012 at 11:51 AM, Abu Rasheda rcpilot2...@gmail.com wrote:
 When you say, LKM area is prepared with vmalloc is it for code /
 executable you refering too ?

Yes, AFAIK memory area code and static data in linux kernel module is
allocated via vmalloc().

if so will it matter for data copy ?

see my previous reply :)


 Point # 2. Some one was saying that on atleast MIPS it takes more
 cycle to call kernel main function from module because of log jump.
 Does it apply to x86_64 to ?

IIRC long jump means jumping more than 64 KB...but that's in real mode
in 32 bit...so I am not sure whether it still applies in protected
mode.

 To teat above two should I make my module part of static kernel ?

good ideai think you can try that... :)

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-30 Thread Abu Rasheda
I did another experiment.

Wrote a stand alone module and user program which does ioctl and pass
buffer to kernel module.

User program passes a buffer through ioctl and kernel module does
kmalloc on it and calls copy_from_user, kfree and return. Test program
send 120 gigabyte data to module.

If I pass 1k buffer per call, I get

115,396,349,819 instructions  #0.90  insns per cycle
  [95.00%]

as I increase size of buffer, insns per cycle keep decreasing. Here is the data:

1k 0.90  insns per cycle
8k 0.43  insns per cycle
  43k 0.18  insns per cycle
100k 0.08  insns per cycle

Showing that cop_from_user is more efficient when copy data is small,
why it is so ?

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-30 Thread Abu Rasheda
On Wed, May 30, 2012 at 2:44 PM, Abu Rasheda rcpilot2...@gmail.com wrote:
 I did another experiment.

 Wrote a stand alone module and user program which does ioctl and pass
 buffer to kernel module.

 User program passes a buffer through ioctl and kernel module does
 kmalloc on it and calls copy_from_user, kfree and return. Test program
 send 120 gigabyte data to module.

 If I pass 1k buffer per call, I get

 115,396,349,819 instructions              #    0.90  insns per cycle
      [95.00%]

 as I increase size of buffer, insns per cycle keep decreasing. Here is the 
 data:

    1k 0.90  insns per cycle
    8k 0.43  insns per cycle
  43k 0.18  insns per cycle
 100k 0.08  insns per cycle

 Showing that cop_from_user is more efficient when copy data is small,
 why it is so ?

Did another experiment:

User program sending 43k and allocating 43k after entering ioctl and
copy_from_user smaller portion in each call to copy_from_user:
--
copy_from_user  0.25k at a time 0.56  insns per cycle
copy_from_user  0.50k at a time 0.42  insns per cycle
copy_from_user  1.00k at a time 0.36  insns per cycle
copy_from_user  2.00k at a time 0.29  insns per cycle
copy_from_user  3.00k at a time 0.26  insns per cycle
copy_from_user  4.00k at a time 0.23  insns per cycle
copy_from_user  8.00k at a time 0.21  insns per cycle
copy_from_user 16.00k at a time 0.19  insns per cycle


User program sending 43k, allocating smaller chunk and sending that
chunk to call to copy_from_user:
--
Allocated 0.25k and copy_from_user  0.25k at a time 1.04 insns per cycle
Allocated 0.50k and copy_from_user  0.50k at a time 0.90 insns per cycle
Allocated 1.00k and copy_from_user  1.00k at a time 0.79 insns per cycle
Allocated 2.00k and copy_from_user  2.00k at a time 0.67 insns per cycle
Allocated 4.00k and copy_from_user  4.00k at a time 0.53 insns per cycle
Allocated 8.00k and copy_from_user  8.00k at a time 0.42 insns per cycle
Allocated 16.00k and copy_from_user 16.00k at a time 0.33 insns per cycle
Allocated 32.00k and copy_from_user 32.00k at a time 0.22 insns per cycle

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-30 Thread Mulyadi Santosa
Hi...

On Thu, May 31, 2012 at 4:44 AM, Abu Rasheda rcpilot2...@gmail.com wrote:
 as I increase size of buffer, insns per cycle keep decreasing. Here is the 
 data:

    1k 0.90  insns per cycle
    8k 0.43  insns per cycle
  43k 0.18  insns per cycle
 100k 0.08  insns per cycle

 Showing that cop_from_user is more efficient when copy data is small,
 why it is so ?

you meant, the bigger the buffer, the fewer the instructions, right?

Not sure why, but I am sure it will reach some peak point.

Anyway, you did kmalloc and then kfree()? I think that's why...bigger
buffer will grab large chunk from slab...and again likely it's
physically contigous. Also, it will be placed in the same cache line.

Whereas the smaller onewill hit allocate/free cycle more...thus
flushing the L1/L2 cache even more.

CMIIW people...

-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-29 Thread Mulyadi Santosa
Hi...

On Wed, May 30, 2012 at 6:50 AM, Abu Rasheda rcpilot2...@gmail.com wrote:
 So obviously, CPU is stalling when it is copying data and there are
 more cache misses. My question is, is there a difference calling
 copy_from_user from kernel proper compared to calling from LKM ?

Theoritically, it should be the same. However, one thing that might
interest you is that the fact that linux kernel module memory area is
prepared through vmalloc(), thus there is a chance they are not
physically contigous...whereas the main kernel image are using
page_alloc() IIRC thus physically contigous.

What I meant here is, there must be difference speed when you copy
onto something contigous vs non contigous. IIRC at least it will waste
some portion of L1/L2 cache.

Just my 2 cents, maybe I am wrong somewhere...


-- 
regards,

Mulyadi Santosa
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


Re: Module vs Kernel main performacne

2012-05-29 Thread Abu Rasheda
 What I meant here is, there must be difference speed when you copy
 onto something contigous vs non contigous. IIRC at least it will waste
 some portion of L1/L2 cache.

When you say, LKM area is prepared with vmalloc is it for code /
executable you refering too ? if so will it matter for data copy ?

Point # 2. Some one was saying that on atleast MIPS it takes more
cycle to call kernel main function from module because of log jump.
Does it apply to x86_64 to ?

To teat above two should I make my module part of static kernel ?

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies