from:"Vincent"

[PATCH] gitignore: ignore hz.bc

2013-03-23 Thread Vincent

Signed-off-by: Vincent Stehlé 
---
 kernel/.gitignore |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/.gitignore b/kernel/.gitignore
index ab4f109..b3097bd 100644
--- a/kernel/.gitignore
+++ b/kernel/.gitignore
@@ -4,3 +4,4 @@
 config_data.h
 config_data.gz
 timeconst.h
+hz.bc
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

mount /mnt/cdrom ok!but ls segmentation fault...

2000-11-18 Thread Vincent


Hi all,
Using linux-2.4.0-test11-pre7 right now..., here's what i did,
mount /mnt/cdrom
cd /mnt/cdrom
ls
Segmentation fault
ls
*NOT Responding*
can't kill /sbin/ls
can't umount /mnt/cdrom
ps , shows ;

613 ?D  0:00 /bin/ls --color=auto -F -b -T 0
   ^

i didn't want to reboot...
CDRom door is locked..

BTW, what does D mean in ps?

thanks in advance,

-
Regards, Vincent <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: mount /mnt/cdrom ok!but ls segmentation fault...

2000-11-19 Thread Vincent

"Albert D. Cahalan" wrote:
> 
> The 'D' means that the process is running uninterruptable kernel
> code that should never take long to execute. Usually it means
> the process is doing disk IO.
> 
> To find where process 613 is stuck, do this:
> 
> ps -p 613 -o comm,stat,f,pcpu,nwchan,wchan

  361 pts/1D  0:00 /bin/ls --color=auto -F -b -T 0
t77@darkstar:~$ ps -p 361 -o comm,stat,f,pcpu,nwchan,wchan
COMMAND  STAT   F %CPU  WCHAN WCHAN
ls   D000  0.0 107951 down
^ no idea... :p since i am a
newbie,
is there anyway of killing such a process?
root@darkstar:~# umount /mnt/cdrom1
umount: /mnt/cdrom1: device is busy
root@darkstar:~# umount -f /mnt/cdrom1
umount2: Device or resource busy
umount: /mnt/cdrom1: device is busy

After playing around with ls ,i found that acutally executing /bin/ls
is ok, only because of the default alias of ls is alias ls='/bin/ls
$LS_OPTIONS' then 
ls will crash...and thus make the cdrom useless.

When ls /mnt/cdrom , from a virtual terminal there are extended kernel
error messages which i don't 
know howto copy the error message into memory or save it into a file.
Where 
if i 'ls /mnt/cdrom' from a gnome-terminal the error message is just
Segmentation fault.

from /var/log/syslog after "ls /mnt/cdrom"
Nov 19 19:46:47 darkstar kernel: Unable to handle kernel paging request
at virtual address dfdfdfc4
Nov 19 19:46:47 darkstar kernel: *pde = 

Unable to handle kernel paging request at virtual address dfdfdfc4
*pde = 
Oops: 
CPU: 0
EIP: 0010:[]
EFLAGS: 00010202 rest went off the screen
i've tried "ls >~/tmp/err.out" , it didn't work just a 0byte file.

hmmm, ok here it's in dmesg|less

Unable to handle kernel paging request at virtual address dfdfdfc4
 printing eip:
c486d5a7
*pde = 
Oops: 
CPU:0
EIP:0010:[]
EFLAGS: 00010202
eax: dfdfdf00   ebx: c2976960   ecx: c1ddb800   edx: c23f5c00
esi: c1ddb800   edi: c1ddb821   ebp: c233fba0   esp: c15b9eb0
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 229, stackpage=c15b9000)
Stack: c2976960 c486a2bf c1ddb800 c2976960 c27f8000 c10a9df0 c1b3d140
c2976960 
   c1b3d140 0001 c01e1818 0022 0022  0b976960
0800 
   22994000 c486a3dd c2976960 c1b3d140 c27f8000 c27f8400 fff4
c1b3d140 
Call Trace: [] [] [] []
[] [] [] 
   [] 
Code: 8b 90 c4 00 00 00 80 b8 b4 00 00 00 00 74 1e 68 00 10 00 00 
lines 76-116/116 (END) 

thank you for reply,
-
Regards, Vincent <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

PROBLEM: isofs crash on 2.4.0-test11-pre7 [1.] MAINTAINERS: ISO FILESYSTEM

2000-11-19 Thread Vincent


[2.]  Full description of the problem/report:
using gnome-terminal, the default alias ls='/bin/ls $LS_OPTIONS'
#mount /mnt/cdrom
#cd /mnt/cdrom
#ls
Segmentation fault
#ls

root@darkstar:~# umount /mnt/cdrom
umount: /mnt/cdrom: device is busy
root@darkstar:~# umount -f /mnt/cdrom
umount2: Device or resource busy
umount: /mnt/cdrom: device is busy
#ps ax
...
  361 ?D  0:00 /bin/ls --color=auto -F -b -T 0
...
#kill -9 361
#ps ax
...
  361 ?D  0:00 /bin/ls --color=auto -F -b -T 0
...
CDROM is now unusable...

[3.] Keywords (i.e., modules, networking, kernel):
Module: isofs
Networking: ppp dialup
Kernel: 2.4.0-test11-pre7

[4.] Kernel version (from /proc/version):
t77@darkstar:~$ cat /proc/version
Linux version 2.4.0-test11 (t77@darkstar) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 Sat Nov 18 16:23:40 EST 2000

[5.] Output of Oops.. message
ksymoops 2.3.5 on i686 2.4.0-test11.  Options used
 -V (default)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0-test11/ (default)
 -m /boot/System.map (specified)

Unable to handle kernel paging request at virtual address dfdfdfc4
c486d5a7
*pde = 
Oops: 
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: dfdfdf00   ebx: c2976960   ecx: c1ddb800   edx: c23f5c00
esi: c1ddb800   edi: c1ddb821   ebp: c233fba0   esp: c15b9eb0
ds: 0018   es: 0018   ss: 0018
Process ls (pid: 229, stackpage=c15b9000)
Stack: c2976960 c486a2bf c1ddb800 c2976960 c27f8000 c10a9df0 c1b3d140
c2976960 
   c1b3d140 0001 c01e1818 0022 0022  0b976960
0800 
   22994000 c486a3dd c2976960 c1b3d140 c27f8000 c27f8400 fff4
c1b3d140 
Call Trace: [] [] [] []
[] [] [] 
   [] 
Code: 8b 90 c4 00 00 00 80 b8 b4 00 00 00 00 74 1e 68 00 10 00 00 

>>EIP; c486d5a7 <[isofs]get_joliet_filename+13/87>   <=
Trace; c486a2bf <[isofs]__module_using_checksums+bd/19e>
Trace; c486a3dd <[isofs]isofs_lookup+3d/88>
Trace; c013502b 
Trace; c0135788 
Trace; c0134dc7 
Trace; c0135d90 <__user_walk+3c/58>
Trace; c0132a26 
Trace; c0108daf 
Code;  c486d5a7 <[isofs]get_joliet_filename+13/87>
 <_EIP>:
Code;  c486d5a7 <[isofs]get_joliet_filename+13/87>   <=
   0:   8b 90 c4 00 00 00 movl   0xc4(%eax),%edx   <=
Code;  c486d5ad <[isofs]get_joliet_filename+19/87>
   6:   80 b8 b4 00 00 00 00  cmpb   $0x0,0xb4(%eax)
Code;  c486d5b4 <[isofs]get_joliet_filename+20/87>
   d:   74 1e je 2d <_EIP+0x2d> c486d5d4
<[isofs]get_joliet_filename+40/87>
Code;  c486d5b6 <[isofs]get_joliet_filename+22/87>
   f:   68 00 10 00 00pushl  $0x1000

[6.] A small shell script or example program which triggers the problem
(if possible)
none...


[7.] Environment
[7.1.] Software (add the output of the ver_linux script here)

t77@darkstar:~$ ver_linux
-- Versions installed: (if some fields are empty or looks
-- unusual then possibly you have very old versions)
Linux darkstar 2.4.0-test11 #1 Sat Nov 18 16:23:40 EST 2000 i686 unknown
Kernel modules found
Gnu C  egcs-2.91.66
Binutils   2.9.1.0.25
Linux C Library..
Dynamic Linker (ld.so) 1.9.9
ls: /usr/lib/libg++.so: No such file or directory
Procps 2.0.6
Mount  2.10l
Net-tools  (2000-05-21)
Kbd0.99
Sh-utils   2.0
Sh-utils   gJC
Sh-utils   
Sh-utils   Inc.
Sh-utils   NO
Sh-utils   PURPOSE.


[7.2.] Processor information (from /proc/cpuinfo):
t77@darkstar:~$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 3
model name  : Pentium II (Klamath)
stepping: 4
cpu MHz : 233.000866
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
features: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov mmx
bogomips: 466.94

[7.3.] Module information (from /proc/modules):
t77@darkstar:~$ cat /proc/modules
nls_cp950  98432   1 (autoclean)
sr_mod 12000   1 (autoclean)
cdrom  27360   0 (autoclean) [sr_mod]
isofs  18384   1 (autoclean)
ppp_deflate40672   1 (autoclean)
bsd_comp4160   0 (autoclean)
ipchains   31392   0 (unused)
ide-scsi7984   1
scsi_mod   56640   2 [sr_mod ide-scsi]
emu10k145184   0
soundcore   3888   4 [emu10k1]
ppp_async   6512   1
ppp_generic13056   2 [ppp_deflate bsd_comp ppp_async]
slhc4688   1 [ppp_generic]

[7.4.] Loaded driver and hardware information (/proc/ioports,
/proc/iomem)
t77@darkstar:~$ cat /proc/ioports
-001f : dma1
0020-003f : pic1
0040-005f :

Re: [PATCH] Typo in test11-pre7 isofs/namei.c

2000-11-19 Thread Vincent


Tom Leete wrote:
> 
> Hi,
> 
> The second and third arguments of get_joliet_filename() are swapped.
> 
> Tom
> 
> --- linux-2.4.0-test11/fs/isofs/namei.c.origSat Nov 18 01:55:55 2000
> +++ linux-2.4.0-test11/fs/isofs/namei.c Sat Nov 18 07:08:05 2000
> @@ -127,7 +127,7 @@
> dpnt = tmpname;
>  #ifdef CONFIG_JOLIET
> } else if (dir->i_sb->u.isofs_sb.s_joliet_level) {
> -   dlen = get_joliet_filename(de, dir, tmpname);
> +   dlen = get_joliet_filename(de, tmpname, dir);
> dpnt = tmpname;
>  #endif
> } else if (dir->i_sb->u.isofs_sb.s_mapping == 'a') {
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] topology: Fix compilation warning when not in SMP

2014-04-05 Thread Vincent


On 04/05/2014 01:49 AM, Greg Kroah-Hartman wrote:

Warnings aren't a stable kernel issue, so why would this be relevant
there?


Oh, sorry about that. I'll go re-read the stable kernel rules again.

Shall I re-post without the stable Cc:, for only mainline and next?

Best regards,

V.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH linux-next] staging: r8192ee: Adapt flush function prototype

2014-06-20 Thread Vincent


On 06/20/2014 02:19 AM, Greg Kroah-Hartman wrote:
(..)

This doesn't apply as I think it's already done part of a merge...


You are right, it seems to be in f9da455b93f6.

Thanks for your concern!

Best regards,

V.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux next: boot on Wandboard broken by gic related change

2014-12-01 Thread Vincent

Hi,

FYI, I noticed that Linux next would not boot anymore on Wandboard i.MX6
quad. This, since next-20141127.

After bisecting for a while, `git bisect run' pointed at this very commit:

  9a1091ef0017c40ab63e7fc0326b2dcfd4dde3a4
  irqchip: gic: Support hierarchy irq domain.

Indeed, reverting this commit on top of Linux next-20141201 repairs the
boot.

I am afraid I cannot debug this, but I would gladly help test patches :)

Best regard,

V.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: KVM Disk i/o or VM activities causes soft lockup?

2012-11-26 Thread Vincent Li

On Mon, Nov 26, 2012 at 2:58 AM, Stefan Hajnoczi  wrote:
> On Fri, Nov 23, 2012 at 10:34:16AM -0800, Vincent Li wrote:
>> On Thu, Nov 22, 2012 at 11:29 PM, Stefan Hajnoczi  wrote:
>> > On Wed, Nov 21, 2012 at 03:36:50PM -0800, Vincent Li wrote:
>> >> We have users running on redhat based distro (Kernel
>> >> 2.6.32-131.21.1.el6.x86_64 ) with kvm, when customer made cron job
>> >> script to copy large files between kvm guest or some other user space
>> >> program leads to disk i/o or VM activities, users get following soft
>> >> lockup message from console:
>> >>
>> >> Nov 17 13:44:46 slot1/luipaard100a err kernel: BUG: soft lockup -
>> >> CPU#4 stuck for 61s! [qemu-kvm:6795]
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: Modules linked in:
>> >> ebt_vlan nls_utf8 isofs ebtable_filter ebtables 8021q garp bridge stp
>> >> llc ipt_REJECT iptable_filter xt_NOTRACK nf_conntrack iptable_raw
>> >> ip_tables loop ext2 binfmt_misc hed womdict(U) vnic(U) parport_pc lp
>> >> parport predis(U) lasthop(U) ipv6 toggler vhost_net tun kvm_intel kvm
>> >> jiffies(U) sysstats hrsleep i2c_dev datastor(U) linux_user_bde(P)(U)
>> >> linux_kernel_bde(P)(U) tg3 libphy serio_raw i2c_i801 i2c_core ehci_hcd
>> >> raid1 raid0 virtio_pci virtio_blk virtio virtio_ring mvsas libsas
>> >> scsi_transport_sas mptspi mptscsih mptbase scsi_transport_spi 3w_9xxx
>> >> sata_svw(U) ahci serverworks sata_sil ata_piix libata sd_mod
>> >> crc_t10dif amd74xx piix ide_gd_mod ide_core dm_snapshot dm_mirror
>> >> dm_region_hash dm_log dm_mod ext3 jbd mbcache
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: Pid: 6795, comm:
>> >> qemu-kvm Tainted: P   
>> >> 2.6.32-131.21.1.el6.f5.x86_64 #1
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: Call Trace:
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: 
>> >> [] ? get_timestamp+0x9/0xf
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? watchdog_timer_fn+0x130/0x178
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? __run_hrtimer+0xa3/0xff
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? hrtimer_interrupt+0xe6/0x190
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? hrtimer_interrupt+0xa9/0x190
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? hpet_interrupt_handler+0x26/0x2d
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? hrtimer_peek_ahead_timers+0x9/0xd
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? __do_softirq+0xc5/0x17a
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? call_softirq+0x1c/0x28
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? do_softirq+0x31/0x66
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? call_function_interrupt+0x13/0x20
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: 
>> >> [] ? vmx_get_msr+0x0/0x123 [kvm_intel]
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? kvm_arch_vcpu_ioctl_run+0x80e/0xaf1 [kvm]
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? kvm_arch_vcpu_ioctl_run+0x802/0xaf1 [kvm]
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? inode_has_perm+0x65/0x72
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? kvm_vcpu_ioctl+0xf2/0x5ba [kvm]
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? file_has_perm+0x9a/0xac
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? vfs_ioctl+0x21/0x6b
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? do_vfs_ioctl+0x487/0x4da
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? sys_ioctl+0x51/0x70
>> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel:
>> >> [] ? system_call_fastpath+0x3c/0x41
>> >
>> > This soft lockup is report on the host?
>> >
>> > Stefan
>>
>> Yes, it is on host. we just recommend users not doing large file
>> copying, just wondering if there is potential kernel bug. it seems the
>> softlockup backtrace pointing to hrtimer and softirq. my naive
>> knowledge is that the watchdog thread is on top of hrtimer which is on
>> top of softirq.
>
> Since the soft lockup detector is firing on the host, this seems like a
> hardware/driver problem.  Have you ever had soft lockups running non-KVM
> workloads on this host?
>
> Stefan

this soft lockup only triggers when running KVM, also users used
another script in cron job to restart 4 kvm instance every 5 mintues (
insane to me) that also causing tons of softlock up message during the
kvm instance startup . we have already told customer stop doing that
and the softlockup message disappear.

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2012-11-27 Thread Vincent Guittot

On 27 November 2012 06:19, Viresh Kumar  wrote:
> Hi Tejun,
>
> On 26 November 2012 22:45, Tejun Heo  wrote:
>> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote:
>
>> I'm pretty skeptical about this.  queue_work() w/o explicit CPU
>> assignment has always guaranteed that the work item will be executed
>> on the same CPU.  I don't think there are too many users depending on
>> that but am not sure at all that there are none.  I asked you last
>> time that you would at the very least need to audit most users but it
>> doesn't seem like there has been any effort there.
>
> My bad. I completely missed/forgot that comment from your earlier mails.
> Will do it.
>
>> That said, if the obtained benefit is big enough, sure, we can
>> definitely change the behavior, which isn't all that great to begin
>> with, and try to shake out the bugs quicky by e.g. forcing all work
>> items to execute on different CPUs, but you're presenting a few
>> percent of work items being migrated to a different CPU from an
>> already active CPU, which doesn't seem like such a big benefit to me
>> even if the migration target CPU is somewhat more efficient.  How much
>> powersaving are we talking about?
>
> Hmm.. I actually implemented the problem discussed here:
> (I know you have seen this earlier :) )
>
> http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf
>
> Specifically slides: 12 & 19.
>
> I haven't done much power calculations with it and have tested it more from
> functionality point of view.
>
> @Vincent: Can you add some comments here?

Sorry for this late reply.

We have faced some situations on TC2 (as an example) where the tasks
are running in the LITTLE cluster whereas some periodic works stay on
the big cluster so we can have one cluster that wakes up for tasks and
another one that wakes up for work. We would like to consolidate the
behaviour of the work with the tasks behaviour.

Sorry, I don't have relevant figures as the patches are used with
other ones which also impact the power consumption.

This series introduces the possibility to run a work on another CPU
which is necessary if we want a better correlation of task and work
scheduling on the system. Most of the time the queue_work is used when
a driver don't mind the CPU on which you want to run whereas it looks
like it should be used only if you want to run locally. We would like
to solve this point with the new interface that is proposed by viresh

Vincent

>
> --
> viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2012-11-27 Thread Vincent Guittot

On 27 November 2012 14:59, Steven Rostedt  wrote:
> On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
>> On 27 November 2012 18:56, Steven Rostedt  wrote:
>> > A couple of things. The sched_select_cpu() is not cheap. It has a double
>> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
>> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
>> > searching 1023 CPUs before we come up with our own.
>>
>> Not sure if you missed the first check sched_select_cpu()
>>
>> +int sched_select_cpu(unsigned int sd_flags)
>> +{
>> +   /* If Current cpu isn't idle, don't migrate anything */
>> +   if (!idle_cpu(cpu))
>> +   return cpu;
>>
>> We aren't going to search if we aren't idle.
>
> OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
> heh we are idle we can spin. But then why go through this in the first
> place ;-)

By migrating it now, it will create its activity and wake up on the
right CPU next time.

If migrating on any CPUs seems a bit risky, we could restrict the
migration on a CPU on the same node. We can pass such contraints on
sched_select_cpu

>
>
>>
>> > Also, I really don't like this as a default behavior. It seems that this
>> > solution is for a very special case, and this can become very intrusive
>> > for the normal case.
>>
>> We tried with an KCONFIG option for it, which Tejun rejected.
>
> Yeah, I saw that. I don't like adding KCONFIG options either. Best is to
> get something working that doesn't add any regressions. If you can get
> this to work without making *any* regressions in the normal case than
> I'm totally fine with that. But if this adds any issues with the normal
> case, then it's a show stopper.
>
>>
>> > To be honest, I'm uncomfortable with this approach. It seems to be
>> > fighting a symptom and not the disease. I'd rather find a way to keep
>> > work from being queued on wrong CPU. If it is a timer, find a way to
>> > move the timer. If it is something else, lets work to fix that. Doing
>> > searches of possibly all CPUs (unlikely, but it is there), just seems
>> > wrong to me.
>>
>> As Vincent pointed out, on big LITTLE systems we just don't want to
>> serve works on big cores. That would be wasting too much of power.
>> Specially if we are going to wake up big cores.
>>
>> It would be difficult to control the source driver (which queues work) to
>> little cores. We thought, if somebody wanted to queue work on current
>> cpu then they must use queue_work_on().
>
> As Tejun has mentioned earlier, is there any assumptions anywhere that
> expects an unbounded work queue to not migrate? Where per cpu variables
> might be used. Tejun had a good idea of forcing this to migrate the work
> *every* time. To not let a work queue run on the same CPU that it was
> queued on. If it can survive that, then it is probably OK. Maybe add a
> config option that forces this? That way, anyone can test that this
> isn't an issue.
>
> -- Steve
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one

2012-11-27 Thread Vincent Guittot

On 27 November 2012 16:04, Steven Rostedt  wrote:
> On Tue, 2012-11-27 at 15:55 +0100, Vincent Guittot wrote:
>> On 27 November 2012 14:59, Steven Rostedt  wrote:
>> > On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote:
>> >> On 27 November 2012 18:56, Steven Rostedt  wrote:
>> >> > A couple of things. The sched_select_cpu() is not cheap. It has a double
>> >> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs,
>> >> > and we are CPU 1023 and all other CPUs happen to be idle, we could be
>> >> > searching 1023 CPUs before we come up with our own.
>> >>
>> >> Not sure if you missed the first check sched_select_cpu()
>> >>
>> >> +int sched_select_cpu(unsigned int sd_flags)
>> >> +{
>> >> +   /* If Current cpu isn't idle, don't migrate anything */
>> >> +   if (!idle_cpu(cpu))
>> >> +   return cpu;
>> >>
>> >> We aren't going to search if we aren't idle.
>> >
>> > OK, we are idle, but CPU 1022 isn't. We still need a large search. But,
>> > heh we are idle we can spin. But then why go through this in the first
>> > place ;-)
>>
>> By migrating it now, it will create its activity and wake up on the
>> right CPU next time.
>>
>> If migrating on any CPUs seems a bit risky, we could restrict the
>> migration on a CPU on the same node. We can pass such contraints on
>> sched_select_cpu
>>
>
> That's assuming that the CPUs stay idle. Now if we move the work to
> another CPU and it goes idle, then it may move that again. It could end
> up being a ping pong approach.
>
> I don't think idle is a strong enough heuristic for the general case. If
> interrupts are constantly going off on a CPU that happens to be idle
> most of the time, it will constantly be moving work onto CPUs that are
> currently doing real work, and by doing so, it will be slowing those
> CPUs down.
>

I agree that idle is probably not enough but it's the heuristic that
is currently used for selecting a CPU for a timer and the timer also
uses sched_select_cpu in this series. So in order to go step by step,
a common interface has been introduced for selecting a CPU and this
function uses the same algorithm than the timer already do.
Once we agreed on an interface, the heuristic could be updated.


> -- Steve
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/3] mfd: stmpe: Update DT support in stmpe driver

2012-11-27 Thread Rabin Vincent

2012/11/27 Viresh Kumar :
> On 27 November 2012 14:10, Lee Jones  wrote:
> I haven't seen this in any of SPEAr boards i have worked on. Maybe Rabin
> would have, that's why he added that part of code :)
>
> @Rabin/Linus: Do you remember why have you added this in stmpe driver:
>
> +   if (stmpe->pdata->irq_invert_polarity)
> +   icr ^= STMPE_ICR_LSB_HIGH;
> +
>
> Does somebody actually need it?

It was (as irq_rev_pol) part of Luotao Fu's proposed STMPE811 patchset
(https://patchwork.kernel.org/patch/106173/) which I integrated into my
version of the STMPE driver, which didn't have it in its initial version
(https://patchwork.kernel.org/patch/103273/).

It's not something _I_ ever used.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 2/2] clk: per-user clock accounting for debug

2012-11-28 Thread Rabin Vincent

When a clock has multiple users, the WARNING on imbalance of
enable/disable may not show the guilty party since although they may
have commited the error earlier, the warning is emitted later when some
other user, presumably innocent, disables the clock.

Provide per-user clock enable/disable accounting and disabler tracking
in order to help debug these problems.

NOTE: with this patch, clk_get_parent() behaves like clk_get(), i.e. it
needs to be matched with a clk_put().  Otherwise, memory will leak.

Signed-off-by: Rabin Vincent 
---
 drivers/clk/clk-core.h  | 18 ++
 drivers/clk/clk.c   | 35 +--
 drivers/clk/clkdev.c|  9 ++---
 include/linux/clk-private.h |  6 +-
 4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/drivers/clk/clk-core.h b/drivers/clk/clk-core.h
index 341ae45..c8259c2 100644
--- a/drivers/clk/clk-core.h
+++ b/drivers/clk/clk-core.h
@@ -4,11 +4,21 @@
 struct clk_core;
 
 #ifdef CONFIG_COMMON_CLK
-#define clk_to_clk_core(clk)   ((struct clk_core *)(clk))
-#define clk_core_to_clk(core)  ((struct clk *)(core))
+struct clk_core *clk_to_clk_core(struct clk *clk);
+struct clk *clk_core_to_clk(struct clk_core *clk_core, const char *dev,
+   const char *con);
+
+static inline void clk_free_clk(struct clk *clk)
+{
+   kfree(clk);
+}
 #else
-#define clk_to_clk_core(clk)   ((clk))
-#define clk_core_to_clk(core)  ((core))
+#define clk_to_clk_core(clk)   ((clk))
+#define clk_core_to_clk(core, dev, con)((core))
+
+static inline void clk_free_clk(struct clk *clk)
+{
+}
 #endif
 
 #endif
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 1fb7043..57ba594 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -250,6 +250,27 @@ static int clk_disable_unused(void)
 }
 late_initcall(clk_disable_unused);
 
+struct clk *clk_core_to_clk(struct clk_core *clk_core, const char *dev,
+   const char *con)
+{
+   struct clk *clk;
+
+   clk = kzalloc(sizeof(*clk), GFP_KERNEL);
+   if (!clk)
+   return ERR_PTR(-ENOMEM);
+
+   clk->core = clk_core;
+   clk->dev_id = dev;
+   clk->con_id = con;
+
+   return clk;
+}
+
+struct clk_core *clk_to_clk_core(struct clk *clk)
+{
+   return clk->core;
+}
+
 /***helper functions   ***/
 
 inline const char *__clk_get_name(struct clk_core *clk)
@@ -504,7 +525,15 @@ void clk_disable(struct clk *clk_user)
unsigned long flags;
 
spin_lock_irqsave(&enable_lock, flags);
-   __clk_disable(clk);
+   if (!WARN(clk_user->enable_count == 0,
+ "incorrect disable clk dev %s con %s last disabler %pF\n",
+ clk_user->dev_id, clk_user->con_id, clk_user->last_disable)) {
+
+   clk_user->last_disable = __builtin_return_address(0);
+   clk_user->enable_count--;
+
+   __clk_disable(clk);
+   }
spin_unlock_irqrestore(&enable_lock, flags);
 }
 EXPORT_SYMBOL_GPL(clk_disable);
@@ -559,6 +588,8 @@ int clk_enable(struct clk *clk_user)
 
spin_lock_irqsave(&enable_lock, flags);
ret = __clk_enable(clk);
+   if (!ret)
+   clk_user->enable_count++;
spin_unlock_irqrestore(&enable_lock, flags);
 
return ret;
@@ -976,7 +1007,7 @@ struct clk *clk_get_parent(struct clk *clk_user)
parent = __clk_get_parent(clk);
mutex_unlock(&prepare_lock);
 
-   return clk_core_to_clk(parent);
+   return clk_core_to_clk(parent, clk_user->dev_id, clk_user->con_id);
 }
 EXPORT_SYMBOL_GPL(clk_get_parent);
 
diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
index 5ddcaf1..1321b7c 100644
--- a/drivers/clk/clkdev.c
+++ b/drivers/clk/clkdev.c
@@ -43,7 +43,7 @@ struct clk *of_clk_get(struct device_node *np, int index)
 
clk = of_clk_get_from_provider(&clkspec);
of_node_put(clkspec.np);
-   return clk_core_to_clk(clk);
+   return clk_core_to_clk(clk, np->full_name, NULL);
 }
 EXPORT_SYMBOL(of_clk_get);
 
@@ -151,7 +151,7 @@ struct clk *clk_get_sys(const char *dev_id, const char 
*con_id)
if (!cl)
return ERR_PTR(-ENOENT);
 
-   return clk_core_to_clk(cl->clk);
+   return clk_core_to_clk(cl->clk, dev_id, con_id);
 }
 EXPORT_SYMBOL(clk_get_sys);
 
@@ -172,7 +172,10 @@ EXPORT_SYMBOL(clk_get);
 
 void clk_put(struct clk *clk)
 {
-   __clk_put(clk_to_clk_core(clk));
+   clk_core_t *core = clk_to_clk_core(clk);
+
+   clk_free_clk(clk);
+   __clk_put(core);
 }
 EXPORT_SYMBOL(clk_put);
 
diff --git a/include/linux/clk-private.h b/include/linux/clk-private.h
index e5b766e..406c951 100644
--- a/include/linux/clk-private.h
+++ b/include/linux/clk-private.h
@@ -47,7 +47,11 @@ struct clk_core {
 };
 
 struct clk {
-   struct clk_core clk;
+   struct clk_core *core;
+   unsigned intenable_cou

Re: [RFC 1/2] clk: use struct clk only for external API

2012-11-28 Thread Rabin Vincent

2012/11/28 viresh kumar :
> On Wed, Nov 28, 2012 at 9:31 PM, viresh kumar  wrote:
>> On Wed, Nov 28, 2012 at 5:22 PM, Rabin Vincent
>> Isn't something wrong here? For common clk case shouldn't
>> this be:
>>
>>> +#define clk_to_clk_core(clk)  (&clk->clk)
>>> +#define clk_core_to_clk(core) (container_of(clk, ...))  //not getting into 
>>> the exact format here
>>
>> Sorry, if i am missing basics.
>
> Ok. I saw these getting updated in 2/2. But it means this individual patch
> is broken and this is not allowed i believe.

It would be better to use container_of / &clk->clk, yes.  I wouldn't
really describe it as "broken" though since it works fine as it is,
since it's the first and only element.  I will change it anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5] sched: fix init NOHZ_IDLE flag

2013-03-19 Thread Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause is:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.

More generally, the NOHZ_IDLE flag must be initialized when new sched_domains
are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned.

This condition can be ensured by adding a synchronize_rcu between the
destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE
flag will not be updated with old sched_domain once it has been initialized.
But this solution introduces a additionnal latency in the rebuild sequence
that is called during cpu hotplug.

As suggested by Frederic Weisbecker, another solution is to have the same
rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce
a new sched_domain_rq struct that is the entry point for both sched_domains
and objects that must follow the same lifecycle like NOHZ_IDLE flags. They
will share the same RCU lifecycle and will be always synchronized.

The synchronization is done at the cost of :
 - an additional indirection for accessing the first sched_domain level
 - an additional indirection and a rcu_dereference before accessing to the
   NOHZ_IDLE flag.

Change since v4:
 - link both sched_domain and NOHZ_IDLE flag in one RCU object so
   their states are always synchronized.

Change since V3;
 - NOHZ flag is not cleared if a NULL domain is attached to the CPU
 - Remove patch 2/2 which becomes useless with latest modifications

Change since V2:
 - change the initialization to idle state instead of busy state so a CPU that
   enters idle during the build of the sched_domain will not corrupt the
   initialization state

Change since V1:
 - remove the patch for SCHED softirq on an idle core use case as it was
   a side effect of the other use cases.

Signed-off-by: Vincent Guittot 
---
 include/linux/sched.h |6 +++
 kernel/sched/core.c   |  105 -
 kernel/sched/fair.c   |   35 +++--
 kernel/sched/sched.h  |   24 +--
 4 files changed, 145 insertions(+), 25 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..2a52188 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -959,6 +959,12 @@ struct sched_domain {
unsigned long span[0];
 };
 
+struct sched_domain_rq {
+   struct sched_domain *sd;
+   unsigned long flags;
+   struct rcu_head rcu;/* used during destruction */
+};
+
 static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
 {
return to_cpumask(sd->span);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f12624..69e2313 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5602,6 +5602,15 @@ static void destroy_sched_domains(struct sched_domain 
*sd, int cpu)
destroy_sched_domain(sd, cpu);
 }
 
+static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu)
+{
+   if (!sd_rq)
+   return;
+
+   destroy_sched_domains(sd_rq->sd, cpu);
+   kfree_rcu(sd_rq, rcu);
+}
+
 /*
  * Keep a special pointer to the highest sched_domain that has
  * SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this
@@ -5632,10 +5641,23 @@ static void update_top_cache_domain(int cpu)
  * hold the hotplug lock.
  */
 static void
-cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
+cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd,
+   int cpu)
 {
struct rq *rq = cpu_rq(cpu);
-   struct sched_domain *tmp;
+   struct sched_domain_rq *tmp_rq;
+   struct sched_domain *tmp, *sd = NULL;
+
+   /*
+* If we don't have any sched_domain and associated object, we can
+* directly jump to the attach sequence otherwise we try to degenerate
+* the sched_domain
+*/
+   if (!sd_rq)
+   goto attach;
+
+   /* Get a pointer to the 1st sched_domain */
+   sd = sd_rq->sd;
 
/* Remove the sched domains which do not contribute to scheduling. */
for (tmp = sd; tmp; ) {
@@ -5658,14 +5680,17 @@ cpu_attach_domain(struct sched_domain *sd, struct 
root_domain *rd, int cpu)
destroy_sched_domain(tmp, cpu);
if (sd)
sd->child = NULL;
+   /* update sched_domain_rq */
+   sd_rq->sd = sd;
}
 
+attach:
sched_domain_debug(sd, cpu);
 
rq_attach_root(rq, rd);
-   tmp = rq->sd;
-   rcu_assign_pointer(rq->sd, sd);
-   destroy_sched_domains(tmp, cpu);
+   tmp_rq = rq->sd_r

Re: [PATCH] usb: Make USB persist default configurable

2013-03-19 Thread Vincent Palatin

On Tue, Mar 19, 2013 at 7:56 AM, Alan Stern  wrote:
>
> On Mon, 18 Mar 2013, Greg Kroah-Hartman wrote:
>
> > On Mon, Mar 18, 2013 at 05:02:19PM -0700, Julius Werner wrote:
> > > > Why can't you just revert this in userspace?  Isn't that easier than
> > > > doing a kernel patch and providing an option that we need to now
> > > > maintain for pretty much forever?
> > >
> > > I could solve it in userspace, but that really feels like a hacky
> > > workaround and not a long term solution. It would mean that every new
> > > device starts with persist enabled and stays that way for a few
> > > milliseconds (maybe up to seconds if it's connected on boot), until
> > > userspace gets around to disable it again... opening the possibility
> > > for very weird race conditions and bugs with drivers/devices that
> > > don't work with persist.
> >
> > What drivers/devices don't work with persist?  We need to know that now,
> > otherwise all other distros and users have problems, right?
> >
> > > This default is a policy that really resides in the kernel, it has
> > > changed in the past, and since there is no definitive better choice
> > > for all cases I thought making it configurable is the right thing to
> > > do.
> >
> > Too many options can be a bad thing.
> >
> > I think Alan made this a "always on" option, so I'd like to get his
> > opinion on it.  Alan?
>
> Originally the "persist" attribute defaulted to "off".  Linus disliked
> this (at least, he disliked it for mass-storage devices) and so at his
> request the default was changed to "on".  There didn't seem to be any
> reason to treat other devices differently from mass-storage devices;
> consequently the default is now "on" for everything.
>
> Julius's commit message mentions the disadvantage of "persist": Resume
> times can be increased.  But it doesn't mention the chief advantage:
> Filesystems stored on USB devices are much less likely to be lost
> across suspends.
>
> The races mentioned above don't seem to be very dangerous.  How likely
> is it that the system will be suspended within a few milliseconds of
> probing a new USB device?

For laptops, if the suspend/resume is triggered by the lid open/close
detection, this is somewhat likely and bit us in the past :
 the classical use case I have encountered is a back-to-back suspend
triggered by the user opening the lid then closing it again in the
next 2 or 3 seconds because he has changed is mind (damn user...),
might be also triggered by lid hall sensor missing proper debouncing
(but in that case, the mechanical time constant is often shorter than
the latency of resuming USB devices).


>
> As for buggy devices and drivers that can't handle persist, we have
> better ways of dealing with them.  Buggy devices can get a quirk flag
> (USB_QUIRK_RESET).  Buggy drivers should be fixed.
>
> Alan Stern
>

-- 
Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-04 Thread Vincent Guittot

On 1 February 2013 19:03, Frederic Weisbecker  wrote:
> 2013/1/29 Vincent Guittot :
>> On my smp platform which is made of 5 cores in 2 clusters,I have the
>> nr_busy_cpu field of sched_group_power struct that is not null when the
>> platform is fully idle. The root cause seems to be:
>> During the boot sequence, some CPUs reach the idle loop and set their
>> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
>> field is initialized later with the assumption that all CPUs are in the busy
>> state whereas some CPUs have already set their NOHZ_IDLE flag.
>> We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
>> have a coherent configuration.
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/sched/core.c |1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 257002c..fd41924 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
>> sched_domain *sd)
>>
>> update_group_power(sd, cpu);
>> atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
>> +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>
> So that's a real issue indeed.  nr_busy_cpus was never correct.
>
> Now I'm still a bit worried with this solution. What if an idle task
> started in smp_init() has not yet stopped its tick, but is about to do
> so? The domains are not yet available to the task but the nohz flags
> are. When it later restarts the tick, it's going to erroneously
> increase nr_busy_cpus.

My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
init_sched_groups_power instead of setting them as it is done now. If
a CPU enters idle during the init sequence, the flag is already
cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
while a NULL sched_domain is attached to the CPU thanks to patch 2.
This should solve all use cases ?

>
> It probably won't happen in practice. But then there is more: sched
> domains can be concurrently rebuild anytime, right?  So what if we
> call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
> domain is switched concurrently. Are we having a new sched group along
> the way? If so we have a bug here as well because we can have
> NOHZ_IDLE set but nr_busy_cpus accounting the CPU.

When the sched_domain are rebuilt, we set a null sched_domain during
the rebuild sequence and a new sched_group_power is created as well

>
> May be we need to set the per cpu nohz flags on the child leaf sched
> domain? This way it's initialized and stored on the same RCU pointer
> and we nohz_flags and nr_busy_cpus become sync.
>
> Also we probably still need the first patch of your previous round.
> Because the current patch may introduce situations where we have idle
> CPUs with NOHZ_IDLE flags cleared.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sched: fix wrong rq's runnable_avg update with rt task

2013-02-08 Thread Vincent Guittot

When a RT task is scheduled on an idle CPU, the update of the rq's load is
not done because CFS's functions are not called. Then, the idle_balance,
which is called just before entering the idle function, updates the
rq's load and makes the assumption that the elapsed time since the last
update, was only running time.

The rq's load of a CPU that only runs a periodic RT task, is close to
LOAD_AVG_MAX whatever the running duration of the RT task is.

A new idle_exit function is called when the prev task is the idle function
so the elapsed time will be accounted as idle time in the rq's load.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/core.c  |3 +++
 kernel/sched/fair.c  |   10 ++
 kernel/sched/sched.h |5 +
 3 files changed, 18 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..592e06c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2927,6 +2927,9 @@ need_resched:
 
pre_schedule(rq, prev);
 
+   if (unlikely(prev == rq->idle))
+   idle_exit(cpu, rq);
+
if (unlikely(!rq->nr_running))
idle_balance(cpu, rq);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5eea870..520fe55 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1562,6 +1562,16 @@ static inline void dequeue_entity_load_avg(struct cfs_rq 
*cfs_rq,
se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter);
} /* migrations, e.g. sleep=0 leave decay_count == 0 */
 }
+
+/*
+ * Update the rq's load with the elapsed idle time before a task is
+ * scheduled. if the newly scheduled task is not a CFS  task, idle_exit will
+ * be the only way to update the runnable statistic.
+ */
+void idle_exit(int this_cpu, struct rq *this_rq)
+{
+   update_rq_runnable_avg(this_rq, 0);
+}
 #else
 static inline void update_entity_load_avg(struct sched_entity *se,
  int update_cfs_rq) {}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fc88644..9707092 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class;
 
 extern void trigger_load_balance(struct rq *rq, int cpu);
 extern void idle_balance(int this_cpu, struct rq *this_rq);
+extern void idle_exit(int this_cpu, struct rq *this_rq);
 
 #else  /* CONFIG_SMP */
 
@@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq)
 {
 }
 
+static inline void idle_exit(int this_cpu, struct rq *this_rq)
+{
+}
+
 #endif
 
 extern void sysrq_sched_debug_show(void);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: fix wrong rq's runnable_avg update with rt task

2013-02-08 Thread Vincent Guittot

On 8 February 2013 15:46, Steven Rostedt  wrote:
> On Fri, 2013-02-08 at 12:11 +0100, Vincent Guittot wrote:
>> When a RT task is scheduled on an idle CPU, the update of the rq's load is
>> not done because CFS's functions are not called. Then, the idle_balance,
>> which is called just before entering the idle function, updates the
>> rq's load and makes the assumption that the elapsed time since the last
>> update, was only running time.
>>
>> The rq's load of a CPU that only runs a periodic RT task, is close to
>> LOAD_AVG_MAX whatever the running duration of the RT task is.
>>
>> A new idle_exit function is called when the prev task is the idle function
>> so the elapsed time will be accounted as idle time in the rq's load.
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/sched/core.c  |3 +++
>>  kernel/sched/fair.c  |   10 ++
>>  kernel/sched/sched.h |5 +
>>  3 files changed, 18 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 26058d0..592e06c 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2927,6 +2927,9 @@ need_resched:
>>
>>   pre_schedule(rq, prev);
>>
>> + if (unlikely(prev == rq->idle))
>> + idle_exit(cpu, rq);
>> +
>
> Let's get rid of the added junk in the core code that should be isolated
> in the idle code.
>

i agree

> I posted these patches before, and I'm about to post again:
>
> https://lkml.org/lkml/2012/12/21/378
> https://lkml.org/lkml/2012/12/21/377
>
> I'm working to clean these patches up today and post them again. Would
> working on top of these work for you?

yes for sure.
I will move that code in pre_schedule

Vincent

>
> -- Steve
>
>
>>   if (unlikely(!rq->nr_running))
>>   idle_balance(cpu, rq);
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-08 Thread Vincent Guittot

On 8 February 2013 16:35, Frederic Weisbecker  wrote:
> 2013/2/4 Vincent Guittot :
>> On 1 February 2013 19:03, Frederic Weisbecker  wrote:
>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>> index 257002c..fd41924 100644
>>>> --- a/kernel/sched/core.c
>>>> +++ b/kernel/sched/core.c
>>>> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
>>>> sched_domain *sd)
>>>>
>>>> update_group_power(sd, cpu);
>>>> atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
>>>> +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>>>
>>> So that's a real issue indeed.  nr_busy_cpus was never correct.
>>>
>>> Now I'm still a bit worried with this solution. What if an idle task
>>> started in smp_init() has not yet stopped its tick, but is about to do
>>> so? The domains are not yet available to the task but the nohz flags
>>> are. When it later restarts the tick, it's going to erroneously
>>> increase nr_busy_cpus.
>>
>> My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in
>> init_sched_groups_power instead of setting them as it is done now. If
>> a CPU enters idle during the init sequence, the flag is already
>> cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared
>> while a NULL sched_domain is attached to the CPU thanks to patch 2.
>> This should solve all use cases ?
>
> This may work on smp_init(). But the per cpu domain can be changed 
> concurrently
> anytime on cpu hotplug, with a new sched group power struct, right?

During a cpu hotplug, a null domain is attached to each CPU of the
partition because we have to build new sched_domains so we have a
similar behavior than smp_init.
So if we clear  NOHZ_IDLE flag and nr_busy_cpus in
init_sched_groups_power, we should be safe for init and hotplug.

More generally speaking, if the sched_domains of a group of CPUs must
be rebuilt, a NULL sched_domain is attached to these CPUs during the
build

>
> What if the following happen (inventing function names but you get the idea):
>
> CPU 0   CPU 1
>
> dom = new_domain(...) {
>nr_cpus_busy = 0;
>set_idle(CPU 1);  old_dom =get_dom()
>  clear_idle(CPU 1)
> }
> rcu_assign_pointer(cpu1_dom, dom);
>
>
> Can this scenario happen?

This scenario will be:

 CPU 0   CPU 1

 detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
 }

 dom = new_domain(...) {
nr_cpus_busy = 0;
set_idle(CPU 1);  old_dom =get_dom()
  old_dom is null
  //clear_idle(CPU
1) can't happen because a null domain is attached so we will never
call nohz_kick_needed which is the only place where we can clear_idle
 }
 rcu_assign_pointer(cpu1_dom, dom);

>
>
>>>
>>> It probably won't happen in practice. But then there is more: sched
>>> domains can be concurrently rebuild anytime, right?  So what if we
>>> call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the
>>> domain is switched concurrently. Are we having a new sched group along
>>> the way? If so we have a bug here as well because we can have
>>> NOHZ_IDLE set but nr_busy_cpus accounting the CPU.
>>
>> When the sched_domain are rebuilt, we set a null sched_domain during
>> the rebuild sequence and a new sched_group_power is created as well
>
> So at that time we may race with a CPU setting/clearing its NOHZ_IDLE flag
> as in my above scenario?

Unless i have missed a use case, we always have a null domain attached
to a CPU while we build the new one. So the patch 2/2 should protect
us against clearing the NOHZ_IDLE whereas the new nr_busy_cpus is not
yet attached.

I'm going to send a new version which set the NOHZ_IDLE bit and clear
nr_busy_cpus during the built of a sched_domain

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 2/2] sched: fix update NOHZ_IDLE flag

2013-02-08 Thread Vincent Guittot

The function nohz_kick_needed modifies NOHZ_IDLE flag that is used to update
the nr_busy_cpus of the sched_group.
When the sched_domain are updated (during the boot or because of the unplug of
a CPUs as an example) a null_domain is attached to CPUs. We have to test
likely(!on_null_domain(cpu) first in order to detect such intialization step
and to not modify the NOHZ_IDLE flag

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5eea870..dac2edf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5695,7 +5695,7 @@ void trigger_load_balance(struct rq *rq, int cpu)
likely(!on_null_domain(cpu)))
raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ
-   if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
+   if (likely(!on_null_domain(cpu)) && nohz_kick_needed(rq, cpu))
nohz_balancer_kick(cpu);
 #endif
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 0/2] sched: fix nr_busy_cpus

2013-02-08 Thread Vincent Guittot

The nr_busy_cpus field of the sched_group_power is sometime different from 0
whereas the platform is fully idle. This serie fixes 3 use cases:
 - when some CPUs enter idle state while booting all CPUs
 - when a CPU is unplug and/or replug

Change since V2:
 - change the initialization to idle state instead of busy state so a CPU that
   enters idle during the build of the sched_domain will not corrupt the
   initialization state

Change since V1:
 - remove the patch for SCHED softirq on an idle core use case as it was
   a side effect of the other use cases.

Vincent Guittot (2):
  sched: fix init NOHZ_IDLE flag
  sched: fix update NOHZ_IDLE flag

 kernel/sched/core.c |4 +++-
 kernel/sched/fair.c |2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/2] sched: fix init NOHZ_IDLE flag

2013-02-08 Thread Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause seems to be:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
We set the NOHZ_IDLE flag when nr_busy_cpus is initialized to 0 in order to
have a coherent configuration.

The patch 2/2 protects this init against an update of NOHZ_IDLE flag because
a NULL sched_domain is attached to the CPU during the build of the new
sched_domain so nohz_kick_needed and set_cpu_sd_state_busy are not called and
can't clear the NOHZ_IDLE flag

Signed-off-by: Vincent Guittot 
---
 kernel/sched/core.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..c730a4e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5884,7 +5884,9 @@ static void init_sched_groups_power(int cpu, struct 
sched_domain *sd)
return;
 
update_group_power(sd, cpu);
-   atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
+   atomic_set(&sg->sgp->nr_busy_cpus, 0);
+   set_bit(NOHZ_IDLE, nohz_flags(cpu));
+
 }
 
 int __weak arch_sd_sibling_asym_packing(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug

2013-02-11 Thread Vincent Guittot

Hi Srivatsa,

I can try to run some of our stress tests on your patches. Have you
got a git tree that i can pull ?

Regards,
Vincent

On 8 February 2013 19:09, Srivatsa S. Bhat
 wrote:
> On 02/08/2013 10:14 PM, Srivatsa S. Bhat wrote:
>> On 02/08/2013 09:11 PM, Russell King - ARM Linux wrote:
>>> On Thu, Feb 07, 2013 at 11:41:34AM +0530, Srivatsa S. Bhat wrote:
>>>> On 02/07/2013 09:44 AM, Rusty Russell wrote:
>>>>> "Srivatsa S. Bhat"  writes:
>>>>>> On 01/22/2013 01:03 PM, Srivatsa S. Bhat wrote:
>>>>>>  Avg. latency of 1 CPU offline (ms) [stop-cpu/stop-m/c 
>>>>>> latency]
>>>>>>
>>>>>> # online CPUsMainline (with stop-m/c)   This patchset (no 
>>>>>> stop-m/c)
>>>>>>
>>>>>>   8 17.04  7.73
>>>>>>
>>>>>>  16 18.05  6.44
>>>>>>
>>>>>>  32 17.31  7.39
>>>>>>
>>>>>>  64 32.40  9.28
>>>>>>
>>>>>> 128 98.23  7.35
>>>>>
>>>>> Nice!
>>>>
>>>> Thank you :-)
>>>>
>>>>>  I wonder how the ARM guys feel with their quad-cpu systems...
>>>>>
>>>>
>>>> That would be definitely interesting to know :-)
>>>
>>> That depends what exactly you'd like tested (and how) and whether you'd
>>> like it to be a test-chip based quad core, or an OMAP dual-core SoC.
>>>
>>
>> The effect of stop_machine() doesn't really depend on the CPU architecture
>> used underneath or the platform. It depends only on the _number_ of
>> _logical_ CPUs used.
>>
>> And stop_machine() has 2 noticeable drawbacks:
>> 1. It makes the hotplug operation itself slow
>> 2. and it causes disruptions to the workloads running on the other
>> CPUs by hijacking the entire machine for significant amounts of time.
>>
>> In my experiments (mentioned above), I tried to measure how my patchset
>> improves (reduces) the duration of hotplug (CPU offline) itself. Which is
>> also slightly indicative of the impact it has on the rest of the system.
>>
>> But what would be nice to test, is a setup where the workloads running on
>> the rest of the system are latency-sensitive, and measure the impact of
>> CPU offline on them, with this patchset applied. That would tell us how
>> far is this useful in making CPU hotplug less disruptive on the system.
>>
>> Of course, it would be nice to also see whether we observe any reduction
>> in hotplug duration itself (point 1 above) on ARM platforms with lot
>> of CPUs. [This could potentially speed up suspend/resume, which is used
>> rather heavily on ARM platforms].
>>
>> The benefits from this patchset over mainline (both in terms of points
>> 1 and 2 above) is expected to increase, with increasing number of CPUs in
>> the system.
>>
>
> Adding Vincent to CC, who had previously evaluated the performance and
> latency implications of CPU hotplug on ARM platforms, IIRC.
>
> Regards,
> Srivatsa S. Bhat
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sched: fix env->src_cpu for active migration

2013-02-12 Thread Vincent Guittot

need_active_balance uses env->src_cpu which is set only if there is more
than 1 task on the run queue. We must set the src_cpu field unconditionnally
otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is
only 1 task on the run queue

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..32938ea 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5044,6 +5044,10 @@ redo:
 
ld_moved = 0;
lb_iterations = 1;
+
+   env.src_cpu   = busiest->cpu;
+   env.src_rq= busiest;
+
if (busiest->nr_running > 1) {
/*
 * Attempt to move tasks. If find_busiest_group has found
@@ -5052,8 +5056,6 @@ redo:
 * correctly treated as an imbalance.
 */
env.flags |= LBF_ALL_PINNED;
-   env.src_cpu   = busiest->cpu;
-   env.src_rq= busiest;
env.loop_max  = min(sysctl_sched_nr_migrate, 
busiest->nr_running);
 
update_h_load(env.src_cpu);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] sched: fix wrong rq's runnable_avg update with rt task

2013-02-12 Thread Vincent Guittot

When a RT task is scheduled on an idle CPU, the update of the rq's load is
not done because CFS's functions are not called. Then, the idle_balance,
which is called just before entering the idle function, updates the
rq's load and makes the assumption that the elapsed time since the last
update, was only running time.

The rq's load of a CPU that only runs a periodic RT task, is close to
LOAD_AVG_MAX whatever the running duration of the RT task is.

A new idle_exit function is called when the prev task is the idle function
so the elapsed time will be accounted as idle time in the rq's load.

Changes since V1:
- move code out of schedule function and create a pre_schedule callback for
  idle class instead.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c  |   10 ++
 kernel/sched/idle_task.c |7 +++
 kernel/sched/sched.h |5 +
 3 files changed, 22 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..60951f1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1562,6 +1562,16 @@ static inline void dequeue_entity_load_avg(struct cfs_rq 
*cfs_rq,
se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter);
} /* migrations, e.g. sleep=0 leave decay_count == 0 */
 }
+
+/*
+ * Update the rq's load with the elapsed idle time before a task is
+ * scheduled. if the newly scheduled task is not a CFS task, idle_exit will
+ * be the only way to update the runnable statistic.
+ */
+void idle_exit(int this_cpu, struct rq *this_rq)
+{
+   update_rq_runnable_avg(this_rq, 0);
+}
 #else
 static inline void update_entity_load_avg(struct sched_entity *se,
  int update_cfs_rq) {}
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index b6baf37..27cd379 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -13,6 +13,12 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int 
flags)
 {
return task_cpu(p); /* IDLE tasks as never migrated */
 }
+
+static void pre_schedule_idle(struct rq *rq, struct task_struct *prev)
+{
+   /* Update rq's load with elapsed idle time */
+   idle_exit(smp_processor_id(), rq);
+}
 #endif /* CONFIG_SMP */
 /*
  * Idle tasks are unconditionally rescheduled:
@@ -86,6 +92,7 @@ const struct sched_class idle_sched_class = {
 
 #ifdef CONFIG_SMP
.select_task_rq = select_task_rq_idle,
+   .pre_schedule   = pre_schedule_idle,
 #endif
 
.set_curr_task  = set_curr_task_idle,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fc88644..9707092 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class;
 
 extern void trigger_load_balance(struct rq *rq, int cpu);
 extern void idle_balance(int this_cpu, struct rq *this_rq);
+extern void idle_exit(int this_cpu, struct rq *this_rq);
 
 #else  /* CONFIG_SMP */
 
@@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq)
 {
 }
 
+static inline void idle_exit(int this_cpu, struct rq *this_rq)
+{
+}
+
 #endif
 
 extern void sysrq_sched_debug_show(void);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] sched: fix wrong rq's runnable_avg update with rt task

2013-02-12 Thread Vincent Guittot

On 12 February 2013 14:23, Vincent Guittot  wrote:
> When a RT task is scheduled on an idle CPU, the update of the rq's load is
> not done because CFS's functions are not called. Then, the idle_balance,
> which is called just before entering the idle function, updates the
> rq's load and makes the assumption that the elapsed time since the last
> update, was only running time.
>
> The rq's load of a CPU that only runs a periodic RT task, is close to
> LOAD_AVG_MAX whatever the running duration of the RT task is.
>
> A new idle_exit function is called when the prev task is the idle function
> so the elapsed time will be accounted as idle time in the rq's load.
>
> Changes since V1:
> - move code out of schedule function and create a pre_schedule callback for
>   idle class instead.

Hi Steve,

I have pushed a new version of my patch to have comments about the
proposed solution but I will rebase it on top of your work when
available

Vincent

>
> Signed-off-by: Vincent Guittot 
> ---
>  kernel/sched/fair.c  |   10 ++
>  kernel/sched/idle_task.c |7 +++
>  kernel/sched/sched.h |5 +
>  3 files changed, 22 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 81fa536..60951f1 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1562,6 +1562,16 @@ static inline void dequeue_entity_load_avg(struct 
> cfs_rq *cfs_rq,
> se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter);
> } /* migrations, e.g. sleep=0 leave decay_count == 0 */
>  }
> +
> +/*
> + * Update the rq's load with the elapsed idle time before a task is
> + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will
> + * be the only way to update the runnable statistic.
> + */
> +void idle_exit(int this_cpu, struct rq *this_rq)
> +{
> +   update_rq_runnable_avg(this_rq, 0);
> +}
>  #else
>  static inline void update_entity_load_avg(struct sched_entity *se,
>   int update_cfs_rq) {}
> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index b6baf37..27cd379 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -13,6 +13,12 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, 
> int flags)
>  {
> return task_cpu(p); /* IDLE tasks as never migrated */
>  }
> +
> +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev)
> +{
> +   /* Update rq's load with elapsed idle time */
> +   idle_exit(smp_processor_id(), rq);
> +}
>  #endif /* CONFIG_SMP */
>  /*
>   * Idle tasks are unconditionally rescheduled:
> @@ -86,6 +92,7 @@ const struct sched_class idle_sched_class = {
>
>  #ifdef CONFIG_SMP
> .select_task_rq = select_task_rq_idle,
> +   .pre_schedule   = pre_schedule_idle,
>  #endif
>
> .set_curr_task  = set_curr_task_idle,
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index fc88644..9707092 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class;
>
>  extern void trigger_load_balance(struct rq *rq, int cpu);
>  extern void idle_balance(int this_cpu, struct rq *this_rq);
> +extern void idle_exit(int this_cpu, struct rq *this_rq);
>
>  #else  /* CONFIG_SMP */
>
> @@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq)
>  {
>  }
>
> +static inline void idle_exit(int this_cpu, struct rq *this_rq)
> +{
> +}
> +
>  #endif
>
>  extern void sysrq_sched_debug_show(void);
> --
> 1.7.9.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] sched: fix wrong rq's runnable_avg update with rt task

2013-02-12 Thread Vincent Guittot

On 12 February 2013 15:53, Steven Rostedt  wrote:
> On Tue, 2013-02-12 at 14:23 +0100, Vincent Guittot wrote:
>>   .set_curr_task  = set_curr_task_idle,
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index fc88644..9707092 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class;
>>
>>  extern void trigger_load_balance(struct rq *rq, int cpu);
>>  extern void idle_balance(int this_cpu, struct rq *this_rq);
>> +extern void idle_exit(int this_cpu, struct rq *this_rq);
>>
>>  #else/* CONFIG_SMP */
>>
>> @@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq)
>>  {
>>  }
>>
>> +static inline void idle_exit(int this_cpu, struct rq *this_rq)
>> +{
>> +}
>> +
>
> Is this part needed? I don't see it ever called when !CONFIG_SMP.

no I forgot to remove it

Vincent
>
> -- Steve
>
>>  #endif
>>
>>  extern void sysrq_sched_debug_show(void);
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v6 03/21] sched: only count runnable avg on cfs_rq's nr_running

2013-04-02 Thread Vincent Guittot

On 30 March 2013 15:34, Alex Shi  wrote:
> Old function count the runnable avg on rq's nr_running even there is
> only rt task in rq. That is incorrect, so correct it to cfs_rq's
> nr_running.
>
> Signed-off-by: Alex Shi 
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2881d42..026e959 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2829,7 +2829,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, 
> int flags)
> }
>
> if (!se) {
> -   update_rq_runnable_avg(rq, rq->nr_running);
> +   update_rq_runnable_avg(rq, rq->cfs.nr_running);

A RT task that preempts your CFS task will be accounted in the
runnable_avg fields. So whatever you do, RT task will impact your
runnable_avg statistics. Instead of trying to get only CFS tasks, you
should take into account all tasks activity in the rq.

Vincent
> inc_nr_running(rq);
> }
> hrtick_update(rq);
> --
> 1.7.12
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch v6 10/21] sched: get rq potential maximum utilization

2013-04-02 Thread Vincent Guittot

On 30 March 2013 15:34, Alex Shi  wrote:
> Since the rt task priority is higher than fair tasks, cfs_rq utilization
> is just the left of rt utilization.
>
> When there are some cfs tasks in queue, the potential utilization may
> be yielded, so mulitiplying cfs task number to get max potential
> utilization of cfs. Then the rq utilization is sum of rt util and cfs
> util.
>
> Signed-off-by: Alex Shi 
> ---
>  kernel/sched/fair.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ae87dab..0feeaee 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3350,6 +3350,22 @@ struct sg_lb_stats {
> unsigned int group_util;/* sum utilization of group */
>  };
>
> +static unsigned long scale_rt_util(int cpu);
> +
> +static unsigned int max_rq_util(int cpu)
> +{
> +   struct rq *rq = cpu_rq(cpu);
> +   unsigned int rt_util = scale_rt_util(cpu);
> +   unsigned int cfs_util;
> +   unsigned int nr_running;
> +
> +   cfs_util = (FULL_UTIL - rt_util) > rq->util ? rq->util
> +   : (FULL_UTIL - rt_util);

rt_util and rq->util don't use the same computation algorithm so the
results are hardly comparable or addable. In addition, some RT tasks
can have impacted the rq->util, so they will be accounted in both
side.

Vincent

> +   nr_running = rq->nr_running ? rq->nr_running : 1;
> +
> +   return rt_util + cfs_util * nr_running;
> +}
> +
>  /*
>   * sched_balance_self: balance the current task (running on cpu) in domains
>   * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and
> --
> 1.7.12
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH Resend v5] sched: fix init NOHZ_IDLE flag

2013-04-03 Thread Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause is:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.

More generally, the NOHZ_IDLE flag must be initialized when new sched_domains
are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned.

This condition can be ensured by adding a synchronize_rcu between the
destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE
flag will not be updated with old sched_domain once it has been initialized.
But this solution introduces a additionnal latency in the rebuild sequence
that is called during cpu hotplug.

As suggested by Frederic Weisbecker, another solution is to have the same
rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce
a new sched_domain_rq struct that is the entry point for both sched_domains
and objects that must follow the same lifecycle like NOHZ_IDLE flags. They
will share the same RCU lifecycle and will be always synchronized.

The synchronization is done at the cost of :
 - an additional indirection for accessing the first sched_domain level
 - an additional indirection and a rcu_dereference before accessing to the
   NOHZ_IDLE flag.

Change since v4:
 - link both sched_domain and NOHZ_IDLE flag in one RCU object so
   their states are always synchronized.

Change since V3;
 - NOHZ flag is not cleared if a NULL domain is attached to the CPU
 - Remove patch 2/2 which becomes useless with latest modifications

Change since V2:
 - change the initialization to idle state instead of busy state so a CPU that
   enters idle during the build of the sched_domain will not corrupt the
   initialization state

Change since V1:
 - remove the patch for SCHED softirq on an idle core use case as it was
   a side effect of the other use cases.

Signed-off-by: Vincent Guittot 
---
 include/linux/sched.h |6 +++
 kernel/sched/core.c   |  105 -
 kernel/sched/fair.c   |   35 +++--
 kernel/sched/sched.h  |   24 +--
 4 files changed, 145 insertions(+), 25 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..2a52188 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -959,6 +959,12 @@ struct sched_domain {
unsigned long span[0];
 };
 
+struct sched_domain_rq {
+   struct sched_domain *sd;
+   unsigned long flags;
+   struct rcu_head rcu;/* used during destruction */
+};
+
 static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
 {
return to_cpumask(sd->span);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f12624..69e2313 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5602,6 +5602,15 @@ static void destroy_sched_domains(struct sched_domain 
*sd, int cpu)
destroy_sched_domain(sd, cpu);
 }
 
+static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu)
+{
+   if (!sd_rq)
+   return;
+
+   destroy_sched_domains(sd_rq->sd, cpu);
+   kfree_rcu(sd_rq, rcu);
+}
+
 /*
  * Keep a special pointer to the highest sched_domain that has
  * SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this
@@ -5632,10 +5641,23 @@ static void update_top_cache_domain(int cpu)
  * hold the hotplug lock.
  */
 static void
-cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
+cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd,
+   int cpu)
 {
struct rq *rq = cpu_rq(cpu);
-   struct sched_domain *tmp;
+   struct sched_domain_rq *tmp_rq;
+   struct sched_domain *tmp, *sd = NULL;
+
+   /*
+* If we don't have any sched_domain and associated object, we can
+* directly jump to the attach sequence otherwise we try to degenerate
+* the sched_domain
+*/
+   if (!sd_rq)
+   goto attach;
+
+   /* Get a pointer to the 1st sched_domain */
+   sd = sd_rq->sd;
 
/* Remove the sched domains which do not contribute to scheduling. */
for (tmp = sd; tmp; ) {
@@ -5658,14 +5680,17 @@ cpu_attach_domain(struct sched_domain *sd, struct 
root_domain *rd, int cpu)
destroy_sched_domain(tmp, cpu);
if (sd)
sd->child = NULL;
+   /* update sched_domain_rq */
+   sd_rq->sd = sd;
}
 
+attach:
sched_domain_debug(sd, cpu);
 
rq_attach_root(rq, rd);
-   tmp = rq->sd;
-   rcu_assign_pointer(rq->sd, sd);
-   destroy_sched_domains(tmp, cpu);
+   tmp_rq = rq->sd_r

[PATCH v4] sched: fix wrong rq's runnable_avg update with rt tasks

2013-04-04 Thread Vincent Guittot

The current update of the rq's load can be erroneous when RT tasks are
involved

The update of the load of a rq that becomes idle, is done only if the avg_idle
is less than sysctl_sched_migration_cost. If RT tasks and short idle duration
alternate, the runnable_avg will not be updated correctly and the time will be
accounted as idle time when a CFS task wakes up.

A new idle_enter function is called when the next task is the idle function
so the elapsed time will be accounted as run time in the load of the rq,
whatever the average idle time is. The function update_rq_runnable_avg is
removed from idle_balance.

When a RT task is scheduled on an idle CPU, the update of the rq's load is
not done when the rq exit idle state because CFS's functions are not
called. Then, the idle_balance, which is called just before entering the
idle function, updates the rq's load and makes the assumption that the
elapsed time since the last update, was only running time.

As a consequence, the rq's load of a CPU that only runs a periodic RT task,
is close to LOAD_AVG_MAX whatever the running duration of the RT task is.

A new idle_exit function is called when the prev task is the idle function
so the elapsed time will be accounted as idle time in the rq's load.

Changes since V3:
- Remove dependancy with CONFIG_FAIR_GROUP_SCHED
- Add a new idle_enter function and create a post_schedule callback for
 idle class
- Remove the update_runnable_avg from idle_balance

Changes since V2:
- remove useless definition for UP platform
- rebased on top of Steven Rostedt's patches :
https://lkml.org/lkml/2013/2/12/558

Changes since V1:
- move code out of schedule function and create a pre_schedule callback for
  idle class instead.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c  |   23 +--
 kernel/sched/idle_task.c |   10 ++
 kernel/sched/sched.h |   12 
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fcdbff..1851ca8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq 
*cfs_rq,
se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter);
} /* migrations, e.g. sleep=0 leave decay_count == 0 */
 }
+
+/*
+ * Update the rq's load with the elapsed running time before entering
+ * idle. if the last scheduled task is not a CFS task, idle_enter will
+ * be the only way to update the runnable statistic.
+ */
+void idle_enter(struct rq *this_rq)
+{
+   update_rq_runnable_avg(this_rq, 1);
+}
+
+/*
+ * Update the rq's load with the elapsed idle time before a task is
+ * scheduled. if the newly scheduled task is not a CFS task, idle_exit will
+ * be the only way to update the runnable statistic.
+ */
+void idle_exit(struct rq *this_rq)
+{
+   update_rq_runnable_avg(this_rq, 0);
+}
+
 #else
 static inline void update_entity_load_avg(struct sched_entity *se,
  int update_cfs_rq) {}
@@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq)
if (this_rq->avg_idle < sysctl_sched_migration_cost)
return;
 
-   update_rq_runnable_avg(this_rq, 1);
-
/*
 * Drop the rq->lock, but keep preempt disabled.
 */
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 66b5220..0775261 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -14,8 +14,17 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int 
flags)
return task_cpu(p); /* IDLE tasks as never migrated */
 }
 
+static void pre_schedule_idle(struct rq *rq, struct task_struct *prev)
+{
+   /* Update rq's load with elapsed idle time */
+   idle_exit(rq);
+}
+
 static void post_schedule_idle(struct rq *rq)
 {
+   /* Update rq's load with elapsed running time */
+   idle_enter(rq);
+
idle_balance(smp_processor_id(), rq);
 }
 #endif /* CONFIG_SMP */
@@ -95,6 +104,7 @@ const struct sched_class idle_sched_class = {
 
 #ifdef CONFIG_SMP
.select_task_rq = select_task_rq_idle,
+   .pre_schedule   = pre_schedule_idle,
.post_schedule  = post_schedule_idle,
 #endif
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fc88644..ff4b029 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -878,6 +878,18 @@ extern const struct sched_class idle_sched_class;
 extern void trigger_load_balance(struct rq *rq, int cpu);
 extern void idle_balance(int this_cpu, struct rq *this_rq);
 
+/*
+ * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg
+ * becomes useful in lb
+ */
+#if defined(CONFIG_FAIR_GROUP_SCHED)
+extern void idle_enter(struct rq *this_rq);
+extern void idle_exit(struct rq *this_rq);
+#else
+static inline void idle_enter(struct rq *this_rq

Re: [PATCH Resend v5] sched: fix init NOHZ_IDLE flag

2013-04-04 Thread Vincent Guittot

On 4 April 2013 19:07, Frederic Weisbecker  wrote:
> 2013/4/3 Vincent Guittot :
>> On my smp platform which is made of 5 cores in 2 clusters, I have the
>> nr_busy_cpu field of sched_group_power struct that is not null when the
>> platform is fully idle. The root cause is:
>> During the boot sequence, some CPUs reach the idle loop and set their
>> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
>> field is initialized later with the assumption that all CPUs are in the busy
>> state whereas some CPUs have already set their NOHZ_IDLE flag.
>>
>> More generally, the NOHZ_IDLE flag must be initialized when new sched_domains
>> are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned.
>>
>> This condition can be ensured by adding a synchronize_rcu between the
>> destruction of old sched_domains and the creation of new ones so the 
>> NOHZ_IDLE
>> flag will not be updated with old sched_domain once it has been initialized.
>> But this solution introduces a additionnal latency in the rebuild sequence
>> that is called during cpu hotplug.
>>
>> As suggested by Frederic Weisbecker, another solution is to have the same
>> rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce
>> a new sched_domain_rq struct that is the entry point for both sched_domains
>> and objects that must follow the same lifecycle like NOHZ_IDLE flags. They
>> will share the same RCU lifecycle and will be always synchronized.
>>
>> The synchronization is done at the cost of :
>>  - an additional indirection for accessing the first sched_domain level
>>  - an additional indirection and a rcu_dereference before accessing to the
>>NOHZ_IDLE flag.
>>
>> Change since v4:
>>  - link both sched_domain and NOHZ_IDLE flag in one RCU object so
>>their states are always synchronized.
>>
>> Change since V3;
>>  - NOHZ flag is not cleared if a NULL domain is attached to the CPU
>>  - Remove patch 2/2 which becomes useless with latest modifications
>>
>> Change since V2:
>>  - change the initialization to idle state instead of busy state so a CPU 
>> that
>>enters idle during the build of the sched_domain will not corrupt the
>>initialization state
>>
>> Change since V1:
>>  - remove the patch for SCHED softirq on an idle core use case as it was
>>a side effect of the other use cases.
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  include/linux/sched.h |6 +++
>>  kernel/sched/core.c   |  105 
>> -
>>  kernel/sched/fair.c   |   35 +++--
>>  kernel/sched/sched.h  |   24 +--
>>  4 files changed, 145 insertions(+), 25 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index d35d2b6..2a52188 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -959,6 +959,12 @@ struct sched_domain {
>> unsigned long span[0];
>>  };
>>
>> +struct sched_domain_rq {
>> +   struct sched_domain *sd;
>> +   unsigned long flags;
>> +   struct rcu_head rcu;/* used during destruction */
>> +};
>> +
>>  static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
>>  {
>> return to_cpumask(sd->span);
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 7f12624..69e2313 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5602,6 +5602,15 @@ static void destroy_sched_domains(struct sched_domain 
>> *sd, int cpu)
>> destroy_sched_domain(sd, cpu);
>>  }
>>
>> +static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu)
>> +{
>> +   if (!sd_rq)
>> +   return;
>> +
>> +   destroy_sched_domains(sd_rq->sd, cpu);
>> +   kfree_rcu(sd_rq, rcu);
>> +}
>> +
>>  /*
>>   * Keep a special pointer to the highest sched_domain that has
>>   * SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this
>> @@ -5632,10 +5641,23 @@ static void update_top_cache_domain(int cpu)
>>   * hold the hotplug lock.
>>   */
>>  static void
>> -cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>> +cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd,
>> +   int cpu)
>>  {
>> struct rq *rq = cpu_rq(cpu);
>> -   struct sched_domain *tmp;
>> +   struct sched_domain_rq *tmp_rq;
>&

Re: [RFC PATCH v3 5/6] sched: pack the idle load balance

2013-04-05 Thread Vincent Guittot

Peter,

After some toughts about your comments,I can update the buddy cpu
during ILB or periofdic LB to a new idle core and extend the packing
mechanism  Does this additional mechanism sound better for you ?

Vincent

On 26 March 2013 15:42, Peter Zijlstra  wrote:
> On Tue, 2013-03-26 at 15:03 +0100, Vincent Guittot wrote:
>> > But ha! here's your NO_HZ link.. but does the above DTRT and ensure
>> > that the ILB is a little core when possible?
>>
>> The loop looks for an idle CPU as close as possible to the buddy CPU
>> and the buddy CPU is the 1st CPU has been chosen. So if your buddy is
>> a little and there is an idle little, the ILB will be this idle
>> little.
>
> Earlier you wrote:
>
>>   | Cluster 0   | Cluster 1   |
>>   | CPU0 | CPU1 | CPU2 | CPU3 |
>> ---
>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
>
> So extrapolating that to a 4+4 big-little you'd get something like:
>
>   |   little  A9  ||   big A15 |
>   | 0 | 1 | 2 | 3 || 4 | 5 | 6 | 7 |
> --+---+---+---+---++---+---+---+---+
> buddy | 0 | 0 | 0 | 0 || 0 | 4 | 4 | 4 |
>
> Right?
>
> So supposing the current ILB is 6, we'll only check 4, not 0-3, even
> though there might be a perfectly idle cpu in there.
>
> Also, your scheme fails to pack when cpus 0,4 are filled, even when
> there's idle cores around.
>
> If we'd use the ILB as packing cpu, we would simply select a next pack
> target once the old one fills up.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-01-29 Thread Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters,I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause seems to be:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
have a coherent configuration.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/core.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 257002c..fd41924 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct 
sched_domain *sd)
 
update_group_power(sd, cpu);
atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }
 
 int __weak arch_sd_sibling_asym_packing(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/2] sched: fix update NOHZ_IDLE flag

2013-01-29 Thread Vincent Guittot

The function nohz_kick_needed modifies NOHZ_IDLE flag that is used to update
the nr_busy_cpus of the sched_group.
When the sched_domain are updated (during the boot or because of the unplug of
a CPUs as an example) a null_domain is attached to CPUs. We have to test
likely(!on_null_domain(cpu) first in order to detect such intialization step
and to not modify the NOHZ_IDLE flag

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5eea870..dac2edf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5695,7 +5695,7 @@ void trigger_load_balance(struct rq *rq, int cpu)
likely(!on_null_domain(cpu)))
raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ
-   if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
+   if (likely(!on_null_domain(cpu)) && nohz_kick_needed(rq, cpu))
nohz_balancer_kick(cpu);
 #endif
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 0/2] sched: fix nr_busy_cpus

2013-01-29 Thread Vincent Guittot

The nr_busy_cpus field of the sched_group_power is sometime different from 0
whereas the platform is fully idle. This serie fixes 3 use cases:
 - when some CPUs enter idle state while booting all CPUs
 - when a CPU is unplug and/or replug

Change since V1:
 - remove the patch for SCHED softirq on an idle core use case as it was
   a side effect of the other use cases.

Vincent Guittot (2):
  sched: fix init NOHZ_IDLE flag
  sched: fix update NOHZ_IDLE flag

 kernel/sched/core.c |1 +
 kernel/sched/fair.c |2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] hwmon: (lm90) Add device tree support

2013-02-01 Thread Vincent Palatin

Add support to instantiate LM90-compatible sensors from a device-tree
configuration.
When the kernel has device tree support, we avoid doing the auto-detection
as probing the busses might mess-up sensitive I2C devices or trigger long
timeouts on non-functional busses.

Signed-off-by: Vincent Palatin 
---
 .../devicetree/bindings/i2c/trivial-devices.txt| 19 +
 .../devicetree/bindings/vendor-prefixes.txt|  1 +
 drivers/hwmon/lm90.c   | 47 +-
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt 
b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
index 446859f..4d991ca 100644
--- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt
+++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
@@ -10,6 +10,7 @@ document for it just like any other devices.
 Compatible Vendor / Chip
 == =
 ad,ad7414  SMBus/I2C Digital Temperature Sensor in 6-Pin SOT with 
SMBus Alert and Over Temperature Pin
+ad,adm1032 +/-1C Remote and local system temperature monitor
 ad,adm9240 ADM9240:  Complete System Hardware Monitor for 
uProcessor-Based Systems
 adi,adt7461+/-1C TDM Extended Temp Range I.C
 adt7461+/-1C TDM Extended Temp Range I.C
@@ -35,16 +36,33 @@ fsl,mc13892 MC13892: Power Management Integrated 
Circuit (PMIC) for i.MX35/51
 fsl,mma8450MMA8450Q: Xtrinsic Low-power, 3-axis Xtrinsic 
Accelerometer
 fsl,mpr121 MPR121: Proximity Capacitive Touch Sensor Controller
 fsl,sgtl5000   SGTL5000: Ultra Low-Power Audio Codec
+gmt,g781   +/-1C Remote and local temperature sensor
 maxim,ds1050   5 Bit Programmable, Pulse-Width Modulator
 maxim,max1237  Low-Power, 4-/12-Channel, 2-Wire Serial, 12-Bit ADCs
 maxim,max6625  9-Bit/12-Bit Temperature Sensors with I²C-Compatible 
Serial Interface
+maxim,max6646  +145C Precision SMBus-Compatible Remote/Local Sensors
+maxim,max6647  +145C Precision SMBus-Compatible Remote/Local Sensors
+maxim,max6649  +145C Precision SMBus-Compatible Remote/Local Sensors
+maxim,max6657  +/-1C SMBus-Compatible Remote/Local Sensors
+maxim,max6658  +/-1C SMBus-Compatible Remote/Local Sensors
+maxim,max6659  +/-1C SMBus-Compatible Remote/Local Sensors
+maxim,max6680  +/-1C Fail-Safe Remote/Local Temperature Sensors
+maxim,max6681  +/-1C Fail-Safe Remote/Local Temperature Sensors
+maxim,max6695  Dual Remote/Local Temperature Sensors
+maxim,max6696  Dual Remote/Local Temperature Sensors
 mc,rv3029c2Real Time Clock Module with I2C-Bus
 national,lm75  I2C TEMP SENSOR
 national,lm80  Serial Interface ACPI-Compatible Microprocessor System 
Hardware Monitor
+national,lm86  +/-0.75C Accurate, Remote Diode and Local Digital 
Temperature Sensor with Two-Wire Interface
+national,lm89  +/-0.75C Remote and Local Digital Temperature Sensor 
with Two-Wire Interface-Wire Interface
+national,lm90  +/-3C Accurate, Remote Diode and Local Digital 
Temperature Sensor with Two-Wire Interface
 national,lm92  ±0.33°C Accurate, 12-Bit + Sign Temperature Sensor and 
Thermal Window Comparator with Two-Wire Interface
+national,lm99  +/-1C Accurate, Remote Diode and Local Digital 
Temperature Sensor with Two-Wire Interface
 nxp,pca9556Octal SMBus and I2C registered interface
 nxp,pca95578-bit I2C-bus and SMBus I/O port with reset
 nxp,pcf8563Real-time clock/calendar
+nxp,sa56004remote/local digital temperature sensor with 
overtemperature alarms
+onnn,nct1008   +/-1C Temperature Monitor with Series Resistance 
Cancellation
 ovti,ov5642OV5642: Color CMOS QSXGA (5-megapixel) Image Sensor 
with OmniBSI and Embedded TrueFocus
 pericom,pt7c4338   Real-time Clock Module
 plx,pex864848-Lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch
@@ -59,3 +77,4 @@ taos,tsl2550  Ambient Light Sensor with SMBUS/Two 
Wire Serial Interface
 ti,tsc2003 I2C Touch-Screen Controller
 ti,tmp102  Low Power Digital Temperature Sensor with SMBUS/Two 
Wire Serial Interface
 ti,tmp275  Digital Temperature Sensor
+winbond,w83l771H/W Monitor IC
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 902b1b1..2074699 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -23,6 +23,7 @@ est   ESTeem Wireless Modems
 fslFreescale Semiconductor
 GEFanucGE Fanuc Intelligent Platforms Embedded Systems, Inc.
 gefGE Fanuc Intelligent Platforms Embedded Systems, Inc.
+gmtGlobal Mixed-mode

Re: [PATCH] regmap: debugfs: Fix compilation warning

2013-01-24 Thread Vincent Stehlé

On 01/23/2013 04:58 PM, Mark Brown wrote:
> On Tue, Jan 22, 2013 at 11:07:04AM +0100, Vincent Stehlé wrote:
> 
>> Do you think there is a way to "mark" the list_for_each_entry()
>> as iterating at least once? an __attribute__ maybe?
> 
> No - but are you sure that's true?

If you mean "am I sure the loop iterates at least once", then yes, as
we have an explicit check just before the concerned list_for_each_entry():

/*
 * This should never happen; we return above if we fail to
 * allocate and we should never be in this code if there are
 * no registers at all.
 */
if (list_empty(&map->debugfs_off_cache)) {
WARN_ON(list_empty(&map->debugfs_off_cache));
return base;
}

Best regards,

V.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle

2013-01-24 Thread Vincent Guittot

On 24 January 2013 17:44, Frederic Weisbecker  wrote:
> 2012/12/3 Vincent Guittot :
>> With the coupled cpuidle driver (but probably also with other drivers),
>> a CPU loops in a temporary safe state while waiting for other CPUs of its
>> cluster to be ready to enter the coupled C-state. If an IRQ or a softirq
>> occurs, the CPU will stay in this internal loop if there is no need
>> to resched. The SCHED softirq clears the NOHZ and increases
>> nr_busy_cpus. If there is no need to resched, we will not call
>> set_cpu_sd_state_idle because of this internal loop in a cpuidle state.
>> We have to call set_cpu_sd_state_idle in tick_nohz_irq_exit which is used
>> to handle such situation.
>
> I'm a bit confused with this.
>
> set_cpu_sd_state_busy() is only called from nohz_kick_needed(). And it
> checks idle_cpu() before doing anything. So if no task is going to be
> scheduled, idle_cpu() prevents from calling set_cpu_sd_state_busy().
>
> I'm probably missing something.

Hi Frederic

I can't find back the trace that i had saved with the issue but IIRC
the sequence is:
The CPU is kicked for ILB
The wake_list of the CPU becomes not empty so cpu id not idle
CPU wakes up, updates is timer framework and call nohz_kick_needed the
execute the ILB sequence
we don't go out of the cpuidle driver function because we don't need
to resched so we don't clear the busy state

I'm going to look for the saved trace to check the sequence above

Vincent

>
> Thanks.
>
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/time/tick-sched.c |2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> index 955d35b..b8d74ea 100644
>> --- a/kernel/time/tick-sched.c
>> +++ b/kernel/time/tick-sched.c
>> @@ -570,6 +570,8 @@ void tick_nohz_irq_exit(void)
>> if (!ts->inidle)
>> return;
>>
>> +   set_cpu_sd_state_idle();
>> +
>> /* Cancel the timer because CPU already waken up from the C-states*/
>> menu_hrtimer_cancel();
>> __tick_nohz_idle_enter(ts);
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] sched: fix init NOHZ_IDLE flag

2013-02-26 Thread Vincent Guittot

On 26 February 2013 14:16, Frederic Weisbecker  wrote:
> 2013/2/22 Vincent Guittot :
>> I wanted to avoid having to use the sd pointer for testing NOHZ_IDLE
>> flag because it occurs each time we go into idle but it seems to be
>> not easily feasible.
>> Another solution could be to add a synchronization step between
>> rcu_assign_pointer(dom 1, NULL) and create new domain to ensure that
>> all pending access to old sd values, has finished but this will imply
>> a potential delay in the rebuild  of sched_domain and i'm not sure
>> that it's acceptable
>
> The other issue is that we'll need to abuse the fact that struct
> sched_domain is per cpu in order to store a per cpu state there.
> That's a bit ugly but at least safer.
>
> Also, are struct sched_group and struct sched_group_power shared among
> several CPUs or are they per CPUs allocated as well? I guess they
> aren't otherwise nr_cpus_busy would be pointless.

Yes they are shared between CPUs, per cpu sched_domain points to same
sched_group and sched_group_power.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] sched: fix init NOHZ_IDLE flag

2013-02-27 Thread Vincent Guittot

On 26 February 2013 18:43, Frederic Weisbecker  wrote:
> 2013/2/26 Vincent Guittot :
>> On 26 February 2013 14:16, Frederic Weisbecker  wrote:
>>> 2013/2/22 Vincent Guittot :
>>>> I wanted to avoid having to use the sd pointer for testing NOHZ_IDLE
>>>> flag because it occurs each time we go into idle but it seems to be
>>>> not easily feasible.
>>>> Another solution could be to add a synchronization step between
>>>> rcu_assign_pointer(dom 1, NULL) and create new domain to ensure that
>>>> all pending access to old sd values, has finished but this will imply
>>>> a potential delay in the rebuild  of sched_domain and i'm not sure
>>>> that it's acceptable
>
> Ah I see what you meant there. Making a synchronize_rcu() after
> setting the dom to NULL, on top of which we could work on preventing
> from any concurrent nohz_flag modification. But cpu hotplug seem to
> become a bit of a performance sensitive path this day.

That's was also my concern

>
> Ok I don't like having a per cpu state in struct sched domain but for
> now I can't find anything better. So my suggestion is that we do this
> and describe well the race, define the issue in the changelog and code
> comments and explain how we are solving it. This way at least the
> issue is identified and known. Then later, on review or after the
> patch is upstream, if somebody with some good taste comes with a
> better idea, we consider it.
>
> What do you think?

I don't have better solution than adding this state in the
sched_domain if we want to keep the exact same behavior. This will be
a bit of waste of mem because we don't need to update all sched_domain
level (1st level is enough).

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] sched: fix init NOHZ_IDLE flag

2013-02-27 Thread Vincent Guittot

On 27 February 2013 17:13, Frederic Weisbecker  wrote:
> On Wed, Feb 27, 2013 at 09:28:26AM +0100, Vincent Guittot wrote:
>> > Ok I don't like having a per cpu state in struct sched domain but for
>> > now I can't find anything better. So my suggestion is that we do this
>> > and describe well the race, define the issue in the changelog and code
>> > comments and explain how we are solving it. This way at least the
>> > issue is identified and known. Then later, on review or after the
>> > patch is upstream, if somebody with some good taste comes with a
>> > better idea, we consider it.
>> >
>> > What do you think?
>>
>> I don't have better solution than adding this state in the
>> sched_domain if we want to keep the exact same behavior. This will be
>> a bit of waste of mem because we don't need to update all sched_domain
>> level (1st level is enough).
>
> Or you can try something like the below. Both flags and sched_domain share 
> the same
> object here so the same RCU lifecycle. And there shouldn't be more overhead 
> there
> since accessing rq->sd_rq.sd is the same than rq->sd_rq in the ASM level: only
> one pointer to dereference.

your proposal solves the waste of memory and keeps the sync between
flag and nr_busy. I'm going to try it

Thanks

>
> Also rq_idle becomes a separate value from rq->nohz_flags. It's a simple 
> boolean
> (just making it an int here because boolean size are a bit opaque, although 
> they
> are supposed to be char, let's just avoid surprises in structures).
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index cc03cfd..16c0d55 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -417,7 +417,10 @@ struct rq {
>
>  #ifdef CONFIG_SMP
> struct root_domain *rd;
> -   struct sched_domain *sd;
> +   struct sched_domain_rq {
> +   struct sched_domain sd;
> +   int rq_idle;
> +   } __rcu *sd_rq;
>
> unsigned long cpu_power;
>
> @@ -505,9 +508,14 @@ DECLARE_PER_CPU(struct rq, runqueues);
>
>  #ifdef CONFIG_SMP
>
> -#define rcu_dereference_check_sched_domain(p) \
> -   rcu_dereference_check((p), \
> - lockdep_is_held(&sched_domains_mutex))
> +#define rcu_dereference_check_sched_domain(p) ({\
> +   struct sched_domain_rq *__sd_rq = rcu_dereference_check((p),\
> +   lockdep_is_held(&sched_domains_mutex)); \
> +   if (!__sd_rq)   \
> +   NULL;   \
> +   else\
> +   &__sd_rq->sd;   \
> +})
>
>  /*
>   * The domain tree (rq->sd) is protected by RCU's quiescent state transition.
> @@ -517,7 +525,7 @@ DECLARE_PER_CPU(struct rq, runqueues);
>   * preempt-disabled sections.
>   */
>  #define for_each_domain(cpu, __sd) \
> -   for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
> +   for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd_rq); \
> __sd; __sd = __sd->parent)
>
>  #define for_each_lower_domain(sd) for (; sd; sd = sd->child)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-19 Thread Vincent Guittot

On 18 February 2013 16:40, Frederic Weisbecker  wrote:
> 2013/2/18 Vincent Guittot :
>> On 18 February 2013 15:38, Frederic Weisbecker  wrote:
>>> I pasted the original at: http://pastebin.com/DMm5U8J8
>>
>> We can clear the idle flag only in the nohz_kick_needed which will not
>> be called if the sched_domain is NULL so the sequence will be
>>
>> = CPU 0 == CPU 1=
>>
>> detach_and_destroy_domain {
>> rcu_assign_pointer(cpu1_dom, NULL);
>> }
>>
>> dom = new_domain(...) {
>>  nr_cpus_busy = 0;
>>  set_idle(CPU 1);
>> }
>> dom =
>> rcu_dereference(cpu1_dom)
>> //dom == NULL, return
>>
>> rcu_assign_pointer(cpu1_dom, dom);
>>
>> dom =
>> rcu_dereference(cpu1_dom)
>> //dom != NULL,
>> nohz_kick_needed {
>>
>> set_idle(CPU 1)
>>    dom
>> = rcu_dereference(cpu1_dom)
>>
>> //dec nr_cpus_busy,
>> }
>>
>> Vincent
>
> Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
> already in the middle of nohz_kick_needed().

Yes nothing prevents the sequence below to occur

= CPU 0 == CPU 1=
dom =
rcu_dereference(cpu1_dom)
//dom != NULL
detach_and_destroy_domain {
rcu_assign_pointer(cpu1_dom, NULL);
}

dom = new_domain(...) {
 nr_cpus_busy = 0;
 //nr_cpus_busy in the new_dom
 set_idle(CPU 1);
}
nohz_kick_needed {
 clear_idle(CPU 1)
 dom =
rcu_dereference(cpu1_dom)

//cpu1_dom == old_dom
 inc nr_cpus_busy,

//nr_cpus_busy in the old_dom
}

rcu_assign_pointer(cpu1_dom, dom);
//cpu1_dom == new_dom

I'm not sure that this can happen in practice because CPU1 is in
interrupt handler but we don't have any mechanism to prevent the
sequence.

The NULL sched_domain can be used to detect this situation and the
set_cpu_sd_state_busy function can be modified like below

inline void set_cpu_sd_state_busy
 {
struct sched_domain *sd;
int cpu = smp_processor_id();
+   int clear = 0;

if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
return;
-   clear_bit(NOHZ_IDLE, nohz_flags(cpu));

rcu_read_lock();
for_each_domain(cpu, sd) {
atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+   clear = 1;
}
rcu_read_unlock();
+
+   if (likely(clear))
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }

The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain
attached to the CPU.
With this implementation, we still don't need to get the sched_domain
for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle

The patch 2 become useless

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug

2013-02-19 Thread Vincent Guittot

On 18 February 2013 20:53, Steven Rostedt  wrote:
> On Mon, 2013-02-18 at 17:50 +0100, Vincent Guittot wrote:
>
>> yes for sure.
>> The problem is more linked to cpuidle and function tracer.
>>
>> cpu hotplug and function tracer work when cpuidle is disable.
>> cpu hotplug and cpuidle works if i don't enable function tracer.
>> my platform is dead as soon as I enable function tracer if cpuidle is
>> enabled. It looks like some notrace are missing in my platform driver
>> but we haven't completely fix the issue yet
>>
>
> You can bisect to find out exactly what function is the problem:
>
>  cat /debug/tracing/available_filter_functions > t
>
> f(t) {
>  num=`wc -l t`
>  sed -ne "1,${num}p" t > t1
>  let num=num+1
>  sed -ne "${num},$p" t > t2
>
>  cat t1 > /debug/tracing/set_ftrace_filter
>  # note this may take a long time to finish
>
>  echo function > /debug/tracing/current_tracer
>
>  
> }
>

Thanks, i'm going to have a look

Vincent

> -- Steve
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag

2013-02-19 Thread Vincent Guittot

On 19 February 2013 11:29, Vincent Guittot  wrote:
> On 18 February 2013 16:40, Frederic Weisbecker  wrote:
>> 2013/2/18 Vincent Guittot :
>>> On 18 February 2013 15:38, Frederic Weisbecker  wrote:
>>>> I pasted the original at: http://pastebin.com/DMm5U8J8
>>>
>>> We can clear the idle flag only in the nohz_kick_needed which will not
>>> be called if the sched_domain is NULL so the sequence will be
>>>
>>> = CPU 0 == CPU 1=
>>>
>>> detach_and_destroy_domain {
>>> rcu_assign_pointer(cpu1_dom, NULL);
>>> }
>>>
>>> dom = new_domain(...) {
>>>  nr_cpus_busy = 0;
>>>  set_idle(CPU 1);
>>> }
>>> dom =
>>> rcu_dereference(cpu1_dom)
>>> //dom == NULL, return
>>>
>>> rcu_assign_pointer(cpu1_dom, dom);
>>>
>>> dom =
>>> rcu_dereference(cpu1_dom)
>>> //dom != NULL,
>>>     nohz_kick_needed {
>>>
>>> set_idle(CPU 1)
>>>dom
>>> = rcu_dereference(cpu1_dom)
>>>
>>> //dec nr_cpus_busy,
>>> }
>>>
>>> Vincent
>>
>> Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is
>> already in the middle of nohz_kick_needed().
>
> Yes nothing prevents the sequence below to occur
>
> = CPU 0 == CPU 1=
> dom =
> rcu_dereference(cpu1_dom)
> //dom != NULL
> detach_and_destroy_domain {
> rcu_assign_pointer(cpu1_dom, NULL);
> }
>
> dom = new_domain(...) {
>  nr_cpus_busy = 0;
>  //nr_cpus_busy in the new_dom
>  set_idle(CPU 1);
> }
> nohz_kick_needed {
>  clear_idle(CPU 1)
>  dom =
> rcu_dereference(cpu1_dom)
>
> //cpu1_dom == old_dom
>  inc nr_cpus_busy,
>
> //nr_cpus_busy in the old_dom
> }
>
> rcu_assign_pointer(cpu1_dom, dom);
> //cpu1_dom == new_dom

The sequence above is not correct in addition to become unreadable
after going through gmail

The correct and readable version
https://pastebin.linaro.org/1750/

Vincent

>
> I'm not sure that this can happen in practice because CPU1 is in
> interrupt handler but we don't have any mechanism to prevent the
> sequence.
>
> The NULL sched_domain can be used to detect this situation and the
> set_cpu_sd_state_busy function can be modified like below
>
> inline void set_cpu_sd_state_busy
>  {
> struct sched_domain *sd;
> int cpu = smp_processor_id();
> +   int clear = 0;
>
> if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
> return;
> -   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>
> rcu_read_lock();
> for_each_domain(cpu, sd) {
>     atomic_inc(&sd->groups->sgp->nr_busy_cpus);
> +   clear = 1;
> }
> rcu_read_unlock();
> +
> +   if (likely(clear))
> +   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>  }
>
> The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain
> attached to the CPU.
> With this implementation, we still don't need to get the sched_domain
> for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle
>
> The patch 2 become useless
>
> Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v4] sched: fix init NOHZ_IDLE flag

2013-02-21 Thread Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters, I have the
nr_busy_cpu field of sched_group_power struct that is not null when the
platform is fully idle. The root cause seems to be:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
During the initialization of the sched_domain, we set the NOHZ_IDLE flag when
nr_busy_cpus is initialized to 0 in order to have a coherent configuration.
If a CPU enters idle and call set_cpu_sd_state_idle during the build of the
new sched_domain it will not corrupt the initial state
set_cpu_sd_state_busy is modified and clears the NOHZ_IDLE only if a non NULL
sched_domain is attached to the CPU (which is the case during the rebuild)

Change since V3;
 - NOHZ flag is not cleared if a NULL domain is attached to the CPU
 - Remove patch 2/2 which becomes useless with latest modifications

Change since V2:
 - change the initialization to idle state instead of busy state so a CPU that
   enters idle during the build of the sched_domain will not corrupt the
   initialization state

Change since V1:
 - remove the patch for SCHED softirq on an idle core use case as it was
   a side effect of the other use cases.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/core.c |4 +++-
 kernel/sched/fair.c |9 +++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 26058d0..c730a4e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5884,7 +5884,9 @@ static void init_sched_groups_power(int cpu, struct 
sched_domain *sd)
return;
 
update_group_power(sd, cpu);
-   atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
+   atomic_set(&sg->sgp->nr_busy_cpus, 0);
+   set_bit(NOHZ_IDLE, nohz_flags(cpu));
+
 }
 
 int __weak arch_sd_sibling_asym_packing(void)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 81fa536..2701a92 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5403,15 +5403,20 @@ static inline void set_cpu_sd_state_busy(void)
 {
struct sched_domain *sd;
int cpu = smp_processor_id();
+   int clear = 0;
 
if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
return;
-   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 
rcu_read_lock();
-   for_each_domain(cpu, sd)
+   for_each_domain(cpu, sd) {
atomic_inc(&sd->groups->sgp->nr_busy_cpus);
+   clear = 1;
+   }
rcu_read_unlock();
+
+   if (likely(clear))
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }
 
 void set_cpu_sd_state_idle(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] sched: fix init NOHZ_IDLE flag

2013-02-22 Thread Vincent Guittot

On 22 February 2013 13:32, Frederic Weisbecker  wrote:
> On Thu, Feb 21, 2013 at 09:29:16AM +0100, Vincent Guittot wrote:
>> On my smp platform which is made of 5 cores in 2 clusters, I have the
>> nr_busy_cpu field of sched_group_power struct that is not null when the
>> platform is fully idle. The root cause seems to be:
>> During the boot sequence, some CPUs reach the idle loop and set their
>> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
>> field is initialized later with the assumption that all CPUs are in the busy
>> state whereas some CPUs have already set their NOHZ_IDLE flag.
>> During the initialization of the sched_domain, we set the NOHZ_IDLE flag when
>> nr_busy_cpus is initialized to 0 in order to have a coherent configuration.
>> If a CPU enters idle and call set_cpu_sd_state_idle during the build of the
>> new sched_domain it will not corrupt the initial state
>> set_cpu_sd_state_busy is modified and clears the NOHZ_IDLE only if a non NULL
>> sched_domain is attached to the CPU (which is the case during the rebuild)
>>
>> Change since V3;
>>  - NOHZ flag is not cleared if a NULL domain is attached to the CPU
>>  - Remove patch 2/2 which becomes useless with latest modifications
>>
>> Change since V2:
>>  - change the initialization to idle state instead of busy state so a CPU 
>> that
>>enters idle during the build of the sched_domain will not corrupt the
>>initialization state
>>
>> Change since V1:
>>  - remove the patch for SCHED softirq on an idle core use case as it was
>>a side effect of the other use cases.
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>  kernel/sched/core.c |4 +++-
>>  kernel/sched/fair.c |9 +++--
>>  2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 26058d0..c730a4e 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5884,7 +5884,9 @@ static void init_sched_groups_power(int cpu, struct 
>> sched_domain *sd)
>>   return;
>>
>>   update_group_power(sd, cpu);
>> - atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
>> + atomic_set(&sg->sgp->nr_busy_cpus, 0);
>> + set_bit(NOHZ_IDLE, nohz_flags(cpu));
>> +
>>  }
>>
>>  int __weak arch_sd_sibling_asym_packing(void)
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 81fa536..2701a92 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5403,15 +5403,20 @@ static inline void set_cpu_sd_state_busy(void)
>>  {
>>   struct sched_domain *sd;
>>   int cpu = smp_processor_id();
>> + int clear = 0;
>>
>>   if (!test_bit(NOHZ_IDLE, nohz_flags(cpu)))
>>   return;
>> - clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>>
>>   rcu_read_lock();
>> - for_each_domain(cpu, sd)
>> + for_each_domain(cpu, sd) {
>>   atomic_inc(&sd->groups->sgp->nr_busy_cpus);
>> + clear = 1;
>> + }
>>   rcu_read_unlock();
>> +
>> + if (likely(clear))
>> + clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>
> I fear there is still a race window:
>
>   = CPU 0 = = CPU 1 =
>  // NOHZ_IDLE is set
>  set_cpu_sd_state_busy() {
>  dom1 = rcu_dereference(dom1);
>  inc(dom1->nr_busy_cpus)
>
> rcu_assign_pointer(dom 1, NULL)
> // create new domain
> init_sched_group_power() {
> atomic_set(&tmp->nr_busy_cpus, 0);
> set_bit(NOHZ_IDLE, nohz_flags(cpu 1));
> rcu_assign_pointer(dom 1, tmp)
>
>
>
>   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
>   }
>
>
> I don't know if there is any sane way to deal with this issue other than
> having nr_busy_cpus and nohz_flags in the same object sharing the same
> lifecycle.

I wanted to avoid having to use the sd pointer for testing NOHZ_IDLE
flag because it occurs each time we go into idle but it seems to be
not easily feasible.
Another solution could be to add a synchronization step between
rcu_assign_pointer(dom 1, NULL) and create new domain to ensure that
all pending access to old sd values, has finished but this will imply
a potential delay in the rebuild  of sched_domain and i'm not sure
that it's acceptable

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] topology: removed kzalloc return value cast

2013-03-13 Thread Vincent Guittot

On 10 March 2013 21:35, Mihai Stirbat  wrote:
> Signed-off-by: Mihai Stirbat 
> ---
>  arch/arm/kernel/topology.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index 79282eb..f10316b 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -100,7 +100,7 @@ static void __init parse_dt_topology(void)
> int alloc_size, cpu = 0;
>
> alloc_size = nr_cpu_ids * sizeof(struct cpu_capacity);
> -   cpu_capacity = (struct cpu_capacity *)kzalloc(alloc_size, GFP_NOWAIT);
> +   cpu_capacity = kzalloc(alloc_size, GFP_NOWAIT);

you're right

Acked-by: Vincent Guittot 

>
> while ((cn = of_find_node_by_type(cn, "cpu"))) {
> const u32 *rate, *reg;
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] regulator: disable supply regulator if it is enabled for boot-on

2012-08-28 Thread Rabin Vincent

2012/8/28 Laxman Dewangan :
> I tried to reproduce the lockup issue with the following change but not
> seeing any lockup issue.

Did you enable CONFIG_PROVE_LOCKING?

> Also reviewing the change, I am not seeing any call trace where the
> recursive locking happening.

There's probably no actual recursive locking, but the lockdep warning
itself is a problem which must be eliminated.  You could perhaps do this
by doing the regulator_disable(rdev->supply); after you mutex unlock the
rdev->mutex.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] regulator: disable supply regulator if it is enabled for boot-on

2012-08-30 Thread Rabin Vincent

2012/8/29 Laxman Dewangan :
> @@ -3614,8 +3615,11 @@ static int __init regulator_init_complete(void)
>
> mutex_lock(&rdev->mutex);
>
> -   if (rdev->use_count)
> +   if (rdev->use_count) {
> +   if (rdev->supply && c->boot_on)
> +   supply_disable = true;
> goto unlock;
> +   }
>
> /* If we can't read the status assume it's on. */
> if (ops->is_enabled)
> @@ -3634,6 +3638,8 @@ static int __init regulator_init_complete(void)
> if (ret != 0) {
> rdev_err(rdev, "couldn't disable: %d\n", ret);
> }
> +   if (rdev->supply)
> +   supply_disable = true;
> } else {
> /* The intention is that in future we will
>  * assume that full constraints are provided

This does not handle the case where a regulator is not set boot_on but
is considered on (for example, because of the lack of an is_enabled
callback), and is later actually enabled by a consumer before
regulator_init_complete().  In this case, the supply's use count will
still be one more than it should be, because the "&& c->boot_on"
condition above will fail.

To fix this, you should probably note which regulators' supplies you
enable in regulator_register() and use that information in the above two
checks here in regulator_init_complete().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] sched: nohz_idle_balance

2012-09-12 Thread Vincent Guittot

On tickless system, one CPU runs load balance for all idle CPUs.
The cpu_load of this CPU is updated before starting the load balance
of each other idle CPUs. We should instead update the cpu_load of the 
balance_cpu.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c |   11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1ca4fe4..9ae3a5b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4794,14 +4794,15 @@ static void nohz_idle_balance(int this_cpu, enum 
cpu_idle_type idle)
if (need_resched())
break;
 
-   raw_spin_lock_irq(&this_rq->lock);
-   update_rq_clock(this_rq);
-   update_idle_cpu_load(this_rq);
-   raw_spin_unlock_irq(&this_rq->lock);
+   rq = cpu_rq(balance_cpu);
+
+   raw_spin_lock_irq(&rq->lock);
+   update_rq_clock(rq);
+   update_idle_cpu_load(rq);
+   raw_spin_unlock_irq(&rq->lock);
 
rebalance_domains(balance_cpu, CPU_IDLE);
 
-   rq = cpu_rq(balance_cpu);
if (time_after(this_rq->next_balance, rq->next_balance))
this_rq->next_balance = rq->next_balance;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] sched: nohz_idle_balance

2012-09-13 Thread Vincent Guittot

Wrong button make me removed others guys from the thread.

Sorry for this mistake.

On 13 September 2012 09:56, Mike Galbraith  wrote:
> On Thu, 2012-09-13 at 09:44 +0200, Vincent Guittot wrote:
>> On 13 September 2012 09:29, Mike Galbraith  wrote:
>> > On Thu, 2012-09-13 at 08:59 +0200, Vincent Guittot wrote:
>> >> On 13 September 2012 08:49, Mike Galbraith  wrote:
>> >> > On Thu, 2012-09-13 at 06:11 +0200, Vincent Guittot wrote:
>> >> >> On tickless system, one CPU runs load balance for all idle CPUs.
>> >> >> The cpu_load of this CPU is updated before starting the load balance
>> >> >> of each other idle CPUs. We should instead update the cpu_load of the 
>> >> >> balance_cpu.
>> >> >>
>> >> >> Signed-off-by: Vincent Guittot 
>> >> >> ---
>> >> >>  kernel/sched/fair.c |   11 ++-
>> >> >>  1 file changed, 6 insertions(+), 5 deletions(-)
>> >> >>
>> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> >> index 1ca4fe4..9ae3a5b 100644
>> >> >> --- a/kernel/sched/fair.c
>> >> >> +++ b/kernel/sched/fair.c
>> >> >> @@ -4794,14 +4794,15 @@ static void nohz_idle_balance(int this_cpu, 
>> >> >> enum cpu_idle_type idle)
>> >> >>   if (need_resched())
>> >> >>   break;
>> >> >>
>> >> >> - raw_spin_lock_irq(&this_rq->lock);
>> >> >> - update_rq_clock(this_rq);
>> >> >> - update_idle_cpu_load(this_rq);
>> >> >> - raw_spin_unlock_irq(&this_rq->lock);
>> >> >> + rq = cpu_rq(balance_cpu);
>> >> >> +
>> >> >> + raw_spin_lock_irq(&rq->lock);
>> >> >> + update_rq_clock(rq);
>> >> >> + update_idle_cpu_load(rq);
>> >> >> + raw_spin_unlock_irq(&rq->lock);
>> >> >>
>> >> >>   rebalance_domains(balance_cpu, CPU_IDLE);
>> >> >>
>> >> >> - rq = cpu_rq(balance_cpu);
>> >> >>   if (time_after(this_rq->next_balance, rq->next_balance))
>> >> >>   this_rq->next_balance = rq->next_balance;
>> >> >>   }
>> >> >
>> >> > Ew, banging locks and updating clocks to what good end?
>> >>
>> >> The goal is to update the cpu_load table of the CPU before starting
>> >> the load balance. Other wise we will use outdated value in the load
>> >> balance sequence
>> >
>> > If there's load to distribute, seems it should all work out fine without
>> > doing that.  What harm is being done that makes this worth while?
>>
>> this_load and avg_load can be wrong and make an idle CPU set as
>> balanced compared to the busy one
>
> I think you need to present numbers showing benefit.  Crawling all over
> a mostly idle (4096p?) box is decidedly bad thing to do.

Yep, let me prepare some figures

You should also notice that you are already crawling all over the idle
processor in rebalance_domains

Vincent

>
> -Mike
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/5] ARM: topology: set the capacity of each cores for big.LITTLE

2012-09-13 Thread Vincent Guittot

On 10 July 2012 15:42, Peter Zijlstra  wrote:
> On Tue, 2012-07-10 at 14:35 +0200, Vincent Guittot wrote:
>>
>> May be the last one which enable ARCH_POWER should also go into tip ?
>>
> OK, I can take it.

Hi Peter,

I can't find the patch that enable ARCH_POWER in the tip tree. Have
you take it in your tree ?

Regards,
Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/5] ARM: topology: set the capacity of each cores for big.LITTLE

2012-09-13 Thread Vincent Guittot

On 13 September 2012 14:07, Peter Zijlstra  wrote:
> On Thu, 2012-09-13 at 11:17 +0200, Vincent Guittot wrote:
>> On 10 July 2012 15:42, Peter Zijlstra  wrote:
>> > On Tue, 2012-07-10 at 14:35 +0200, Vincent Guittot wrote:
>> >>
>> >> May be the last one which enable ARCH_POWER should also go into tip ?
>> >>
>> > OK, I can take it.
>>
>> Hi Peter,
>>
>> I can't find the patch that enable ARCH_POWER in the tip tree. Have
>> you take it in your tree ?
>
>
> Uhmmm how about I say I have now? Sorry about that.

ok, thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Remove unneeded code in sys_getpriority

2008-02-02 Thread Rabin Vincent

This check is not required because the condition is always true.

Signed-off-by: Rabin Vincent <[EMAIL PROTECTED]>
---
 kernel/sys.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index d1fe71e..a001974 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -212,11 +212,8 @@ asmlinkage long sys_getpriority(int which, int who)
p = find_task_by_vpid(who);
else
p = current;
-   if (p) {
-   niceval = 20 - task_nice(p);
-   if (niceval > retval)
-   retval = niceval;
-   }
+   if (p)
+   retval = 20 - task_nice(p);
break;
case PRIO_PGRP:
if (who)
-- 
1.5.3.8

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove unneeded code in sys_getpriority

2008-02-03 Thread Rabin Vincent

On Sun, Feb 03, 2008 at 10:54:45AM +0100, Frank Seidel wrote:
> On Sunday 03 February 2008 04:04, Rabin Vincent wrote:
> > This check is not required because the condition is always true.
> > ...
> > -   if (niceval > retval)
> > -   retval = niceval;
> > +   retval = 20 - task_nice(p);
> 
> Thats surely correct, but on the other hand currently those
> case blocks are quite independet of their possition/could easily
> be rearranged now .. or think of another case is put ahead.
> Then this could mess up things.

Do you mean the PRIO_* cases in the switch?  They're still independent
of position after the patch because they don't fall through.

> Thanks,
> Frank

Rabin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] USB: ohci-exynos: initialize registers pointer earlier

2012-11-01 Thread Vincent Palatin

In the former code, we have a race condition between the first interrupt
and the regs field initilization in the usb_hcd structure.
If the OHCI irq fires before hcd->regs is set, we are getting a null
pointer dereference in ohci_irq.

When calling usb_add_hcd(), it first executes the reset() callback,
then enables the ohci interrupt, and finally executes the start()
callback. So moving the ohci_init() call which actually initializes the
reg field from start() to reset() should remove the race.

Tested by enabling the external HSIC hub in the bootloader on an exynos5
machine and booting. With the former code, this triggers an early interrupt
about 50% of the boots and a subsequent kernel panic in ohci_irq when trying
to access the registers.

Cc: Olof Johansson 
Cc: Doug Anderson 
Cc: Arjun.K.V 
Cc: Vikas Sajjan 
Cc: Abhilash Kesavan 
Signed-off-by: Vincent Palatin 
---
 drivers/usb/host/ohci-exynos.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/ohci-exynos.c b/drivers/usb/host/ohci-exynos.c
index 20a5008..f04cfde 100644
--- a/drivers/usb/host/ohci-exynos.c
+++ b/drivers/usb/host/ohci-exynos.c
@@ -23,6 +23,11 @@ struct exynos_ohci_hcd {
struct clk *clk;
 };
 
+static int ohci_exynos_reset(struct usb_hcd *hcd)
+{
+   return ohci_init(hcd_to_ohci(hcd));
+}
+
 static int ohci_exynos_start(struct usb_hcd *hcd)
 {
struct ohci_hcd *ohci = hcd_to_ohci(hcd);
@@ -30,10 +35,6 @@ static int ohci_exynos_start(struct usb_hcd *hcd)
 
ohci_dbg(ohci, "ohci_exynos_start, ohci:%p", ohci);
 
-   ret = ohci_init(ohci);
-   if (ret < 0)
-   return ret;
-
ret = ohci_run(ohci);
if (ret < 0) {
dev_err(hcd->self.controller, "can't start %s\n",
@@ -53,6 +54,7 @@ static const struct hc_driver exynos_ohci_hc_driver = {
.irq= ohci_irq,
.flags  = HCD_MEMORY|HCD_USB11,
 
+   .reset  = ohci_exynos_reset,
.start  = ohci_exynos_start,
.stop   = ohci_stop,
.shutdown   = ohci_shutdown,
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-07-27 Thread Vincent ETIENNE

 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
do_page_fault+0x1aa/0x3c0
Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
cp_new_stat+0x10d/0x120
Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
vfs_fstatat+0x41/0x80
Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
sys_newstat+0x1f/0x50
Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
system_call_fastpath+0x16/0x1b
Jul 27 23:41:41 jupiter2 kernel: [  351.170003] Code: b6 44 24 18 4c 89
e7 83 e0 80 3c 01 19 db e8 76 3f 00 00 f7 d3 83 e3 a1 89 d8 5b 5d 41 5c
c3 0f 0b eb fe 0f 0b eb fe 0f 0$
Jul 27 23:41:41 jupiter2 kernel: [  351.170003] RIP 
[] submit_bh+0x112/0x120
Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  RSP 
Jul 27 23:41:41 jupiter2 kernel: [  351.177405] ---[ end trace
e1e88bdf12146104 ]---
Jul 27 23:41:41 jupiter2 kernel: [  351.177868] deliver (5783) used
greatest stack depth: 3032 bytes left

Regards,

Vincent ETIENNE

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-07-30 Thread Vincent ETIENNE


HI,

Le 30/07/2012 08:30, Joel Becker a écrit :
> On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
>> Hello
>>
>> Get this on first write made ( by deliver sending mail to inform of the
>> restart of services  )
>> Home partition (the one receiving the mail) is based on ocfs2 created
>> from drbd block device in primary/primary mode
>> These drbd devices are based on lvm.
>>
>> system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2
>> but working with linux 3.0 kernel
>>
>> reproduced on two machines ( so different hardware involved on this one
>> software md raid on SATA, on second one areca hardware raid card )
>> but the 2 machines are the one sharing this partition ( so share the
>> same data )
>   Hmm.  Any chance you can bisect this further?

Will try to. Will take a few days as the server is in production ( but
used as backup so...)

>> Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
>> ]
>> Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
>> fs/buffer.c:2886!
>   This is:
>
>   BUG_ON(!buffer_mapped(bh));
>
> in submit_bh().
>
>
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003] Call Trace:
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> ocfs2_read_blocks+0x176/0x6c0
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> T.1552+0x91/0x2b0
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> ocfs2_find_actor+0x120/0x120
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> ocfs2_read_inode_block_full+0x37/0x60
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> ocfs2_fast_symlink_readpage+0x2f/0x160
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> do_read_cache_page+0x85/0x180
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> ocfs2_fill_super+0x2500/0x2500
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> read_cache_page+0x9/0x20
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> page_getlink+0x25/0x80
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> page_follow_link_light+0x1b/0x30
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> path_lookupat+0x38b/0x720
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> do_path_lookup+0x2c/0xd0
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> ocfs2_inode_revalidate+0x71/0x160
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> user_path_at_empty+0x5c/0xb0
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> do_page_fault+0x1aa/0x3c0
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> cp_new_stat+0x10d/0x120
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> vfs_fstatat+0x41/0x80
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> sys_newstat+0x1f/0x50
>> Jul 27 23:41:41 jupiter2 kernel: [  351.170003]  [] ?
>> system_call_fastpath+0x16/0x1b
>   This stack trace is from 3.5, because of the location of the
> BUG.  The call path in the trace suggests the code added by Al's ea022d,
> but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
> from 3.2?

For a 3.2 kernel i get this stack trace. Different trace form 3.5 but
exactly at the same moment. and for the same reasons.
Seems to be less immmediate than with 3.5 but more a subjective
imrpession than something based on fact. ( it takes a few seconds after
deliver is started to have the bug )


[  716.402833] o2dlm: Joining domain B43153ED20B942E291251F2C138ADA9E (
0 1 ) 2 nodes
[  716.501511] ocfs2: Mounting device (147,2) on (node 1, slot 0) with
ordered data mode.
[  716.505744] mount.ocfs2 used greatest stack depth: 2936 bytes left
[  727.133743] deliver used greatest stack depth: 2632 bytes left
[  764.167029] deliver used greatest stack depth: 1896 bytes left
[  764.778872] BUG: unable to handle kernel NULL pointer dereference at
0038
[  764.778897] IP: []
__ocfs2_change_file_space+0x75a/0x1690
[  764.778922] PGD 62697067 PUD 67a81067 PMD 0
[  764.778939] Oops:  [#1] SMP
[  764.778953] CPU 0
[  764.778959] Modules linked in: drbd lru_cache ipv6 [last unloaded: drbd]
[  764.778986]
[  764.778993] Pid: 5909, comm: deliver Not tainted 3.2.12-gentoo #2 HP
ProLiant ML150 G3/ML150 G3
[  764.779017] RIP: 0010:[]  []
__ocfs2_change_file_space+0x75a/0x1690
[  764.779041] RSP: 0018:880067b2dd98  EFLAGS: 00010246
[  764.779053] RAX:  RBX: 880067f82000 RCX:
880063d11000
[  764.779069] RDX:  RSI: 0001 RDI:
88007ae83288
[  764.779085] RBP: 880055d1f138 R08: 0010 R09:
88

Re: kernel BUG at fs/buffer.c:2886! Linux 3.5.0

2012-07-30 Thread Vincent ETIENNE




On 30/07/2012 09:53, Joel Becker wrote:
> On Mon, Jul 30, 2012 at 09:45:14AM +0200, Vincent ETIENNE wrote:
>> Le 30/07/2012 08:30, Joel Becker a écrit :
>>> On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote:
>>>> Hello
>>>>
>>>> Get this on first write made ( by deliver sending mail to inform of the
>>>> restart of services  )
>>>> Home partition (the one receiving the mail) is based on ocfs2 created
>>>> from drbd block device in primary/primary mode
>>>> These drbd devices are based on lvm.
>>>>
>>>> system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2
>>>> but working with linux 3.0 kernel
>>>>
>>>> reproduced on two machines ( so different hardware involved on this one
>>>> software md raid on SATA, on second one areca hardware raid card )
>>>> but the 2 machines are the one sharing this partition ( so share the
>>>> same data )
>>> Hmm.  Any chance you can bisect this further?
>> Will try to. Will take a few days as the server is in production ( but
>> used as backup so...)
>>
>>>> Jul 27 23:41:41 jupiter2 kernel: [  351.169213] [ cut here
>>>> ]
>>>> Jul 27 23:41:41 jupiter2 kernel: [  351.169261] kernel BUG at
>>>> fs/buffer.c:2886!
>>> This is:
>>>
>>> BUG_ON(!buffer_mapped(bh));
>>>
>>> in submit_bh().
>>>
>>> system_call_fastpath+0x16/0x1b
>>> This stack trace is from 3.5, because of the location of the
>>> BUG.  The call path in the trace suggests the code added by Al's ea022d,
>>> but you say it breaks in 3.2 and 3.3 as well.  Can you give me a trace
>>> from 3.2?
>> For a 3.2 kernel i get this stack trace. Different trace form 3.5 but
>> exactly at the same moment. and for the same reasons.
>> Seems to be less immmediate than with 3.5 but more a subjective
>> imrpession than something based on fact. ( it takes a few seconds after
>> deliver is started to have the bug )
> Totally different stack trace.  Not in symlink code, but instead in
> fallocate.  Weird.  I wonder if you are hitting two things.  Bisection
> will definitely help.

Yes could be, that would explain the 2 stack trace ( and the different
timing observed )
Bisection is in progress. The fallocate bug is certainly already
corrected ( info sent by
sunil.mush...@gmail.com but unavailable on the list for the moment  ?)

--

The fallocate() oops is probably the same that is fixed by this patch.
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a


Is in the list of patches that are ready to be pushed.
https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=shortlog;h=mw-3.4-mar15



But not sure it will correct all i observed. So i will continue to
bisect to confirm/infirm.
( But i seems to have lost network on my server after a reboot and so no
more access before tomorrow , I have certainly forget to do make
modules_install before installing new kernel ... Being stupid is not
very helpful... ) . I hope to finish the bisection tomorrow or wednesday.
 
Thanks a lot for the support.
> Joel
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] lib: vsprintf: Optimize put_dec_trunc8

2012-09-23 Thread Rabin Vincent

2012/8/3 George Spelvin :
> If you're going to have a conditional branch after
> each 32x32->64-bit multiply, might as well shrink the code
> and make it a loop.
>
> This also avoids using the long multiply for small integers.
>
> (This leaves the comments in a confusing state, but that's a separate
> patch to make review easier.)
>
> Signed-off-by: George Spelvin 

This patch breaks IP address printing with "%pI4" (and by extension,
nfsroot).  Example:

 - Before: 10.0.0.1
 - After: 10...1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one

2012-09-25 Thread Vincent Guittot

On 25 September 2012 13:30, Viresh Kumar  wrote:
> On 25 September 2012 16:52, Peter Zijlstra  wrote:
>> On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:
>>> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq,
>>> struct work_struct *work)
>>>  {
>>> int ret;
>>>
>>> -   ret = queue_work_on(get_cpu(), wq, work);
>>> -   put_cpu();
>>> +   preempt_disable();
>>> +   ret = queue_work_on(wq_select_cpu(), wq, work);
>>> +   preempt_enable();
>>>
>>> return ret;
>>>  }
>>
>> Right, so the problem I see here is that wq_select_cpu() is horridly
>> expensive..
>
> But this is what the initial idea during LPC we had. Any improvements here
> you can suggest?

The main outcome of the LPC was that we should be able to select
another CPU than the local one.
Using the same policy than timer, is a 1st step to consolidate
interface. A next step should be to update the policy of the function

Vincent
>
>>> @@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long
>>> __data)
>>> struct delayed_work *dwork = (struct delayed_work *)__data;
>>> struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
>>>
>>> -   __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
>>> +   __queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
>>>  }
>>
>> Shouldn't timer migration have sorted this one?
>
> Maybe yes. Will investigate more on it.
>
> Thanks for your early feedback.
>
> --
> viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 2/6] sched: add a new SD SHARE_POWERLINE flag for sched_domain

2012-10-29 Thread Vincent Guittot

On 24 October 2012 17:17, Santosh Shilimkar  wrote:
> Vincent,
>
> Few comments/questions.
>
>
> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
>>
>> This new flag SD SHARE_POWERLINE reflects the sharing of the power rail
>> between the members of a domain. As this is the current assumption of the
>> scheduler, the flag is added to all sched_domain
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>   arch/ia64/include/asm/topology.h |1 +
>>   arch/tile/include/asm/topology.h |1 +
>>   include/linux/sched.h|1 +
>>   include/linux/topology.h |3 +++
>>   kernel/sched/core.c  |5 +
>>   5 files changed, 11 insertions(+)
>>
>> diff --git a/arch/ia64/include/asm/topology.h
>> b/arch/ia64/include/asm/topology.h
>> index a2496e4..065c720 100644
>> --- a/arch/ia64/include/asm/topology.h
>> +++ b/arch/ia64/include/asm/topology.h
>> @@ -65,6 +65,7 @@ void build_cpu_to_node_map(void);
>> | SD_BALANCE_EXEC   \
>> | SD_BALANCE_FORK   \
>> | SD_WAKE_AFFINE,   \
>> +   | arch_sd_share_power_line()\
>> .last_balance   = jiffies,  \
>> .balance_interval   = 1,\
>> .nr_balance_failed  = 0,\
>> diff --git a/arch/tile/include/asm/topology.h
>> b/arch/tile/include/asm/topology.h
>> index 7a7ce39..d39ed0b 100644
>> --- a/arch/tile/include/asm/topology.h
>> +++ b/arch/tile/include/asm/topology.h
>> @@ -72,6 +72,7 @@ static inline const struct cpumask *cpumask_of_node(int
>> node)
>> | 0*SD_PREFER_LOCAL \
>> | 0*SD_SHARE_CPUPOWER   \
>> | 0*SD_SHARE_PKG_RESOURCES  \
>> +   | arch_sd_share_power_line()\
>> | 0*SD_SERIALIZE\
>> ,   \
>> .last_balance   = jiffies,  \
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 4786b20..74f2daf 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -862,6 +862,7 @@ enum cpu_idle_type {
>>   #define SD_WAKE_AFFINE0x0020  /* Wake task to waking CPU
>> */
>>   #define SD_PREFER_LOCAL   0x0040  /* Prefer to keep tasks
>> local to this domain */
>>   #define SD_SHARE_CPUPOWER 0x0080  /* Domain members share cpu power
>> */
>> +#define SD_SHARE_POWERLINE 0x0100  /* Domain members share power
>> domain */
>
> If you ignore the current use of SD_SHARE_CPUPOWER, isn't the meaning of
> CPUPOWER and POWERLINE is same here. Just trying to understand the clear
> meaning of this new flag. Have you not considered SD_SHARE_CPUPOWER
> because it is being used for cpu_power and needs at least minimum two
> domains ? SD_PACKING would have been probably more appropriate based
> on the way it is being used in further series.

CPUPOWER reflects the share of hw ressources between cores like for
hyper threading. POWERLINE describes the fact that cores are sharing
the same power line amore precisely the powergate.
>
> Regards
> Santosh
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 2/6] sched: add a new SD SHARE_POWERLINE flag for sched_domain

2012-10-29 Thread Vincent Guittot

 It looks like i need to describe more what

On 29 October 2012 10:40, Vincent Guittot  wrote:
> On 24 October 2012 17:17, Santosh Shilimkar  wrote:
>> Vincent,
>>
>> Few comments/questions.
>>
>>
>> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
>>>
>>> This new flag SD SHARE_POWERLINE reflects the sharing of the power rail
>>> between the members of a domain. As this is the current assumption of the
>>> scheduler, the flag is added to all sched_domain
>>>
>>> Signed-off-by: Vincent Guittot 
>>> ---
>>>   arch/ia64/include/asm/topology.h |1 +
>>>   arch/tile/include/asm/topology.h |1 +
>>>   include/linux/sched.h|1 +
>>>   include/linux/topology.h |3 +++
>>>   kernel/sched/core.c  |5 +
>>>   5 files changed, 11 insertions(+)
>>>
>>> diff --git a/arch/ia64/include/asm/topology.h
>>> b/arch/ia64/include/asm/topology.h
>>> index a2496e4..065c720 100644
>>> --- a/arch/ia64/include/asm/topology.h
>>> +++ b/arch/ia64/include/asm/topology.h
>>> @@ -65,6 +65,7 @@ void build_cpu_to_node_map(void);
>>> | SD_BALANCE_EXEC   \
>>> | SD_BALANCE_FORK   \
>>> | SD_WAKE_AFFINE,   \
>>> +   | arch_sd_share_power_line()\
>>> .last_balance   = jiffies,  \
>>> .balance_interval   = 1,\
>>> .nr_balance_failed  = 0,\
>>> diff --git a/arch/tile/include/asm/topology.h
>>> b/arch/tile/include/asm/topology.h
>>> index 7a7ce39..d39ed0b 100644
>>> --- a/arch/tile/include/asm/topology.h
>>> +++ b/arch/tile/include/asm/topology.h
>>> @@ -72,6 +72,7 @@ static inline const struct cpumask *cpumask_of_node(int
>>> node)
>>> | 0*SD_PREFER_LOCAL \
>>> | 0*SD_SHARE_CPUPOWER   \
>>> | 0*SD_SHARE_PKG_RESOURCES  \
>>> +   | arch_sd_share_power_line()\
>>> | 0*SD_SERIALIZE\
>>> ,   \
>>> .last_balance   = jiffies,  \
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 4786b20..74f2daf 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -862,6 +862,7 @@ enum cpu_idle_type {
>>>   #define SD_WAKE_AFFINE0x0020  /* Wake task to waking CPU
>>> */
>>>   #define SD_PREFER_LOCAL   0x0040  /* Prefer to keep tasks
>>> local to this domain */
>>>   #define SD_SHARE_CPUPOWER 0x0080  /* Domain members share cpu power
>>> */
>>> +#define SD_SHARE_POWERLINE 0x0100  /* Domain members share power
>>> domain */
>>
>> If you ignore the current use of SD_SHARE_CPUPOWER, isn't the meaning of
>> CPUPOWER and POWERLINE is same here. Just trying to understand the clear
>> meaning of this new flag. Have you not considered SD_SHARE_CPUPOWER
>> because it is being used for cpu_power and needs at least minimum two
>> domains ? SD_PACKING would have been probably more appropriate based
>> on the way it is being used in further series.
>
> CPUPOWER reflects the share of hw ressources between cores like for
> hyper threading. POWERLINE describes the fact that cores are sharing
> the same power line amore precisely the powergate.

Sorry, the mail has been sent too early while I was writing it

CPUPOWER reflects the share of hw ressources between cores like for
hyper threading. POWERLINE describes the fact that cores are sharing
the same power line and more precisely the same power gating. It looks
like I need to describe more precisely what i would mean with
SHARE_POWERLINE.

I don't want to use PACKING because it's more a behavior than a
feature. If cores can power gate independently (!SD_SHARE_POWERLINE),
packing small tasks is one interesting behavior but it may be not the
only one. I want to make a difference between the HW configuration and
the behavior we want to have above it

Vincent

>>
>> Regards
>> Santosh
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 3/6] sched: pack small tasks

2012-10-29 Thread Vincent Guittot

On 24 October 2012 17:20, Santosh Shilimkar  wrote:
> Vincent,
>
> Few comments/questions.
>
>
> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
>>
>> During sched_domain creation, we define a pack buddy CPU if available.
>>
>> On a system that share the powerline at all level, the buddy is set to -1
>>
>> On a dual clusters / dual cores system which can powergate each core and
>> cluster independantly, the buddy configuration will be :
>>| CPU0 | CPU1 | CPU2 | CPU3 |
>> ---
>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
>
> ^
> Is that a typo ? Should it be CPU2 instead of
> CPU0 ?

No it's not a typo.
The system packs at each scheduling level. It starts to pack in
cluster because each core can power gate independently so CPU1 tries
to pack its tasks in CPU0 and CPU3 in CPU2. Then, it packs at CPU
level so CPU2 tries to pack in the cluster of CPU0 and CPU0 packs in
itself

>
>
>> Small tasks tend to slip out of the periodic load balance.
>> The best place to choose to migrate them is at their wake up.
>>
> I have tried this series since I was looking at some of these packing
> bits. On Mobile workloads like OSIdle with Screen ON, MP3, gallary,
> I did see some additional filtering of threads with this series
> but its not making much difference in power. More on this below.

Can I ask you which configuration you have used ? how many cores and
cluster ?  Can they be power gated independently ?

>
>
>> Signed-off-by: Vincent Guittot 
>> ---
>>   kernel/sched/core.c  |1 +
>>   kernel/sched/fair.c  |  109
>> ++
>>   kernel/sched/sched.h |1 +
>>   3 files changed, 111 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index dab7908..70cadbe 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct
>> root_domain *rd, int cpu)
>> rcu_assign_pointer(rq->sd, sd);
>> destroy_sched_domains(tmp, cpu);
>>
>> +   update_packing_domain(cpu);
>> update_top_cache_domain(cpu);
>>   }
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 4f4a4f6..8c9d3ed 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -157,6 +157,63 @@ void sched_init_granularity(void)
>> update_sysctl();
>>   }
>>
>> +
>> +/*
>> + * Save the id of the optimal CPU that should be used to pack small tasks
>> + * The value -1 is used when no buddy has been found
>> + */
>> +DEFINE_PER_CPU(int, sd_pack_buddy);
>> +
>> +/* Look for the best buddy CPU that can be used to pack small tasks
>> + * We make the assumption that it doesn't wort to pack on CPU that share
>> the
>
> s/wort/worth

yes

>
>> + * same powerline. We looks for the 1st sched_domain without the
>> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the
>> lowest
>> + * power per core based on the assumption that their power efficiency is
>> + * better */
>
> Commenting style..
> /*
>  *
>  */
>

yes

> Can you please expand the why the assumption is right ?
> "it doesn't wort to pack on CPU that share the same powerline"

By "share the same power-line", I mean that the CPUs can't power off
independently. So if some CPUs can't power off independently, it's
worth to try to use most of them to race to idle.

>
> Think about a scenario where you have quad core, ducal cluster system
>
> |Cluster1|  |cluster 2|
> | CPU0 | CPU1 | CPU2 | CPU3 |   | CPU0 | CPU1 | CPU2 | CPU3 |
>
>
> Both clusters run from same voltage rail and have same PLL
> clocking them. But the cluster have their own power domain
> and all CPU's can power gate them-self to low power states.
> Clusters also have their own level2 caches.
>
> In this case, you will still save power if you try to pack
> load on one cluster. No ?

yes, I need to update the description of SD_SHARE_POWERLINE because
I'm afraid I was not clear enough. SD_SHARE_POWERLINE includes the
power gating capacity of each core. For your example above, the
SD_SHARE_POWERLINE shoud be cleared at both MC and CPU level.

>
>
>> +void update_packing_domain(int cpu)
>> +{
>> +   struct sched_domain *sd;
>> +   int id = -1;
>> +
>> +   sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE);
>> +   if (!sd)
>> +   s

Re: [RFC 4/6] sched: secure access to other CPU statistics

2012-10-29 Thread Vincent Guittot

On 24 October 2012 17:21, Santosh Shilimkar  wrote:
> $subject is bit confusing here.
>
>
> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
>>
>> The atomic update of runnable_avg_sum and runnable_avg_period are ensured
>> by their size and the toolchain. But we must ensure to not read an old
>> value
>> for one field and a newly updated value for the other field. As we don't
>> want to lock other CPU while reading these fields, we read twice each
>> fields
>> and check that no change have occured in the middle.
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>   kernel/sched/fair.c |   19 +--
>>   1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 8c9d3ed..6df53b5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3133,13 +3133,28 @@ static int select_idle_sibling(struct task_struct
>> *p, int target)
>>   static inline bool is_buddy_busy(int cpu)
>>   {
>> struct rq *rq = cpu_rq(cpu);
>> +   volatile u32 *psum = &rq->avg.runnable_avg_sum;
>> +   volatile u32 *pperiod = &rq->avg.runnable_avg_period;
>> +   u32 sum, new_sum, period, new_period;
>> +   int timeout = 10;
>
> So it can be 2 times read or more as well.
>
>> +
>> +   while (timeout) {
>> +   sum = *psum;
>> +   period = *pperiod;
>> +   new_sum = *psum;
>> +   new_period = *pperiod;
>> +
>> +   if ((sum == new_sum) && (period == new_period))
>> +   break;
>> +
>> +   timeout--;
>> +   }
>>
> Seems like you did notice incorrect pair getting read
> for rq runnable_avg_sum and runnable_avg_period. Seems
> like the fix is to update them together under some lock
> to avoid such issues.

My goal is to have a lock free mechanism because I don't want to lock
another CPU while reading its statistic

>
> Regards
> Santosh
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 5/6] sched: pack the idle load balance

2012-10-29 Thread Vincent Guittot

On 24 October 2012 17:21, Santosh Shilimkar  wrote:
> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
>>
>> Look for an idle CPU close the pack buddy CPU whenever possible.
>
> s/close/close to

yes

>
>> The goal is to prevent the wake up of a CPU which doesn't share the power
>> line of the pack CPU
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>   kernel/sched/fair.c |   18 ++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 6df53b5..f76acdc 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5158,7 +5158,25 @@ static struct {
>>
>>   static inline int find_new_ilb(int call_cpu)
>>   {
>> +   struct sched_domain *sd;
>> int ilb = cpumask_first(nohz.idle_cpus_mask);
>> +   int buddy = per_cpu(sd_pack_buddy, call_cpu);
>> +
>> +   /*
>> +* If we have a pack buddy CPU, we try to run load balance on a
>> CPU
>> +* that is close to the buddy.
>> +*/
>> +   if (buddy != -1)
>> +   for_each_domain(buddy, sd) {
>> +   if (sd->flags & SD_SHARE_CPUPOWER)
>> +   continue;
>
> Do you mean SD_SHARE_POWERLINE here ?

No, I just don't want to take hyperthread level for ILB

>
>> +
>> +   ilb = cpumask_first_and(sched_domain_span(sd),
>> +   nohz.idle_cpus_mask);
>> +
>> +   if (ilb < nr_cpu_ids)
>> +   break;
>> +   }
>>
>> if (ilb < nr_cpu_ids && idle_cpu(ilb))
>> return ilb;
>>
> Can you please expand "idle CPU _close_ the pack buddy CPU" ?

The goal is to packed  the tasks on the pack buddy CPU so when the
scheduler needs to start ILB, I try to wake up a CPU that is close to
the buddy and preferably in the same cluster

>
> Regards
> santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 6/6] ARM: sched: clear SD_SHARE_POWERLINE

2012-10-29 Thread Vincent Guittot

On 24 October 2012 17:21, Santosh Shilimkar  wrote:
> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote:
>>
>> The ARM platforms take advantage of packing small tasks on few cores.
>> This is true even when the cores of a cluster can't be powergated
>> independently.
>>
>>
>> Signed-off-by: Vincent Guittot 
>> ---
>>   arch/arm/kernel/topology.c |5 +
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
>> index 26c12c6..00511d0 100644
>> --- a/arch/arm/kernel/topology.c
>> +++ b/arch/arm/kernel/topology.c
>> @@ -226,6 +226,11 @@ static inline void update_cpu_power(unsigned int
>> cpuid, unsigned int mpidr) {}
>>*/
>>   struct cputopo_arm cpu_topology[NR_CPUS];
>>
>> +int arch_sd_share_power_line(void)
>> +{
>> +   return 0*SD_SHARE_POWERLINE;
>> +}
>
>
> Making this selection of policy based on sched domain will better. Just
> gives the flexibility to choose a separate scheme for big and little
> systems which will be very convenient.

I agree that it would be more flexible to be able to set it for each level

>
> Regards
> Santosh
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH linux-next] edma: select arch common code to fix link

2013-07-09 Thread Vincent Stehlé

EDMA code has been moved to a common folder with a new CONFIG_TI_PRIV_EDMA
switch. Select it when the edma driver is enabled.

This fixes the following link error:

  drivers/built-in.o: In function `edma_remove':
  of_iommu.c:(.text+0x4ef20): undefined reference to `edma_free_slot'
  drivers/built-in.o: In function `edma_control':
  of_iommu.c:(.text+0x4ef70): undefined reference to `edma_stop'
  drivers/built-in.o: In function `edma_execute':
  of_iommu.c:(.text+0x4f11c): undefined reference to `edma_write_slot'
  of_iommu.c:(.text+0x4f150): undefined reference to `edma_link'
  of_iommu.c:(.text+0x4f168): undefined reference to `edma_start'
  drivers/built-in.o: In function `edma_free_chan_resources':
  of_iommu.c:(.text+0x4f220): undefined reference to `edma_stop'
  of_iommu.c:(.text+0x4f304): undefined reference to `edma_free_slot'
  of_iommu.c:(.text+0x4f328): undefined reference to `edma_free_channel'
  drivers/built-in.o: In function `edma_alloc_chan_resources':
  of_iommu.c:(.text+0x4f37c): undefined reference to `edma_alloc_channel'
  of_iommu.c:(.text+0x4f3d8): undefined reference to `edma_free_channel'
  drivers/built-in.o: In function `edma_prep_slave_sg':
  of_iommu.c:(.text+0x4f67c): undefined reference to `edma_alloc_slot'
  drivers/built-in.o: In function `edma_probe':
  of_iommu.c:(.text+0x4f794): undefined reference to `edma_alloc_slot'
  of_iommu.c:(.text+0x4f8b8): undefined reference to `edma_free_slot'
  drivers/built-in.o: In function `edma_callback':
  of_iommu.c:(.text+0x4fae4): undefined reference to `edma_stop'
  make: *** [vmlinux] Error 1

Signed-off-by: Vincent Stehlé 
Cc: Matt Porter 
Cc: Sekhar Nori 
Cc: Vinod Koul 
Cc: Dan Williams 
Cc: Russell King 
---


Hi,

Build of linux next-20130709 is broken for ARM multi_v7_defconfig. This patch
fixes it.

(Note: the error messages mentioning of_iommu.c are misleading.)

Best regards,

V.


 drivers/dma/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 6825957..8b3fca9 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -198,6 +198,7 @@ config TI_EDMA
depends on ARCH_DAVINCI || ARCH_OMAP
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
+   select TI_PRIV_EDMA
default n
help
  Enable support for the TI EDMA controller. This DMA
-- 
1.7.10.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH linux-next] arm: multi_v7_defconfig: add fsl lpuart serial console

2013-07-09 Thread Vincent Stehlé

Add Freescale LPUART serial console support. This gives us the boot messages on
UART on e.g. the Vybrid VF610 Tower board.

Signed-off-by: Vincent Stehlé 
Cc: Olof Johansson 
Cc: Russell King 
---


Hi,

Would you please consider adding LPUART for ARM multi_v7_defconfig, please?

(This patch is built on top of the following patches:
http://comments.gmane.org/gmane.linux.kernel/1519712)

Best regards,

V.


 arch/arm/configs/multi_v7_defconfig |2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/configs/multi_v7_defconfig 
b/arch/arm/configs/multi_v7_defconfig
index 81eac83..80aacc6 100644
--- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -79,6 +79,8 @@ CONFIG_SERIAL_XILINX_PS_UART=y
 CONFIG_SERIAL_XILINX_PS_UART_CONSOLE=y
 CONFIG_SERIAL_IMX=y
 CONFIG_SERIAL_IMX_CONSOLE=y
+CONFIG_SERIAL_FSL_LPUART=y
+CONFIG_SERIAL_FSL_LPUART_CONSOLE=y
 CONFIG_I2C_DESIGNWARE_PLATFORM=y
 CONFIG_I2C_SIRF=y
 CONFIG_I2C_TEGRA=y
-- 
1.7.10.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH linux-next] ARM: imx: fix imx_init_l2cache storage class

2013-07-10 Thread Vincent Stehlé

Commit 879ec1ceeac21285d62606c1e96520887efcd9bc makes imx_init_l2cache a common
function and updates the header declaration accordingly. Fix function storage
class, too.

This fixes the following compilation error:

  arch/arm/mach-imx/system.c:101:123: error: static declaration of 
‘imx_init_l2cache’ follows non-static declaration
  In file included from arch/arm/mach-imx/system.c:32:0:
  arch/arm/mach-imx/common.h:165:13: note: previous declaration of 
‘imx_init_l2cache’ was here
  arch/arm/mach-imx/system.c:101:123: warning: ‘imx_init_l2cache’ defined but 
not used [-Wunused-function]

Signed-off-by: Vincent Stehlé 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Russell King 
Cc: triv...@kernel.org
---


Hi,

Linux next-20130710 breaks compilation of ARM multi_v7_defconfig. This patch 
fixes it.

(Note: this patch is necessary for the link, too: 
http://www.spinics.net/lists/kernel/msg1563777.html)

Best regards,

V.


 arch/arm/mach-imx/system.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-imx/system.c b/arch/arm/mach-imx/system.c
index e5592ca..64ff37e 100644
--- a/arch/arm/mach-imx/system.c
+++ b/arch/arm/mach-imx/system.c
@@ -98,7 +98,7 @@ void __init mxc_arch_reset_init_dt(void)
 }
 
 #ifdef CONFIG_CACHE_L2X0
-static void __init imx_init_l2cache(void)
+void __init imx_init_l2cache(void)
 {
void __iomem *l2x0_base;
struct device_node *np;
-- 
1.7.10.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 1/9] sched: Introduce power scheduler

2013-07-10 Thread Vincent Guittot

On 10 July 2013 13:11, Morten Rasmussen  wrote:
> On Wed, Jul 10, 2013 at 03:10:15AM +0100, Arjan van de Ven wrote:
>> On 7/9/2013 8:55 AM, Morten Rasmussen wrote:
>> > +   mod_delayed_work_on(schedule_cpu(), system_wq, &dwork,
>> > +   msecs_to_jiffies(INTERVAL));
>>
>> so thinking about this more, this really really should not be a work queue.
>> a work queue will cause a large number of context switches for no reason
>> (on Intel and AMD you can switch P state from interrupt context, and I'm 
>> pretty sure
>> that holds for many ARM as well)
>
> Agree. I should have made it clear this is only a temporary solution. I
> would prefer to tie the power scheduler to the existing scheduler tick
> instead so we don't wake up cpus unnecessarily. nohz may be able handle
> that for us. Also, currently the power scheduler updates all cpus.
> Going forward this would change to per cpu updates and partial updates
> of the global view to improve scalability.

For the packing tasks patches, we are using the periodic load balance
sequence to update the activity like it is done for the cpu_power. I
have planned to update the packing patches to see how it can cooperate
with Morten patches as it has similar needs.

>
>>
>> and in addition, it causes some really nasty cases, especially around real 
>> time tasks.
>> Your workqueue will schedule a kernel thread, which will run
>> BEHIND real time tasks, and such real time task will then never be able to 
>> start running at a higher performance.
>>
>> (and with the delta between lowest and highest performance sometimes being 
>> 10x or more,
>> the real time task will be running SLOW... quite possible longer than 
>> several milliseconds)
>>
>> and all for no good reason; a normal timer running in irq context would be 
>> much better for this kind of thing!
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH linux-next] pinctrl: fix pinconf_dbg_config_write return type

2013-09-16 Thread Vincent Stehlé

Have pinconf_dbg_config_write() return a ssize_t. This fixes the following
compilation warning:

  drivers/pinctrl/pinconf.c:617:2: warning: initialization from incompatible 
pointer type [enabled by default]
  drivers/pinctrl/pinconf.c:617:2: warning: (near initialization for 
‘pinconf_dbg_pinconfig_fops.write’) [enabled by default]

Signed-off-by: Vincent Stehlé 
Cc: Linus Walleij 
---

Hi,

This can be seen with e.g. next-20130916 with x86 allmodconfig.

Best regards,

V.

 drivers/pinctrl/pinconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/pinconf.c b/drivers/pinctrl/pinconf.c
index a138965..1664e78 100644
--- a/drivers/pinctrl/pinconf.c
+++ b/drivers/pinctrl/pinconf.c
@@ -490,7 +490,7 @@ exit:
  *are values that should match the pinctrl-maps
  *  reflects the new config and is driver dependant
  */
-static int pinconf_dbg_config_write(struct file *file,
+static ssize_t pinconf_dbg_config_write(struct file *file,
const char __user *user_buf, size_t count, loff_t *ppos)
 {
struct pinctrl_maps *maps_node;
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 14/10] sched, fair: Fix the group_capacity computation

2013-09-04 Thread Vincent Guittot

On 28 August 2013 13:16, Peter Zijlstra  wrote:
>
> Subject: sched, fair: Fix the group_capacity computation
> From: Peter Zijlstra 
> Date: Wed Aug 28 12:40:38 CEST 2013
>
> Do away with 'phantom' cores due to N*frac(smt_power) >= 1 by limiting
> the capacity to the actual number of cores.
>

Peter,

your patch also solves the 'phantom' big cores that can appear on HMP
system because big cores have a cpu_power >=  SCHED_POWER_SCALE in
order to express a higher capacity than LITTLE cores.

Acked-by Vincent Guittot 

Vincent

> The assumption of 1 < smt_power < 2 is an actual requirement because
> of what SMT is so this should work regardless of the SMT
> implementation.
>
> It can still be defeated by creative use of cpu hotplug, but if you're
> one of those freaks, you get to live with it.
>
> Signed-off-by: Peter Zijlstra 
> ---
>  kernel/sched/fair.c |   20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4554,18 +4554,24 @@ static inline int sg_imbalanced(struct s
>  /*
>   * Compute the group capacity.
>   *
> - * For now the capacity is simply the number of power units in the 
> group_power.
> - * A power unit represents a full core.
> - *
> - * This has an issue where N*frac(smt_power) >= 1, in that case we'll see 
> extra
> - * 'cores' that aren't actually there.
> + * Avoid the issue where N*frac(smt_power) >= 1 creates 'phantom' cores by
> + * first dividing out the smt factor and computing the actual number of cores
> + * and limit power unit capacity with that.
>   */
>  static inline int sg_capacity(struct lb_env *env, struct sched_group *group)
>  {
> +   unsigned int capacity, smt, cpus;
> +   unsigned int power, power_orig;
> +
> +   power = group->sgp->power;
> +   power_orig = group->sgp->power_orig;
> +   cpus = group->group_weight;
>
> -   unsigned int power = group->sgp->power;
> -   unsigned int capacity = DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE);
> +   /* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */
> +   smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig);
> +   capacity = cpus / smt; /* cores */
>
> +   capacity = min_t(capacity, DIV_ROUND_CLOSEST(power, 
> SCHED_POWER_SCALE));
> if (!capacity)
> capacity = fix_small_capacity(env->sd, group);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] gma500: define do_gma_backlight_set only when used

2013-09-26 Thread Vincent Stehlé

Make sure static function do_gma_backlight_set() is only defined when
CONFIG_BACKLIGHT_CLASS_DEVICE is defined, as it is never called otherwise.

This fixes the following warning:

  drivers/gpu/drm/gma500/backlight.c:29:13: warning: ‘do_gma_backlight_set’ 
defined but not used [-Wunused-function]

While at it, remove some end of line spaces.

Signed-off-by: Vincent Stehlé 
Cc: David Airlie 
---

Hi,

This can be seen with mainline or linux-next with e.g. allmodconfig on x86.

Best regards,

V.

 drivers/gpu/drm/gma500/backlight.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/gma500/backlight.c 
b/drivers/gpu/drm/gma500/backlight.c
index 143eba3..399731e 100644
--- a/drivers/gpu/drm/gma500/backlight.c
+++ b/drivers/gpu/drm/gma500/backlight.c
@@ -26,13 +26,13 @@
 #include "intel_bios.h"
 #include "power.h"
 
+#ifdef CONFIG_BACKLIGHT_CLASS_DEVICE
 static void do_gma_backlight_set(struct drm_device *dev)
 {
-#ifdef CONFIG_BACKLIGHT_CLASS_DEVICE
struct drm_psb_private *dev_priv = dev->dev_private;
backlight_update_status(dev_priv->backlight_device);
-#endif 
 }
+#endif
 
 void gma_backlight_enable(struct drm_device *dev)
 {
@@ -43,7 +43,7 @@ void gma_backlight_enable(struct drm_device *dev)
dev_priv->backlight_device->props.brightness = 
dev_priv->backlight_level;
do_gma_backlight_set(dev);
}
-#endif 
+#endif
 }
 
 void gma_backlight_disable(struct drm_device *dev)
@@ -55,7 +55,7 @@ void gma_backlight_disable(struct drm_device *dev)
dev_priv->backlight_device->props.brightness = 0;
do_gma_backlight_set(dev);
}
-#endif 
+#endif
 }
 
 void gma_backlight_set(struct drm_device *dev, int v)
@@ -67,7 +67,7 @@ void gma_backlight_set(struct drm_device *dev, int v)
dev_priv->backlight_device->props.brightness = v;
do_gma_backlight_set(dev);
}
-#endif 
+#endif
 }
 
 int gma_backlight_init(struct drm_device *dev)
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] i2c-designware: define i2c_dw_pci_runtime_idle only with runtime pm

2013-09-26 Thread Vincent Stehlé

Make sure i2c_dw_pci_runtime_idle() is defined only when actually used, when
CONFIG_PM_RUNTIME is defined.

This fixes the following compilation warning:

 drivers/i2c/busses/i2c-designware-pcidrv.c:188:12: warning: 
‘i2c_dw_pci_runtime_idle’ defined but not used [-Wunused-function]

Signed-off-by: Vincent Stehlé 
Cc: Wolfram Sang 
---
 drivers/i2c/busses/i2c-designware-pcidrv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/i2c/busses/i2c-designware-pcidrv.c 
b/drivers/i2c/busses/i2c-designware-pcidrv.c
index f6ed06c..2b5d3a6 100644
--- a/drivers/i2c/busses/i2c-designware-pcidrv.c
+++ b/drivers/i2c/busses/i2c-designware-pcidrv.c
@@ -185,6 +185,7 @@ static int i2c_dw_pci_resume(struct device *dev)
return 0;
 }
 
+#ifdef CONFIG_PM_RUNTIME
 static int i2c_dw_pci_runtime_idle(struct device *dev)
 {
int err = pm_schedule_suspend(dev, 500);
@@ -194,6 +195,7 @@ static int i2c_dw_pci_runtime_idle(struct device *dev)
return 0;
return -EBUSY;
 }
+#endif
 
 static const struct dev_pm_ops i2c_dw_pm_ops = {
.resume = i2c_dw_pci_resume,
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH linux-next] skd: fix some VPRINTK() specifiers for size_t

2013-09-28 Thread Vincent Stehlé

Use %zu for VPRINTK() as size_t specifier in replacement of %u.

This fixes 7 compilation warnings on x86_64 like the following:

  drivers/block/skd_main.c:4628:42: warning: format ‘%u’ expects argument of 
type ‘unsigned int’, but argument 6 has type ‘long unsigned int’ [-Wformat=]

While at it, remove one cast to unsigned long for a size_t VPRINTK() argument
and specify it as %zu, too.

Signed-off-by: Vincent Stehlé 
Cc: Andrew Morton 
---

Hi,

This can be seen on e.g. linux next-20130927.

Best regards,

V.

 drivers/block/skd_main.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index 3110f68..ee7f7a8 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -4556,11 +4556,10 @@ static int skd_cons_skmsg(struct skd_device *skdev)
int rc = 0;
u32 i;
 
-   VPRINTK(skdev, "skmsg_table kzalloc, struct %u, count %u total %lu\n",
+   VPRINTK(skdev, "skmsg_table kzalloc, struct %zu, count %u total %zu\n",
sizeof(struct skd_fitmsg_context),
skdev->num_fitmsg_context,
-   (unsigned long) sizeof(struct skd_fitmsg_context) *
-   skdev->num_fitmsg_context);
+   sizeof(struct skd_fitmsg_context) * skdev->num_fitmsg_context);
 
skdev->skmsg_table = kzalloc(sizeof(struct skd_fitmsg_context)
 *skdev->num_fitmsg_context, GFP_KERNEL);
@@ -4611,7 +4610,7 @@ static int skd_cons_skreq(struct skd_device *skdev)
int rc = 0;
u32 i;
 
-   VPRINTK(skdev, "skreq_table kzalloc, struct %u, count %u total %u\n",
+   VPRINTK(skdev, "skreq_table kzalloc, struct %zu, count %u total %zu\n",
sizeof(struct skd_request_context),
skdev->num_req_context,
sizeof(struct skd_request_context) * skdev->num_req_context);
@@ -4623,7 +4622,7 @@ static int skd_cons_skreq(struct skd_device *skdev)
goto err_out;
}
 
-   VPRINTK(skdev, "alloc sg_table sg_per_req %u scatlist %u total %u\n",
+   VPRINTK(skdev, "alloc sg_table sg_per_req %u scatlist %zu total %zu\n",
skdev->sgs_per_request, sizeof(struct scatterlist),
skdev->sgs_per_request * sizeof(struct scatterlist));
 
@@ -4668,7 +4667,7 @@ static int skd_cons_skspcl(struct skd_device *skdev)
int rc = 0;
u32 i, nbytes;
 
-   VPRINTK(skdev, "skspcl_table kzalloc, struct %u, count %u total %u\n",
+   VPRINTK(skdev, "skspcl_table kzalloc, struct %zu, count %u total %zu\n",
sizeof(struct skd_special_context),
skdev->n_special,
sizeof(struct skd_special_context) * skdev->n_special);
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ipv4:PATCH] Allow userspace to specify primary or secondary ip on interface

2013-09-30 Thread Vincent Li

Yes, I found I can use 'ip route replace' command to change the 'src'
address as workaround. Julian also responded in another thread that He
could come up with a patch to sort ip with scope, primary, secondary
preferences.

https://lkml.org/lkml/2013/9/27/482

Vincent

On Sun, Sep 29, 2013 at 2:59 PM, David Miller  wrote:
> From: Vincent Li 
> Date: Tue, 24 Sep 2013 14:09:48 -0700
>
>> the reason for this patch is that we have a multi blade cluster platform
>> sharing 'floating management ip' and also that each blade has its own
>> management ip on the management interface, so whichever blade in the
>> cluster becomes primary blade, the 'floating mangaement ip' follows it,
>> and we want any of our traffic originated from the primary blade source from
>> the 'floating management ip' for consistency. but in this case, since the 
>> local
>> blade management ip is always the primary ip on the mangaement interface and
>> 'floating management ip' is always secondary, kernel always choose the 
>> primary
>> ip as source ip address. thus we would like to add the flexibility in kernel 
>> to
>> allow us to specify which ip to be primary or secondary.
>
> You have the flexibility already.
>
> You can specify a specific source address ot use in routes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface

2013-09-24 Thread Vincent Li

the current behavior is when an IP is added to an interface, the primary
or secondary attributes is depending on the order of ip added to the interface
the first IP will be primary and second, third,... or alias IP will be secondary
if the IP subnet matches

this patch add the flexiblity to allow user to specify an argument 'primary' or 
'secondary'
(use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for 
example) to specify
an IP address to be  primary or secondary.

the reason for this patch is that we have a multi blade cluster platform 
sharing 'floating management ip'
and also that each blade has its own management ip on the management interface, 
so whichever blade in the
cluster becomes primary blade, the 'floating mangaement ip' follows it, and we 
want any of our traffic
originated from the primary blade source from the 'floating management ip' for 
consistency. but in this
case, since the local blade management ip is always the primary ip on the 
mangaement interface and 'floating
management ip' is always secondary, kernel always choose the primary ip as 
source ip address. thus we would
like to add the flexibility in kernel to allow us to specify which ip to be 
primary or secondary.

Signed-off-by: Vincent Li 
---
 net/ipv4/devinet.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcb..bfc702a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct 
nlmsghdr *nlh,
return 0;
}
 
-   ifa->ifa_flags &= ~IFA_F_SECONDARY;
last_primary = &in_dev->ifa_list;
 
+   if((*last_primary) == NULL)
+   ifa->ifa_flags &= ~IFA_F_SECONDARY;
+
for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
 ifap = &ifa1->ifa_next) {
if (!(ifa1->ifa_flags & IFA_F_SECONDARY) &&
@@ -458,7 +460,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct 
nlmsghdr *nlh,
inet_free_ifa(ifa);
return -EINVAL;
}
-   ifa->ifa_flags |= IFA_F_SECONDARY;
+if (!(ifa->ifa_flags & IFA_F_SECONDARY))
+ifa1->ifa_flags |= IFA_F_SECONDARY;
+else
+ifa->ifa_flags |= IFA_F_SECONDARY;
}
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface

2013-09-24 Thread Vincent Li

Ok, I will resend the patch with your suggestions.

Vincent

On Tue, Sep 24, 2013 at 12:28 PM, David Miller  wrote:
> From: Vincent Li 
> Date: Tue, 24 Sep 2013 11:11:21 -0700
>
>> the current behavior is when an IP is added to an interface, the primary
>> or secondary attributes is depending on the order of ip added to the 
>> interface
>> the first IP will be primary and second, third,... or alias IP will be 
>> secondary
>> if the IP subnet matches
>>
>> this patch add the flexiblity to allow user to specify an argument 'primary' 
>> or 'secondary'
>> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for 
>> example) to specify
>> an IP address to be  primary or secondary.
>>
>> the reason for this patch is that we have a multi blade cluster platform 
>> sharing 'floating management ip'
>> and also that each blade has its own management ip on the management 
>> interface, so whichever blade in the
>> cluster becomes primary blade, the 'floating mangaement ip' follows it, and 
>> we want any of our traffic
>> originated from the primary blade source from the 'floating management ip' 
>> for consistency. but in this
>> case, since the local blade management ip is always the primary ip on the 
>> mangaement interface and 'floating
>> management ip' is always secondary, kernel always choose the primary ip as 
>> source ip address. thus we would
>> like to add the flexibility in kernel to allow us to specify which ip to be 
>> primary or secondary.
>>
>> Signed-off-by: Vincent Li 
>
> When submitting a patch, please:
>
> 1) Specify an appropriate prefix for your subject line, indicating the
>subsystem.  "ipv4: " might be appropriate here.
>
> 2) Format your commit message so that lines do not exceed 80 columns.
>People will read using ASCII text based tools in 80 column
>terminals.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ipv4:PATCH] Allow userspace to specify primary or secondary ip on interface

2013-09-24 Thread Vincent Li

Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to
be primary or secondary ip on an interface.

the current behavior is when an IP is added to an interface, the primary
or secondary attributes is depending on the order of ip added to the interface
the first IP will be primary and second, third...or alias IP will be secondary
if the IP subnet matches.

this patch add the flexiblity to allow user to specify an argument 'primary' 
or 'secondary' (use 'ip addr add ip/mask primary|secondary dev ethX ' from 
iproute2 for example) to specify an IP address to be  primary or secondary.

the reason for this patch is that we have a multi blade cluster platform 
sharing 'floating management ip' and also that each blade has its own 
management ip on the management interface, so whichever blade in the
cluster becomes primary blade, the 'floating mangaement ip' follows it,
and we want any of our traffic originated from the primary blade source from
the 'floating management ip' for consistency. but in this case, since the local
blade management ip is always the primary ip on the mangaement interface and 
'floating management ip' is always secondary, kernel always choose the primary
ip as source ip address. thus we would like to add the flexibility in kernel to
allow us to specify which ip to be primary or secondary.

Signed-off-by: Vincent Li 
---
 net/ipv4/devinet.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a1b5bcb..5a7764e 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -440,8 +440,9 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct 
nlmsghdr *nlh,
return 0;
}
 
-   ifa->ifa_flags &= ~IFA_F_SECONDARY;
last_primary = &in_dev->ifa_list;
+   if(*last_primary == NULL)
+   ifa->ifa_flags &= ~IFA_F_SECONDARY;
 
for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
 ifap = &ifa1->ifa_next) {
@@ -458,7 +459,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct 
nlmsghdr *nlh,
inet_free_ifa(ifa);
return -EINVAL;
}
-   ifa->ifa_flags |= IFA_F_SECONDARY;
+   if (!(ifa->ifa_flags & IFA_F_SECONDARY))
+   ifa1->ifa_flags |= IFA_F_SECONDARY;
+   else
+   ifa->ifa_flags |= IFA_F_SECONDARY;
}
}
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface

2013-09-24 Thread Vincent Li

Thanks Julian for the comments, I imagined it would not be so simple
as it changed old behavior with ip binary and some actions in
__inet_del_ifa() that I am not fully aware of. my intention is to
preserve the old behavior and extend the flexibility, I am unable to
come up with a good patch to achieve the intended behavior.

I had to patch the ip binary to sort of preserve original ip binary
behavior with the kernel patch I provided., the ip command patch
below:

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 1c3e4da..9f2802c 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1259,6 +1259,7 @@ static int ipaddr_modify(int cmd, int flags, int
argc, char **argv)
req.n.nlmsg_flags = NLM_F_REQUEST | flags;
req.n.nlmsg_type = cmd;
req.ifa.ifa_family = preferred_family;
+   req.ifa.ifa_flags |= IFA_F_SECONDARY;

while (argc > 0) {
if (strcmp(*argv, "peer") == 0 ||
@@ -1307,6 +1308,11 @@ static int ipaddr_modify(int cmd, int flags,
int argc, char **argv)
invarg("invalid scope value.", *argv);
req.ifa.ifa_scope = scope;
scoped = 1;
+} else if (strcmp(*argv, "secondary") == 0 ||
+   strcmp(*argv, "temporary") == 0) {
+req.ifa.ifa_flags |= IFA_F_SECONDARY;
+} else if (strcmp(*argv, "primary") == 0) {
+req.ifa.ifa_flags &= ~IFA_F_SECONDARY;
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
d = *argv;

if someone can point me to the right patch directions or coming up
with better patches, it is very much appreciated.


On Tue, Sep 24, 2013 at 2:13 PM, Julian Anastasov  wrote:
>
> Hello,
>
> On Tue, 24 Sep 2013, Vincent Li wrote:
>
>> the current behavior is when an IP is added to an interface, the primary
>> or secondary attributes is depending on the order of ip added to the 
>> interface
>> the first IP will be primary and second, third,... or alias IP will be 
>> secondary
>> if the IP subnet matches
>>
>> this patch add the flexiblity to allow user to specify an argument 'primary' 
>> or 'secondary'
>> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for 
>> example) to specify
>> an IP address to be  primary or secondary.
>>
>> the reason for this patch is that we have a multi blade cluster platform 
>> sharing 'floating management ip'
>> and also that each blade has its own management ip on the management 
>> interface, so whichever blade in the
>> cluster becomes primary blade, the 'floating mangaement ip' follows it, and 
>> we want any of our traffic
>> originated from the primary blade source from the 'floating management ip' 
>> for consistency. but in this
>> case, since the local blade management ip is always the primary ip on the 
>> mangaement interface and 'floating
>> management ip' is always secondary, kernel always choose the primary ip as 
>> source ip address. thus we would
>> like to add the flexibility in kernel to allow us to specify which ip to be 
>> primary or secondary.
>>
>> Signed-off-by: Vincent Li 
>> ---
>>  net/ipv4/devinet.c |9 +++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
>> index a1b5bcb..bfc702a 100644
>> --- a/net/ipv4/devinet.c
>> +++ b/net/ipv4/devinet.c
>> @@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, 
>> struct nlmsghdr *nlh,
>>   return 0;
>>   }
>>
>> - ifa->ifa_flags &= ~IFA_F_SECONDARY;
>>   last_primary = &in_dev->ifa_list;
>>
>> + if((*last_primary) == NULL)
>> + ifa->ifa_flags &= ~IFA_F_SECONDARY;
>> +
>>   for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
>>ifap = &ifa1->ifa_next) {
>>   if (!(ifa1->ifa_flags & IFA_F_SECONDARY) &&
>> @@ -458,7 +460,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, 
>> struct nlmsghdr *nlh,
>>   inet_free_ifa(ifa);
>>   return -EINVAL;
>>   }
>> - ifa->ifa_flags |= IFA_F_SECONDARY;
>
> There is some confusion here, when ifa has
> IFA_F_SECONDARY bit set, in the 'else' we set it again.
> I guess the 'else' part is not needed.
>
>> +

Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface

2013-09-24 Thread Vincent Li

sorry Julian to miss your point after reading the __inet_del_ifa and
see the rtmsg_ifa, fib_del_ifaddr/fib_add_ifaddr, I can try another
patch and actually test if the patches changes works as it is
intended, not just checking from ip binary output.

Vincent

On Tue, Sep 24, 2013 at 2:34 PM, Vincent Li  wrote:
> Thanks Julian for the comments, I imagined it would not be so simple
> as it changed old behavior with ip binary and some actions in
> __inet_del_ifa() that I am not fully aware of. my intention is to
> preserve the old behavior and extend the flexibility, I am unable to
> come up with a good patch to achieve the intended behavior.
>
> I had to patch the ip binary to sort of preserve original ip binary
> behavior with the kernel patch I provided., the ip command patch
> below:
>
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 1c3e4da..9f2802c 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -1259,6 +1259,7 @@ static int ipaddr_modify(int cmd, int flags, int
> argc, char **argv)
> req.n.nlmsg_flags = NLM_F_REQUEST | flags;
> req.n.nlmsg_type = cmd;
> req.ifa.ifa_family = preferred_family;
> +   req.ifa.ifa_flags |= IFA_F_SECONDARY;
>
> while (argc > 0) {
> if (strcmp(*argv, "peer") == 0 ||
> @@ -1307,6 +1308,11 @@ static int ipaddr_modify(int cmd, int flags,
> int argc, char **argv)
> invarg("invalid scope value.", *argv);
> req.ifa.ifa_scope = scope;
> scoped = 1;
> +} else if (strcmp(*argv, "secondary") == 0 ||
> +   strcmp(*argv, "temporary") == 0) {
> +req.ifa.ifa_flags |= IFA_F_SECONDARY;
> +} else if (strcmp(*argv, "primary") == 0) {
> +req.ifa.ifa_flags &= ~IFA_F_SECONDARY;
> } else if (strcmp(*argv, "dev") == 0) {
> NEXT_ARG();
> d = *argv;
>
> if someone can point me to the right patch directions or coming up
> with better patches, it is very much appreciated.
>
>
> On Tue, Sep 24, 2013 at 2:13 PM, Julian Anastasov  wrote:
>>
>> Hello,
>>
>> On Tue, 24 Sep 2013, Vincent Li wrote:
>>
>>> the current behavior is when an IP is added to an interface, the primary
>>> or secondary attributes is depending on the order of ip added to the 
>>> interface
>>> the first IP will be primary and second, third,... or alias IP will be 
>>> secondary
>>> if the IP subnet matches
>>>
>>> this patch add the flexiblity to allow user to specify an argument 
>>> 'primary' or 'secondary'
>>> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for 
>>> example) to specify
>>> an IP address to be  primary or secondary.
>>>
>>> the reason for this patch is that we have a multi blade cluster platform 
>>> sharing 'floating management ip'
>>> and also that each blade has its own management ip on the management 
>>> interface, so whichever blade in the
>>> cluster becomes primary blade, the 'floating mangaement ip' follows it, and 
>>> we want any of our traffic
>>> originated from the primary blade source from the 'floating management ip' 
>>> for consistency. but in this
>>> case, since the local blade management ip is always the primary ip on the 
>>> mangaement interface and 'floating
>>> management ip' is always secondary, kernel always choose the primary ip as 
>>> source ip address. thus we would
>>> like to add the flexibility in kernel to allow us to specify which ip to be 
>>> primary or secondary.
>>>
>>> Signed-off-by: Vincent Li 
>>> ---
>>>  net/ipv4/devinet.c |9 +++--
>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
>>> index a1b5bcb..bfc702a 100644
>>> --- a/net/ipv4/devinet.c
>>> +++ b/net/ipv4/devinet.c
>>> @@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, 
>>> struct nlmsghdr *nlh,
>>>   return 0;
>>>   }
>>>
>>> - ifa->ifa_flags &= ~IFA_F_SECONDARY;
>>>   last_primary = &in_dev->ifa_list;
>>>
>>> + if((*last_primary) == NULL)
>>> + ifa->ifa_flags &= ~IFA_F_SECONDARY;
>>> +
>>>   fo

Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface

2013-09-25 Thread Vincent Li

I think it is good idea to add these preferences flags and sorted
them, but my code knowledge is limited to implement it  as I am still
learning, I can help testing :)

On Wed, Sep 25, 2013 at 12:08 AM, Julian Anastasov  wrote:
>
> Hello,
>
> On Tue, 24 Sep 2013, Vincent Li wrote:
>
>> Thanks Julian for the comments, I imagined it would not be so simple
>> as it changed old behavior with ip binary and some actions in
>> __inet_del_ifa() that I am not fully aware of. my intention is to
>> preserve the old behavior and extend the flexibility, I am unable to
>> come up with a good patch to achieve the intended behavior.
>
> ...
>
>> if someone can point me to the right patch directions or coming up
>> with better patches, it is very much appreciated.
>
> My first idea was to use NLM_F_APPEND to implement
> 'ip addr prepend' and 'ip addr append' but the default
> operation is 'append' without providing NLM_F_APPEND, so it
> does not work.
>
> Another idea is to add new attribute IFA_PREFERENCE in
> include/uapi/linux/if_addr.h just before __IFA_MAX, integer,
> 3 of the values are known. A preference for the used scope.
>
> /* Add as last, default */
> IFA_PREFERENCE_APPEND = 0,
>
> /* Add as last primary, before any present primary in subnet */
> IFA_PREFERENCE_PRIMARY = 128,
>
> /* First for scope */
> IFA_PREFERENCE_FIRST = 255,
>
> We should keep it in ifa as priority, for
> sorting purposes. It can be 4-byte value, if user wants
> to copy user-defined order into preference.
>
> Sorting order should be:
>
> - all primaries sorted by decreasing scope, decreasing
> priority and adding order
>
> - then all secondaries (IFA_F_SECONDARY) sorted by decreasing
> priority and adding order
>
> Usage:
>
> ip addr add ... pref[erence] type_or_priority
>
> # Add floating IP (append at priority 128)
> # The primary mode is not guaranteed if another address from
> # the same subnet is already using the same or higher priority.
> ip addr add ... pref primary
> # More preferred primary
> ip addr add ... pref 129
>
> # Add first IP for scope
> ip addr add ... pref first
>
> The scope has similar 'sorting' property but not
> for IPs in same subnet and it would be difficult to use
> it for global routes.
>
> Thoughts?
>
> Regards
>
> --
> Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC 0/6] sched: packing small tasks

2012-10-07 Thread Vincent Guittot

Hi,

This patch-set takes advantage of the new statistics that are going to be 
available in the kernel thanks to the per-entity load-tracking: 
http://thread.gmane.org/gmane.linux.kernel/1348522. It packs the small tasks in 
as few as possible CPU/Cluster/Core. The main goal of packing small tasks is to 
reduce the power consumption by minimizing the number of power domain that are 
used. The packing is done in 2 steps:

The 1st step looks for the best place to pack tasks on a system according to 
its topology and it defines a pack buddy CPU for each CPU if there is one 
available. The policy for setting a pack buddy CPU is that we pack at all 
levels where the power line is not shared by groups of CPUs. For describing 
this capability, a new flag has been introduced SD_SHARE_POWERLINE that is used 
to describe where CPUs of a scheduling domain are sharing their power rails. 
This flag has been set in all sched_domain in order to keep unchanged the 
default behaviour of the scheduler.

In a 2nd step, the scheduler checks the load level of the task which wakes up 
and the business of the buddy CPU. Then, It can decide to migrate the task on 
the buddy.

The patch-set has been tested on ARM platforms: quad CA-9 SMP and TC2 HMP (dual 
CA-15 and 3xCA-7 cluster). For ARM platform, the results have demonstrated that 
it's worth packing small tasks at all topology levels.

The performance tests have been done on both platforms with sysbench. The 
results don't show any performance regressions. These results are aligned with 
the policy which uses the normal behavior with heavy use cases.

test: sysbench --test=cpu --num-threads=N --max-requests=R run

Results below is the average duration of 3 tests on the quad CA-9.
default is the current scheduler behavior (pack buddy CPU is -1)
pack is the scheduler with the pack mecanism

  | default |  pack   |
---
N=8;  R=200   |  3.1999 |  3.1921 |
N=8;  R=2000  | 31.4939 | 31.4844 |
N=12; R=200   |  3.2043 |  3.2084 |
N=12; R=2000  | 31.4897 | 31.4831 |
N=16; R=200   |  3.1774 |  3.1824 |
N=16; R=2000  | 31.4899 | 31.4897 |
---

The power consumption tests have been done only on TC2 platform which has got 
accessible power lines and I have used cyclictest to simulate small tasks. The 
tests show some power consumption improvements.

test: cyclictest -t 8 -q -e 100 -D 20 & cyclictest -t 8 -q -e 100 -D 20

The measurements have been done during 16 seconds and the result has been 
normalized to 100

  | CA15 | CA7  | total |
-
default   | 100  |  40  | 140   |
pack  |  <1  |  45  | <46   |
-

The A15 cluster is less power efficient than the A7 cluster but if we assume 
that the tasks is well spread on both clusters, we can guest estimate that the 
power consumption on a dual cluster of CA7 would have been for a default kernel:

  | CA7  | CA7  | total |
-
default   |  40  |  40  |  80   |
-----


Vincent Guittot (6):
  Revert "sched: introduce temporary FAIR_GROUP_SCHED dependency for
load-tracking"
  sched: add a new SD SHARE_POWERLINE flag for sched_domain
  sched: pack small task at wakeup
  sched: secure access to other CPU statistics
  sched: pack the idle load balance
  ARM: sched: clear SD_SHARE_POWERLINE

 arch/arm/kernel/topology.c   |5 ++
 arch/ia64/include/asm/topology.h |1 +
 arch/tile/include/asm/topology.h |1 +
 include/linux/sched.h|9 +--
 include/linux/topology.h |3 +
 kernel/sched/core.c  |   13 ++--
 kernel/sched/fair.c  |  155 +++---
 kernel/sched/sched.h |   10 +--
 8 files changed, 165 insertions(+), 32 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BISECTED] snd-hda-intel audio distortion in Linus' current tree

2012-10-07 Thread Vincent Cheng

[Cc: alsa-de...@alsa-project.org; also, please cc: me explicitly as
well, since I'm not subscribed to either list]

On Wed, Sep 26, 2012 at 12:29 AM, Steven Noonan  wrote:
> Started having audio problems when trying out the latest tree
> (v3.6-rc7-10-g56d27ad). When playing any kind of audio, there was
> significant distortion, mostly crackling noise. I'm using a Lenovo
> ThinkPad X230 (Panther Point).
>
> I did a git-bisect to locate the problem, and it seems this commit is to
> blame:
>
> c20c5a841cbe47f5b7812b57bd25397497e5fbc0 is the first bad commit
> commit c20c5a841cbe47f5b7812b57bd25397497e5fbc0
> Author: Seth Heasley 
> Date:   Thu Jun 14 14:23:53 2012 -0700
>
> ALSA: hda_intel: activate COMBO mode for Intel client chipsets
>
> This patch activates the COMBO position_fix for recent Intel 
> client chipsets.
> COMBO mode is the recommended setting for Intel chipsets and 
> eliminates HD
> audio warnings in dmesg.  This patch has been tested on Lynx 
> Point, Panther
> Point, and Cougar Pont.
>
> Signed-off-by: Seth Heasley 
> Signed-off-by: Takashi Iwai 
>
> It's pretty clear-cut. If I revert this patch, my sound starts
> functioning normally again.
>
> Any thoughts on how to proceed here? Can someone revert this, or is
> there some testing that I can do?
>
> Here's a pretty-printed bisection log, if needed:
>
>  # good: [28a33cbc] Linux 3.5
>  # bad:  [b13bc8dd] Merge tag 'staging-3.6-rc1' of git://git.kernel.or
>  # good: [3c4cfade] Merge git://git.kernel.org/pub/scm/linux/kernel/gi
>  # bad:  [9fc37779] Merge tag 'usb-3.6-rc1' of git://git.kernel.org/pu
>  # bad:  [f14121ab] Merge tag 'dt-for-3.6' of git://sources.calxeda.co
>  # good: [d14b7a41] Merge branch 'for-linus' of git://git.kernel.org/p
>  # good: [15d47763] Merge branch 'for-3.5' into for-3.6
>  # bad:  [dbf7b591] Merge tag 'sound-3.6' of git://git.kernel.org/pub/
>  # bad:  [1c76684d] ALSA: hda - add Haswell HDMI codec id
>  # bad:  [8b8d654b] ALSA: hda - Move one-time init codes from generic_
>  # good: [80c8bfbe] ALSA: HDA: Create phantom jacks for fixed inputs a
>  # bad:  [ceaa86ba] ALSA: hda - Remove invalid init verbs for Nvidia 2
>  # bad:  [4b6ace9e] ALSA: hda - Add the support for VIA HDMI pin detec
>  # bad:  [c20c5a84] ALSA: hda_intel: activate COMBO mode for Intel cli
>
>
> - Steven
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>

I can confirm that I've also hit this bug as well, and that it's still
present in stable 3.6.0. Strangely enough however, this only seems to
affect VLC for me; while playing audio through mplayer or any
gstreamer-based players (Rhythmbox, Totem, etc.), I don't encounter
any audio distortion. Possibly also related to [1]?

A workaround (other than reverting this commit) is to not use COMBO
mode, i.e. load snd-hda-intel with position_fix=2.

Please let me know if any more information is needed.

$ lspci -vvnn | grep -A8 Audio
00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C210 Series
Chipset Family High Definition Audio Controller [8086:1e20] (rev 04)
Subsystem: Toshiba America Info Systems Device [1179:fb30]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- 
Kernel driver in use: snd_hda_intel

Machine: Toshiba Satellite P850
Distro: Debian wheezy/sid
ALSA 1.0.25; PulseAudio 2.0

Regards,
Vincent

[1] 
http://mailman.alsa-project.org/pipermail/alsa-devel/2012-September/055161.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mm: show migration types in show_mem

2012-10-08 Thread Rabin Vincent

This is useful to diagnose the reason for page allocation failure for
cases where there appear to be several free pages.

Example, with this alloc_pages(GFP_ATOMIC) failure:

 swapper/0: page allocation failure: order:0, mode:0x0
 ...
 Mem-info:
 Normal per-cpu:
 CPU0: hi:   90, btch:  15 usd:  48
 CPU1: hi:   90, btch:  15 usd:  21
 active_anon:0 inactive_anon:0 isolated_anon:0
  active_file:0 inactive_file:84 isolated_file:0
  unevictable:0 dirty:0 writeback:0 unstable:0
  free:4026 slab_reclaimable:75 slab_unreclaimable:484
  mapped:0 shmem:0 pagetables:0 bounce:0
 Normal free:16104kB min:2296kB low:2868kB high:3444kB active_anon:0kB
 inactive_anon:0kB active_file:0kB inactive_file:336kB unevictable:0kB
 isolated(anon):0kB isolated(file):0kB present:331776kB mlocked:0kB
 dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:300kB
 slab_unreclaimable:1936kB kernel_stack:328kB pagetables:0kB unstable:0kB
 bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
 lowmem_reserve[]: 0 0

Before the patch, it's hard (for me, at least) to say why all these free
chunks weren't considered for allocation:

 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB
 1*1024kB 1*2048kB 3*4096kB = 16128kB

After the patch, it's obvious that the reason is that all of these are
in the MIGRATE_CMA (C) freelist:

 Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB
 (C) 1*1024kB (C) 1*2048kB (C) 3*4096kB (C) = 16128kB

Signed-off-by: Rabin Vincent 
---
 mm/page_alloc.c | 42 --
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c13ea75..cbe5373 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2818,6 +2818,31 @@ out:
 
 #define K(x) ((x) << (PAGE_SHIFT-10))
 
+static void show_migration_types(unsigned char type)
+{
+   static const char types[MIGRATE_TYPES] = {
+   [MIGRATE_UNMOVABLE] = 'U',
+   [MIGRATE_RECLAIMABLE]   = 'E',
+   [MIGRATE_MOVABLE]   = 'M',
+   [MIGRATE_RESERVE]   = 'R',
+#ifdef CONFIG_CMA
+   [MIGRATE_CMA]   = 'C',
+#endif
+   [MIGRATE_ISOLATE]   = 'I',
+   };
+   char tmp[MIGRATE_TYPES + 1];
+   char *p = tmp;
+   int i;
+
+   for (i = 0; i < MIGRATE_TYPES; i++) {
+   if (type & (1 << i))
+   *p++ = types[i];
+   }
+
+   *p = '\0';
+   printk("(%s) ", tmp);
+}
+
 /*
  * Show free area list (used inside shift_scroll-lock stuff)
  * We also calculate the percentage fragmentation. We do this by counting the
@@ -2942,6 +2967,7 @@ void show_free_areas(unsigned int filter)
 
for_each_populated_zone(zone) {
unsigned long nr[MAX_ORDER], flags, order, total = 0;
+   unsigned char types[MAX_ORDER];
 
if (skip_free_areas_node(filter, zone_to_nid(zone)))
continue;
@@ -2950,12 +2976,24 @@ void show_free_areas(unsigned int filter)
 
spin_lock_irqsave(&zone->lock, flags);
for (order = 0; order < MAX_ORDER; order++) {
-   nr[order] = zone->free_area[order].nr_free;
+   struct free_area *area = &zone->free_area[order];
+   int type;
+
+   nr[order] = area->nr_free;
total += nr[order] << order;
+
+   types[order] = 0;
+   for (type = 0; type < MIGRATE_TYPES; type++) {
+   if (!list_empty(&area->free_list[type]))
+   types[order] |= 1 << type;
+   }
}
spin_unlock_irqrestore(&zone->lock, flags);
-   for (order = 0; order < MAX_ORDER; order++)
+   for (order = 0; order < MAX_ORDER; order++) {
printk("%lu*%lukB ", nr[order], K(1UL) << order);
+   if (nr[order])
+   show_migration_types(types[order]);
+   }
printk("= %lukB\n", K(total));
}
 
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

CMA and zone watermarks

2012-10-08 Thread Rabin Vincent

It appears that when CMA is enabled, the zone watermarks are not properly
respected, leading to for example GFP_NOWAIT allocations getting access to the
high pools.

I ran the following test code which simply allocates pages with GFP_NOWAIT
until it fails, and then tries GFP_ATOMIC.  Without CMA, the GFP_ATOMIC
allocation succeeds, with CMA, it fails too.

Logs attached (includes my patch which prints the migration type in the failure
message http://marc.info/?l=linux-mm&m=134971041701306&w=2), taken on 3.6
kernel.

Thanks.

diff --git a/arch/arm/mach-ux500/board-mop500.c
b/arch/arm/mach-ux500/board-mop500.c
index a534d88..b98d0df 100644
--- a/arch/arm/mach-ux500/board-mop500.c
+++ b/arch/arm/mach-ux500/board-mop500.c
@@ -854,3 +854,25 @@ DT_MACHINE_START(U8500_DT, "ST-Ericsson U8500
platform (Device Tree Support)")
.dt_compat  = u8500_dt_board_compat,
 MACHINE_END
 #endif
+
+static int __init late(void)
+{
+   while (1) {
+   void *p;
+
+   p = alloc_page(GFP_NOWAIT);
+   if (!p) {
+   pr_err("GFP_NOWAIT failed, checking GFP_ATOMIC");
+
+   p = alloc_page(GFP_ATOMIC);
+   if (!p)
+   panic("GFP_ATOMIC failed too, fail!");
+
+   panic("GFP_ATOMIC OK, all good\n");
+   }
+
+   }
+
+   return 0;
+}
+late_initcall(late);


cmalog.txt.gz
Description: GNU Zip compressed data

Re: CMA and zone watermarks

2012-10-09 Thread Rabin Vincent

Hi Marek, Minchan,

2012/10/9 Marek Szyprowski :
> Could You run your test with latest linux-next kernel? There have been some
> patches merged to akpm tree which should fix accounting for free and free
> cma pages. I hope it should fix this issue.

I've tested with the mentioned patches (which seem to have also reached
Linus' tree today) and they appear to resolve the problem.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] drm/omap: fix allocation size for page addresses array

2012-10-09 Thread Vincent Penquerc'h

Signed-off-by: Rob Clark 
Signed-off-by: Vincent Penquerc'h 
---
 drivers/staging/omapdrm/omap_gem.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/omapdrm/omap_gem.c 
b/drivers/staging/omapdrm/omap_gem.c
index c828743..4c1472c 100644
--- a/drivers/staging/omapdrm/omap_gem.c
+++ b/drivers/staging/omapdrm/omap_gem.c
@@ -246,7 +246,7 @@ static int omap_gem_attach_pages(struct drm_gem_object *obj)
 * DSS, GPU, etc. are not cache coherent:
 */
if (omap_obj->flags & (OMAP_BO_WC|OMAP_BO_UNCACHED)) {
-   addrs = kmalloc(npages * sizeof(addrs), GFP_KERNEL);
+   addrs = kmalloc(npages * sizeof(*addrs), GFP_KERNEL);
if (!addrs) {
ret = -ENOMEM;
goto free_pages;
@@ -257,7 +257,7 @@ static int omap_gem_attach_pages(struct drm_gem_object *obj)
0, PAGE_SIZE, DMA_BIDIRECTIONAL);
}
} else {
-   addrs = kzalloc(npages * sizeof(addrs), GFP_KERNEL);
+   addrs = kzalloc(npages * sizeof(*addrs), GFP_KERNEL);
if (!addrs) {
ret = -ENOMEM;
goto free_pages;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86, fpu: avoid FPU lazy restore after suspend

2012-11-30 Thread Vincent Palatin

When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.

We can just invalidate the "fpu_owner_task", so nobody will try to
lazy restore a state which no longer exists in the hardware.

Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.

The issue seems to exist since 3.4 (after the FPU lazy restore was actually 
implemented),
to apply the change to 3.4, "this_cpu_write" needs to be replaced by 
percpu_write.

Cc: Duncan Laurie 
Cc: Olof Johansson 
Cc:  [v3.4+] # for 3.4 need to replace this_cpu_write by 
percpu_write
Signed-off-by: Vincent Palatin 
---
 arch/x86/kernel/smpboot.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..7610c58 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -1230,6 +1232,9 @@ int native_cpu_disable(void)
clear_local_APIC();
 
cpu_disable_common();
+
+   /* the FPU context will be lost, nobody owns it */
+   this_cpu_write(fpu_owner_task, NULL);
return 0;
 }
 
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

issue with x86 FPU state after suspend to ram

2012-11-30 Thread Vincent Palatin

Hi,

On a 4-core Ivybridge platform, when doing a lot of suspend-to-ram/resume
cycles, we were observing processes randomly killed by a SIGFPE.
When dumping the FPU registers state on the SIGFPE (usually a floating stack
underflow/overflow on a floating point arithmetic operation), the FPU registers
looks empty or at least corrupted which was more or less impossible with
respect to the disassembled floating point code.

After doing more tracing, in the faulty case, the process seems to be keeping
FPU ownership over a secondary CPU unplug/re-plug triggered by the suspend.
Then it's doing a lazy restore of its FPU context (ie just using the current
FPU hardware registers as he is the owner) instead of writing them back
to the hardware from the version previously saved in the task context,
despite the fact the whole FPU hardware state has been lost.

Just invalidating the "fpu_owner_task" when disabling a secondary CPU seems
to solve my issue (it's already reset for the primary CPU).

By the way, when FPU the lazy restore patch was discussed back in february,
Ingo commented (in http://permalink.gmane.org/gmane.linux.kernel/1255423) :
"
I guess the CPU hotplug case deserves a comment in the code: CPU
hotplug + replug of the same (but meanwhile reset) CPU is safe
because fpu_owner_task[cpu] gets reset to NULL.
"
That contradicts my previous observation, so maybe I have totally overlooked
something in this mechanism.
Can you comment ?

I'm still putting my patch proposal in this thread.
The issue seems to exist since 3.4 after the FPU lazy restore was actually
implemented by commit 7e16838d "i387: support lazy restore of FPU state".
But the issue is mainly visible on 3.4 and 3.6 since on tip of tree, it is
hidden by the eager fpu implementation for platforms with xsave support,
but it still happens with eagerfpu=off.

To apply this change to 3.4, "this_cpu_write" needs to be replaced by
percpu_write.

--
Vincent

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] x86, fpu: avoid FPU lazy restore after suspend

2012-11-30 Thread Vincent Palatin

When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.

Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
so nobody will try to lazy restore a state which doesn't exist in the hardware.

Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.

The issue seems to exist since 3.4 (after the FPU lazy restore was actually 
implemented),
to apply the change to 3.4, "this_cpu_write" needs to be replaced by 
percpu_write.

Cc: Duncan Laurie 
Cc: Olof Johansson 
Cc:  [v3.4+] # for 3.4 need to replace this_cpu_write by 
percpu_write
Signed-off-by: Vincent Palatin 
---
 arch/x86/include/asm/fpu-internal.h |   15 +--
 arch/x86/kernel/smpboot.c   |5 +
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index 831dbb9..41ab26e 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -399,14 +399,17 @@ static inline void drop_init_fpu(struct task_struct *tsk)
 typedef struct { int preload; } fpu_switch_t;
 
 /*
- * FIXME! We could do a totally lazy restore, but we need to
- * add a per-cpu "this was the task that last touched the FPU
- * on this CPU" variable, and the task needs to have a "I last
- * touched the FPU on this CPU" and check them.
+ * Must be run with preemption disabled: this clears the fpu_owner_task,
+ * on this CPU.
  *
- * We don't do that yet, so "fpu_lazy_restore()" always returns
- * false, but some day..
+ * This will disable any lazy FPU state restore of the current FPU state,
+ * but if the current thread owns the FPU, it will still be saved by.
  */
+static inline void __cpu_disable_lazy_restore(unsigned int cpu)
+{
+   per_cpu(fpu_owner_task, cpu) = NULL;
+}
+
 static inline int fpu_lazy_restore(struct task_struct *new, unsigned int cpu)
 {
return new == this_cpu_read_stable(fpu_owner_task) &&
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..f3e2ec8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -818,6 +820,9 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct 
task_struct *tidle)
 
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
+   /* the FPU context is blank, nobody can own it */
+   __cpu_disable_lazy_restore(cpu);
+
err = do_boot_cpu(apicid, cpu, tidle);
if (err) {
pr_debug("do_boot_cpu failed %d\n", err);
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] x86, fpu: avoid FPU lazy restore after suspend

2012-11-30 Thread Vincent Palatin

When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.

Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
so nobody will try to lazy restore a state which doesn't exist in the hardware.

Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.

Cc: Duncan Laurie 
Cc: Olof Johansson 
Cc:  [v3.4+] # for 3.4 need to replace this_cpu_write by 
percpu_write
Signed-off-by: Vincent Palatin 
---
Hi,

The patch updated according the HPA and Linus comments.
I'm still re-running the testing on v3.

Change in v3:
- remove misleading comment about 3.4 in the description.

Change in v2:
- add an helper function and comment in fpu-internal.h as described by Linus
- do the cleaning in the native_cpu_up function as suggested by HPA

Vincent

 arch/x86/include/asm/fpu-internal.h |   15 +--
 arch/x86/kernel/smpboot.c   |5 +
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index 831dbb9..41ab26e 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -399,14 +399,17 @@ static inline void drop_init_fpu(struct task_struct *tsk)
 typedef struct { int preload; } fpu_switch_t;
 
 /*
- * FIXME! We could do a totally lazy restore, but we need to
- * add a per-cpu "this was the task that last touched the FPU
- * on this CPU" variable, and the task needs to have a "I last
- * touched the FPU on this CPU" and check them.
+ * Must be run with preemption disabled: this clears the fpu_owner_task,
+ * on this CPU.
  *
- * We don't do that yet, so "fpu_lazy_restore()" always returns
- * false, but some day..
+ * This will disable any lazy FPU state restore of the current FPU state,
+ * but if the current thread owns the FPU, it will still be saved by.
  */
+static inline void __cpu_disable_lazy_restore(unsigned int cpu)
+{
+   per_cpu(fpu_owner_task, cpu) = NULL;
+}
+
 static inline int fpu_lazy_restore(struct task_struct *new, unsigned int cpu)
 {
return new == this_cpu_read_stable(fpu_owner_task) &&
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c80a33b..f3e2ec8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -68,6 +68,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -818,6 +820,9 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct 
task_struct *tidle)
 
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 
+   /* the FPU context is blank, nobody can own it */
+   __cpu_disable_lazy_restore(cpu);
+
err = do_boot_cpu(apicid, cpu, tidle);
if (err) {
pr_debug("do_boot_cpu failed %d\n", err);
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, fpu: avoid FPU lazy restore after suspend

2012-11-30 Thread Vincent Palatin

On Fri, Nov 30, 2012 at 11:55 AM, H. Peter Anvin  wrote:
>
> On 11/30/2012 11:54 AM, Vincent Palatin wrote:
> >>
> > I have done a patch v2 according to your suggestions.
> > I will run the testing on it now.
> > I probably need at least 2 to 3 hours to validate it.
> >
>
> That would be super.  Let me know and I'll queue it up and send a pull
> request with this and a few more urgent things to Linus.


I have done 1000+ cycles so far with patch v3 (on 4-core Ivybridge and
no eagerfpu),
and did not hit my issue.
I let the testing going on,
but wrt the issue after suspend, this fixes it with very high probability
(ie I have never done that many cycles without hitting the issue).

-- 
Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle

2012-12-03 Thread Vincent Guittot

With the coupled cpuidle driver (but probably also with other drivers),
a CPU loops in a temporary safe state while waiting for other CPUs of its
cluster to be ready to enter the coupled C-state. If an IRQ or a softirq
occurs, the CPU will stay in this internal loop if there is no need
to resched. The SCHED softirq clears the NOHZ and increases
nr_busy_cpus. If there is no need to resched, we will not call
set_cpu_sd_state_idle because of this internal loop in a cpuidle state.
We have to call set_cpu_sd_state_idle in tick_nohz_irq_exit which is used
to handle such situation.

Signed-off-by: Vincent Guittot 
---
 kernel/time/tick-sched.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 955d35b..b8d74ea 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -570,6 +570,8 @@ void tick_nohz_irq_exit(void)
if (!ts->inidle)
return;
 
+   set_cpu_sd_state_idle();
+
/* Cancel the timer because CPU already waken up from the C-states*/
menu_hrtimer_cancel();
__tick_nohz_idle_enter(ts);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH Resend 3/3] sched: fix update NOHZ_IDLE flag

2012-12-03 Thread Vincent Guittot

The function nohz_kick_needed modifies NOHZ_IDLE flag that is used to update
the nr_busy_cpus of the sched_group.
When the sched_domain are updated (because of the unplug of a CPUs as an
example) a null_domain is attached to CPUs. We have to test
likely(!on_null_domain(cpu) first in order to detect such intialization step
and to not modify the NOHZ_IDLE flag

Signed-off-by: Vincent Guittot 
---
 kernel/sched/fair.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 24a5588..1ef57a8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6311,7 +6311,7 @@ void trigger_load_balance(struct rq *rq, int cpu)
likely(!on_null_domain(cpu)))
raise_softirq(SCHED_SOFTIRQ);
 #ifdef CONFIG_NO_HZ
-   if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu)))
+   if (likely(!on_null_domain(cpu)) && nohz_kick_needed(rq, cpu))
nohz_balancer_kick(cpu);
 #endif
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH Resend 2/3] sched: fix init NOHZ_IDLE flag

2012-12-03 Thread Vincent Guittot

On my smp platform which is made of 5 cores in 2 clusters,I have the
nr_busy_cpus field of sched_group_power struct that is not null when the
platform is fully idle. The root cause seems to be:
During the boot sequence, some CPUs reach the idle loop and set their
NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus
field is initialized later with the assumption that all CPUs are in the busy
state whereas some CPUs have already set their NOHZ_IDLE flag.
We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to
have a coherent configuration.

Signed-off-by: Vincent Guittot 
---
 kernel/sched/core.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bae620a..77a01c8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5875,6 +5875,7 @@ static void init_sched_groups_power(int cpu, struct 
sched_domain *sd)
 
update_group_power(sd, cpu);
atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
+   clear_bit(NOHZ_IDLE, nohz_flags(cpu));
 }
 
 int __weak arch_sd_sibling_asym_packing(void)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3630 matches

Mail list logo