[PATCH] gitignore: ignore hz.bc
Signed-off-by: Vincent Stehlé --- kernel/.gitignore |1 + 1 file changed, 1 insertion(+) diff --git a/kernel/.gitignore b/kernel/.gitignore index ab4f109..b3097bd 100644 --- a/kernel/.gitignore +++ b/kernel/.gitignore @@ -4,3 +4,4 @@ config_data.h config_data.gz timeconst.h +hz.bc -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
mount /mnt/cdrom ok!but ls segmentation fault...
Hi all, Using linux-2.4.0-test11-pre7 right now..., here's what i did, mount /mnt/cdrom cd /mnt/cdrom ls Segmentation fault ls *NOT Responding* can't kill /sbin/ls can't umount /mnt/cdrom ps , shows ; 613 ?D 0:00 /bin/ls --color=auto -F -b -T 0 ^ i didn't want to reboot... CDRom door is locked.. BTW, what does D mean in ps? thanks in advance, - Regards, Vincent <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: mount /mnt/cdrom ok!but ls segmentation fault...
"Albert D. Cahalan" wrote: > > The 'D' means that the process is running uninterruptable kernel > code that should never take long to execute. Usually it means > the process is doing disk IO. > > To find where process 613 is stuck, do this: > > ps -p 613 -o comm,stat,f,pcpu,nwchan,wchan 361 pts/1D 0:00 /bin/ls --color=auto -F -b -T 0 t77@darkstar:~$ ps -p 361 -o comm,stat,f,pcpu,nwchan,wchan COMMAND STAT F %CPU WCHAN WCHAN ls D000 0.0 107951 down ^ no idea... :p since i am a newbie, is there anyway of killing such a process? root@darkstar:~# umount /mnt/cdrom1 umount: /mnt/cdrom1: device is busy root@darkstar:~# umount -f /mnt/cdrom1 umount2: Device or resource busy umount: /mnt/cdrom1: device is busy After playing around with ls ,i found that acutally executing /bin/ls is ok, only because of the default alias of ls is alias ls='/bin/ls $LS_OPTIONS' then ls will crash...and thus make the cdrom useless. When ls /mnt/cdrom , from a virtual terminal there are extended kernel error messages which i don't know howto copy the error message into memory or save it into a file. Where if i 'ls /mnt/cdrom' from a gnome-terminal the error message is just Segmentation fault. from /var/log/syslog after "ls /mnt/cdrom" Nov 19 19:46:47 darkstar kernel: Unable to handle kernel paging request at virtual address dfdfdfc4 Nov 19 19:46:47 darkstar kernel: *pde = Unable to handle kernel paging request at virtual address dfdfdfc4 *pde = Oops: CPU: 0 EIP: 0010:[] EFLAGS: 00010202 rest went off the screen i've tried "ls >~/tmp/err.out" , it didn't work just a 0byte file. hmmm, ok here it's in dmesg|less Unable to handle kernel paging request at virtual address dfdfdfc4 printing eip: c486d5a7 *pde = Oops: CPU:0 EIP:0010:[] EFLAGS: 00010202 eax: dfdfdf00 ebx: c2976960 ecx: c1ddb800 edx: c23f5c00 esi: c1ddb800 edi: c1ddb821 ebp: c233fba0 esp: c15b9eb0 ds: 0018 es: 0018 ss: 0018 Process ls (pid: 229, stackpage=c15b9000) Stack: c2976960 c486a2bf c1ddb800 c2976960 c27f8000 c10a9df0 c1b3d140 c2976960 c1b3d140 0001 c01e1818 0022 0022 0b976960 0800 22994000 c486a3dd c2976960 c1b3d140 c27f8000 c27f8400 fff4 c1b3d140 Call Trace: [] [] [] [] [] [] [] [] Code: 8b 90 c4 00 00 00 80 b8 b4 00 00 00 00 74 1e 68 00 10 00 00 lines 76-116/116 (END) thank you for reply, - Regards, Vincent <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
PROBLEM: isofs crash on 2.4.0-test11-pre7 [1.] MAINTAINERS: ISO FILESYSTEM
[2.] Full description of the problem/report: using gnome-terminal, the default alias ls='/bin/ls $LS_OPTIONS' #mount /mnt/cdrom #cd /mnt/cdrom #ls Segmentation fault #ls root@darkstar:~# umount /mnt/cdrom umount: /mnt/cdrom: device is busy root@darkstar:~# umount -f /mnt/cdrom umount2: Device or resource busy umount: /mnt/cdrom: device is busy #ps ax ... 361 ?D 0:00 /bin/ls --color=auto -F -b -T 0 ... #kill -9 361 #ps ax ... 361 ?D 0:00 /bin/ls --color=auto -F -b -T 0 ... CDROM is now unusable... [3.] Keywords (i.e., modules, networking, kernel): Module: isofs Networking: ppp dialup Kernel: 2.4.0-test11-pre7 [4.] Kernel version (from /proc/version): t77@darkstar:~$ cat /proc/version Linux version 2.4.0-test11 (t77@darkstar) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Sat Nov 18 16:23:40 EST 2000 [5.] Output of Oops.. message ksymoops 2.3.5 on i686 2.4.0-test11. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0-test11/ (default) -m /boot/System.map (specified) Unable to handle kernel paging request at virtual address dfdfdfc4 c486d5a7 *pde = Oops: CPU:0 EIP:0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: dfdfdf00 ebx: c2976960 ecx: c1ddb800 edx: c23f5c00 esi: c1ddb800 edi: c1ddb821 ebp: c233fba0 esp: c15b9eb0 ds: 0018 es: 0018 ss: 0018 Process ls (pid: 229, stackpage=c15b9000) Stack: c2976960 c486a2bf c1ddb800 c2976960 c27f8000 c10a9df0 c1b3d140 c2976960 c1b3d140 0001 c01e1818 0022 0022 0b976960 0800 22994000 c486a3dd c2976960 c1b3d140 c27f8000 c27f8400 fff4 c1b3d140 Call Trace: [] [] [] [] [] [] [] [] Code: 8b 90 c4 00 00 00 80 b8 b4 00 00 00 00 74 1e 68 00 10 00 00 >>EIP; c486d5a7 <[isofs]get_joliet_filename+13/87> <= Trace; c486a2bf <[isofs]__module_using_checksums+bd/19e> Trace; c486a3dd <[isofs]isofs_lookup+3d/88> Trace; c013502b Trace; c0135788 Trace; c0134dc7 Trace; c0135d90 <__user_walk+3c/58> Trace; c0132a26 Trace; c0108daf Code; c486d5a7 <[isofs]get_joliet_filename+13/87> <_EIP>: Code; c486d5a7 <[isofs]get_joliet_filename+13/87> <= 0: 8b 90 c4 00 00 00 movl 0xc4(%eax),%edx <= Code; c486d5ad <[isofs]get_joliet_filename+19/87> 6: 80 b8 b4 00 00 00 00 cmpb $0x0,0xb4(%eax) Code; c486d5b4 <[isofs]get_joliet_filename+20/87> d: 74 1e je 2d <_EIP+0x2d> c486d5d4 <[isofs]get_joliet_filename+40/87> Code; c486d5b6 <[isofs]get_joliet_filename+22/87> f: 68 00 10 00 00pushl $0x1000 [6.] A small shell script or example program which triggers the problem (if possible) none... [7.] Environment [7.1.] Software (add the output of the ver_linux script here) t77@darkstar:~$ ver_linux -- Versions installed: (if some fields are empty or looks -- unusual then possibly you have very old versions) Linux darkstar 2.4.0-test11 #1 Sat Nov 18 16:23:40 EST 2000 i686 unknown Kernel modules found Gnu C egcs-2.91.66 Binutils 2.9.1.0.25 Linux C Library.. Dynamic Linker (ld.so) 1.9.9 ls: /usr/lib/libg++.so: No such file or directory Procps 2.0.6 Mount 2.10l Net-tools (2000-05-21) Kbd0.99 Sh-utils 2.0 Sh-utils gJC Sh-utils Sh-utils Inc. Sh-utils NO Sh-utils PURPOSE. [7.2.] Processor information (from /proc/cpuinfo): t77@darkstar:~$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 3 model name : Pentium II (Klamath) stepping: 4 cpu MHz : 233.000866 cache size : 512 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes features: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov mmx bogomips: 466.94 [7.3.] Module information (from /proc/modules): t77@darkstar:~$ cat /proc/modules nls_cp950 98432 1 (autoclean) sr_mod 12000 1 (autoclean) cdrom 27360 0 (autoclean) [sr_mod] isofs 18384 1 (autoclean) ppp_deflate40672 1 (autoclean) bsd_comp4160 0 (autoclean) ipchains 31392 0 (unused) ide-scsi7984 1 scsi_mod 56640 2 [sr_mod ide-scsi] emu10k145184 0 soundcore 3888 4 [emu10k1] ppp_async 6512 1 ppp_generic13056 2 [ppp_deflate bsd_comp ppp_async] slhc4688 1 [ppp_generic] [7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) t77@darkstar:~$ cat /proc/ioports -001f : dma1 0020-003f : pic1 0040-005f :
Re: [PATCH] Typo in test11-pre7 isofs/namei.c
Tom Leete wrote: > > Hi, > > The second and third arguments of get_joliet_filename() are swapped. > > Tom > > --- linux-2.4.0-test11/fs/isofs/namei.c.origSat Nov 18 01:55:55 2000 > +++ linux-2.4.0-test11/fs/isofs/namei.c Sat Nov 18 07:08:05 2000 > @@ -127,7 +127,7 @@ > dpnt = tmpname; > #ifdef CONFIG_JOLIET > } else if (dir->i_sb->u.isofs_sb.s_joliet_level) { > - dlen = get_joliet_filename(de, dir, tmpname); > + dlen = get_joliet_filename(de, tmpname, dir); > dpnt = tmpname; > #endif > } else if (dir->i_sb->u.isofs_sb.s_mapping == 'a') { > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] topology: Fix compilation warning when not in SMP
On 04/05/2014 01:49 AM, Greg Kroah-Hartman wrote: Warnings aren't a stable kernel issue, so why would this be relevant there? Oh, sorry about that. I'll go re-read the stable kernel rules again. Shall I re-post without the stable Cc:, for only mainline and next? Best regards, V. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH linux-next] staging: r8192ee: Adapt flush function prototype
On 06/20/2014 02:19 AM, Greg Kroah-Hartman wrote: (..) This doesn't apply as I think it's already done part of a merge... You are right, it seems to be in f9da455b93f6. Thanks for your concern! Best regards, V. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux next: boot on Wandboard broken by gic related change
Hi, FYI, I noticed that Linux next would not boot anymore on Wandboard i.MX6 quad. This, since next-20141127. After bisecting for a while, `git bisect run' pointed at this very commit: 9a1091ef0017c40ab63e7fc0326b2dcfd4dde3a4 irqchip: gic: Support hierarchy irq domain. Indeed, reverting this commit on top of Linux next-20141201 repairs the boot. I am afraid I cannot debug this, but I would gladly help test patches :) Best regard, V. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: KVM Disk i/o or VM activities causes soft lockup?
On Mon, Nov 26, 2012 at 2:58 AM, Stefan Hajnoczi wrote: > On Fri, Nov 23, 2012 at 10:34:16AM -0800, Vincent Li wrote: >> On Thu, Nov 22, 2012 at 11:29 PM, Stefan Hajnoczi wrote: >> > On Wed, Nov 21, 2012 at 03:36:50PM -0800, Vincent Li wrote: >> >> We have users running on redhat based distro (Kernel >> >> 2.6.32-131.21.1.el6.x86_64 ) with kvm, when customer made cron job >> >> script to copy large files between kvm guest or some other user space >> >> program leads to disk i/o or VM activities, users get following soft >> >> lockup message from console: >> >> >> >> Nov 17 13:44:46 slot1/luipaard100a err kernel: BUG: soft lockup - >> >> CPU#4 stuck for 61s! [qemu-kvm:6795] >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: Modules linked in: >> >> ebt_vlan nls_utf8 isofs ebtable_filter ebtables 8021q garp bridge stp >> >> llc ipt_REJECT iptable_filter xt_NOTRACK nf_conntrack iptable_raw >> >> ip_tables loop ext2 binfmt_misc hed womdict(U) vnic(U) parport_pc lp >> >> parport predis(U) lasthop(U) ipv6 toggler vhost_net tun kvm_intel kvm >> >> jiffies(U) sysstats hrsleep i2c_dev datastor(U) linux_user_bde(P)(U) >> >> linux_kernel_bde(P)(U) tg3 libphy serio_raw i2c_i801 i2c_core ehci_hcd >> >> raid1 raid0 virtio_pci virtio_blk virtio virtio_ring mvsas libsas >> >> scsi_transport_sas mptspi mptscsih mptbase scsi_transport_spi 3w_9xxx >> >> sata_svw(U) ahci serverworks sata_sil ata_piix libata sd_mod >> >> crc_t10dif amd74xx piix ide_gd_mod ide_core dm_snapshot dm_mirror >> >> dm_region_hash dm_log dm_mod ext3 jbd mbcache >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: Pid: 6795, comm: >> >> qemu-kvm Tainted: P >> >> 2.6.32-131.21.1.el6.f5.x86_64 #1 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: Call Trace: >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? get_timestamp+0x9/0xf >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? watchdog_timer_fn+0x130/0x178 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? __run_hrtimer+0xa3/0xff >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? hrtimer_interrupt+0xe6/0x190 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? hrtimer_interrupt+0xa9/0x190 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? hpet_interrupt_handler+0x26/0x2d >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? hrtimer_peek_ahead_timers+0x9/0xd >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? __do_softirq+0xc5/0x17a >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? call_softirq+0x1c/0x28 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? do_softirq+0x31/0x66 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? call_function_interrupt+0x13/0x20 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? vmx_get_msr+0x0/0x123 [kvm_intel] >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? kvm_arch_vcpu_ioctl_run+0x80e/0xaf1 [kvm] >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? kvm_arch_vcpu_ioctl_run+0x802/0xaf1 [kvm] >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? inode_has_perm+0x65/0x72 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? kvm_vcpu_ioctl+0xf2/0x5ba [kvm] >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? file_has_perm+0x9a/0xac >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? vfs_ioctl+0x21/0x6b >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? do_vfs_ioctl+0x487/0x4da >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? sys_ioctl+0x51/0x70 >> >> Nov 17 13:44:46 slot1/luipaard100a warning kernel: >> >> [] ? system_call_fastpath+0x3c/0x41 >> > >> > This soft lockup is report on the host? >> > >> > Stefan >> >> Yes, it is on host. we just recommend users not doing large file >> copying, just wondering if there is potential kernel bug. it seems the >> softlockup backtrace pointing to hrtimer and softirq. my naive >> knowledge is that the watchdog thread is on top of hrtimer which is on >> top of softirq. > > Since the soft lockup detector is firing on the host, this seems like a > hardware/driver problem. Have you ever had soft lockups running non-KVM > workloads on this host? > > Stefan this soft lockup only triggers when running KVM, also users used another script in cron job to restart 4 kvm instance every 5 mintues ( insane to me) that also causing tons of softlock up message during the kvm instance startup . we have already told customer stop doing that and the softlockup message disappear. Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
On 27 November 2012 06:19, Viresh Kumar wrote: > Hi Tejun, > > On 26 November 2012 22:45, Tejun Heo wrote: >> On Tue, Nov 06, 2012 at 04:08:45PM +0530, Viresh Kumar wrote: > >> I'm pretty skeptical about this. queue_work() w/o explicit CPU >> assignment has always guaranteed that the work item will be executed >> on the same CPU. I don't think there are too many users depending on >> that but am not sure at all that there are none. I asked you last >> time that you would at the very least need to audit most users but it >> doesn't seem like there has been any effort there. > > My bad. I completely missed/forgot that comment from your earlier mails. > Will do it. > >> That said, if the obtained benefit is big enough, sure, we can >> definitely change the behavior, which isn't all that great to begin >> with, and try to shake out the bugs quicky by e.g. forcing all work >> items to execute on different CPUs, but you're presenting a few >> percent of work items being migrated to a different CPU from an >> already active CPU, which doesn't seem like such a big benefit to me >> even if the migration target CPU is somewhat more efficient. How much >> powersaving are we talking about? > > Hmm.. I actually implemented the problem discussed here: > (I know you have seen this earlier :) ) > > http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/08/lpc2012-sched-timer-workqueue.pdf > > Specifically slides: 12 & 19. > > I haven't done much power calculations with it and have tested it more from > functionality point of view. > > @Vincent: Can you add some comments here? Sorry for this late reply. We have faced some situations on TC2 (as an example) where the tasks are running in the LITTLE cluster whereas some periodic works stay on the big cluster so we can have one cluster that wakes up for tasks and another one that wakes up for work. We would like to consolidate the behaviour of the work with the tasks behaviour. Sorry, I don't have relevant figures as the patches are used with other ones which also impact the power consumption. This series introduces the possibility to run a work on another CPU which is necessary if we want a better correlation of task and work scheduling on the system. Most of the time the queue_work is used when a driver don't mind the CPU on which you want to run whereas it looks like it should be used only if you want to run locally. We would like to solve this point with the new interface that is proposed by viresh Vincent > > -- > viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
On 27 November 2012 14:59, Steven Rostedt wrote: > On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote: >> On 27 November 2012 18:56, Steven Rostedt wrote: >> > A couple of things. The sched_select_cpu() is not cheap. It has a double >> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs, >> > and we are CPU 1023 and all other CPUs happen to be idle, we could be >> > searching 1023 CPUs before we come up with our own. >> >> Not sure if you missed the first check sched_select_cpu() >> >> +int sched_select_cpu(unsigned int sd_flags) >> +{ >> + /* If Current cpu isn't idle, don't migrate anything */ >> + if (!idle_cpu(cpu)) >> + return cpu; >> >> We aren't going to search if we aren't idle. > > OK, we are idle, but CPU 1022 isn't. We still need a large search. But, > heh we are idle we can spin. But then why go through this in the first > place ;-) By migrating it now, it will create its activity and wake up on the right CPU next time. If migrating on any CPUs seems a bit risky, we could restrict the migration on a CPU on the same node. We can pass such contraints on sched_select_cpu > > >> >> > Also, I really don't like this as a default behavior. It seems that this >> > solution is for a very special case, and this can become very intrusive >> > for the normal case. >> >> We tried with an KCONFIG option for it, which Tejun rejected. > > Yeah, I saw that. I don't like adding KCONFIG options either. Best is to > get something working that doesn't add any regressions. If you can get > this to work without making *any* regressions in the normal case than > I'm totally fine with that. But if this adds any issues with the normal > case, then it's a show stopper. > >> >> > To be honest, I'm uncomfortable with this approach. It seems to be >> > fighting a symptom and not the disease. I'd rather find a way to keep >> > work from being queued on wrong CPU. If it is a timer, find a way to >> > move the timer. If it is something else, lets work to fix that. Doing >> > searches of possibly all CPUs (unlikely, but it is there), just seems >> > wrong to me. >> >> As Vincent pointed out, on big LITTLE systems we just don't want to >> serve works on big cores. That would be wasting too much of power. >> Specially if we are going to wake up big cores. >> >> It would be difficult to control the source driver (which queues work) to >> little cores. We thought, if somebody wanted to queue work on current >> cpu then they must use queue_work_on(). > > As Tejun has mentioned earlier, is there any assumptions anywhere that > expects an unbounded work queue to not migrate? Where per cpu variables > might be used. Tejun had a good idea of forcing this to migrate the work > *every* time. To not let a work queue run on the same CPU that it was > queued on. If it can survive that, then it is probably OK. Maybe add a > config option that forces this? That way, anyone can test that this > isn't an issue. > > -- Steve > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 Resend 3/4] workqueue: Schedule work on non-idle cpu instead of current one
On 27 November 2012 16:04, Steven Rostedt wrote: > On Tue, 2012-11-27 at 15:55 +0100, Vincent Guittot wrote: >> On 27 November 2012 14:59, Steven Rostedt wrote: >> > On Tue, 2012-11-27 at 19:18 +0530, Viresh Kumar wrote: >> >> On 27 November 2012 18:56, Steven Rostedt wrote: >> >> > A couple of things. The sched_select_cpu() is not cheap. It has a double >> >> > loop of domains/cpus looking for a non idle cpu. If we have 1024 CPUs, >> >> > and we are CPU 1023 and all other CPUs happen to be idle, we could be >> >> > searching 1023 CPUs before we come up with our own. >> >> >> >> Not sure if you missed the first check sched_select_cpu() >> >> >> >> +int sched_select_cpu(unsigned int sd_flags) >> >> +{ >> >> + /* If Current cpu isn't idle, don't migrate anything */ >> >> + if (!idle_cpu(cpu)) >> >> + return cpu; >> >> >> >> We aren't going to search if we aren't idle. >> > >> > OK, we are idle, but CPU 1022 isn't. We still need a large search. But, >> > heh we are idle we can spin. But then why go through this in the first >> > place ;-) >> >> By migrating it now, it will create its activity and wake up on the >> right CPU next time. >> >> If migrating on any CPUs seems a bit risky, we could restrict the >> migration on a CPU on the same node. We can pass such contraints on >> sched_select_cpu >> > > That's assuming that the CPUs stay idle. Now if we move the work to > another CPU and it goes idle, then it may move that again. It could end > up being a ping pong approach. > > I don't think idle is a strong enough heuristic for the general case. If > interrupts are constantly going off on a CPU that happens to be idle > most of the time, it will constantly be moving work onto CPUs that are > currently doing real work, and by doing so, it will be slowing those > CPUs down. > I agree that idle is probably not enough but it's the heuristic that is currently used for selecting a CPU for a timer and the timer also uses sched_select_cpu in this series. So in order to go step by step, a common interface has been introduced for selecting a CPU and this function uses the same algorithm than the timer already do. Once we agreed on an interface, the heuristic could be updated. > -- Steve > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 3/3] mfd: stmpe: Update DT support in stmpe driver
2012/11/27 Viresh Kumar : > On 27 November 2012 14:10, Lee Jones wrote: > I haven't seen this in any of SPEAr boards i have worked on. Maybe Rabin > would have, that's why he added that part of code :) > > @Rabin/Linus: Do you remember why have you added this in stmpe driver: > > + if (stmpe->pdata->irq_invert_polarity) > + icr ^= STMPE_ICR_LSB_HIGH; > + > > Does somebody actually need it? It was (as irq_rev_pol) part of Luotao Fu's proposed STMPE811 patchset (https://patchwork.kernel.org/patch/106173/) which I integrated into my version of the STMPE driver, which didn't have it in its initial version (https://patchwork.kernel.org/patch/103273/). It's not something _I_ ever used. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 2/2] clk: per-user clock accounting for debug
When a clock has multiple users, the WARNING on imbalance of enable/disable may not show the guilty party since although they may have commited the error earlier, the warning is emitted later when some other user, presumably innocent, disables the clock. Provide per-user clock enable/disable accounting and disabler tracking in order to help debug these problems. NOTE: with this patch, clk_get_parent() behaves like clk_get(), i.e. it needs to be matched with a clk_put(). Otherwise, memory will leak. Signed-off-by: Rabin Vincent --- drivers/clk/clk-core.h | 18 ++ drivers/clk/clk.c | 35 +-- drivers/clk/clkdev.c| 9 ++--- include/linux/clk-private.h | 6 +- 4 files changed, 58 insertions(+), 10 deletions(-) diff --git a/drivers/clk/clk-core.h b/drivers/clk/clk-core.h index 341ae45..c8259c2 100644 --- a/drivers/clk/clk-core.h +++ b/drivers/clk/clk-core.h @@ -4,11 +4,21 @@ struct clk_core; #ifdef CONFIG_COMMON_CLK -#define clk_to_clk_core(clk) ((struct clk_core *)(clk)) -#define clk_core_to_clk(core) ((struct clk *)(core)) +struct clk_core *clk_to_clk_core(struct clk *clk); +struct clk *clk_core_to_clk(struct clk_core *clk_core, const char *dev, + const char *con); + +static inline void clk_free_clk(struct clk *clk) +{ + kfree(clk); +} #else -#define clk_to_clk_core(clk) ((clk)) -#define clk_core_to_clk(core) ((core)) +#define clk_to_clk_core(clk) ((clk)) +#define clk_core_to_clk(core, dev, con)((core)) + +static inline void clk_free_clk(struct clk *clk) +{ +} #endif #endif diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 1fb7043..57ba594 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -250,6 +250,27 @@ static int clk_disable_unused(void) } late_initcall(clk_disable_unused); +struct clk *clk_core_to_clk(struct clk_core *clk_core, const char *dev, + const char *con) +{ + struct clk *clk; + + clk = kzalloc(sizeof(*clk), GFP_KERNEL); + if (!clk) + return ERR_PTR(-ENOMEM); + + clk->core = clk_core; + clk->dev_id = dev; + clk->con_id = con; + + return clk; +} + +struct clk_core *clk_to_clk_core(struct clk *clk) +{ + return clk->core; +} + /***helper functions ***/ inline const char *__clk_get_name(struct clk_core *clk) @@ -504,7 +525,15 @@ void clk_disable(struct clk *clk_user) unsigned long flags; spin_lock_irqsave(&enable_lock, flags); - __clk_disable(clk); + if (!WARN(clk_user->enable_count == 0, + "incorrect disable clk dev %s con %s last disabler %pF\n", + clk_user->dev_id, clk_user->con_id, clk_user->last_disable)) { + + clk_user->last_disable = __builtin_return_address(0); + clk_user->enable_count--; + + __clk_disable(clk); + } spin_unlock_irqrestore(&enable_lock, flags); } EXPORT_SYMBOL_GPL(clk_disable); @@ -559,6 +588,8 @@ int clk_enable(struct clk *clk_user) spin_lock_irqsave(&enable_lock, flags); ret = __clk_enable(clk); + if (!ret) + clk_user->enable_count++; spin_unlock_irqrestore(&enable_lock, flags); return ret; @@ -976,7 +1007,7 @@ struct clk *clk_get_parent(struct clk *clk_user) parent = __clk_get_parent(clk); mutex_unlock(&prepare_lock); - return clk_core_to_clk(parent); + return clk_core_to_clk(parent, clk_user->dev_id, clk_user->con_id); } EXPORT_SYMBOL_GPL(clk_get_parent); diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c index 5ddcaf1..1321b7c 100644 --- a/drivers/clk/clkdev.c +++ b/drivers/clk/clkdev.c @@ -43,7 +43,7 @@ struct clk *of_clk_get(struct device_node *np, int index) clk = of_clk_get_from_provider(&clkspec); of_node_put(clkspec.np); - return clk_core_to_clk(clk); + return clk_core_to_clk(clk, np->full_name, NULL); } EXPORT_SYMBOL(of_clk_get); @@ -151,7 +151,7 @@ struct clk *clk_get_sys(const char *dev_id, const char *con_id) if (!cl) return ERR_PTR(-ENOENT); - return clk_core_to_clk(cl->clk); + return clk_core_to_clk(cl->clk, dev_id, con_id); } EXPORT_SYMBOL(clk_get_sys); @@ -172,7 +172,10 @@ EXPORT_SYMBOL(clk_get); void clk_put(struct clk *clk) { - __clk_put(clk_to_clk_core(clk)); + clk_core_t *core = clk_to_clk_core(clk); + + clk_free_clk(clk); + __clk_put(core); } EXPORT_SYMBOL(clk_put); diff --git a/include/linux/clk-private.h b/include/linux/clk-private.h index e5b766e..406c951 100644 --- a/include/linux/clk-private.h +++ b/include/linux/clk-private.h @@ -47,7 +47,11 @@ struct clk_core { }; struct clk { - struct clk_core clk; + struct clk_core *core; + unsigned intenable_cou
Re: [RFC 1/2] clk: use struct clk only for external API
2012/11/28 viresh kumar : > On Wed, Nov 28, 2012 at 9:31 PM, viresh kumar wrote: >> On Wed, Nov 28, 2012 at 5:22 PM, Rabin Vincent >> Isn't something wrong here? For common clk case shouldn't >> this be: >> >>> +#define clk_to_clk_core(clk) (&clk->clk) >>> +#define clk_core_to_clk(core) (container_of(clk, ...)) //not getting into >>> the exact format here >> >> Sorry, if i am missing basics. > > Ok. I saw these getting updated in 2/2. But it means this individual patch > is broken and this is not allowed i believe. It would be better to use container_of / &clk->clk, yes. I wouldn't really describe it as "broken" though since it works fine as it is, since it's the first and only element. I will change it anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v5] sched: fix init NOHZ_IDLE flag
On my smp platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle. The root cause is: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. More generally, the NOHZ_IDLE flag must be initialized when new sched_domains are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned. This condition can be ensured by adding a synchronize_rcu between the destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE flag will not be updated with old sched_domain once it has been initialized. But this solution introduces a additionnal latency in the rebuild sequence that is called during cpu hotplug. As suggested by Frederic Weisbecker, another solution is to have the same rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce a new sched_domain_rq struct that is the entry point for both sched_domains and objects that must follow the same lifecycle like NOHZ_IDLE flags. They will share the same RCU lifecycle and will be always synchronized. The synchronization is done at the cost of : - an additional indirection for accessing the first sched_domain level - an additional indirection and a rcu_dereference before accessing to the NOHZ_IDLE flag. Change since v4: - link both sched_domain and NOHZ_IDLE flag in one RCU object so their states are always synchronized. Change since V3; - NOHZ flag is not cleared if a NULL domain is attached to the CPU - Remove patch 2/2 which becomes useless with latest modifications Change since V2: - change the initialization to idle state instead of busy state so a CPU that enters idle during the build of the sched_domain will not corrupt the initialization state Change since V1: - remove the patch for SCHED softirq on an idle core use case as it was a side effect of the other use cases. Signed-off-by: Vincent Guittot --- include/linux/sched.h |6 +++ kernel/sched/core.c | 105 - kernel/sched/fair.c | 35 +++-- kernel/sched/sched.h | 24 +-- 4 files changed, 145 insertions(+), 25 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index d35d2b6..2a52188 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -959,6 +959,12 @@ struct sched_domain { unsigned long span[0]; }; +struct sched_domain_rq { + struct sched_domain *sd; + unsigned long flags; + struct rcu_head rcu;/* used during destruction */ +}; + static inline struct cpumask *sched_domain_span(struct sched_domain *sd) { return to_cpumask(sd->span); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f12624..69e2313 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5602,6 +5602,15 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu) destroy_sched_domain(sd, cpu); } +static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu) +{ + if (!sd_rq) + return; + + destroy_sched_domains(sd_rq->sd, cpu); + kfree_rcu(sd_rq, rcu); +} + /* * Keep a special pointer to the highest sched_domain that has * SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this @@ -5632,10 +5641,23 @@ static void update_top_cache_domain(int cpu) * hold the hotplug lock. */ static void -cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) +cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd, + int cpu) { struct rq *rq = cpu_rq(cpu); - struct sched_domain *tmp; + struct sched_domain_rq *tmp_rq; + struct sched_domain *tmp, *sd = NULL; + + /* +* If we don't have any sched_domain and associated object, we can +* directly jump to the attach sequence otherwise we try to degenerate +* the sched_domain +*/ + if (!sd_rq) + goto attach; + + /* Get a pointer to the 1st sched_domain */ + sd = sd_rq->sd; /* Remove the sched domains which do not contribute to scheduling. */ for (tmp = sd; tmp; ) { @@ -5658,14 +5680,17 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) destroy_sched_domain(tmp, cpu); if (sd) sd->child = NULL; + /* update sched_domain_rq */ + sd_rq->sd = sd; } +attach: sched_domain_debug(sd, cpu); rq_attach_root(rq, rd); - tmp = rq->sd; - rcu_assign_pointer(rq->sd, sd); - destroy_sched_domains(tmp, cpu); + tmp_rq = rq->sd_r
Re: [PATCH] usb: Make USB persist default configurable
On Tue, Mar 19, 2013 at 7:56 AM, Alan Stern wrote: > > On Mon, 18 Mar 2013, Greg Kroah-Hartman wrote: > > > On Mon, Mar 18, 2013 at 05:02:19PM -0700, Julius Werner wrote: > > > > Why can't you just revert this in userspace? Isn't that easier than > > > > doing a kernel patch and providing an option that we need to now > > > > maintain for pretty much forever? > > > > > > I could solve it in userspace, but that really feels like a hacky > > > workaround and not a long term solution. It would mean that every new > > > device starts with persist enabled and stays that way for a few > > > milliseconds (maybe up to seconds if it's connected on boot), until > > > userspace gets around to disable it again... opening the possibility > > > for very weird race conditions and bugs with drivers/devices that > > > don't work with persist. > > > > What drivers/devices don't work with persist? We need to know that now, > > otherwise all other distros and users have problems, right? > > > > > This default is a policy that really resides in the kernel, it has > > > changed in the past, and since there is no definitive better choice > > > for all cases I thought making it configurable is the right thing to > > > do. > > > > Too many options can be a bad thing. > > > > I think Alan made this a "always on" option, so I'd like to get his > > opinion on it. Alan? > > Originally the "persist" attribute defaulted to "off". Linus disliked > this (at least, he disliked it for mass-storage devices) and so at his > request the default was changed to "on". There didn't seem to be any > reason to treat other devices differently from mass-storage devices; > consequently the default is now "on" for everything. > > Julius's commit message mentions the disadvantage of "persist": Resume > times can be increased. But it doesn't mention the chief advantage: > Filesystems stored on USB devices are much less likely to be lost > across suspends. > > The races mentioned above don't seem to be very dangerous. How likely > is it that the system will be suspended within a few milliseconds of > probing a new USB device? For laptops, if the suspend/resume is triggered by the lid open/close detection, this is somewhat likely and bit us in the past : the classical use case I have encountered is a back-to-back suspend triggered by the user opening the lid then closing it again in the next 2 or 3 seconds because he has changed is mind (damn user...), might be also triggered by lid hall sensor missing proper debouncing (but in that case, the mechanical time constant is often shorter than the latency of resuming USB devices). > > As for buggy devices and drivers that can't handle persist, we have > better ways of dealing with them. Buggy devices can get a quirk flag > (USB_QUIRK_RESET). Buggy drivers should be fixed. > > Alan Stern > -- Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag
On 1 February 2013 19:03, Frederic Weisbecker wrote: > 2013/1/29 Vincent Guittot : >> On my smp platform which is made of 5 cores in 2 clusters,I have the >> nr_busy_cpu field of sched_group_power struct that is not null when the >> platform is fully idle. The root cause seems to be: >> During the boot sequence, some CPUs reach the idle loop and set their >> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus >> field is initialized later with the assumption that all CPUs are in the busy >> state whereas some CPUs have already set their NOHZ_IDLE flag. >> We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to >> have a coherent configuration. >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/core.c |1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 257002c..fd41924 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct >> sched_domain *sd) >> >> update_group_power(sd, cpu); >> atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); >> + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); > > So that's a real issue indeed. nr_busy_cpus was never correct. > > Now I'm still a bit worried with this solution. What if an idle task > started in smp_init() has not yet stopped its tick, but is about to do > so? The domains are not yet available to the task but the nohz flags > are. When it later restarts the tick, it's going to erroneously > increase nr_busy_cpus. My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in init_sched_groups_power instead of setting them as it is done now. If a CPU enters idle during the init sequence, the flag is already cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared while a NULL sched_domain is attached to the CPU thanks to patch 2. This should solve all use cases ? > > It probably won't happen in practice. But then there is more: sched > domains can be concurrently rebuild anytime, right? So what if we > call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the > domain is switched concurrently. Are we having a new sched group along > the way? If so we have a bug here as well because we can have > NOHZ_IDLE set but nr_busy_cpus accounting the CPU. When the sched_domain are rebuilt, we set a null sched_domain during the rebuild sequence and a new sched_group_power is created as well > > May be we need to set the per cpu nohz flags on the child leaf sched > domain? This way it's initialized and stored on the same RCU pointer > and we nohz_flags and nr_busy_cpus become sync. > > Also we probably still need the first patch of your previous round. > Because the current patch may introduce situations where we have idle > CPUs with NOHZ_IDLE flags cleared. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched: fix wrong rq's runnable_avg update with rt task
When a RT task is scheduled on an idle CPU, the update of the rq's load is not done because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. The rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Signed-off-by: Vincent Guittot --- kernel/sched/core.c |3 +++ kernel/sched/fair.c | 10 ++ kernel/sched/sched.h |5 + 3 files changed, 18 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26058d0..592e06c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2927,6 +2927,9 @@ need_resched: pre_schedule(rq, prev); + if (unlikely(prev == rq->idle)) + idle_exit(cpu, rq); + if (unlikely(!rq->nr_running)) idle_balance(cpu, rq); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5eea870..520fe55 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,16 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit(int this_cpu, struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fc88644..9707092 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +extern void idle_exit(int this_cpu, struct rq *this_rq); #else /* CONFIG_SMP */ @@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq) { } +static inline void idle_exit(int this_cpu, struct rq *this_rq) +{ +} + #endif extern void sysrq_sched_debug_show(void); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: fix wrong rq's runnable_avg update with rt task
On 8 February 2013 15:46, Steven Rostedt wrote: > On Fri, 2013-02-08 at 12:11 +0100, Vincent Guittot wrote: >> When a RT task is scheduled on an idle CPU, the update of the rq's load is >> not done because CFS's functions are not called. Then, the idle_balance, >> which is called just before entering the idle function, updates the >> rq's load and makes the assumption that the elapsed time since the last >> update, was only running time. >> >> The rq's load of a CPU that only runs a periodic RT task, is close to >> LOAD_AVG_MAX whatever the running duration of the RT task is. >> >> A new idle_exit function is called when the prev task is the idle function >> so the elapsed time will be accounted as idle time in the rq's load. >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/core.c |3 +++ >> kernel/sched/fair.c | 10 ++ >> kernel/sched/sched.h |5 + >> 3 files changed, 18 insertions(+) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 26058d0..592e06c 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -2927,6 +2927,9 @@ need_resched: >> >> pre_schedule(rq, prev); >> >> + if (unlikely(prev == rq->idle)) >> + idle_exit(cpu, rq); >> + > > Let's get rid of the added junk in the core code that should be isolated > in the idle code. > i agree > I posted these patches before, and I'm about to post again: > > https://lkml.org/lkml/2012/12/21/378 > https://lkml.org/lkml/2012/12/21/377 > > I'm working to clean these patches up today and post them again. Would > working on top of these work for you? yes for sure. I will move that code in pre_schedule Vincent > > -- Steve > > >> if (unlikely(!rq->nr_running)) >> idle_balance(cpu, rq); >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag
On 8 February 2013 16:35, Frederic Weisbecker wrote: > 2013/2/4 Vincent Guittot : >> On 1 February 2013 19:03, Frederic Weisbecker wrote: >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>>> index 257002c..fd41924 100644 >>>> --- a/kernel/sched/core.c >>>> +++ b/kernel/sched/core.c >>>> @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct >>>> sched_domain *sd) >>>> >>>> update_group_power(sd, cpu); >>>> atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); >>>> + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); >>> >>> So that's a real issue indeed. nr_busy_cpus was never correct. >>> >>> Now I'm still a bit worried with this solution. What if an idle task >>> started in smp_init() has not yet stopped its tick, but is about to do >>> so? The domains are not yet available to the task but the nohz flags >>> are. When it later restarts the tick, it's going to erroneously >>> increase nr_busy_cpus. >> >> My 1st idea was to clear NOHZ_IDLE flag and nr_busy_cpus in >> init_sched_groups_power instead of setting them as it is done now. If >> a CPU enters idle during the init sequence, the flag is already >> cleared, and nohz_flags and nr_busy_cpus will stay synced and cleared >> while a NULL sched_domain is attached to the CPU thanks to patch 2. >> This should solve all use cases ? > > This may work on smp_init(). But the per cpu domain can be changed > concurrently > anytime on cpu hotplug, with a new sched group power struct, right? During a cpu hotplug, a null domain is attached to each CPU of the partition because we have to build new sched_domains so we have a similar behavior than smp_init. So if we clear NOHZ_IDLE flag and nr_busy_cpus in init_sched_groups_power, we should be safe for init and hotplug. More generally speaking, if the sched_domains of a group of CPUs must be rebuilt, a NULL sched_domain is attached to these CPUs during the build > > What if the following happen (inventing function names but you get the idea): > > CPU 0 CPU 1 > > dom = new_domain(...) { >nr_cpus_busy = 0; >set_idle(CPU 1); old_dom =get_dom() > clear_idle(CPU 1) > } > rcu_assign_pointer(cpu1_dom, dom); > > > Can this scenario happen? This scenario will be: CPU 0 CPU 1 detach_and_destroy_domain { rcu_assign_pointer(cpu1_dom, NULL); } dom = new_domain(...) { nr_cpus_busy = 0; set_idle(CPU 1); old_dom =get_dom() old_dom is null //clear_idle(CPU 1) can't happen because a null domain is attached so we will never call nohz_kick_needed which is the only place where we can clear_idle } rcu_assign_pointer(cpu1_dom, dom); > > >>> >>> It probably won't happen in practice. But then there is more: sched >>> domains can be concurrently rebuild anytime, right? So what if we >>> call set_cpu_sd_state_idle() and decrease nr_busy_cpus while the >>> domain is switched concurrently. Are we having a new sched group along >>> the way? If so we have a bug here as well because we can have >>> NOHZ_IDLE set but nr_busy_cpus accounting the CPU. >> >> When the sched_domain are rebuilt, we set a null sched_domain during >> the rebuild sequence and a new sched_group_power is created as well > > So at that time we may race with a CPU setting/clearing its NOHZ_IDLE flag > as in my above scenario? Unless i have missed a use case, we always have a null domain attached to a CPU while we build the new one. So the patch 2/2 should protect us against clearing the NOHZ_IDLE whereas the new nr_busy_cpus is not yet attached. I'm going to send a new version which set the NOHZ_IDLE bit and clear nr_busy_cpus during the built of a sched_domain Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 2/2] sched: fix update NOHZ_IDLE flag
The function nohz_kick_needed modifies NOHZ_IDLE flag that is used to update the nr_busy_cpus of the sched_group. When the sched_domain are updated (during the boot or because of the unplug of a CPUs as an example) a null_domain is attached to CPUs. We have to test likely(!on_null_domain(cpu) first in order to detect such intialization step and to not modify the NOHZ_IDLE flag Signed-off-by: Vincent Guittot --- kernel/sched/fair.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5eea870..dac2edf 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5695,7 +5695,7 @@ void trigger_load_balance(struct rq *rq, int cpu) likely(!on_null_domain(cpu))) raise_softirq(SCHED_SOFTIRQ); #ifdef CONFIG_NO_HZ - if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) + if (likely(!on_null_domain(cpu)) && nohz_kick_needed(rq, cpu)) nohz_balancer_kick(cpu); #endif } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 0/2] sched: fix nr_busy_cpus
The nr_busy_cpus field of the sched_group_power is sometime different from 0 whereas the platform is fully idle. This serie fixes 3 use cases: - when some CPUs enter idle state while booting all CPUs - when a CPU is unplug and/or replug Change since V2: - change the initialization to idle state instead of busy state so a CPU that enters idle during the build of the sched_domain will not corrupt the initialization state Change since V1: - remove the patch for SCHED softirq on an idle core use case as it was a side effect of the other use cases. Vincent Guittot (2): sched: fix init NOHZ_IDLE flag sched: fix update NOHZ_IDLE flag kernel/sched/core.c |4 +++- kernel/sched/fair.c |2 +- 2 files changed, 4 insertions(+), 2 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 1/2] sched: fix init NOHZ_IDLE flag
On my smp platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle. The root cause seems to be: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. We set the NOHZ_IDLE flag when nr_busy_cpus is initialized to 0 in order to have a coherent configuration. The patch 2/2 protects this init against an update of NOHZ_IDLE flag because a NULL sched_domain is attached to the CPU during the build of the new sched_domain so nohz_kick_needed and set_cpu_sd_state_busy are not called and can't clear the NOHZ_IDLE flag Signed-off-by: Vincent Guittot --- kernel/sched/core.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26058d0..c730a4e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5884,7 +5884,9 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd) return; update_group_power(sd, cpu); - atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); + atomic_set(&sg->sgp->nr_busy_cpus, 0); + set_bit(NOHZ_IDLE, nohz_flags(cpu)); + } int __weak arch_sd_sibling_asym_packing(void) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug
Hi Srivatsa, I can try to run some of our stress tests on your patches. Have you got a git tree that i can pull ? Regards, Vincent On 8 February 2013 19:09, Srivatsa S. Bhat wrote: > On 02/08/2013 10:14 PM, Srivatsa S. Bhat wrote: >> On 02/08/2013 09:11 PM, Russell King - ARM Linux wrote: >>> On Thu, Feb 07, 2013 at 11:41:34AM +0530, Srivatsa S. Bhat wrote: >>>> On 02/07/2013 09:44 AM, Rusty Russell wrote: >>>>> "Srivatsa S. Bhat" writes: >>>>>> On 01/22/2013 01:03 PM, Srivatsa S. Bhat wrote: >>>>>> Avg. latency of 1 CPU offline (ms) [stop-cpu/stop-m/c >>>>>> latency] >>>>>> >>>>>> # online CPUsMainline (with stop-m/c) This patchset (no >>>>>> stop-m/c) >>>>>> >>>>>> 8 17.04 7.73 >>>>>> >>>>>> 16 18.05 6.44 >>>>>> >>>>>> 32 17.31 7.39 >>>>>> >>>>>> 64 32.40 9.28 >>>>>> >>>>>> 128 98.23 7.35 >>>>> >>>>> Nice! >>>> >>>> Thank you :-) >>>> >>>>> I wonder how the ARM guys feel with their quad-cpu systems... >>>>> >>>> >>>> That would be definitely interesting to know :-) >>> >>> That depends what exactly you'd like tested (and how) and whether you'd >>> like it to be a test-chip based quad core, or an OMAP dual-core SoC. >>> >> >> The effect of stop_machine() doesn't really depend on the CPU architecture >> used underneath or the platform. It depends only on the _number_ of >> _logical_ CPUs used. >> >> And stop_machine() has 2 noticeable drawbacks: >> 1. It makes the hotplug operation itself slow >> 2. and it causes disruptions to the workloads running on the other >> CPUs by hijacking the entire machine for significant amounts of time. >> >> In my experiments (mentioned above), I tried to measure how my patchset >> improves (reduces) the duration of hotplug (CPU offline) itself. Which is >> also slightly indicative of the impact it has on the rest of the system. >> >> But what would be nice to test, is a setup where the workloads running on >> the rest of the system are latency-sensitive, and measure the impact of >> CPU offline on them, with this patchset applied. That would tell us how >> far is this useful in making CPU hotplug less disruptive on the system. >> >> Of course, it would be nice to also see whether we observe any reduction >> in hotplug duration itself (point 1 above) on ARM platforms with lot >> of CPUs. [This could potentially speed up suspend/resume, which is used >> rather heavily on ARM platforms]. >> >> The benefits from this patchset over mainline (both in terms of points >> 1 and 2 above) is expected to increase, with increasing number of CPUs in >> the system. >> > > Adding Vincent to CC, who had previously evaluated the performance and > latency implications of CPU hotplug on ARM platforms, IIRC. > > Regards, > Srivatsa S. Bhat > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] sched: fix env->src_cpu for active migration
need_active_balance uses env->src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is only 1 task on the run queue Signed-off-by: Vincent Guittot --- kernel/sched/fair.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest->cpu; + env.src_rq= busiest; + if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest->cpu; - env.src_rq= busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); update_h_load(env.src_cpu); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] sched: fix wrong rq's runnable_avg update with rt task
When a RT task is scheduled on an idle CPU, the update of the rq's load is not done because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. The rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 10 ++ kernel/sched/idle_task.c |7 +++ kernel/sched/sched.h |5 + 3 files changed, 22 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..60951f1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,16 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit(int this_cpu, struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..27cd379 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,12 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + /* Update rq's load with elapsed idle time */ + idle_exit(smp_processor_id(), rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -86,6 +92,7 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, #endif .set_curr_task = set_curr_task_idle, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fc88644..9707092 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +extern void idle_exit(int this_cpu, struct rq *this_rq); #else /* CONFIG_SMP */ @@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq) { } +static inline void idle_exit(int this_cpu, struct rq *this_rq) +{ +} + #endif extern void sysrq_sched_debug_show(void); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched: fix wrong rq's runnable_avg update with rt task
On 12 February 2013 14:23, Vincent Guittot wrote: > When a RT task is scheduled on an idle CPU, the update of the rq's load is > not done because CFS's functions are not called. Then, the idle_balance, > which is called just before entering the idle function, updates the > rq's load and makes the assumption that the elapsed time since the last > update, was only running time. > > The rq's load of a CPU that only runs a periodic RT task, is close to > LOAD_AVG_MAX whatever the running duration of the RT task is. > > A new idle_exit function is called when the prev task is the idle function > so the elapsed time will be accounted as idle time in the rq's load. > > Changes since V1: > - move code out of schedule function and create a pre_schedule callback for > idle class instead. Hi Steve, I have pushed a new version of my patch to have comments about the proposed solution but I will rebase it on top of your work when available Vincent > > Signed-off-by: Vincent Guittot > --- > kernel/sched/fair.c | 10 ++ > kernel/sched/idle_task.c |7 +++ > kernel/sched/sched.h |5 + > 3 files changed, 22 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 81fa536..60951f1 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -1562,6 +1562,16 @@ static inline void dequeue_entity_load_avg(struct > cfs_rq *cfs_rq, > se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); > } /* migrations, e.g. sleep=0 leave decay_count == 0 */ > } > + > +/* > + * Update the rq's load with the elapsed idle time before a task is > + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will > + * be the only way to update the runnable statistic. > + */ > +void idle_exit(int this_cpu, struct rq *this_rq) > +{ > + update_rq_runnable_avg(this_rq, 0); > +} > #else > static inline void update_entity_load_avg(struct sched_entity *se, > int update_cfs_rq) {} > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > index b6baf37..27cd379 100644 > --- a/kernel/sched/idle_task.c > +++ b/kernel/sched/idle_task.c > @@ -13,6 +13,12 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, > int flags) > { > return task_cpu(p); /* IDLE tasks as never migrated */ > } > + > +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) > +{ > + /* Update rq's load with elapsed idle time */ > + idle_exit(smp_processor_id(), rq); > +} > #endif /* CONFIG_SMP */ > /* > * Idle tasks are unconditionally rescheduled: > @@ -86,6 +92,7 @@ const struct sched_class idle_sched_class = { > > #ifdef CONFIG_SMP > .select_task_rq = select_task_rq_idle, > + .pre_schedule = pre_schedule_idle, > #endif > > .set_curr_task = set_curr_task_idle, > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index fc88644..9707092 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class; > > extern void trigger_load_balance(struct rq *rq, int cpu); > extern void idle_balance(int this_cpu, struct rq *this_rq); > +extern void idle_exit(int this_cpu, struct rq *this_rq); > > #else /* CONFIG_SMP */ > > @@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq) > { > } > > +static inline void idle_exit(int this_cpu, struct rq *this_rq) > +{ > +} > + > #endif > > extern void sysrq_sched_debug_show(void); > -- > 1.7.9.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] sched: fix wrong rq's runnable_avg update with rt task
On 12 February 2013 15:53, Steven Rostedt wrote: > On Tue, 2013-02-12 at 14:23 +0100, Vincent Guittot wrote: >> .set_curr_task = set_curr_task_idle, >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h >> index fc88644..9707092 100644 >> --- a/kernel/sched/sched.h >> +++ b/kernel/sched/sched.h >> @@ -877,6 +877,7 @@ extern const struct sched_class idle_sched_class; >> >> extern void trigger_load_balance(struct rq *rq, int cpu); >> extern void idle_balance(int this_cpu, struct rq *this_rq); >> +extern void idle_exit(int this_cpu, struct rq *this_rq); >> >> #else/* CONFIG_SMP */ >> >> @@ -884,6 +885,10 @@ static inline void idle_balance(int cpu, struct rq *rq) >> { >> } >> >> +static inline void idle_exit(int this_cpu, struct rq *this_rq) >> +{ >> +} >> + > > Is this part needed? I don't see it ever called when !CONFIG_SMP. no I forgot to remove it Vincent > > -- Steve > >> #endif >> >> extern void sysrq_sched_debug_show(void); > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v6 03/21] sched: only count runnable avg on cfs_rq's nr_running
On 30 March 2013 15:34, Alex Shi wrote: > Old function count the runnable avg on rq's nr_running even there is > only rt task in rq. That is incorrect, so correct it to cfs_rq's > nr_running. > > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 2881d42..026e959 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2829,7 +2829,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, > int flags) > } > > if (!se) { > - update_rq_runnable_avg(rq, rq->nr_running); > + update_rq_runnable_avg(rq, rq->cfs.nr_running); A RT task that preempts your CFS task will be accounted in the runnable_avg fields. So whatever you do, RT task will impact your runnable_avg statistics. Instead of trying to get only CFS tasks, you should take into account all tasks activity in the rq. Vincent > inc_nr_running(rq); > } > hrtick_update(rq); > -- > 1.7.12 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v6 10/21] sched: get rq potential maximum utilization
On 30 March 2013 15:34, Alex Shi wrote: > Since the rt task priority is higher than fair tasks, cfs_rq utilization > is just the left of rt utilization. > > When there are some cfs tasks in queue, the potential utilization may > be yielded, so mulitiplying cfs task number to get max potential > utilization of cfs. Then the rq utilization is sum of rt util and cfs > util. > > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c | 16 > 1 file changed, 16 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index ae87dab..0feeaee 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3350,6 +3350,22 @@ struct sg_lb_stats { > unsigned int group_util;/* sum utilization of group */ > }; > > +static unsigned long scale_rt_util(int cpu); > + > +static unsigned int max_rq_util(int cpu) > +{ > + struct rq *rq = cpu_rq(cpu); > + unsigned int rt_util = scale_rt_util(cpu); > + unsigned int cfs_util; > + unsigned int nr_running; > + > + cfs_util = (FULL_UTIL - rt_util) > rq->util ? rq->util > + : (FULL_UTIL - rt_util); rt_util and rq->util don't use the same computation algorithm so the results are hardly comparable or addable. In addition, some RT tasks can have impacted the rq->util, so they will be accounted in both side. Vincent > + nr_running = rq->nr_running ? rq->nr_running : 1; > + > + return rt_util + cfs_util * nr_running; > +} > + > /* > * sched_balance_self: balance the current task (running on cpu) in domains > * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and > -- > 1.7.12 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH Resend v5] sched: fix init NOHZ_IDLE flag
On my smp platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle. The root cause is: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. More generally, the NOHZ_IDLE flag must be initialized when new sched_domains are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned. This condition can be ensured by adding a synchronize_rcu between the destruction of old sched_domains and the creation of new ones so the NOHZ_IDLE flag will not be updated with old sched_domain once it has been initialized. But this solution introduces a additionnal latency in the rebuild sequence that is called during cpu hotplug. As suggested by Frederic Weisbecker, another solution is to have the same rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce a new sched_domain_rq struct that is the entry point for both sched_domains and objects that must follow the same lifecycle like NOHZ_IDLE flags. They will share the same RCU lifecycle and will be always synchronized. The synchronization is done at the cost of : - an additional indirection for accessing the first sched_domain level - an additional indirection and a rcu_dereference before accessing to the NOHZ_IDLE flag. Change since v4: - link both sched_domain and NOHZ_IDLE flag in one RCU object so their states are always synchronized. Change since V3; - NOHZ flag is not cleared if a NULL domain is attached to the CPU - Remove patch 2/2 which becomes useless with latest modifications Change since V2: - change the initialization to idle state instead of busy state so a CPU that enters idle during the build of the sched_domain will not corrupt the initialization state Change since V1: - remove the patch for SCHED softirq on an idle core use case as it was a side effect of the other use cases. Signed-off-by: Vincent Guittot --- include/linux/sched.h |6 +++ kernel/sched/core.c | 105 - kernel/sched/fair.c | 35 +++-- kernel/sched/sched.h | 24 +-- 4 files changed, 145 insertions(+), 25 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index d35d2b6..2a52188 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -959,6 +959,12 @@ struct sched_domain { unsigned long span[0]; }; +struct sched_domain_rq { + struct sched_domain *sd; + unsigned long flags; + struct rcu_head rcu;/* used during destruction */ +}; + static inline struct cpumask *sched_domain_span(struct sched_domain *sd) { return to_cpumask(sd->span); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f12624..69e2313 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5602,6 +5602,15 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu) destroy_sched_domain(sd, cpu); } +static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu) +{ + if (!sd_rq) + return; + + destroy_sched_domains(sd_rq->sd, cpu); + kfree_rcu(sd_rq, rcu); +} + /* * Keep a special pointer to the highest sched_domain that has * SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this @@ -5632,10 +5641,23 @@ static void update_top_cache_domain(int cpu) * hold the hotplug lock. */ static void -cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) +cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd, + int cpu) { struct rq *rq = cpu_rq(cpu); - struct sched_domain *tmp; + struct sched_domain_rq *tmp_rq; + struct sched_domain *tmp, *sd = NULL; + + /* +* If we don't have any sched_domain and associated object, we can +* directly jump to the attach sequence otherwise we try to degenerate +* the sched_domain +*/ + if (!sd_rq) + goto attach; + + /* Get a pointer to the 1st sched_domain */ + sd = sd_rq->sd; /* Remove the sched domains which do not contribute to scheduling. */ for (tmp = sd; tmp; ) { @@ -5658,14 +5680,17 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) destroy_sched_domain(tmp, cpu); if (sd) sd->child = NULL; + /* update sched_domain_rq */ + sd_rq->sd = sd; } +attach: sched_domain_debug(sd, cpu); rq_attach_root(rq, rd); - tmp = rq->sd; - rcu_assign_pointer(rq->sd, sd); - destroy_sched_domains(tmp, cpu); + tmp_rq = rq->sd_r
[PATCH v4] sched: fix wrong rq's runnable_avg update with rt tasks
The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 23 +-- kernel/sched/idle_task.c | 10 ++ kernel/sched/sched.h | 12 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0fcdbff..1851ca8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index 66b5220..0775261 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -14,8 +14,17 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) return task_cpu(p); /* IDLE tasks as never migrated */ } +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + /* Update rq's load with elapsed idle time */ + idle_exit(rq); +} + static void post_schedule_idle(struct rq *rq) { + /* Update rq's load with elapsed running time */ + idle_enter(rq); + idle_balance(smp_processor_id(), rq); } #endif /* CONFIG_SMP */ @@ -95,6 +104,7 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, .post_schedule = post_schedule_idle, #endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fc88644..ff4b029 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -878,6 +878,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter(struct rq *this_rq); +extern void idle_exit(struct rq *this_rq); +#else +static inline void idle_enter(struct rq *this_rq
Re: [PATCH Resend v5] sched: fix init NOHZ_IDLE flag
On 4 April 2013 19:07, Frederic Weisbecker wrote: > 2013/4/3 Vincent Guittot : >> On my smp platform which is made of 5 cores in 2 clusters, I have the >> nr_busy_cpu field of sched_group_power struct that is not null when the >> platform is fully idle. The root cause is: >> During the boot sequence, some CPUs reach the idle loop and set their >> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus >> field is initialized later with the assumption that all CPUs are in the busy >> state whereas some CPUs have already set their NOHZ_IDLE flag. >> >> More generally, the NOHZ_IDLE flag must be initialized when new sched_domains >> are created in order to ensure that NOHZ_IDLE and nr_busy_cpus are aligned. >> >> This condition can be ensured by adding a synchronize_rcu between the >> destruction of old sched_domains and the creation of new ones so the >> NOHZ_IDLE >> flag will not be updated with old sched_domain once it has been initialized. >> But this solution introduces a additionnal latency in the rebuild sequence >> that is called during cpu hotplug. >> >> As suggested by Frederic Weisbecker, another solution is to have the same >> rcu lifecycle for both NOHZ_IDLE and sched_domain struct. I have introduce >> a new sched_domain_rq struct that is the entry point for both sched_domains >> and objects that must follow the same lifecycle like NOHZ_IDLE flags. They >> will share the same RCU lifecycle and will be always synchronized. >> >> The synchronization is done at the cost of : >> - an additional indirection for accessing the first sched_domain level >> - an additional indirection and a rcu_dereference before accessing to the >>NOHZ_IDLE flag. >> >> Change since v4: >> - link both sched_domain and NOHZ_IDLE flag in one RCU object so >>their states are always synchronized. >> >> Change since V3; >> - NOHZ flag is not cleared if a NULL domain is attached to the CPU >> - Remove patch 2/2 which becomes useless with latest modifications >> >> Change since V2: >> - change the initialization to idle state instead of busy state so a CPU >> that >>enters idle during the build of the sched_domain will not corrupt the >>initialization state >> >> Change since V1: >> - remove the patch for SCHED softirq on an idle core use case as it was >>a side effect of the other use cases. >> >> Signed-off-by: Vincent Guittot >> --- >> include/linux/sched.h |6 +++ >> kernel/sched/core.c | 105 >> - >> kernel/sched/fair.c | 35 +++-- >> kernel/sched/sched.h | 24 +-- >> 4 files changed, 145 insertions(+), 25 deletions(-) >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index d35d2b6..2a52188 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -959,6 +959,12 @@ struct sched_domain { >> unsigned long span[0]; >> }; >> >> +struct sched_domain_rq { >> + struct sched_domain *sd; >> + unsigned long flags; >> + struct rcu_head rcu;/* used during destruction */ >> +}; >> + >> static inline struct cpumask *sched_domain_span(struct sched_domain *sd) >> { >> return to_cpumask(sd->span); >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 7f12624..69e2313 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -5602,6 +5602,15 @@ static void destroy_sched_domains(struct sched_domain >> *sd, int cpu) >> destroy_sched_domain(sd, cpu); >> } >> >> +static void destroy_sched_domain_rq(struct sched_domain_rq *sd_rq, int cpu) >> +{ >> + if (!sd_rq) >> + return; >> + >> + destroy_sched_domains(sd_rq->sd, cpu); >> + kfree_rcu(sd_rq, rcu); >> +} >> + >> /* >> * Keep a special pointer to the highest sched_domain that has >> * SD_SHARE_PKG_RESOURCE set (Last Level Cache Domain) for this >> @@ -5632,10 +5641,23 @@ static void update_top_cache_domain(int cpu) >> * hold the hotplug lock. >> */ >> static void >> -cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) >> +cpu_attach_domain(struct sched_domain_rq *sd_rq, struct root_domain *rd, >> + int cpu) >> { >> struct rq *rq = cpu_rq(cpu); >> - struct sched_domain *tmp; >> + struct sched_domain_rq *tmp_rq; >&
Re: [RFC PATCH v3 5/6] sched: pack the idle load balance
Peter, After some toughts about your comments,I can update the buddy cpu during ILB or periofdic LB to a new idle core and extend the packing mechanism Does this additional mechanism sound better for you ? Vincent On 26 March 2013 15:42, Peter Zijlstra wrote: > On Tue, 2013-03-26 at 15:03 +0100, Vincent Guittot wrote: >> > But ha! here's your NO_HZ link.. but does the above DTRT and ensure >> > that the ILB is a little core when possible? >> >> The loop looks for an idle CPU as close as possible to the buddy CPU >> and the buddy CPU is the 1st CPU has been chosen. So if your buddy is >> a little and there is an idle little, the ILB will be this idle >> little. > > Earlier you wrote: > >> | Cluster 0 | Cluster 1 | >> | CPU0 | CPU1 | CPU2 | CPU3 | >> --- >> buddy | CPU0 | CPU0 | CPU0 | CPU2 | > > So extrapolating that to a 4+4 big-little you'd get something like: > > | little A9 || big A15 | > | 0 | 1 | 2 | 3 || 4 | 5 | 6 | 7 | > --+---+---+---+---++---+---+---+---+ > buddy | 0 | 0 | 0 | 0 || 0 | 4 | 4 | 4 | > > Right? > > So supposing the current ILB is 6, we'll only check 4, not 0-3, even > though there might be a perfectly idle cpu in there. > > Also, your scheme fails to pack when cpus 0,4 are filled, even when > there's idle cores around. > > If we'd use the ILB as packing cpu, we would simply select a next pack > target once the old one fills up. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/2] sched: fix init NOHZ_IDLE flag
On my smp platform which is made of 5 cores in 2 clusters,I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle. The root cause seems to be: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to have a coherent configuration. Signed-off-by: Vincent Guittot --- kernel/sched/core.c |1 + 1 file changed, 1 insertion(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 257002c..fd41924 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5884,6 +5884,7 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd) update_group_power(sd, cpu); atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); } int __weak arch_sd_sibling_asym_packing(void) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/2] sched: fix update NOHZ_IDLE flag
The function nohz_kick_needed modifies NOHZ_IDLE flag that is used to update the nr_busy_cpus of the sched_group. When the sched_domain are updated (during the boot or because of the unplug of a CPUs as an example) a null_domain is attached to CPUs. We have to test likely(!on_null_domain(cpu) first in order to detect such intialization step and to not modify the NOHZ_IDLE flag Signed-off-by: Vincent Guittot --- kernel/sched/fair.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5eea870..dac2edf 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5695,7 +5695,7 @@ void trigger_load_balance(struct rq *rq, int cpu) likely(!on_null_domain(cpu))) raise_softirq(SCHED_SOFTIRQ); #ifdef CONFIG_NO_HZ - if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) + if (likely(!on_null_domain(cpu)) && nohz_kick_needed(rq, cpu)) nohz_balancer_kick(cpu); #endif } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/2] sched: fix nr_busy_cpus
The nr_busy_cpus field of the sched_group_power is sometime different from 0 whereas the platform is fully idle. This serie fixes 3 use cases: - when some CPUs enter idle state while booting all CPUs - when a CPU is unplug and/or replug Change since V1: - remove the patch for SCHED softirq on an idle core use case as it was a side effect of the other use cases. Vincent Guittot (2): sched: fix init NOHZ_IDLE flag sched: fix update NOHZ_IDLE flag kernel/sched/core.c |1 + kernel/sched/fair.c |2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] hwmon: (lm90) Add device tree support
Add support to instantiate LM90-compatible sensors from a device-tree configuration. When the kernel has device tree support, we avoid doing the auto-detection as probing the busses might mess-up sensitive I2C devices or trigger long timeouts on non-functional busses. Signed-off-by: Vincent Palatin --- .../devicetree/bindings/i2c/trivial-devices.txt| 19 + .../devicetree/bindings/vendor-prefixes.txt| 1 + drivers/hwmon/lm90.c | 47 +- 3 files changed, 65 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt b/Documentation/devicetree/bindings/i2c/trivial-devices.txt index 446859f..4d991ca 100644 --- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt +++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt @@ -10,6 +10,7 @@ document for it just like any other devices. Compatible Vendor / Chip == = ad,ad7414 SMBus/I2C Digital Temperature Sensor in 6-Pin SOT with SMBus Alert and Over Temperature Pin +ad,adm1032 +/-1C Remote and local system temperature monitor ad,adm9240 ADM9240: Complete System Hardware Monitor for uProcessor-Based Systems adi,adt7461+/-1C TDM Extended Temp Range I.C adt7461+/-1C TDM Extended Temp Range I.C @@ -35,16 +36,33 @@ fsl,mc13892 MC13892: Power Management Integrated Circuit (PMIC) for i.MX35/51 fsl,mma8450MMA8450Q: Xtrinsic Low-power, 3-axis Xtrinsic Accelerometer fsl,mpr121 MPR121: Proximity Capacitive Touch Sensor Controller fsl,sgtl5000 SGTL5000: Ultra Low-Power Audio Codec +gmt,g781 +/-1C Remote and local temperature sensor maxim,ds1050 5 Bit Programmable, Pulse-Width Modulator maxim,max1237 Low-Power, 4-/12-Channel, 2-Wire Serial, 12-Bit ADCs maxim,max6625 9-Bit/12-Bit Temperature Sensors with I²C-Compatible Serial Interface +maxim,max6646 +145C Precision SMBus-Compatible Remote/Local Sensors +maxim,max6647 +145C Precision SMBus-Compatible Remote/Local Sensors +maxim,max6649 +145C Precision SMBus-Compatible Remote/Local Sensors +maxim,max6657 +/-1C SMBus-Compatible Remote/Local Sensors +maxim,max6658 +/-1C SMBus-Compatible Remote/Local Sensors +maxim,max6659 +/-1C SMBus-Compatible Remote/Local Sensors +maxim,max6680 +/-1C Fail-Safe Remote/Local Temperature Sensors +maxim,max6681 +/-1C Fail-Safe Remote/Local Temperature Sensors +maxim,max6695 Dual Remote/Local Temperature Sensors +maxim,max6696 Dual Remote/Local Temperature Sensors mc,rv3029c2Real Time Clock Module with I2C-Bus national,lm75 I2C TEMP SENSOR national,lm80 Serial Interface ACPI-Compatible Microprocessor System Hardware Monitor +national,lm86 +/-0.75C Accurate, Remote Diode and Local Digital Temperature Sensor with Two-Wire Interface +national,lm89 +/-0.75C Remote and Local Digital Temperature Sensor with Two-Wire Interface-Wire Interface +national,lm90 +/-3C Accurate, Remote Diode and Local Digital Temperature Sensor with Two-Wire Interface national,lm92 ±0.33°C Accurate, 12-Bit + Sign Temperature Sensor and Thermal Window Comparator with Two-Wire Interface +national,lm99 +/-1C Accurate, Remote Diode and Local Digital Temperature Sensor with Two-Wire Interface nxp,pca9556Octal SMBus and I2C registered interface nxp,pca95578-bit I2C-bus and SMBus I/O port with reset nxp,pcf8563Real-time clock/calendar +nxp,sa56004remote/local digital temperature sensor with overtemperature alarms +onnn,nct1008 +/-1C Temperature Monitor with Series Resistance Cancellation ovti,ov5642OV5642: Color CMOS QSXGA (5-megapixel) Image Sensor with OmniBSI and Embedded TrueFocus pericom,pt7c4338 Real-time Clock Module plx,pex864848-Lane, 12-Port PCI Express Gen 2 (5.0 GT/s) Switch @@ -59,3 +77,4 @@ taos,tsl2550 Ambient Light Sensor with SMBUS/Two Wire Serial Interface ti,tsc2003 I2C Touch-Screen Controller ti,tmp102 Low Power Digital Temperature Sensor with SMBUS/Two Wire Serial Interface ti,tmp275 Digital Temperature Sensor +winbond,w83l771H/W Monitor IC diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt index 902b1b1..2074699 100644 --- a/Documentation/devicetree/bindings/vendor-prefixes.txt +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt @@ -23,6 +23,7 @@ est ESTeem Wireless Modems fslFreescale Semiconductor GEFanucGE Fanuc Intelligent Platforms Embedded Systems, Inc. gefGE Fanuc Intelligent Platforms Embedded Systems, Inc. +gmtGlobal Mixed-mode
Re: [PATCH] regmap: debugfs: Fix compilation warning
On 01/23/2013 04:58 PM, Mark Brown wrote: > On Tue, Jan 22, 2013 at 11:07:04AM +0100, Vincent Stehlé wrote: > >> Do you think there is a way to "mark" the list_for_each_entry() >> as iterating at least once? an __attribute__ maybe? > > No - but are you sure that's true? If you mean "am I sure the loop iterates at least once", then yes, as we have an explicit check just before the concerned list_for_each_entry(): /* * This should never happen; we return above if we fail to * allocate and we should never be in this code if there are * no registers at all. */ if (list_empty(&map->debugfs_off_cache)) { WARN_ON(list_empty(&map->debugfs_off_cache)); return base; } Best regards, V. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle
On 24 January 2013 17:44, Frederic Weisbecker wrote: > 2012/12/3 Vincent Guittot : >> With the coupled cpuidle driver (but probably also with other drivers), >> a CPU loops in a temporary safe state while waiting for other CPUs of its >> cluster to be ready to enter the coupled C-state. If an IRQ or a softirq >> occurs, the CPU will stay in this internal loop if there is no need >> to resched. The SCHED softirq clears the NOHZ and increases >> nr_busy_cpus. If there is no need to resched, we will not call >> set_cpu_sd_state_idle because of this internal loop in a cpuidle state. >> We have to call set_cpu_sd_state_idle in tick_nohz_irq_exit which is used >> to handle such situation. > > I'm a bit confused with this. > > set_cpu_sd_state_busy() is only called from nohz_kick_needed(). And it > checks idle_cpu() before doing anything. So if no task is going to be > scheduled, idle_cpu() prevents from calling set_cpu_sd_state_busy(). > > I'm probably missing something. Hi Frederic I can't find back the trace that i had saved with the issue but IIRC the sequence is: The CPU is kicked for ILB The wake_list of the CPU becomes not empty so cpu id not idle CPU wakes up, updates is timer framework and call nohz_kick_needed the execute the ILB sequence we don't go out of the cpuidle driver function because we don't need to resched so we don't clear the busy state I'm going to look for the saved trace to check the sequence above Vincent > > Thanks. > >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/time/tick-sched.c |2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c >> index 955d35b..b8d74ea 100644 >> --- a/kernel/time/tick-sched.c >> +++ b/kernel/time/tick-sched.c >> @@ -570,6 +570,8 @@ void tick_nohz_irq_exit(void) >> if (!ts->inidle) >> return; >> >> + set_cpu_sd_state_idle(); >> + >> /* Cancel the timer because CPU already waken up from the C-states*/ >> menu_hrtimer_cancel(); >> __tick_nohz_idle_enter(ts); >> -- >> 1.7.9.5 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] sched: fix init NOHZ_IDLE flag
On 26 February 2013 14:16, Frederic Weisbecker wrote: > 2013/2/22 Vincent Guittot : >> I wanted to avoid having to use the sd pointer for testing NOHZ_IDLE >> flag because it occurs each time we go into idle but it seems to be >> not easily feasible. >> Another solution could be to add a synchronization step between >> rcu_assign_pointer(dom 1, NULL) and create new domain to ensure that >> all pending access to old sd values, has finished but this will imply >> a potential delay in the rebuild of sched_domain and i'm not sure >> that it's acceptable > > The other issue is that we'll need to abuse the fact that struct > sched_domain is per cpu in order to store a per cpu state there. > That's a bit ugly but at least safer. > > Also, are struct sched_group and struct sched_group_power shared among > several CPUs or are they per CPUs allocated as well? I guess they > aren't otherwise nr_cpus_busy would be pointless. Yes they are shared between CPUs, per cpu sched_domain points to same sched_group and sched_group_power. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] sched: fix init NOHZ_IDLE flag
On 26 February 2013 18:43, Frederic Weisbecker wrote: > 2013/2/26 Vincent Guittot : >> On 26 February 2013 14:16, Frederic Weisbecker wrote: >>> 2013/2/22 Vincent Guittot : >>>> I wanted to avoid having to use the sd pointer for testing NOHZ_IDLE >>>> flag because it occurs each time we go into idle but it seems to be >>>> not easily feasible. >>>> Another solution could be to add a synchronization step between >>>> rcu_assign_pointer(dom 1, NULL) and create new domain to ensure that >>>> all pending access to old sd values, has finished but this will imply >>>> a potential delay in the rebuild of sched_domain and i'm not sure >>>> that it's acceptable > > Ah I see what you meant there. Making a synchronize_rcu() after > setting the dom to NULL, on top of which we could work on preventing > from any concurrent nohz_flag modification. But cpu hotplug seem to > become a bit of a performance sensitive path this day. That's was also my concern > > Ok I don't like having a per cpu state in struct sched domain but for > now I can't find anything better. So my suggestion is that we do this > and describe well the race, define the issue in the changelog and code > comments and explain how we are solving it. This way at least the > issue is identified and known. Then later, on review or after the > patch is upstream, if somebody with some good taste comes with a > better idea, we consider it. > > What do you think? I don't have better solution than adding this state in the sched_domain if we want to keep the exact same behavior. This will be a bit of waste of mem because we don't need to update all sched_domain level (1st level is enough). Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] sched: fix init NOHZ_IDLE flag
On 27 February 2013 17:13, Frederic Weisbecker wrote: > On Wed, Feb 27, 2013 at 09:28:26AM +0100, Vincent Guittot wrote: >> > Ok I don't like having a per cpu state in struct sched domain but for >> > now I can't find anything better. So my suggestion is that we do this >> > and describe well the race, define the issue in the changelog and code >> > comments and explain how we are solving it. This way at least the >> > issue is identified and known. Then later, on review or after the >> > patch is upstream, if somebody with some good taste comes with a >> > better idea, we consider it. >> > >> > What do you think? >> >> I don't have better solution than adding this state in the >> sched_domain if we want to keep the exact same behavior. This will be >> a bit of waste of mem because we don't need to update all sched_domain >> level (1st level is enough). > > Or you can try something like the below. Both flags and sched_domain share > the same > object here so the same RCU lifecycle. And there shouldn't be more overhead > there > since accessing rq->sd_rq.sd is the same than rq->sd_rq in the ASM level: only > one pointer to dereference. your proposal solves the waste of memory and keeps the sync between flag and nr_busy. I'm going to try it Thanks > > Also rq_idle becomes a separate value from rq->nohz_flags. It's a simple > boolean > (just making it an int here because boolean size are a bit opaque, although > they > are supposed to be char, let's just avoid surprises in structures). > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index cc03cfd..16c0d55 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -417,7 +417,10 @@ struct rq { > > #ifdef CONFIG_SMP > struct root_domain *rd; > - struct sched_domain *sd; > + struct sched_domain_rq { > + struct sched_domain sd; > + int rq_idle; > + } __rcu *sd_rq; > > unsigned long cpu_power; > > @@ -505,9 +508,14 @@ DECLARE_PER_CPU(struct rq, runqueues); > > #ifdef CONFIG_SMP > > -#define rcu_dereference_check_sched_domain(p) \ > - rcu_dereference_check((p), \ > - lockdep_is_held(&sched_domains_mutex)) > +#define rcu_dereference_check_sched_domain(p) ({\ > + struct sched_domain_rq *__sd_rq = rcu_dereference_check((p),\ > + lockdep_is_held(&sched_domains_mutex)); \ > + if (!__sd_rq) \ > + NULL; \ > + else\ > + &__sd_rq->sd; \ > +}) > > /* > * The domain tree (rq->sd) is protected by RCU's quiescent state transition. > @@ -517,7 +525,7 @@ DECLARE_PER_CPU(struct rq, runqueues); > * preempt-disabled sections. > */ > #define for_each_domain(cpu, __sd) \ > - for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \ > + for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd_rq); \ > __sd; __sd = __sd->parent) > > #define for_each_lower_domain(sd) for (; sd; sd = sd->child) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag
On 18 February 2013 16:40, Frederic Weisbecker wrote: > 2013/2/18 Vincent Guittot : >> On 18 February 2013 15:38, Frederic Weisbecker wrote: >>> I pasted the original at: http://pastebin.com/DMm5U8J8 >> >> We can clear the idle flag only in the nohz_kick_needed which will not >> be called if the sched_domain is NULL so the sequence will be >> >> = CPU 0 == CPU 1= >> >> detach_and_destroy_domain { >> rcu_assign_pointer(cpu1_dom, NULL); >> } >> >> dom = new_domain(...) { >> nr_cpus_busy = 0; >> set_idle(CPU 1); >> } >> dom = >> rcu_dereference(cpu1_dom) >> //dom == NULL, return >> >> rcu_assign_pointer(cpu1_dom, dom); >> >> dom = >> rcu_dereference(cpu1_dom) >> //dom != NULL, >> nohz_kick_needed { >> >> set_idle(CPU 1) >> dom >> = rcu_dereference(cpu1_dom) >> >> //dec nr_cpus_busy, >> } >> >> Vincent > > Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is > already in the middle of nohz_kick_needed(). Yes nothing prevents the sequence below to occur = CPU 0 == CPU 1= dom = rcu_dereference(cpu1_dom) //dom != NULL detach_and_destroy_domain { rcu_assign_pointer(cpu1_dom, NULL); } dom = new_domain(...) { nr_cpus_busy = 0; //nr_cpus_busy in the new_dom set_idle(CPU 1); } nohz_kick_needed { clear_idle(CPU 1) dom = rcu_dereference(cpu1_dom) //cpu1_dom == old_dom inc nr_cpus_busy, //nr_cpus_busy in the old_dom } rcu_assign_pointer(cpu1_dom, dom); //cpu1_dom == new_dom I'm not sure that this can happen in practice because CPU1 is in interrupt handler but we don't have any mechanism to prevent the sequence. The NULL sched_domain can be used to detect this situation and the set_cpu_sd_state_busy function can be modified like below inline void set_cpu_sd_state_busy { struct sched_domain *sd; int cpu = smp_processor_id(); + int clear = 0; if (!test_bit(NOHZ_IDLE, nohz_flags(cpu))) return; - clear_bit(NOHZ_IDLE, nohz_flags(cpu)); rcu_read_lock(); for_each_domain(cpu, sd) { atomic_inc(&sd->groups->sgp->nr_busy_cpus); + clear = 1; } rcu_read_unlock(); + + if (likely(clear)) + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); } The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain attached to the CPU. With this implementation, we still don't need to get the sched_domain for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle The patch 2 become useless Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 00/45] CPU hotplug: stop_machine()-free CPU hotplug
On 18 February 2013 20:53, Steven Rostedt wrote: > On Mon, 2013-02-18 at 17:50 +0100, Vincent Guittot wrote: > >> yes for sure. >> The problem is more linked to cpuidle and function tracer. >> >> cpu hotplug and function tracer work when cpuidle is disable. >> cpu hotplug and cpuidle works if i don't enable function tracer. >> my platform is dead as soon as I enable function tracer if cpuidle is >> enabled. It looks like some notrace are missing in my platform driver >> but we haven't completely fix the issue yet >> > > You can bisect to find out exactly what function is the problem: > > cat /debug/tracing/available_filter_functions > t > > f(t) { > num=`wc -l t` > sed -ne "1,${num}p" t > t1 > let num=num+1 > sed -ne "${num},$p" t > t2 > > cat t1 > /debug/tracing/set_ftrace_filter > # note this may take a long time to finish > > echo function > /debug/tracing/current_tracer > > > } > Thanks, i'm going to have a look Vincent > -- Steve > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] sched: fix init NOHZ_IDLE flag
On 19 February 2013 11:29, Vincent Guittot wrote: > On 18 February 2013 16:40, Frederic Weisbecker wrote: >> 2013/2/18 Vincent Guittot : >>> On 18 February 2013 15:38, Frederic Weisbecker wrote: >>>> I pasted the original at: http://pastebin.com/DMm5U8J8 >>> >>> We can clear the idle flag only in the nohz_kick_needed which will not >>> be called if the sched_domain is NULL so the sequence will be >>> >>> = CPU 0 == CPU 1= >>> >>> detach_and_destroy_domain { >>> rcu_assign_pointer(cpu1_dom, NULL); >>> } >>> >>> dom = new_domain(...) { >>> nr_cpus_busy = 0; >>> set_idle(CPU 1); >>> } >>> dom = >>> rcu_dereference(cpu1_dom) >>> //dom == NULL, return >>> >>> rcu_assign_pointer(cpu1_dom, dom); >>> >>> dom = >>> rcu_dereference(cpu1_dom) >>> //dom != NULL, >>> nohz_kick_needed { >>> >>> set_idle(CPU 1) >>>dom >>> = rcu_dereference(cpu1_dom) >>> >>> //dec nr_cpus_busy, >>> } >>> >>> Vincent >> >> Ok but CPU 0 can assign NULL to the domain of cpu1 while CPU 1 is >> already in the middle of nohz_kick_needed(). > > Yes nothing prevents the sequence below to occur > > = CPU 0 == CPU 1= > dom = > rcu_dereference(cpu1_dom) > //dom != NULL > detach_and_destroy_domain { > rcu_assign_pointer(cpu1_dom, NULL); > } > > dom = new_domain(...) { > nr_cpus_busy = 0; > //nr_cpus_busy in the new_dom > set_idle(CPU 1); > } > nohz_kick_needed { > clear_idle(CPU 1) > dom = > rcu_dereference(cpu1_dom) > > //cpu1_dom == old_dom > inc nr_cpus_busy, > > //nr_cpus_busy in the old_dom > } > > rcu_assign_pointer(cpu1_dom, dom); > //cpu1_dom == new_dom The sequence above is not correct in addition to become unreadable after going through gmail The correct and readable version https://pastebin.linaro.org/1750/ Vincent > > I'm not sure that this can happen in practice because CPU1 is in > interrupt handler but we don't have any mechanism to prevent the > sequence. > > The NULL sched_domain can be used to detect this situation and the > set_cpu_sd_state_busy function can be modified like below > > inline void set_cpu_sd_state_busy > { > struct sched_domain *sd; > int cpu = smp_processor_id(); > + int clear = 0; > > if (!test_bit(NOHZ_IDLE, nohz_flags(cpu))) > return; > - clear_bit(NOHZ_IDLE, nohz_flags(cpu)); > > rcu_read_lock(); > for_each_domain(cpu, sd) { > atomic_inc(&sd->groups->sgp->nr_busy_cpus); > + clear = 1; > } > rcu_read_unlock(); > + > + if (likely(clear)) > + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); > } > > The NOHZ_IDLE flag will not be clear if we have a NULL sched_domain > attached to the CPU. > With this implementation, we still don't need to get the sched_domain > for testing the NOHZ_IDLE flag which occurs each time CPU becomes idle > > The patch 2 become useless > > Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v4] sched: fix init NOHZ_IDLE flag
On my smp platform which is made of 5 cores in 2 clusters, I have the nr_busy_cpu field of sched_group_power struct that is not null when the platform is fully idle. The root cause seems to be: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. During the initialization of the sched_domain, we set the NOHZ_IDLE flag when nr_busy_cpus is initialized to 0 in order to have a coherent configuration. If a CPU enters idle and call set_cpu_sd_state_idle during the build of the new sched_domain it will not corrupt the initial state set_cpu_sd_state_busy is modified and clears the NOHZ_IDLE only if a non NULL sched_domain is attached to the CPU (which is the case during the rebuild) Change since V3; - NOHZ flag is not cleared if a NULL domain is attached to the CPU - Remove patch 2/2 which becomes useless with latest modifications Change since V2: - change the initialization to idle state instead of busy state so a CPU that enters idle during the build of the sched_domain will not corrupt the initialization state Change since V1: - remove the patch for SCHED softirq on an idle core use case as it was a side effect of the other use cases. Signed-off-by: Vincent Guittot --- kernel/sched/core.c |4 +++- kernel/sched/fair.c |9 +++-- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 26058d0..c730a4e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5884,7 +5884,9 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd) return; update_group_power(sd, cpu); - atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); + atomic_set(&sg->sgp->nr_busy_cpus, 0); + set_bit(NOHZ_IDLE, nohz_flags(cpu)); + } int __weak arch_sd_sibling_asym_packing(void) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..2701a92 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5403,15 +5403,20 @@ static inline void set_cpu_sd_state_busy(void) { struct sched_domain *sd; int cpu = smp_processor_id(); + int clear = 0; if (!test_bit(NOHZ_IDLE, nohz_flags(cpu))) return; - clear_bit(NOHZ_IDLE, nohz_flags(cpu)); rcu_read_lock(); - for_each_domain(cpu, sd) + for_each_domain(cpu, sd) { atomic_inc(&sd->groups->sgp->nr_busy_cpus); + clear = 1; + } rcu_read_unlock(); + + if (likely(clear)) + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); } void set_cpu_sd_state_idle(void) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] sched: fix init NOHZ_IDLE flag
On 22 February 2013 13:32, Frederic Weisbecker wrote: > On Thu, Feb 21, 2013 at 09:29:16AM +0100, Vincent Guittot wrote: >> On my smp platform which is made of 5 cores in 2 clusters, I have the >> nr_busy_cpu field of sched_group_power struct that is not null when the >> platform is fully idle. The root cause seems to be: >> During the boot sequence, some CPUs reach the idle loop and set their >> NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus >> field is initialized later with the assumption that all CPUs are in the busy >> state whereas some CPUs have already set their NOHZ_IDLE flag. >> During the initialization of the sched_domain, we set the NOHZ_IDLE flag when >> nr_busy_cpus is initialized to 0 in order to have a coherent configuration. >> If a CPU enters idle and call set_cpu_sd_state_idle during the build of the >> new sched_domain it will not corrupt the initial state >> set_cpu_sd_state_busy is modified and clears the NOHZ_IDLE only if a non NULL >> sched_domain is attached to the CPU (which is the case during the rebuild) >> >> Change since V3; >> - NOHZ flag is not cleared if a NULL domain is attached to the CPU >> - Remove patch 2/2 which becomes useless with latest modifications >> >> Change since V2: >> - change the initialization to idle state instead of busy state so a CPU >> that >>enters idle during the build of the sched_domain will not corrupt the >>initialization state >> >> Change since V1: >> - remove the patch for SCHED softirq on an idle core use case as it was >>a side effect of the other use cases. >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/core.c |4 +++- >> kernel/sched/fair.c |9 +++-- >> 2 files changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 26058d0..c730a4e 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -5884,7 +5884,9 @@ static void init_sched_groups_power(int cpu, struct >> sched_domain *sd) >> return; >> >> update_group_power(sd, cpu); >> - atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); >> + atomic_set(&sg->sgp->nr_busy_cpus, 0); >> + set_bit(NOHZ_IDLE, nohz_flags(cpu)); >> + >> } >> >> int __weak arch_sd_sibling_asym_packing(void) >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 81fa536..2701a92 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -5403,15 +5403,20 @@ static inline void set_cpu_sd_state_busy(void) >> { >> struct sched_domain *sd; >> int cpu = smp_processor_id(); >> + int clear = 0; >> >> if (!test_bit(NOHZ_IDLE, nohz_flags(cpu))) >> return; >> - clear_bit(NOHZ_IDLE, nohz_flags(cpu)); >> >> rcu_read_lock(); >> - for_each_domain(cpu, sd) >> + for_each_domain(cpu, sd) { >> atomic_inc(&sd->groups->sgp->nr_busy_cpus); >> + clear = 1; >> + } >> rcu_read_unlock(); >> + >> + if (likely(clear)) >> + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); > > I fear there is still a race window: > > = CPU 0 = = CPU 1 = > // NOHZ_IDLE is set > set_cpu_sd_state_busy() { > dom1 = rcu_dereference(dom1); > inc(dom1->nr_busy_cpus) > > rcu_assign_pointer(dom 1, NULL) > // create new domain > init_sched_group_power() { > atomic_set(&tmp->nr_busy_cpus, 0); > set_bit(NOHZ_IDLE, nohz_flags(cpu 1)); > rcu_assign_pointer(dom 1, tmp) > > > > clear_bit(NOHZ_IDLE, nohz_flags(cpu)); > } > > > I don't know if there is any sane way to deal with this issue other than > having nr_busy_cpus and nohz_flags in the same object sharing the same > lifecycle. I wanted to avoid having to use the sd pointer for testing NOHZ_IDLE flag because it occurs each time we go into idle but it seems to be not easily feasible. Another solution could be to add a synchronization step between rcu_assign_pointer(dom 1, NULL) and create new domain to ensure that all pending access to old sd values, has finished but this will imply a potential delay in the rebuild of sched_domain and i'm not sure that it's acceptable Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] topology: removed kzalloc return value cast
On 10 March 2013 21:35, Mihai Stirbat wrote: > Signed-off-by: Mihai Stirbat > --- > arch/arm/kernel/topology.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c > index 79282eb..f10316b 100644 > --- a/arch/arm/kernel/topology.c > +++ b/arch/arm/kernel/topology.c > @@ -100,7 +100,7 @@ static void __init parse_dt_topology(void) > int alloc_size, cpu = 0; > > alloc_size = nr_cpu_ids * sizeof(struct cpu_capacity); > - cpu_capacity = (struct cpu_capacity *)kzalloc(alloc_size, GFP_NOWAIT); > + cpu_capacity = kzalloc(alloc_size, GFP_NOWAIT); you're right Acked-by: Vincent Guittot > > while ((cn = of_find_node_by_type(cn, "cpu"))) { > const u32 *rate, *reg; > -- > 1.7.10.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] regulator: disable supply regulator if it is enabled for boot-on
2012/8/28 Laxman Dewangan : > I tried to reproduce the lockup issue with the following change but not > seeing any lockup issue. Did you enable CONFIG_PROVE_LOCKING? > Also reviewing the change, I am not seeing any call trace where the > recursive locking happening. There's probably no actual recursive locking, but the lockdep warning itself is a problem which must be eliminated. You could perhaps do this by doing the regulator_disable(rdev->supply); after you mutex unlock the rdev->mutex. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] regulator: disable supply regulator if it is enabled for boot-on
2012/8/29 Laxman Dewangan : > @@ -3614,8 +3615,11 @@ static int __init regulator_init_complete(void) > > mutex_lock(&rdev->mutex); > > - if (rdev->use_count) > + if (rdev->use_count) { > + if (rdev->supply && c->boot_on) > + supply_disable = true; > goto unlock; > + } > > /* If we can't read the status assume it's on. */ > if (ops->is_enabled) > @@ -3634,6 +3638,8 @@ static int __init regulator_init_complete(void) > if (ret != 0) { > rdev_err(rdev, "couldn't disable: %d\n", ret); > } > + if (rdev->supply) > + supply_disable = true; > } else { > /* The intention is that in future we will > * assume that full constraints are provided This does not handle the case where a regulator is not set boot_on but is considered on (for example, because of the lack of an is_enabled callback), and is later actually enabled by a consumer before regulator_init_complete(). In this case, the supply's use count will still be one more than it should be, because the "&& c->boot_on" condition above will fail. To fix this, you should probably note which regulators' supplies you enable in regulator_register() and use that information in the above two checks here in regulator_init_complete(). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] sched: nohz_idle_balance
On tickless system, one CPU runs load balance for all idle CPUs. The cpu_load of this CPU is updated before starting the load balance of each other idle CPUs. We should instead update the cpu_load of the balance_cpu. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1ca4fe4..9ae3a5b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4794,14 +4794,15 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle) if (need_resched()) break; - raw_spin_lock_irq(&this_rq->lock); - update_rq_clock(this_rq); - update_idle_cpu_load(this_rq); - raw_spin_unlock_irq(&this_rq->lock); + rq = cpu_rq(balance_cpu); + + raw_spin_lock_irq(&rq->lock); + update_rq_clock(rq); + update_idle_cpu_load(rq); + raw_spin_unlock_irq(&rq->lock); rebalance_domains(balance_cpu, CPU_IDLE); - rq = cpu_rq(balance_cpu); if (time_after(this_rq->next_balance, rq->next_balance)) this_rq->next_balance = rq->next_balance; } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] sched: nohz_idle_balance
Wrong button make me removed others guys from the thread. Sorry for this mistake. On 13 September 2012 09:56, Mike Galbraith wrote: > On Thu, 2012-09-13 at 09:44 +0200, Vincent Guittot wrote: >> On 13 September 2012 09:29, Mike Galbraith wrote: >> > On Thu, 2012-09-13 at 08:59 +0200, Vincent Guittot wrote: >> >> On 13 September 2012 08:49, Mike Galbraith wrote: >> >> > On Thu, 2012-09-13 at 06:11 +0200, Vincent Guittot wrote: >> >> >> On tickless system, one CPU runs load balance for all idle CPUs. >> >> >> The cpu_load of this CPU is updated before starting the load balance >> >> >> of each other idle CPUs. We should instead update the cpu_load of the >> >> >> balance_cpu. >> >> >> >> >> >> Signed-off-by: Vincent Guittot >> >> >> --- >> >> >> kernel/sched/fair.c | 11 ++- >> >> >> 1 file changed, 6 insertions(+), 5 deletions(-) >> >> >> >> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> >> >> index 1ca4fe4..9ae3a5b 100644 >> >> >> --- a/kernel/sched/fair.c >> >> >> +++ b/kernel/sched/fair.c >> >> >> @@ -4794,14 +4794,15 @@ static void nohz_idle_balance(int this_cpu, >> >> >> enum cpu_idle_type idle) >> >> >> if (need_resched()) >> >> >> break; >> >> >> >> >> >> - raw_spin_lock_irq(&this_rq->lock); >> >> >> - update_rq_clock(this_rq); >> >> >> - update_idle_cpu_load(this_rq); >> >> >> - raw_spin_unlock_irq(&this_rq->lock); >> >> >> + rq = cpu_rq(balance_cpu); >> >> >> + >> >> >> + raw_spin_lock_irq(&rq->lock); >> >> >> + update_rq_clock(rq); >> >> >> + update_idle_cpu_load(rq); >> >> >> + raw_spin_unlock_irq(&rq->lock); >> >> >> >> >> >> rebalance_domains(balance_cpu, CPU_IDLE); >> >> >> >> >> >> - rq = cpu_rq(balance_cpu); >> >> >> if (time_after(this_rq->next_balance, rq->next_balance)) >> >> >> this_rq->next_balance = rq->next_balance; >> >> >> } >> >> > >> >> > Ew, banging locks and updating clocks to what good end? >> >> >> >> The goal is to update the cpu_load table of the CPU before starting >> >> the load balance. Other wise we will use outdated value in the load >> >> balance sequence >> > >> > If there's load to distribute, seems it should all work out fine without >> > doing that. What harm is being done that makes this worth while? >> >> this_load and avg_load can be wrong and make an idle CPU set as >> balanced compared to the busy one > > I think you need to present numbers showing benefit. Crawling all over > a mostly idle (4096p?) box is decidedly bad thing to do. Yep, let me prepare some figures You should also notice that you are already crawling all over the idle processor in rebalance_domains Vincent > > -Mike > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 0/5] ARM: topology: set the capacity of each cores for big.LITTLE
On 10 July 2012 15:42, Peter Zijlstra wrote: > On Tue, 2012-07-10 at 14:35 +0200, Vincent Guittot wrote: >> >> May be the last one which enable ARCH_POWER should also go into tip ? >> > OK, I can take it. Hi Peter, I can't find the patch that enable ARCH_POWER in the tip tree. Have you take it in your tree ? Regards, Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 0/5] ARM: topology: set the capacity of each cores for big.LITTLE
On 13 September 2012 14:07, Peter Zijlstra wrote: > On Thu, 2012-09-13 at 11:17 +0200, Vincent Guittot wrote: >> On 10 July 2012 15:42, Peter Zijlstra wrote: >> > On Tue, 2012-07-10 at 14:35 +0200, Vincent Guittot wrote: >> >> >> >> May be the last one which enable ARCH_POWER should also go into tip ? >> >> >> > OK, I can take it. >> >> Hi Peter, >> >> I can't find the patch that enable ARCH_POWER in the tip tree. Have >> you take it in your tree ? > > > Uhmmm how about I say I have now? Sorry about that. ok, thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Remove unneeded code in sys_getpriority
This check is not required because the condition is always true. Signed-off-by: Rabin Vincent <[EMAIL PROTECTED]> --- kernel/sys.c |7 ++- 1 files changed, 2 insertions(+), 5 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index d1fe71e..a001974 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -212,11 +212,8 @@ asmlinkage long sys_getpriority(int which, int who) p = find_task_by_vpid(who); else p = current; - if (p) { - niceval = 20 - task_nice(p); - if (niceval > retval) - retval = niceval; - } + if (p) + retval = 20 - task_nice(p); break; case PRIO_PGRP: if (who) -- 1.5.3.8 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Remove unneeded code in sys_getpriority
On Sun, Feb 03, 2008 at 10:54:45AM +0100, Frank Seidel wrote: > On Sunday 03 February 2008 04:04, Rabin Vincent wrote: > > This check is not required because the condition is always true. > > ... > > - if (niceval > retval) > > - retval = niceval; > > + retval = 20 - task_nice(p); > > Thats surely correct, but on the other hand currently those > case blocks are quite independet of their possition/could easily > be rearranged now .. or think of another case is put ahead. > Then this could mess up things. Do you mean the PRIO_* cases in the switch? They're still independent of position after the patch because they don't fall through. > Thanks, > Frank Rabin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] USB: ohci-exynos: initialize registers pointer earlier
In the former code, we have a race condition between the first interrupt and the regs field initilization in the usb_hcd structure. If the OHCI irq fires before hcd->regs is set, we are getting a null pointer dereference in ohci_irq. When calling usb_add_hcd(), it first executes the reset() callback, then enables the ohci interrupt, and finally executes the start() callback. So moving the ohci_init() call which actually initializes the reg field from start() to reset() should remove the race. Tested by enabling the external HSIC hub in the bootloader on an exynos5 machine and booting. With the former code, this triggers an early interrupt about 50% of the boots and a subsequent kernel panic in ohci_irq when trying to access the registers. Cc: Olof Johansson Cc: Doug Anderson Cc: Arjun.K.V Cc: Vikas Sajjan Cc: Abhilash Kesavan Signed-off-by: Vincent Palatin --- drivers/usb/host/ohci-exynos.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/ohci-exynos.c b/drivers/usb/host/ohci-exynos.c index 20a5008..f04cfde 100644 --- a/drivers/usb/host/ohci-exynos.c +++ b/drivers/usb/host/ohci-exynos.c @@ -23,6 +23,11 @@ struct exynos_ohci_hcd { struct clk *clk; }; +static int ohci_exynos_reset(struct usb_hcd *hcd) +{ + return ohci_init(hcd_to_ohci(hcd)); +} + static int ohci_exynos_start(struct usb_hcd *hcd) { struct ohci_hcd *ohci = hcd_to_ohci(hcd); @@ -30,10 +35,6 @@ static int ohci_exynos_start(struct usb_hcd *hcd) ohci_dbg(ohci, "ohci_exynos_start, ohci:%p", ohci); - ret = ohci_init(ohci); - if (ret < 0) - return ret; - ret = ohci_run(ohci); if (ret < 0) { dev_err(hcd->self.controller, "can't start %s\n", @@ -53,6 +54,7 @@ static const struct hc_driver exynos_ohci_hc_driver = { .irq= ohci_irq, .flags = HCD_MEMORY|HCD_USB11, + .reset = ohci_exynos_reset, .start = ohci_exynos_start, .stop = ohci_stop, .shutdown = ohci_shutdown, -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel BUG at fs/buffer.c:2886! Linux 3.5.0
27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? do_page_fault+0x1aa/0x3c0 Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? cp_new_stat+0x10d/0x120 Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? vfs_fstatat+0x41/0x80 Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? sys_newstat+0x1f/0x50 Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? system_call_fastpath+0x16/0x1b Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] Code: b6 44 24 18 4c 89 e7 83 e0 80 3c 01 19 db e8 76 3f 00 00 f7 d3 83 e3 a1 89 d8 5b 5d 41 5c c3 0f 0b eb fe 0f 0b eb fe 0f 0$ Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] RIP [] submit_bh+0x112/0x120 Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] RSP Jul 27 23:41:41 jupiter2 kernel: [ 351.177405] ---[ end trace e1e88bdf12146104 ]--- Jul 27 23:41:41 jupiter2 kernel: [ 351.177868] deliver (5783) used greatest stack depth: 3032 bytes left Regards, Vincent ETIENNE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel BUG at fs/buffer.c:2886! Linux 3.5.0
HI, Le 30/07/2012 08:30, Joel Becker a écrit : > On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote: >> Hello >> >> Get this on first write made ( by deliver sending mail to inform of the >> restart of services ) >> Home partition (the one receiving the mail) is based on ocfs2 created >> from drbd block device in primary/primary mode >> These drbd devices are based on lvm. >> >> system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2 >> but working with linux 3.0 kernel >> >> reproduced on two machines ( so different hardware involved on this one >> software md raid on SATA, on second one areca hardware raid card ) >> but the 2 machines are the one sharing this partition ( so share the >> same data ) > Hmm. Any chance you can bisect this further? Will try to. Will take a few days as the server is in production ( but used as backup so...) >> Jul 27 23:41:41 jupiter2 kernel: [ 351.169213] [ cut here >> ] >> Jul 27 23:41:41 jupiter2 kernel: [ 351.169261] kernel BUG at >> fs/buffer.c:2886! > This is: > > BUG_ON(!buffer_mapped(bh)); > > in submit_bh(). > > >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] Call Trace: >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> ocfs2_read_blocks+0x176/0x6c0 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> T.1552+0x91/0x2b0 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> ocfs2_find_actor+0x120/0x120 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> ocfs2_read_inode_block_full+0x37/0x60 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> ocfs2_fast_symlink_readpage+0x2f/0x160 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> do_read_cache_page+0x85/0x180 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> ocfs2_fill_super+0x2500/0x2500 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> read_cache_page+0x9/0x20 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> page_getlink+0x25/0x80 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> page_follow_link_light+0x1b/0x30 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> path_lookupat+0x38b/0x720 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> do_path_lookup+0x2c/0xd0 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> ocfs2_inode_revalidate+0x71/0x160 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> user_path_at_empty+0x5c/0xb0 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> do_page_fault+0x1aa/0x3c0 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> cp_new_stat+0x10d/0x120 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> vfs_fstatat+0x41/0x80 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> sys_newstat+0x1f/0x50 >> Jul 27 23:41:41 jupiter2 kernel: [ 351.170003] [] ? >> system_call_fastpath+0x16/0x1b > This stack trace is from 3.5, because of the location of the > BUG. The call path in the trace suggests the code added by Al's ea022d, > but you say it breaks in 3.2 and 3.3 as well. Can you give me a trace > from 3.2? For a 3.2 kernel i get this stack trace. Different trace form 3.5 but exactly at the same moment. and for the same reasons. Seems to be less immmediate than with 3.5 but more a subjective imrpession than something based on fact. ( it takes a few seconds after deliver is started to have the bug ) [ 716.402833] o2dlm: Joining domain B43153ED20B942E291251F2C138ADA9E ( 0 1 ) 2 nodes [ 716.501511] ocfs2: Mounting device (147,2) on (node 1, slot 0) with ordered data mode. [ 716.505744] mount.ocfs2 used greatest stack depth: 2936 bytes left [ 727.133743] deliver used greatest stack depth: 2632 bytes left [ 764.167029] deliver used greatest stack depth: 1896 bytes left [ 764.778872] BUG: unable to handle kernel NULL pointer dereference at 0038 [ 764.778897] IP: [] __ocfs2_change_file_space+0x75a/0x1690 [ 764.778922] PGD 62697067 PUD 67a81067 PMD 0 [ 764.778939] Oops: [#1] SMP [ 764.778953] CPU 0 [ 764.778959] Modules linked in: drbd lru_cache ipv6 [last unloaded: drbd] [ 764.778986] [ 764.778993] Pid: 5909, comm: deliver Not tainted 3.2.12-gentoo #2 HP ProLiant ML150 G3/ML150 G3 [ 764.779017] RIP: 0010:[] [] __ocfs2_change_file_space+0x75a/0x1690 [ 764.779041] RSP: 0018:880067b2dd98 EFLAGS: 00010246 [ 764.779053] RAX: RBX: 880067f82000 RCX: 880063d11000 [ 764.779069] RDX: RSI: 0001 RDI: 88007ae83288 [ 764.779085] RBP: 880055d1f138 R08: 0010 R09: 88
Re: kernel BUG at fs/buffer.c:2886! Linux 3.5.0
On 30/07/2012 09:53, Joel Becker wrote: > On Mon, Jul 30, 2012 at 09:45:14AM +0200, Vincent ETIENNE wrote: >> Le 30/07/2012 08:30, Joel Becker a écrit : >>> On Sat, Jul 28, 2012 at 12:18:30AM +0200, Vincent ETIENNE wrote: >>>> Hello >>>> >>>> Get this on first write made ( by deliver sending mail to inform of the >>>> restart of services ) >>>> Home partition (the one receiving the mail) is based on ocfs2 created >>>> from drbd block device in primary/primary mode >>>> These drbd devices are based on lvm. >>>> >>>> system is running linux-3.5.0, identical symptom with linux 3.3 and 3.2 >>>> but working with linux 3.0 kernel >>>> >>>> reproduced on two machines ( so different hardware involved on this one >>>> software md raid on SATA, on second one areca hardware raid card ) >>>> but the 2 machines are the one sharing this partition ( so share the >>>> same data ) >>> Hmm. Any chance you can bisect this further? >> Will try to. Will take a few days as the server is in production ( but >> used as backup so...) >> >>>> Jul 27 23:41:41 jupiter2 kernel: [ 351.169213] [ cut here >>>> ] >>>> Jul 27 23:41:41 jupiter2 kernel: [ 351.169261] kernel BUG at >>>> fs/buffer.c:2886! >>> This is: >>> >>> BUG_ON(!buffer_mapped(bh)); >>> >>> in submit_bh(). >>> >>> system_call_fastpath+0x16/0x1b >>> This stack trace is from 3.5, because of the location of the >>> BUG. The call path in the trace suggests the code added by Al's ea022d, >>> but you say it breaks in 3.2 and 3.3 as well. Can you give me a trace >>> from 3.2? >> For a 3.2 kernel i get this stack trace. Different trace form 3.5 but >> exactly at the same moment. and for the same reasons. >> Seems to be less immmediate than with 3.5 but more a subjective >> imrpession than something based on fact. ( it takes a few seconds after >> deliver is started to have the bug ) > Totally different stack trace. Not in symlink code, but instead in > fallocate. Weird. I wonder if you are hitting two things. Bisection > will definitely help. Yes could be, that would explain the 2 stack trace ( and the different timing observed ) Bisection is in progress. The fallocate bug is certainly already corrected ( info sent by sunil.mush...@gmail.com but unavailable on the list for the moment ?) -- The fallocate() oops is probably the same that is fixed by this patch. https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=commit;h=a2118b301104a24381b414bc93371d666fe8d43a Is in the list of patches that are ready to be pushed. https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=shortlog;h=mw-3.4-mar15 But not sure it will correct all i observed. So i will continue to bisect to confirm/infirm. ( But i seems to have lost network on my server after a reboot and so no more access before tomorrow , I have certainly forget to do make modules_install before installing new kernel ... Being stupid is not very helpful... ) . I hope to finish the bisection tomorrow or wednesday. Thanks a lot for the support. > Joel > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] lib: vsprintf: Optimize put_dec_trunc8
2012/8/3 George Spelvin : > If you're going to have a conditional branch after > each 32x32->64-bit multiply, might as well shrink the code > and make it a loop. > > This also avoids using the long multiply for small integers. > > (This leaves the comments in a confusing state, but that's a separate > patch to make review easier.) > > Signed-off-by: George Spelvin This patch breaks IP address printing with "%pI4" (and by extension, nfsroot). Example: - Before: 10.0.0.1 - After: 10...1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] workqueue: Schedule work on non-idle cpu instead of current one
On 25 September 2012 13:30, Viresh Kumar wrote: > On 25 September 2012 16:52, Peter Zijlstra wrote: >> On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote: >>> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, >>> struct work_struct *work) >>> { >>> int ret; >>> >>> - ret = queue_work_on(get_cpu(), wq, work); >>> - put_cpu(); >>> + preempt_disable(); >>> + ret = queue_work_on(wq_select_cpu(), wq, work); >>> + preempt_enable(); >>> >>> return ret; >>> } >> >> Right, so the problem I see here is that wq_select_cpu() is horridly >> expensive.. > > But this is what the initial idea during LPC we had. Any improvements here > you can suggest? The main outcome of the LPC was that we should be able to select another CPU than the local one. Using the same policy than timer, is a 1st step to consolidate interface. A next step should be to update the policy of the function Vincent > >>> @@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long >>> __data) >>> struct delayed_work *dwork = (struct delayed_work *)__data; >>> struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work); >>> >>> - __queue_work(smp_processor_id(), cwq->wq, &dwork->work); >>> + __queue_work(wq_select_cpu(), cwq->wq, &dwork->work); >>> } >> >> Shouldn't timer migration have sorted this one? > > Maybe yes. Will investigate more on it. > > Thanks for your early feedback. > > -- > viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 2/6] sched: add a new SD SHARE_POWERLINE flag for sched_domain
On 24 October 2012 17:17, Santosh Shilimkar wrote: > Vincent, > > Few comments/questions. > > > On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote: >> >> This new flag SD SHARE_POWERLINE reflects the sharing of the power rail >> between the members of a domain. As this is the current assumption of the >> scheduler, the flag is added to all sched_domain >> >> Signed-off-by: Vincent Guittot >> --- >> arch/ia64/include/asm/topology.h |1 + >> arch/tile/include/asm/topology.h |1 + >> include/linux/sched.h|1 + >> include/linux/topology.h |3 +++ >> kernel/sched/core.c |5 + >> 5 files changed, 11 insertions(+) >> >> diff --git a/arch/ia64/include/asm/topology.h >> b/arch/ia64/include/asm/topology.h >> index a2496e4..065c720 100644 >> --- a/arch/ia64/include/asm/topology.h >> +++ b/arch/ia64/include/asm/topology.h >> @@ -65,6 +65,7 @@ void build_cpu_to_node_map(void); >> | SD_BALANCE_EXEC \ >> | SD_BALANCE_FORK \ >> | SD_WAKE_AFFINE, \ >> + | arch_sd_share_power_line()\ >> .last_balance = jiffies, \ >> .balance_interval = 1,\ >> .nr_balance_failed = 0,\ >> diff --git a/arch/tile/include/asm/topology.h >> b/arch/tile/include/asm/topology.h >> index 7a7ce39..d39ed0b 100644 >> --- a/arch/tile/include/asm/topology.h >> +++ b/arch/tile/include/asm/topology.h >> @@ -72,6 +72,7 @@ static inline const struct cpumask *cpumask_of_node(int >> node) >> | 0*SD_PREFER_LOCAL \ >> | 0*SD_SHARE_CPUPOWER \ >> | 0*SD_SHARE_PKG_RESOURCES \ >> + | arch_sd_share_power_line()\ >> | 0*SD_SERIALIZE\ >> , \ >> .last_balance = jiffies, \ >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index 4786b20..74f2daf 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -862,6 +862,7 @@ enum cpu_idle_type { >> #define SD_WAKE_AFFINE0x0020 /* Wake task to waking CPU >> */ >> #define SD_PREFER_LOCAL 0x0040 /* Prefer to keep tasks >> local to this domain */ >> #define SD_SHARE_CPUPOWER 0x0080 /* Domain members share cpu power >> */ >> +#define SD_SHARE_POWERLINE 0x0100 /* Domain members share power >> domain */ > > If you ignore the current use of SD_SHARE_CPUPOWER, isn't the meaning of > CPUPOWER and POWERLINE is same here. Just trying to understand the clear > meaning of this new flag. Have you not considered SD_SHARE_CPUPOWER > because it is being used for cpu_power and needs at least minimum two > domains ? SD_PACKING would have been probably more appropriate based > on the way it is being used in further series. CPUPOWER reflects the share of hw ressources between cores like for hyper threading. POWERLINE describes the fact that cores are sharing the same power line amore precisely the powergate. > > Regards > Santosh > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 2/6] sched: add a new SD SHARE_POWERLINE flag for sched_domain
It looks like i need to describe more what On 29 October 2012 10:40, Vincent Guittot wrote: > On 24 October 2012 17:17, Santosh Shilimkar wrote: >> Vincent, >> >> Few comments/questions. >> >> >> On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote: >>> >>> This new flag SD SHARE_POWERLINE reflects the sharing of the power rail >>> between the members of a domain. As this is the current assumption of the >>> scheduler, the flag is added to all sched_domain >>> >>> Signed-off-by: Vincent Guittot >>> --- >>> arch/ia64/include/asm/topology.h |1 + >>> arch/tile/include/asm/topology.h |1 + >>> include/linux/sched.h|1 + >>> include/linux/topology.h |3 +++ >>> kernel/sched/core.c |5 + >>> 5 files changed, 11 insertions(+) >>> >>> diff --git a/arch/ia64/include/asm/topology.h >>> b/arch/ia64/include/asm/topology.h >>> index a2496e4..065c720 100644 >>> --- a/arch/ia64/include/asm/topology.h >>> +++ b/arch/ia64/include/asm/topology.h >>> @@ -65,6 +65,7 @@ void build_cpu_to_node_map(void); >>> | SD_BALANCE_EXEC \ >>> | SD_BALANCE_FORK \ >>> | SD_WAKE_AFFINE, \ >>> + | arch_sd_share_power_line()\ >>> .last_balance = jiffies, \ >>> .balance_interval = 1,\ >>> .nr_balance_failed = 0,\ >>> diff --git a/arch/tile/include/asm/topology.h >>> b/arch/tile/include/asm/topology.h >>> index 7a7ce39..d39ed0b 100644 >>> --- a/arch/tile/include/asm/topology.h >>> +++ b/arch/tile/include/asm/topology.h >>> @@ -72,6 +72,7 @@ static inline const struct cpumask *cpumask_of_node(int >>> node) >>> | 0*SD_PREFER_LOCAL \ >>> | 0*SD_SHARE_CPUPOWER \ >>> | 0*SD_SHARE_PKG_RESOURCES \ >>> + | arch_sd_share_power_line()\ >>> | 0*SD_SERIALIZE\ >>> , \ >>> .last_balance = jiffies, \ >>> diff --git a/include/linux/sched.h b/include/linux/sched.h >>> index 4786b20..74f2daf 100644 >>> --- a/include/linux/sched.h >>> +++ b/include/linux/sched.h >>> @@ -862,6 +862,7 @@ enum cpu_idle_type { >>> #define SD_WAKE_AFFINE0x0020 /* Wake task to waking CPU >>> */ >>> #define SD_PREFER_LOCAL 0x0040 /* Prefer to keep tasks >>> local to this domain */ >>> #define SD_SHARE_CPUPOWER 0x0080 /* Domain members share cpu power >>> */ >>> +#define SD_SHARE_POWERLINE 0x0100 /* Domain members share power >>> domain */ >> >> If you ignore the current use of SD_SHARE_CPUPOWER, isn't the meaning of >> CPUPOWER and POWERLINE is same here. Just trying to understand the clear >> meaning of this new flag. Have you not considered SD_SHARE_CPUPOWER >> because it is being used for cpu_power and needs at least minimum two >> domains ? SD_PACKING would have been probably more appropriate based >> on the way it is being used in further series. > > CPUPOWER reflects the share of hw ressources between cores like for > hyper threading. POWERLINE describes the fact that cores are sharing > the same power line amore precisely the powergate. Sorry, the mail has been sent too early while I was writing it CPUPOWER reflects the share of hw ressources between cores like for hyper threading. POWERLINE describes the fact that cores are sharing the same power line and more precisely the same power gating. It looks like I need to describe more precisely what i would mean with SHARE_POWERLINE. I don't want to use PACKING because it's more a behavior than a feature. If cores can power gate independently (!SD_SHARE_POWERLINE), packing small tasks is one interesting behavior but it may be not the only one. I want to make a difference between the HW configuration and the behavior we want to have above it Vincent >> >> Regards >> Santosh >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 3/6] sched: pack small tasks
On 24 October 2012 17:20, Santosh Shilimkar wrote: > Vincent, > > Few comments/questions. > > > On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote: >> >> During sched_domain creation, we define a pack buddy CPU if available. >> >> On a system that share the powerline at all level, the buddy is set to -1 >> >> On a dual clusters / dual cores system which can powergate each core and >> cluster independantly, the buddy configuration will be : >>| CPU0 | CPU1 | CPU2 | CPU3 | >> --- >> buddy | CPU0 | CPU0 | CPU0 | CPU2 | > > ^ > Is that a typo ? Should it be CPU2 instead of > CPU0 ? No it's not a typo. The system packs at each scheduling level. It starts to pack in cluster because each core can power gate independently so CPU1 tries to pack its tasks in CPU0 and CPU3 in CPU2. Then, it packs at CPU level so CPU2 tries to pack in the cluster of CPU0 and CPU0 packs in itself > > >> Small tasks tend to slip out of the periodic load balance. >> The best place to choose to migrate them is at their wake up. >> > I have tried this series since I was looking at some of these packing > bits. On Mobile workloads like OSIdle with Screen ON, MP3, gallary, > I did see some additional filtering of threads with this series > but its not making much difference in power. More on this below. Can I ask you which configuration you have used ? how many cores and cluster ? Can they be power gated independently ? > > >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/core.c |1 + >> kernel/sched/fair.c | 109 >> ++ >> kernel/sched/sched.h |1 + >> 3 files changed, 111 insertions(+) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index dab7908..70cadbe 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct >> root_domain *rd, int cpu) >> rcu_assign_pointer(rq->sd, sd); >> destroy_sched_domains(tmp, cpu); >> >> + update_packing_domain(cpu); >> update_top_cache_domain(cpu); >> } >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 4f4a4f6..8c9d3ed 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -157,6 +157,63 @@ void sched_init_granularity(void) >> update_sysctl(); >> } >> >> + >> +/* >> + * Save the id of the optimal CPU that should be used to pack small tasks >> + * The value -1 is used when no buddy has been found >> + */ >> +DEFINE_PER_CPU(int, sd_pack_buddy); >> + >> +/* Look for the best buddy CPU that can be used to pack small tasks >> + * We make the assumption that it doesn't wort to pack on CPU that share >> the > > s/wort/worth yes > >> + * same powerline. We looks for the 1st sched_domain without the >> + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the >> lowest >> + * power per core based on the assumption that their power efficiency is >> + * better */ > > Commenting style.. > /* > * > */ > yes > Can you please expand the why the assumption is right ? > "it doesn't wort to pack on CPU that share the same powerline" By "share the same power-line", I mean that the CPUs can't power off independently. So if some CPUs can't power off independently, it's worth to try to use most of them to race to idle. > > Think about a scenario where you have quad core, ducal cluster system > > |Cluster1| |cluster 2| > | CPU0 | CPU1 | CPU2 | CPU3 | | CPU0 | CPU1 | CPU2 | CPU3 | > > > Both clusters run from same voltage rail and have same PLL > clocking them. But the cluster have their own power domain > and all CPU's can power gate them-self to low power states. > Clusters also have their own level2 caches. > > In this case, you will still save power if you try to pack > load on one cluster. No ? yes, I need to update the description of SD_SHARE_POWERLINE because I'm afraid I was not clear enough. SD_SHARE_POWERLINE includes the power gating capacity of each core. For your example above, the SD_SHARE_POWERLINE shoud be cleared at both MC and CPU level. > > >> +void update_packing_domain(int cpu) >> +{ >> + struct sched_domain *sd; >> + int id = -1; >> + >> + sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE); >> + if (!sd) >> + s
Re: [RFC 4/6] sched: secure access to other CPU statistics
On 24 October 2012 17:21, Santosh Shilimkar wrote: > $subject is bit confusing here. > > > On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote: >> >> The atomic update of runnable_avg_sum and runnable_avg_period are ensured >> by their size and the toolchain. But we must ensure to not read an old >> value >> for one field and a newly updated value for the other field. As we don't >> want to lock other CPU while reading these fields, we read twice each >> fields >> and check that no change have occured in the middle. >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/fair.c | 19 +-- >> 1 file changed, 17 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8c9d3ed..6df53b5 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -3133,13 +3133,28 @@ static int select_idle_sibling(struct task_struct >> *p, int target) >> static inline bool is_buddy_busy(int cpu) >> { >> struct rq *rq = cpu_rq(cpu); >> + volatile u32 *psum = &rq->avg.runnable_avg_sum; >> + volatile u32 *pperiod = &rq->avg.runnable_avg_period; >> + u32 sum, new_sum, period, new_period; >> + int timeout = 10; > > So it can be 2 times read or more as well. > >> + >> + while (timeout) { >> + sum = *psum; >> + period = *pperiod; >> + new_sum = *psum; >> + new_period = *pperiod; >> + >> + if ((sum == new_sum) && (period == new_period)) >> + break; >> + >> + timeout--; >> + } >> > Seems like you did notice incorrect pair getting read > for rq runnable_avg_sum and runnable_avg_period. Seems > like the fix is to update them together under some lock > to avoid such issues. My goal is to have a lock free mechanism because I don't want to lock another CPU while reading its statistic > > Regards > Santosh > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 5/6] sched: pack the idle load balance
On 24 October 2012 17:21, Santosh Shilimkar wrote: > On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote: >> >> Look for an idle CPU close the pack buddy CPU whenever possible. > > s/close/close to yes > >> The goal is to prevent the wake up of a CPU which doesn't share the power >> line of the pack CPU >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/fair.c | 18 ++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 6df53b5..f76acdc 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -5158,7 +5158,25 @@ static struct { >> >> static inline int find_new_ilb(int call_cpu) >> { >> + struct sched_domain *sd; >> int ilb = cpumask_first(nohz.idle_cpus_mask); >> + int buddy = per_cpu(sd_pack_buddy, call_cpu); >> + >> + /* >> +* If we have a pack buddy CPU, we try to run load balance on a >> CPU >> +* that is close to the buddy. >> +*/ >> + if (buddy != -1) >> + for_each_domain(buddy, sd) { >> + if (sd->flags & SD_SHARE_CPUPOWER) >> + continue; > > Do you mean SD_SHARE_POWERLINE here ? No, I just don't want to take hyperthread level for ILB > >> + >> + ilb = cpumask_first_and(sched_domain_span(sd), >> + nohz.idle_cpus_mask); >> + >> + if (ilb < nr_cpu_ids) >> + break; >> + } >> >> if (ilb < nr_cpu_ids && idle_cpu(ilb)) >> return ilb; >> > Can you please expand "idle CPU _close_ the pack buddy CPU" ? The goal is to packed the tasks on the pack buddy CPU so when the scheduler needs to start ILB, I try to wake up a CPU that is close to the buddy and preferably in the same cluster > > Regards > santosh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 6/6] ARM: sched: clear SD_SHARE_POWERLINE
On 24 October 2012 17:21, Santosh Shilimkar wrote: > On Sunday 07 October 2012 01:13 PM, Vincent Guittot wrote: >> >> The ARM platforms take advantage of packing small tasks on few cores. >> This is true even when the cores of a cluster can't be powergated >> independently. >> >> >> Signed-off-by: Vincent Guittot >> --- >> arch/arm/kernel/topology.c |5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c >> index 26c12c6..00511d0 100644 >> --- a/arch/arm/kernel/topology.c >> +++ b/arch/arm/kernel/topology.c >> @@ -226,6 +226,11 @@ static inline void update_cpu_power(unsigned int >> cpuid, unsigned int mpidr) {} >>*/ >> struct cputopo_arm cpu_topology[NR_CPUS]; >> >> +int arch_sd_share_power_line(void) >> +{ >> + return 0*SD_SHARE_POWERLINE; >> +} > > > Making this selection of policy based on sched domain will better. Just > gives the flexibility to choose a separate scheme for big and little > systems which will be very convenient. I agree that it would be more flexible to be able to set it for each level > > Regards > Santosh > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH linux-next] edma: select arch common code to fix link
EDMA code has been moved to a common folder with a new CONFIG_TI_PRIV_EDMA switch. Select it when the edma driver is enabled. This fixes the following link error: drivers/built-in.o: In function `edma_remove': of_iommu.c:(.text+0x4ef20): undefined reference to `edma_free_slot' drivers/built-in.o: In function `edma_control': of_iommu.c:(.text+0x4ef70): undefined reference to `edma_stop' drivers/built-in.o: In function `edma_execute': of_iommu.c:(.text+0x4f11c): undefined reference to `edma_write_slot' of_iommu.c:(.text+0x4f150): undefined reference to `edma_link' of_iommu.c:(.text+0x4f168): undefined reference to `edma_start' drivers/built-in.o: In function `edma_free_chan_resources': of_iommu.c:(.text+0x4f220): undefined reference to `edma_stop' of_iommu.c:(.text+0x4f304): undefined reference to `edma_free_slot' of_iommu.c:(.text+0x4f328): undefined reference to `edma_free_channel' drivers/built-in.o: In function `edma_alloc_chan_resources': of_iommu.c:(.text+0x4f37c): undefined reference to `edma_alloc_channel' of_iommu.c:(.text+0x4f3d8): undefined reference to `edma_free_channel' drivers/built-in.o: In function `edma_prep_slave_sg': of_iommu.c:(.text+0x4f67c): undefined reference to `edma_alloc_slot' drivers/built-in.o: In function `edma_probe': of_iommu.c:(.text+0x4f794): undefined reference to `edma_alloc_slot' of_iommu.c:(.text+0x4f8b8): undefined reference to `edma_free_slot' drivers/built-in.o: In function `edma_callback': of_iommu.c:(.text+0x4fae4): undefined reference to `edma_stop' make: *** [vmlinux] Error 1 Signed-off-by: Vincent Stehlé Cc: Matt Porter Cc: Sekhar Nori Cc: Vinod Koul Cc: Dan Williams Cc: Russell King --- Hi, Build of linux next-20130709 is broken for ARM multi_v7_defconfig. This patch fixes it. (Note: the error messages mentioning of_iommu.c are misleading.) Best regards, V. drivers/dma/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 6825957..8b3fca9 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -198,6 +198,7 @@ config TI_EDMA depends on ARCH_DAVINCI || ARCH_OMAP select DMA_ENGINE select DMA_VIRTUAL_CHANNELS + select TI_PRIV_EDMA default n help Enable support for the TI EDMA controller. This DMA -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH linux-next] arm: multi_v7_defconfig: add fsl lpuart serial console
Add Freescale LPUART serial console support. This gives us the boot messages on UART on e.g. the Vybrid VF610 Tower board. Signed-off-by: Vincent Stehlé Cc: Olof Johansson Cc: Russell King --- Hi, Would you please consider adding LPUART for ARM multi_v7_defconfig, please? (This patch is built on top of the following patches: http://comments.gmane.org/gmane.linux.kernel/1519712) Best regards, V. arch/arm/configs/multi_v7_defconfig |2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig index 81eac83..80aacc6 100644 --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -79,6 +79,8 @@ CONFIG_SERIAL_XILINX_PS_UART=y CONFIG_SERIAL_XILINX_PS_UART_CONSOLE=y CONFIG_SERIAL_IMX=y CONFIG_SERIAL_IMX_CONSOLE=y +CONFIG_SERIAL_FSL_LPUART=y +CONFIG_SERIAL_FSL_LPUART_CONSOLE=y CONFIG_I2C_DESIGNWARE_PLATFORM=y CONFIG_I2C_SIRF=y CONFIG_I2C_TEGRA=y -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH linux-next] ARM: imx: fix imx_init_l2cache storage class
Commit 879ec1ceeac21285d62606c1e96520887efcd9bc makes imx_init_l2cache a common function and updates the header declaration accordingly. Fix function storage class, too. This fixes the following compilation error: arch/arm/mach-imx/system.c:101:123: error: static declaration of ‘imx_init_l2cache’ follows non-static declaration In file included from arch/arm/mach-imx/system.c:32:0: arch/arm/mach-imx/common.h:165:13: note: previous declaration of ‘imx_init_l2cache’ was here arch/arm/mach-imx/system.c:101:123: warning: ‘imx_init_l2cache’ defined but not used [-Wunused-function] Signed-off-by: Vincent Stehlé Cc: Shawn Guo Cc: Sascha Hauer Cc: Russell King Cc: triv...@kernel.org --- Hi, Linux next-20130710 breaks compilation of ARM multi_v7_defconfig. This patch fixes it. (Note: this patch is necessary for the link, too: http://www.spinics.net/lists/kernel/msg1563777.html) Best regards, V. arch/arm/mach-imx/system.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/mach-imx/system.c b/arch/arm/mach-imx/system.c index e5592ca..64ff37e 100644 --- a/arch/arm/mach-imx/system.c +++ b/arch/arm/mach-imx/system.c @@ -98,7 +98,7 @@ void __init mxc_arch_reset_init_dt(void) } #ifdef CONFIG_CACHE_L2X0 -static void __init imx_init_l2cache(void) +void __init imx_init_l2cache(void) { void __iomem *l2x0_base; struct device_node *np; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 1/9] sched: Introduce power scheduler
On 10 July 2013 13:11, Morten Rasmussen wrote: > On Wed, Jul 10, 2013 at 03:10:15AM +0100, Arjan van de Ven wrote: >> On 7/9/2013 8:55 AM, Morten Rasmussen wrote: >> > + mod_delayed_work_on(schedule_cpu(), system_wq, &dwork, >> > + msecs_to_jiffies(INTERVAL)); >> >> so thinking about this more, this really really should not be a work queue. >> a work queue will cause a large number of context switches for no reason >> (on Intel and AMD you can switch P state from interrupt context, and I'm >> pretty sure >> that holds for many ARM as well) > > Agree. I should have made it clear this is only a temporary solution. I > would prefer to tie the power scheduler to the existing scheduler tick > instead so we don't wake up cpus unnecessarily. nohz may be able handle > that for us. Also, currently the power scheduler updates all cpus. > Going forward this would change to per cpu updates and partial updates > of the global view to improve scalability. For the packing tasks patches, we are using the periodic load balance sequence to update the activity like it is done for the cpu_power. I have planned to update the packing patches to see how it can cooperate with Morten patches as it has similar needs. > >> >> and in addition, it causes some really nasty cases, especially around real >> time tasks. >> Your workqueue will schedule a kernel thread, which will run >> BEHIND real time tasks, and such real time task will then never be able to >> start running at a higher performance. >> >> (and with the delta between lowest and highest performance sometimes being >> 10x or more, >> the real time task will be running SLOW... quite possible longer than >> several milliseconds) >> >> and all for no good reason; a normal timer running in irq context would be >> much better for this kind of thing! >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH linux-next] pinctrl: fix pinconf_dbg_config_write return type
Have pinconf_dbg_config_write() return a ssize_t. This fixes the following compilation warning: drivers/pinctrl/pinconf.c:617:2: warning: initialization from incompatible pointer type [enabled by default] drivers/pinctrl/pinconf.c:617:2: warning: (near initialization for ‘pinconf_dbg_pinconfig_fops.write’) [enabled by default] Signed-off-by: Vincent Stehlé Cc: Linus Walleij --- Hi, This can be seen with e.g. next-20130916 with x86 allmodconfig. Best regards, V. drivers/pinctrl/pinconf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pinctrl/pinconf.c b/drivers/pinctrl/pinconf.c index a138965..1664e78 100644 --- a/drivers/pinctrl/pinconf.c +++ b/drivers/pinctrl/pinconf.c @@ -490,7 +490,7 @@ exit: *are values that should match the pinctrl-maps * reflects the new config and is driver dependant */ -static int pinconf_dbg_config_write(struct file *file, +static ssize_t pinconf_dbg_config_write(struct file *file, const char __user *user_buf, size_t count, loff_t *ppos) { struct pinctrl_maps *maps_node; -- 1.8.4.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 14/10] sched, fair: Fix the group_capacity computation
On 28 August 2013 13:16, Peter Zijlstra wrote: > > Subject: sched, fair: Fix the group_capacity computation > From: Peter Zijlstra > Date: Wed Aug 28 12:40:38 CEST 2013 > > Do away with 'phantom' cores due to N*frac(smt_power) >= 1 by limiting > the capacity to the actual number of cores. > Peter, your patch also solves the 'phantom' big cores that can appear on HMP system because big cores have a cpu_power >= SCHED_POWER_SCALE in order to express a higher capacity than LITTLE cores. Acked-by Vincent Guittot Vincent > The assumption of 1 < smt_power < 2 is an actual requirement because > of what SMT is so this should work regardless of the SMT > implementation. > > It can still be defeated by creative use of cpu hotplug, but if you're > one of those freaks, you get to live with it. > > Signed-off-by: Peter Zijlstra > --- > kernel/sched/fair.c | 20 +--- > 1 file changed, 13 insertions(+), 7 deletions(-) > > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4554,18 +4554,24 @@ static inline int sg_imbalanced(struct s > /* > * Compute the group capacity. > * > - * For now the capacity is simply the number of power units in the > group_power. > - * A power unit represents a full core. > - * > - * This has an issue where N*frac(smt_power) >= 1, in that case we'll see > extra > - * 'cores' that aren't actually there. > + * Avoid the issue where N*frac(smt_power) >= 1 creates 'phantom' cores by > + * first dividing out the smt factor and computing the actual number of cores > + * and limit power unit capacity with that. > */ > static inline int sg_capacity(struct lb_env *env, struct sched_group *group) > { > + unsigned int capacity, smt, cpus; > + unsigned int power, power_orig; > + > + power = group->sgp->power; > + power_orig = group->sgp->power_orig; > + cpus = group->group_weight; > > - unsigned int power = group->sgp->power; > - unsigned int capacity = DIV_ROUND_CLOSEST(power, SCHED_POWER_SCALE); > + /* smt := ceil(cpus / power), assumes: 1 < smt_power < 2 */ > + smt = DIV_ROUND_UP(SCHED_POWER_SCALE * cpus, power_orig); > + capacity = cpus / smt; /* cores */ > > + capacity = min_t(capacity, DIV_ROUND_CLOSEST(power, > SCHED_POWER_SCALE)); > if (!capacity) > capacity = fix_small_capacity(env->sd, group); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] gma500: define do_gma_backlight_set only when used
Make sure static function do_gma_backlight_set() is only defined when CONFIG_BACKLIGHT_CLASS_DEVICE is defined, as it is never called otherwise. This fixes the following warning: drivers/gpu/drm/gma500/backlight.c:29:13: warning: ‘do_gma_backlight_set’ defined but not used [-Wunused-function] While at it, remove some end of line spaces. Signed-off-by: Vincent Stehlé Cc: David Airlie --- Hi, This can be seen with mainline or linux-next with e.g. allmodconfig on x86. Best regards, V. drivers/gpu/drm/gma500/backlight.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/gma500/backlight.c b/drivers/gpu/drm/gma500/backlight.c index 143eba3..399731e 100644 --- a/drivers/gpu/drm/gma500/backlight.c +++ b/drivers/gpu/drm/gma500/backlight.c @@ -26,13 +26,13 @@ #include "intel_bios.h" #include "power.h" +#ifdef CONFIG_BACKLIGHT_CLASS_DEVICE static void do_gma_backlight_set(struct drm_device *dev) { -#ifdef CONFIG_BACKLIGHT_CLASS_DEVICE struct drm_psb_private *dev_priv = dev->dev_private; backlight_update_status(dev_priv->backlight_device); -#endif } +#endif void gma_backlight_enable(struct drm_device *dev) { @@ -43,7 +43,7 @@ void gma_backlight_enable(struct drm_device *dev) dev_priv->backlight_device->props.brightness = dev_priv->backlight_level; do_gma_backlight_set(dev); } -#endif +#endif } void gma_backlight_disable(struct drm_device *dev) @@ -55,7 +55,7 @@ void gma_backlight_disable(struct drm_device *dev) dev_priv->backlight_device->props.brightness = 0; do_gma_backlight_set(dev); } -#endif +#endif } void gma_backlight_set(struct drm_device *dev, int v) @@ -67,7 +67,7 @@ void gma_backlight_set(struct drm_device *dev, int v) dev_priv->backlight_device->props.brightness = v; do_gma_backlight_set(dev); } -#endif +#endif } int gma_backlight_init(struct drm_device *dev) -- 1.8.4.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] i2c-designware: define i2c_dw_pci_runtime_idle only with runtime pm
Make sure i2c_dw_pci_runtime_idle() is defined only when actually used, when CONFIG_PM_RUNTIME is defined. This fixes the following compilation warning: drivers/i2c/busses/i2c-designware-pcidrv.c:188:12: warning: ‘i2c_dw_pci_runtime_idle’ defined but not used [-Wunused-function] Signed-off-by: Vincent Stehlé Cc: Wolfram Sang --- drivers/i2c/busses/i2c-designware-pcidrv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/i2c/busses/i2c-designware-pcidrv.c b/drivers/i2c/busses/i2c-designware-pcidrv.c index f6ed06c..2b5d3a6 100644 --- a/drivers/i2c/busses/i2c-designware-pcidrv.c +++ b/drivers/i2c/busses/i2c-designware-pcidrv.c @@ -185,6 +185,7 @@ static int i2c_dw_pci_resume(struct device *dev) return 0; } +#ifdef CONFIG_PM_RUNTIME static int i2c_dw_pci_runtime_idle(struct device *dev) { int err = pm_schedule_suspend(dev, 500); @@ -194,6 +195,7 @@ static int i2c_dw_pci_runtime_idle(struct device *dev) return 0; return -EBUSY; } +#endif static const struct dev_pm_ops i2c_dw_pm_ops = { .resume = i2c_dw_pci_resume, -- 1.8.4.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH linux-next] skd: fix some VPRINTK() specifiers for size_t
Use %zu for VPRINTK() as size_t specifier in replacement of %u. This fixes 7 compilation warnings on x86_64 like the following: drivers/block/skd_main.c:4628:42: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 6 has type ‘long unsigned int’ [-Wformat=] While at it, remove one cast to unsigned long for a size_t VPRINTK() argument and specify it as %zu, too. Signed-off-by: Vincent Stehlé Cc: Andrew Morton --- Hi, This can be seen on e.g. linux next-20130927. Best regards, V. drivers/block/skd_main.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c index 3110f68..ee7f7a8 100644 --- a/drivers/block/skd_main.c +++ b/drivers/block/skd_main.c @@ -4556,11 +4556,10 @@ static int skd_cons_skmsg(struct skd_device *skdev) int rc = 0; u32 i; - VPRINTK(skdev, "skmsg_table kzalloc, struct %u, count %u total %lu\n", + VPRINTK(skdev, "skmsg_table kzalloc, struct %zu, count %u total %zu\n", sizeof(struct skd_fitmsg_context), skdev->num_fitmsg_context, - (unsigned long) sizeof(struct skd_fitmsg_context) * - skdev->num_fitmsg_context); + sizeof(struct skd_fitmsg_context) * skdev->num_fitmsg_context); skdev->skmsg_table = kzalloc(sizeof(struct skd_fitmsg_context) *skdev->num_fitmsg_context, GFP_KERNEL); @@ -4611,7 +4610,7 @@ static int skd_cons_skreq(struct skd_device *skdev) int rc = 0; u32 i; - VPRINTK(skdev, "skreq_table kzalloc, struct %u, count %u total %u\n", + VPRINTK(skdev, "skreq_table kzalloc, struct %zu, count %u total %zu\n", sizeof(struct skd_request_context), skdev->num_req_context, sizeof(struct skd_request_context) * skdev->num_req_context); @@ -4623,7 +4622,7 @@ static int skd_cons_skreq(struct skd_device *skdev) goto err_out; } - VPRINTK(skdev, "alloc sg_table sg_per_req %u scatlist %u total %u\n", + VPRINTK(skdev, "alloc sg_table sg_per_req %u scatlist %zu total %zu\n", skdev->sgs_per_request, sizeof(struct scatterlist), skdev->sgs_per_request * sizeof(struct scatterlist)); @@ -4668,7 +4667,7 @@ static int skd_cons_skspcl(struct skd_device *skdev) int rc = 0; u32 i, nbytes; - VPRINTK(skdev, "skspcl_table kzalloc, struct %u, count %u total %u\n", + VPRINTK(skdev, "skspcl_table kzalloc, struct %zu, count %u total %zu\n", sizeof(struct skd_special_context), skdev->n_special, sizeof(struct skd_special_context) * skdev->n_special); -- 1.8.4.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ipv4:PATCH] Allow userspace to specify primary or secondary ip on interface
Yes, I found I can use 'ip route replace' command to change the 'src' address as workaround. Julian also responded in another thread that He could come up with a patch to sort ip with scope, primary, secondary preferences. https://lkml.org/lkml/2013/9/27/482 Vincent On Sun, Sep 29, 2013 at 2:59 PM, David Miller wrote: > From: Vincent Li > Date: Tue, 24 Sep 2013 14:09:48 -0700 > >> the reason for this patch is that we have a multi blade cluster platform >> sharing 'floating management ip' and also that each blade has its own >> management ip on the management interface, so whichever blade in the >> cluster becomes primary blade, the 'floating mangaement ip' follows it, >> and we want any of our traffic originated from the primary blade source from >> the 'floating management ip' for consistency. but in this case, since the >> local >> blade management ip is always the primary ip on the mangaement interface and >> 'floating management ip' is always secondary, kernel always choose the >> primary >> ip as source ip address. thus we would like to add the flexibility in kernel >> to >> allow us to specify which ip to be primary or secondary. > > You have the flexibility already. > > You can specify a specific source address ot use in routes. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
the current behavior is when an IP is added to an interface, the primary or secondary attributes is depending on the order of ip added to the interface the first IP will be primary and second, third,... or alias IP will be secondary if the IP subnet matches this patch add the flexiblity to allow user to specify an argument 'primary' or 'secondary' (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for example) to specify an IP address to be primary or secondary. the reason for this patch is that we have a multi blade cluster platform sharing 'floating management ip' and also that each blade has its own management ip on the management interface, so whichever blade in the cluster becomes primary blade, the 'floating mangaement ip' follows it, and we want any of our traffic originated from the primary blade source from the 'floating management ip' for consistency. but in this case, since the local blade management ip is always the primary ip on the mangaement interface and 'floating management ip' is always secondary, kernel always choose the primary ip as source ip address. thus we would like to add the flexibility in kernel to allow us to specify which ip to be primary or secondary. Signed-off-by: Vincent Li --- net/ipv4/devinet.c |9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index a1b5bcb..bfc702a 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh, return 0; } - ifa->ifa_flags &= ~IFA_F_SECONDARY; last_primary = &in_dev->ifa_list; + if((*last_primary) == NULL) + ifa->ifa_flags &= ~IFA_F_SECONDARY; + for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL; ifap = &ifa1->ifa_next) { if (!(ifa1->ifa_flags & IFA_F_SECONDARY) && @@ -458,7 +460,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh, inet_free_ifa(ifa); return -EINVAL; } - ifa->ifa_flags |= IFA_F_SECONDARY; +if (!(ifa->ifa_flags & IFA_F_SECONDARY)) +ifa1->ifa_flags |= IFA_F_SECONDARY; +else +ifa->ifa_flags |= IFA_F_SECONDARY; } } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
Ok, I will resend the patch with your suggestions. Vincent On Tue, Sep 24, 2013 at 12:28 PM, David Miller wrote: > From: Vincent Li > Date: Tue, 24 Sep 2013 11:11:21 -0700 > >> the current behavior is when an IP is added to an interface, the primary >> or secondary attributes is depending on the order of ip added to the >> interface >> the first IP will be primary and second, third,... or alias IP will be >> secondary >> if the IP subnet matches >> >> this patch add the flexiblity to allow user to specify an argument 'primary' >> or 'secondary' >> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for >> example) to specify >> an IP address to be primary or secondary. >> >> the reason for this patch is that we have a multi blade cluster platform >> sharing 'floating management ip' >> and also that each blade has its own management ip on the management >> interface, so whichever blade in the >> cluster becomes primary blade, the 'floating mangaement ip' follows it, and >> we want any of our traffic >> originated from the primary blade source from the 'floating management ip' >> for consistency. but in this >> case, since the local blade management ip is always the primary ip on the >> mangaement interface and 'floating >> management ip' is always secondary, kernel always choose the primary ip as >> source ip address. thus we would >> like to add the flexibility in kernel to allow us to specify which ip to be >> primary or secondary. >> >> Signed-off-by: Vincent Li > > When submitting a patch, please: > > 1) Specify an appropriate prefix for your subject line, indicating the >subsystem. "ipv4: " might be appropriate here. > > 2) Format your commit message so that lines do not exceed 80 columns. >People will read using ASCII text based tools in 80 column >terminals. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ipv4:PATCH] Allow userspace to specify primary or secondary ip on interface
Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface. the current behavior is when an IP is added to an interface, the primary or secondary attributes is depending on the order of ip added to the interface the first IP will be primary and second, third...or alias IP will be secondary if the IP subnet matches. this patch add the flexiblity to allow user to specify an argument 'primary' or 'secondary' (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for example) to specify an IP address to be primary or secondary. the reason for this patch is that we have a multi blade cluster platform sharing 'floating management ip' and also that each blade has its own management ip on the management interface, so whichever blade in the cluster becomes primary blade, the 'floating mangaement ip' follows it, and we want any of our traffic originated from the primary blade source from the 'floating management ip' for consistency. but in this case, since the local blade management ip is always the primary ip on the mangaement interface and 'floating management ip' is always secondary, kernel always choose the primary ip as source ip address. thus we would like to add the flexibility in kernel to allow us to specify which ip to be primary or secondary. Signed-off-by: Vincent Li --- net/ipv4/devinet.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index a1b5bcb..5a7764e 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -440,8 +440,9 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh, return 0; } - ifa->ifa_flags &= ~IFA_F_SECONDARY; last_primary = &in_dev->ifa_list; + if(*last_primary == NULL) + ifa->ifa_flags &= ~IFA_F_SECONDARY; for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL; ifap = &ifa1->ifa_next) { @@ -458,7 +459,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh, inet_free_ifa(ifa); return -EINVAL; } - ifa->ifa_flags |= IFA_F_SECONDARY; + if (!(ifa->ifa_flags & IFA_F_SECONDARY)) + ifa1->ifa_flags |= IFA_F_SECONDARY; + else + ifa->ifa_flags |= IFA_F_SECONDARY; } } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
Thanks Julian for the comments, I imagined it would not be so simple as it changed old behavior with ip binary and some actions in __inet_del_ifa() that I am not fully aware of. my intention is to preserve the old behavior and extend the flexibility, I am unable to come up with a good patch to achieve the intended behavior. I had to patch the ip binary to sort of preserve original ip binary behavior with the kernel patch I provided., the ip command patch below: diff --git a/ip/ipaddress.c b/ip/ipaddress.c index 1c3e4da..9f2802c 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -1259,6 +1259,7 @@ static int ipaddr_modify(int cmd, int flags, int argc, char **argv) req.n.nlmsg_flags = NLM_F_REQUEST | flags; req.n.nlmsg_type = cmd; req.ifa.ifa_family = preferred_family; + req.ifa.ifa_flags |= IFA_F_SECONDARY; while (argc > 0) { if (strcmp(*argv, "peer") == 0 || @@ -1307,6 +1308,11 @@ static int ipaddr_modify(int cmd, int flags, int argc, char **argv) invarg("invalid scope value.", *argv); req.ifa.ifa_scope = scope; scoped = 1; +} else if (strcmp(*argv, "secondary") == 0 || + strcmp(*argv, "temporary") == 0) { +req.ifa.ifa_flags |= IFA_F_SECONDARY; +} else if (strcmp(*argv, "primary") == 0) { +req.ifa.ifa_flags &= ~IFA_F_SECONDARY; } else if (strcmp(*argv, "dev") == 0) { NEXT_ARG(); d = *argv; if someone can point me to the right patch directions or coming up with better patches, it is very much appreciated. On Tue, Sep 24, 2013 at 2:13 PM, Julian Anastasov wrote: > > Hello, > > On Tue, 24 Sep 2013, Vincent Li wrote: > >> the current behavior is when an IP is added to an interface, the primary >> or secondary attributes is depending on the order of ip added to the >> interface >> the first IP will be primary and second, third,... or alias IP will be >> secondary >> if the IP subnet matches >> >> this patch add the flexiblity to allow user to specify an argument 'primary' >> or 'secondary' >> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for >> example) to specify >> an IP address to be primary or secondary. >> >> the reason for this patch is that we have a multi blade cluster platform >> sharing 'floating management ip' >> and also that each blade has its own management ip on the management >> interface, so whichever blade in the >> cluster becomes primary blade, the 'floating mangaement ip' follows it, and >> we want any of our traffic >> originated from the primary blade source from the 'floating management ip' >> for consistency. but in this >> case, since the local blade management ip is always the primary ip on the >> mangaement interface and 'floating >> management ip' is always secondary, kernel always choose the primary ip as >> source ip address. thus we would >> like to add the flexibility in kernel to allow us to specify which ip to be >> primary or secondary. >> >> Signed-off-by: Vincent Li >> --- >> net/ipv4/devinet.c |9 +++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c >> index a1b5bcb..bfc702a 100644 >> --- a/net/ipv4/devinet.c >> +++ b/net/ipv4/devinet.c >> @@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, >> struct nlmsghdr *nlh, >> return 0; >> } >> >> - ifa->ifa_flags &= ~IFA_F_SECONDARY; >> last_primary = &in_dev->ifa_list; >> >> + if((*last_primary) == NULL) >> + ifa->ifa_flags &= ~IFA_F_SECONDARY; >> + >> for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL; >>ifap = &ifa1->ifa_next) { >> if (!(ifa1->ifa_flags & IFA_F_SECONDARY) && >> @@ -458,7 +460,10 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, >> struct nlmsghdr *nlh, >> inet_free_ifa(ifa); >> return -EINVAL; >> } >> - ifa->ifa_flags |= IFA_F_SECONDARY; > > There is some confusion here, when ifa has > IFA_F_SECONDARY bit set, in the 'else' we set it again. > I guess the 'else' part is not needed. > >> +
Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
sorry Julian to miss your point after reading the __inet_del_ifa and see the rtmsg_ifa, fib_del_ifaddr/fib_add_ifaddr, I can try another patch and actually test if the patches changes works as it is intended, not just checking from ip binary output. Vincent On Tue, Sep 24, 2013 at 2:34 PM, Vincent Li wrote: > Thanks Julian for the comments, I imagined it would not be so simple > as it changed old behavior with ip binary and some actions in > __inet_del_ifa() that I am not fully aware of. my intention is to > preserve the old behavior and extend the flexibility, I am unable to > come up with a good patch to achieve the intended behavior. > > I had to patch the ip binary to sort of preserve original ip binary > behavior with the kernel patch I provided., the ip command patch > below: > > diff --git a/ip/ipaddress.c b/ip/ipaddress.c > index 1c3e4da..9f2802c 100644 > --- a/ip/ipaddress.c > +++ b/ip/ipaddress.c > @@ -1259,6 +1259,7 @@ static int ipaddr_modify(int cmd, int flags, int > argc, char **argv) > req.n.nlmsg_flags = NLM_F_REQUEST | flags; > req.n.nlmsg_type = cmd; > req.ifa.ifa_family = preferred_family; > + req.ifa.ifa_flags |= IFA_F_SECONDARY; > > while (argc > 0) { > if (strcmp(*argv, "peer") == 0 || > @@ -1307,6 +1308,11 @@ static int ipaddr_modify(int cmd, int flags, > int argc, char **argv) > invarg("invalid scope value.", *argv); > req.ifa.ifa_scope = scope; > scoped = 1; > +} else if (strcmp(*argv, "secondary") == 0 || > + strcmp(*argv, "temporary") == 0) { > +req.ifa.ifa_flags |= IFA_F_SECONDARY; > +} else if (strcmp(*argv, "primary") == 0) { > +req.ifa.ifa_flags &= ~IFA_F_SECONDARY; > } else if (strcmp(*argv, "dev") == 0) { > NEXT_ARG(); > d = *argv; > > if someone can point me to the right patch directions or coming up > with better patches, it is very much appreciated. > > > On Tue, Sep 24, 2013 at 2:13 PM, Julian Anastasov wrote: >> >> Hello, >> >> On Tue, 24 Sep 2013, Vincent Li wrote: >> >>> the current behavior is when an IP is added to an interface, the primary >>> or secondary attributes is depending on the order of ip added to the >>> interface >>> the first IP will be primary and second, third,... or alias IP will be >>> secondary >>> if the IP subnet matches >>> >>> this patch add the flexiblity to allow user to specify an argument >>> 'primary' or 'secondary' >>> (use 'ip addr add ip/mask primary|secondary dev ethX ' from iproute2 for >>> example) to specify >>> an IP address to be primary or secondary. >>> >>> the reason for this patch is that we have a multi blade cluster platform >>> sharing 'floating management ip' >>> and also that each blade has its own management ip on the management >>> interface, so whichever blade in the >>> cluster becomes primary blade, the 'floating mangaement ip' follows it, and >>> we want any of our traffic >>> originated from the primary blade source from the 'floating management ip' >>> for consistency. but in this >>> case, since the local blade management ip is always the primary ip on the >>> mangaement interface and 'floating >>> management ip' is always secondary, kernel always choose the primary ip as >>> source ip address. thus we would >>> like to add the flexibility in kernel to allow us to specify which ip to be >>> primary or secondary. >>> >>> Signed-off-by: Vincent Li >>> --- >>> net/ipv4/devinet.c |9 +++-- >>> 1 file changed, 7 insertions(+), 2 deletions(-) >>> >>> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c >>> index a1b5bcb..bfc702a 100644 >>> --- a/net/ipv4/devinet.c >>> +++ b/net/ipv4/devinet.c >>> @@ -440,9 +440,11 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, >>> struct nlmsghdr *nlh, >>> return 0; >>> } >>> >>> - ifa->ifa_flags &= ~IFA_F_SECONDARY; >>> last_primary = &in_dev->ifa_list; >>> >>> + if((*last_primary) == NULL) >>> + ifa->ifa_flags &= ~IFA_F_SECONDARY; >>> + >>> fo
Re: [PATCH] Allow userspace code to use flag IFA_F_SECONDARY to specify an ip address to be primary or secondary ip on an interface
I think it is good idea to add these preferences flags and sorted them, but my code knowledge is limited to implement it as I am still learning, I can help testing :) On Wed, Sep 25, 2013 at 12:08 AM, Julian Anastasov wrote: > > Hello, > > On Tue, 24 Sep 2013, Vincent Li wrote: > >> Thanks Julian for the comments, I imagined it would not be so simple >> as it changed old behavior with ip binary and some actions in >> __inet_del_ifa() that I am not fully aware of. my intention is to >> preserve the old behavior and extend the flexibility, I am unable to >> come up with a good patch to achieve the intended behavior. > > ... > >> if someone can point me to the right patch directions or coming up >> with better patches, it is very much appreciated. > > My first idea was to use NLM_F_APPEND to implement > 'ip addr prepend' and 'ip addr append' but the default > operation is 'append' without providing NLM_F_APPEND, so it > does not work. > > Another idea is to add new attribute IFA_PREFERENCE in > include/uapi/linux/if_addr.h just before __IFA_MAX, integer, > 3 of the values are known. A preference for the used scope. > > /* Add as last, default */ > IFA_PREFERENCE_APPEND = 0, > > /* Add as last primary, before any present primary in subnet */ > IFA_PREFERENCE_PRIMARY = 128, > > /* First for scope */ > IFA_PREFERENCE_FIRST = 255, > > We should keep it in ifa as priority, for > sorting purposes. It can be 4-byte value, if user wants > to copy user-defined order into preference. > > Sorting order should be: > > - all primaries sorted by decreasing scope, decreasing > priority and adding order > > - then all secondaries (IFA_F_SECONDARY) sorted by decreasing > priority and adding order > > Usage: > > ip addr add ... pref[erence] type_or_priority > > # Add floating IP (append at priority 128) > # The primary mode is not guaranteed if another address from > # the same subnet is already using the same or higher priority. > ip addr add ... pref primary > # More preferred primary > ip addr add ... pref 129 > > # Add first IP for scope > ip addr add ... pref first > > The scope has similar 'sorting' property but not > for IPs in same subnet and it would be difficult to use > it for global routes. > > Thoughts? > > Regards > > -- > Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC 0/6] sched: packing small tasks
Hi, This patch-set takes advantage of the new statistics that are going to be available in the kernel thanks to the per-entity load-tracking: http://thread.gmane.org/gmane.linux.kernel/1348522. It packs the small tasks in as few as possible CPU/Cluster/Core. The main goal of packing small tasks is to reduce the power consumption by minimizing the number of power domain that are used. The packing is done in 2 steps: The 1st step looks for the best place to pack tasks on a system according to its topology and it defines a pack buddy CPU for each CPU if there is one available. The policy for setting a pack buddy CPU is that we pack at all levels where the power line is not shared by groups of CPUs. For describing this capability, a new flag has been introduced SD_SHARE_POWERLINE that is used to describe where CPUs of a scheduling domain are sharing their power rails. This flag has been set in all sched_domain in order to keep unchanged the default behaviour of the scheduler. In a 2nd step, the scheduler checks the load level of the task which wakes up and the business of the buddy CPU. Then, It can decide to migrate the task on the buddy. The patch-set has been tested on ARM platforms: quad CA-9 SMP and TC2 HMP (dual CA-15 and 3xCA-7 cluster). For ARM platform, the results have demonstrated that it's worth packing small tasks at all topology levels. The performance tests have been done on both platforms with sysbench. The results don't show any performance regressions. These results are aligned with the policy which uses the normal behavior with heavy use cases. test: sysbench --test=cpu --num-threads=N --max-requests=R run Results below is the average duration of 3 tests on the quad CA-9. default is the current scheduler behavior (pack buddy CPU is -1) pack is the scheduler with the pack mecanism | default | pack | --- N=8; R=200 | 3.1999 | 3.1921 | N=8; R=2000 | 31.4939 | 31.4844 | N=12; R=200 | 3.2043 | 3.2084 | N=12; R=2000 | 31.4897 | 31.4831 | N=16; R=200 | 3.1774 | 3.1824 | N=16; R=2000 | 31.4899 | 31.4897 | --- The power consumption tests have been done only on TC2 platform which has got accessible power lines and I have used cyclictest to simulate small tasks. The tests show some power consumption improvements. test: cyclictest -t 8 -q -e 100 -D 20 & cyclictest -t 8 -q -e 100 -D 20 The measurements have been done during 16 seconds and the result has been normalized to 100 | CA15 | CA7 | total | - default | 100 | 40 | 140 | pack | <1 | 45 | <46 | - The A15 cluster is less power efficient than the A7 cluster but if we assume that the tasks is well spread on both clusters, we can guest estimate that the power consumption on a dual cluster of CA7 would have been for a default kernel: | CA7 | CA7 | total | - default | 40 | 40 | 80 | ----- Vincent Guittot (6): Revert "sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking" sched: add a new SD SHARE_POWERLINE flag for sched_domain sched: pack small task at wakeup sched: secure access to other CPU statistics sched: pack the idle load balance ARM: sched: clear SD_SHARE_POWERLINE arch/arm/kernel/topology.c |5 ++ arch/ia64/include/asm/topology.h |1 + arch/tile/include/asm/topology.h |1 + include/linux/sched.h|9 +-- include/linux/topology.h |3 + kernel/sched/core.c | 13 ++-- kernel/sched/fair.c | 155 +++--- kernel/sched/sched.h | 10 +-- 8 files changed, 165 insertions(+), 32 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BISECTED] snd-hda-intel audio distortion in Linus' current tree
[Cc: alsa-de...@alsa-project.org; also, please cc: me explicitly as well, since I'm not subscribed to either list] On Wed, Sep 26, 2012 at 12:29 AM, Steven Noonan wrote: > Started having audio problems when trying out the latest tree > (v3.6-rc7-10-g56d27ad). When playing any kind of audio, there was > significant distortion, mostly crackling noise. I'm using a Lenovo > ThinkPad X230 (Panther Point). > > I did a git-bisect to locate the problem, and it seems this commit is to > blame: > > c20c5a841cbe47f5b7812b57bd25397497e5fbc0 is the first bad commit > commit c20c5a841cbe47f5b7812b57bd25397497e5fbc0 > Author: Seth Heasley > Date: Thu Jun 14 14:23:53 2012 -0700 > > ALSA: hda_intel: activate COMBO mode for Intel client chipsets > > This patch activates the COMBO position_fix for recent Intel > client chipsets. > COMBO mode is the recommended setting for Intel chipsets and > eliminates HD > audio warnings in dmesg. This patch has been tested on Lynx > Point, Panther > Point, and Cougar Pont. > > Signed-off-by: Seth Heasley > Signed-off-by: Takashi Iwai > > It's pretty clear-cut. If I revert this patch, my sound starts > functioning normally again. > > Any thoughts on how to proceed here? Can someone revert this, or is > there some testing that I can do? > > Here's a pretty-printed bisection log, if needed: > > # good: [28a33cbc] Linux 3.5 > # bad: [b13bc8dd] Merge tag 'staging-3.6-rc1' of git://git.kernel.or > # good: [3c4cfade] Merge git://git.kernel.org/pub/scm/linux/kernel/gi > # bad: [9fc37779] Merge tag 'usb-3.6-rc1' of git://git.kernel.org/pu > # bad: [f14121ab] Merge tag 'dt-for-3.6' of git://sources.calxeda.co > # good: [d14b7a41] Merge branch 'for-linus' of git://git.kernel.org/p > # good: [15d47763] Merge branch 'for-3.5' into for-3.6 > # bad: [dbf7b591] Merge tag 'sound-3.6' of git://git.kernel.org/pub/ > # bad: [1c76684d] ALSA: hda - add Haswell HDMI codec id > # bad: [8b8d654b] ALSA: hda - Move one-time init codes from generic_ > # good: [80c8bfbe] ALSA: HDA: Create phantom jacks for fixed inputs a > # bad: [ceaa86ba] ALSA: hda - Remove invalid init verbs for Nvidia 2 > # bad: [4b6ace9e] ALSA: hda - Add the support for VIA HDMI pin detec > # bad: [c20c5a84] ALSA: hda_intel: activate COMBO mode for Intel cli > > > - Steven > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > I can confirm that I've also hit this bug as well, and that it's still present in stable 3.6.0. Strangely enough however, this only seems to affect VLC for me; while playing audio through mplayer or any gstreamer-based players (Rhythmbox, Totem, etc.), I don't encounter any audio distortion. Possibly also related to [1]? A workaround (other than reverting this commit) is to not use COMBO mode, i.e. load snd-hda-intel with position_fix=2. Please let me know if any more information is needed. $ lspci -vvnn | grep -A8 Audio 00:1b.0 Audio device [0403]: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller [8086:1e20] (rev 04) Subsystem: Toshiba America Info Systems Device [1179:fb30] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Kernel driver in use: snd_hda_intel Machine: Toshiba Satellite P850 Distro: Debian wheezy/sid ALSA 1.0.25; PulseAudio 2.0 Regards, Vincent [1] http://mailman.alsa-project.org/pipermail/alsa-devel/2012-September/055161.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: show migration types in show_mem
This is useful to diagnose the reason for page allocation failure for cases where there appear to be several free pages. Example, with this alloc_pages(GFP_ATOMIC) failure: swapper/0: page allocation failure: order:0, mode:0x0 ... Mem-info: Normal per-cpu: CPU0: hi: 90, btch: 15 usd: 48 CPU1: hi: 90, btch: 15 usd: 21 active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:84 isolated_file:0 unevictable:0 dirty:0 writeback:0 unstable:0 free:4026 slab_reclaimable:75 slab_unreclaimable:484 mapped:0 shmem:0 pagetables:0 bounce:0 Normal free:16104kB min:2296kB low:2868kB high:3444kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:336kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:331776kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:300kB slab_unreclaimable:1936kB kernel_stack:328kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 Before the patch, it's hard (for me, at least) to say why all these free chunks weren't considered for allocation: Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 3*4096kB = 16128kB After the patch, it's obvious that the reason is that all of these are in the MIGRATE_CMA (C) freelist: Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB (C) 1*1024kB (C) 1*2048kB (C) 3*4096kB (C) = 16128kB Signed-off-by: Rabin Vincent --- mm/page_alloc.c | 42 -- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c13ea75..cbe5373 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2818,6 +2818,31 @@ out: #define K(x) ((x) << (PAGE_SHIFT-10)) +static void show_migration_types(unsigned char type) +{ + static const char types[MIGRATE_TYPES] = { + [MIGRATE_UNMOVABLE] = 'U', + [MIGRATE_RECLAIMABLE] = 'E', + [MIGRATE_MOVABLE] = 'M', + [MIGRATE_RESERVE] = 'R', +#ifdef CONFIG_CMA + [MIGRATE_CMA] = 'C', +#endif + [MIGRATE_ISOLATE] = 'I', + }; + char tmp[MIGRATE_TYPES + 1]; + char *p = tmp; + int i; + + for (i = 0; i < MIGRATE_TYPES; i++) { + if (type & (1 << i)) + *p++ = types[i]; + } + + *p = '\0'; + printk("(%s) ", tmp); +} + /* * Show free area list (used inside shift_scroll-lock stuff) * We also calculate the percentage fragmentation. We do this by counting the @@ -2942,6 +2967,7 @@ void show_free_areas(unsigned int filter) for_each_populated_zone(zone) { unsigned long nr[MAX_ORDER], flags, order, total = 0; + unsigned char types[MAX_ORDER]; if (skip_free_areas_node(filter, zone_to_nid(zone))) continue; @@ -2950,12 +2976,24 @@ void show_free_areas(unsigned int filter) spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < MAX_ORDER; order++) { - nr[order] = zone->free_area[order].nr_free; + struct free_area *area = &zone->free_area[order]; + int type; + + nr[order] = area->nr_free; total += nr[order] << order; + + types[order] = 0; + for (type = 0; type < MIGRATE_TYPES; type++) { + if (!list_empty(&area->free_list[type])) + types[order] |= 1 << type; + } } spin_unlock_irqrestore(&zone->lock, flags); - for (order = 0; order < MAX_ORDER; order++) + for (order = 0; order < MAX_ORDER; order++) { printk("%lu*%lukB ", nr[order], K(1UL) << order); + if (nr[order]) + show_migration_types(types[order]); + } printk("= %lukB\n", K(total)); } -- 1.7.11.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
CMA and zone watermarks
It appears that when CMA is enabled, the zone watermarks are not properly respected, leading to for example GFP_NOWAIT allocations getting access to the high pools. I ran the following test code which simply allocates pages with GFP_NOWAIT until it fails, and then tries GFP_ATOMIC. Without CMA, the GFP_ATOMIC allocation succeeds, with CMA, it fails too. Logs attached (includes my patch which prints the migration type in the failure message http://marc.info/?l=linux-mm&m=134971041701306&w=2), taken on 3.6 kernel. Thanks. diff --git a/arch/arm/mach-ux500/board-mop500.c b/arch/arm/mach-ux500/board-mop500.c index a534d88..b98d0df 100644 --- a/arch/arm/mach-ux500/board-mop500.c +++ b/arch/arm/mach-ux500/board-mop500.c @@ -854,3 +854,25 @@ DT_MACHINE_START(U8500_DT, "ST-Ericsson U8500 platform (Device Tree Support)") .dt_compat = u8500_dt_board_compat, MACHINE_END #endif + +static int __init late(void) +{ + while (1) { + void *p; + + p = alloc_page(GFP_NOWAIT); + if (!p) { + pr_err("GFP_NOWAIT failed, checking GFP_ATOMIC"); + + p = alloc_page(GFP_ATOMIC); + if (!p) + panic("GFP_ATOMIC failed too, fail!"); + + panic("GFP_ATOMIC OK, all good\n"); + } + + } + + return 0; +} +late_initcall(late); cmalog.txt.gz Description: GNU Zip compressed data
Re: CMA and zone watermarks
Hi Marek, Minchan, 2012/10/9 Marek Szyprowski : > Could You run your test with latest linux-next kernel? There have been some > patches merged to akpm tree which should fix accounting for free and free > cma pages. I hope it should fix this issue. I've tested with the mentioned patches (which seem to have also reached Linus' tree today) and they appear to resolve the problem. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] drm/omap: fix allocation size for page addresses array
Signed-off-by: Rob Clark Signed-off-by: Vincent Penquerc'h --- drivers/staging/omapdrm/omap_gem.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/omapdrm/omap_gem.c b/drivers/staging/omapdrm/omap_gem.c index c828743..4c1472c 100644 --- a/drivers/staging/omapdrm/omap_gem.c +++ b/drivers/staging/omapdrm/omap_gem.c @@ -246,7 +246,7 @@ static int omap_gem_attach_pages(struct drm_gem_object *obj) * DSS, GPU, etc. are not cache coherent: */ if (omap_obj->flags & (OMAP_BO_WC|OMAP_BO_UNCACHED)) { - addrs = kmalloc(npages * sizeof(addrs), GFP_KERNEL); + addrs = kmalloc(npages * sizeof(*addrs), GFP_KERNEL); if (!addrs) { ret = -ENOMEM; goto free_pages; @@ -257,7 +257,7 @@ static int omap_gem_attach_pages(struct drm_gem_object *obj) 0, PAGE_SIZE, DMA_BIDIRECTIONAL); } } else { - addrs = kzalloc(npages * sizeof(addrs), GFP_KERNEL); + addrs = kzalloc(npages * sizeof(*addrs), GFP_KERNEL); if (!addrs) { ret = -ENOMEM; goto free_pages; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86, fpu: avoid FPU lazy restore after suspend
When a cpu enters S3 state, the FPU state is lost. After resuming for S3, if we try to lazy restore the FPU for a process running on the same CPU, this will result in a corrupted FPU context. We can just invalidate the "fpu_owner_task", so nobody will try to lazy restore a state which no longer exists in the hardware. Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off, by doing thousands of suspend/resume cycles with 4 processes doing FPU operations running. Without the patch, a process is killed after a few hundreds cycles by a SIGFPE. The issue seems to exist since 3.4 (after the FPU lazy restore was actually implemented), to apply the change to 3.4, "this_cpu_write" needs to be replaced by percpu_write. Cc: Duncan Laurie Cc: Olof Johansson Cc: [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write Signed-off-by: Vincent Palatin --- arch/x86/kernel/smpboot.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c80a33b..7610c58 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -68,6 +68,8 @@ #include #include #include +#include +#include #include #include #include @@ -1230,6 +1232,9 @@ int native_cpu_disable(void) clear_local_APIC(); cpu_disable_common(); + + /* the FPU context will be lost, nobody owns it */ + this_cpu_write(fpu_owner_task, NULL); return 0; } -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
issue with x86 FPU state after suspend to ram
Hi, On a 4-core Ivybridge platform, when doing a lot of suspend-to-ram/resume cycles, we were observing processes randomly killed by a SIGFPE. When dumping the FPU registers state on the SIGFPE (usually a floating stack underflow/overflow on a floating point arithmetic operation), the FPU registers looks empty or at least corrupted which was more or less impossible with respect to the disassembled floating point code. After doing more tracing, in the faulty case, the process seems to be keeping FPU ownership over a secondary CPU unplug/re-plug triggered by the suspend. Then it's doing a lazy restore of its FPU context (ie just using the current FPU hardware registers as he is the owner) instead of writing them back to the hardware from the version previously saved in the task context, despite the fact the whole FPU hardware state has been lost. Just invalidating the "fpu_owner_task" when disabling a secondary CPU seems to solve my issue (it's already reset for the primary CPU). By the way, when FPU the lazy restore patch was discussed back in february, Ingo commented (in http://permalink.gmane.org/gmane.linux.kernel/1255423) : " I guess the CPU hotplug case deserves a comment in the code: CPU hotplug + replug of the same (but meanwhile reset) CPU is safe because fpu_owner_task[cpu] gets reset to NULL. " That contradicts my previous observation, so maybe I have totally overlooked something in this mechanism. Can you comment ? I'm still putting my patch proposal in this thread. The issue seems to exist since 3.4 after the FPU lazy restore was actually implemented by commit 7e16838d "i387: support lazy restore of FPU state". But the issue is mainly visible on 3.4 and 3.6 since on tip of tree, it is hidden by the eager fpu implementation for platforms with xsave support, but it still happens with eagerfpu=off. To apply this change to 3.4, "this_cpu_write" needs to be replaced by percpu_write. -- Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] x86, fpu: avoid FPU lazy restore after suspend
When a cpu enters S3 state, the FPU state is lost. After resuming for S3, if we try to lazy restore the FPU for a process running on the same CPU, this will result in a corrupted FPU context. Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU, so nobody will try to lazy restore a state which doesn't exist in the hardware. Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off, by doing thousands of suspend/resume cycles with 4 processes doing FPU operations running. Without the patch, a process is killed after a few hundreds cycles by a SIGFPE. The issue seems to exist since 3.4 (after the FPU lazy restore was actually implemented), to apply the change to 3.4, "this_cpu_write" needs to be replaced by percpu_write. Cc: Duncan Laurie Cc: Olof Johansson Cc: [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write Signed-off-by: Vincent Palatin --- arch/x86/include/asm/fpu-internal.h | 15 +-- arch/x86/kernel/smpboot.c |5 + 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index 831dbb9..41ab26e 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -399,14 +399,17 @@ static inline void drop_init_fpu(struct task_struct *tsk) typedef struct { int preload; } fpu_switch_t; /* - * FIXME! We could do a totally lazy restore, but we need to - * add a per-cpu "this was the task that last touched the FPU - * on this CPU" variable, and the task needs to have a "I last - * touched the FPU on this CPU" and check them. + * Must be run with preemption disabled: this clears the fpu_owner_task, + * on this CPU. * - * We don't do that yet, so "fpu_lazy_restore()" always returns - * false, but some day.. + * This will disable any lazy FPU state restore of the current FPU state, + * but if the current thread owns the FPU, it will still be saved by. */ +static inline void __cpu_disable_lazy_restore(unsigned int cpu) +{ + per_cpu(fpu_owner_task, cpu) = NULL; +} + static inline int fpu_lazy_restore(struct task_struct *new, unsigned int cpu) { return new == this_cpu_read_stable(fpu_owner_task) && diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c80a33b..f3e2ec8 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -68,6 +68,8 @@ #include #include #include +#include +#include #include #include #include @@ -818,6 +820,9 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; + /* the FPU context is blank, nobody can own it */ + __cpu_disable_lazy_restore(cpu); + err = do_boot_cpu(apicid, cpu, tidle); if (err) { pr_debug("do_boot_cpu failed %d\n", err); -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] x86, fpu: avoid FPU lazy restore after suspend
When a cpu enters S3 state, the FPU state is lost. After resuming for S3, if we try to lazy restore the FPU for a process running on the same CPU, this will result in a corrupted FPU context. Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU, so nobody will try to lazy restore a state which doesn't exist in the hardware. Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off, by doing thousands of suspend/resume cycles with 4 processes doing FPU operations running. Without the patch, a process is killed after a few hundreds cycles by a SIGFPE. Cc: Duncan Laurie Cc: Olof Johansson Cc: [v3.4+] # for 3.4 need to replace this_cpu_write by percpu_write Signed-off-by: Vincent Palatin --- Hi, The patch updated according the HPA and Linus comments. I'm still re-running the testing on v3. Change in v3: - remove misleading comment about 3.4 in the description. Change in v2: - add an helper function and comment in fpu-internal.h as described by Linus - do the cleaning in the native_cpu_up function as suggested by HPA Vincent arch/x86/include/asm/fpu-internal.h | 15 +-- arch/x86/kernel/smpboot.c |5 + 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h index 831dbb9..41ab26e 100644 --- a/arch/x86/include/asm/fpu-internal.h +++ b/arch/x86/include/asm/fpu-internal.h @@ -399,14 +399,17 @@ static inline void drop_init_fpu(struct task_struct *tsk) typedef struct { int preload; } fpu_switch_t; /* - * FIXME! We could do a totally lazy restore, but we need to - * add a per-cpu "this was the task that last touched the FPU - * on this CPU" variable, and the task needs to have a "I last - * touched the FPU on this CPU" and check them. + * Must be run with preemption disabled: this clears the fpu_owner_task, + * on this CPU. * - * We don't do that yet, so "fpu_lazy_restore()" always returns - * false, but some day.. + * This will disable any lazy FPU state restore of the current FPU state, + * but if the current thread owns the FPU, it will still be saved by. */ +static inline void __cpu_disable_lazy_restore(unsigned int cpu) +{ + per_cpu(fpu_owner_task, cpu) = NULL; +} + static inline int fpu_lazy_restore(struct task_struct *new, unsigned int cpu) { return new == this_cpu_read_stable(fpu_owner_task) && diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c80a33b..f3e2ec8 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -68,6 +68,8 @@ #include #include #include +#include +#include #include #include #include @@ -818,6 +820,9 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle) per_cpu(cpu_state, cpu) = CPU_UP_PREPARE; + /* the FPU context is blank, nobody can own it */ + __cpu_disable_lazy_restore(cpu); + err = do_boot_cpu(apicid, cpu, tidle); if (err) { pr_debug("do_boot_cpu failed %d\n", err); -- 1.7.7.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, fpu: avoid FPU lazy restore after suspend
On Fri, Nov 30, 2012 at 11:55 AM, H. Peter Anvin wrote: > > On 11/30/2012 11:54 AM, Vincent Palatin wrote: > >> > > I have done a patch v2 according to your suggestions. > > I will run the testing on it now. > > I probably need at least 2 to 3 hours to validate it. > > > > That would be super. Let me know and I'll queue it up and send a pull > request with this and a few more urgent things to Linus. I have done 1000+ cycles so far with patch v3 (on 4-core Ivybridge and no eagerfpu), and did not hit my issue. I let the testing going on, but wrt the issue after suspend, this fixes it with very high probability (ie I have never done that many cycles without hitting the issue). -- Vincent -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH Resend 1/3] sched: fix nr_busy_cpus with coupled cpuidle
With the coupled cpuidle driver (but probably also with other drivers), a CPU loops in a temporary safe state while waiting for other CPUs of its cluster to be ready to enter the coupled C-state. If an IRQ or a softirq occurs, the CPU will stay in this internal loop if there is no need to resched. The SCHED softirq clears the NOHZ and increases nr_busy_cpus. If there is no need to resched, we will not call set_cpu_sd_state_idle because of this internal loop in a cpuidle state. We have to call set_cpu_sd_state_idle in tick_nohz_irq_exit which is used to handle such situation. Signed-off-by: Vincent Guittot --- kernel/time/tick-sched.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 955d35b..b8d74ea 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -570,6 +570,8 @@ void tick_nohz_irq_exit(void) if (!ts->inidle) return; + set_cpu_sd_state_idle(); + /* Cancel the timer because CPU already waken up from the C-states*/ menu_hrtimer_cancel(); __tick_nohz_idle_enter(ts); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH Resend 3/3] sched: fix update NOHZ_IDLE flag
The function nohz_kick_needed modifies NOHZ_IDLE flag that is used to update the nr_busy_cpus of the sched_group. When the sched_domain are updated (because of the unplug of a CPUs as an example) a null_domain is attached to CPUs. We have to test likely(!on_null_domain(cpu) first in order to detect such intialization step and to not modify the NOHZ_IDLE flag Signed-off-by: Vincent Guittot --- kernel/sched/fair.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 24a5588..1ef57a8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6311,7 +6311,7 @@ void trigger_load_balance(struct rq *rq, int cpu) likely(!on_null_domain(cpu))) raise_softirq(SCHED_SOFTIRQ); #ifdef CONFIG_NO_HZ - if (nohz_kick_needed(rq, cpu) && likely(!on_null_domain(cpu))) + if (likely(!on_null_domain(cpu)) && nohz_kick_needed(rq, cpu)) nohz_balancer_kick(cpu); #endif } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH Resend 2/3] sched: fix init NOHZ_IDLE flag
On my smp platform which is made of 5 cores in 2 clusters,I have the nr_busy_cpus field of sched_group_power struct that is not null when the platform is fully idle. The root cause seems to be: During the boot sequence, some CPUs reach the idle loop and set their NOHZ_IDLE flag while waiting for others CPUs to boot. But the nr_busy_cpus field is initialized later with the assumption that all CPUs are in the busy state whereas some CPUs have already set their NOHZ_IDLE flag. We clear the NOHZ_IDLE flag when nr_busy_cpus is initialized in order to have a coherent configuration. Signed-off-by: Vincent Guittot --- kernel/sched/core.c |1 + 1 file changed, 1 insertion(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bae620a..77a01c8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5875,6 +5875,7 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd) update_group_power(sd, cpu); atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight); + clear_bit(NOHZ_IDLE, nohz_flags(cpu)); } int __weak arch_sd_sibling_asym_packing(void) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/