nanosleep - does it make sense with tv_sec 0?
Hi Hackers, I ran into an oddity with the POSIX spec that seems a bit unrealistic: [EINVAL] The rqtp argument specified a nanosecond value less than zero or greater than or equal to 1000 million. Seems like it should also apply for seconds 0. We current silently pass this argument in kern/kern_time.c:kern_nanosleep: int kern_nanosleep(struct thread *td, struct timespec *rqt, struct timespec *rmt) { struct timespec ts, ts2, ts3; struct timeval tv; int error; if (rqt-tv_nsec 0 || rqt-tv_nsec = 10) return (EINVAL); if (rqt-tv_sec 0 || (rqt-tv_sec == 0 rqt-tv_nsec == 0)) // -- first clause here return (0); but I'm wondering whether or not it makes logical sense for us to do this (sleep for a negative amount of time?)... FWIW Linux returns -1 and sets EINVAL in this case, which makes more sense to me. Thanks, -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Deterministic failure to meet sysconf(_SC_TIMER_MAX) for CLOCK_REALTIME
Hi, Running the following noted test [1], I always run into issues on the 29th iteration and EAGAIN: $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable Interestingly enough, sysconf(_SC_TIMER_MAX) returns 54; this is the requirement that the test is attempting to validate (that at least _SC_TIMER_MAX timers can be created via timer_create). The timers kernel code is capped to 25 by default, by a preprocessor define in .../sys/sysctl.h: /sys/sys/sysctl.h:#define CTL_P1003_1B_TIMER_MAX25 /* int */ Doesn't make sense why an additional 4 timers were created. Oh, and the sysctl reports something else entirely: p1003_1b.timers: 200112 p1003_1b.delaytimer_max: 2147483647 p1003_1b.timer_max: 32 So, what number is the source of truth and why don't they all match? Thanks! -Garrett PS I'm still running a CURRENT kernel based off of r206173... [1] http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp-dev.git;a=blob;f=testcases/open_posix_testsuite/conformance/behavior/timers/1-1.c;h=ac043b0913e93f8db93cc74e249316f5ff82bdc8;hb=HEAD ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
(no subject)
I have a similar problem. I have a NFS server (8.0 upgraded a couple times since Feb 2010) that locks up and requires a reboot. The clients are busy vm's from VMWare ESXi using the NFS server for vmdk virtual disk storage. The ESXi reports nfs server inactive and all the vm's post disk write errors when trying to write to their disk. /etc/rc.d/nfsd restart fails to work (it can not kill the nfsd process) The nfsd process runs at 100% cpu at rc_lo state in top. reboot is the only fix. It has only happened under two circumstances. 1) Installation of a VM using Windows 2008. 2) Migrating 16 million mail messages from a physical server to a VM running FreeBSD with ZFS file system as a VM on the ESXi box that uses NFS to store the VM's ZFS disk. The NFS server uses ZFS also. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
coherence-problem on the mapped memory buffer
Hello hackers, while working on the ringmap-project I've faced a problem of no coherency in the memory regions mapped from kernel into the user-space. Details: While integrating ringmap with the ixgbe-driver, I've made some changes to the ixgbe: 1. The mbufs for received packets will be only allocated once. 2. Allocated mbufs will be reused as in ring-buffer one after the other (no new mbufs will be allocated again). 3. Packet buffers (mbuf-m_data) will mapped into the user-space. So, the user-space process has access to the packets after those DMA-transfer from the network adapter into the RAM Problem: Sometimes the user-space process sees not new DMAed data in the mapped packet-buffer, but the OLD data that was previously stored in the same packet buffer. If I try to monitor the received data in the kernel, the kernel sees the data correctly. But sometimes it is vice versa: the user-space process sees the correct new data and the kernel sees the old data in the buffer. It seems to be that the memory-buffer for packets is not synchronized with all CPU's caches. Probably [user|kernel]-thread tries sometimes to reads the old dirty data from the cache of the CPU the thread running on. (In the same time the other thread sees the new data in the same mapped buffer). Can you please provide me with some information that would be helpful for avoiding this unexpected coherence-problem. Alex P.S. Details about hardware and used software: 1. /var/run/dmesg.boot : ... CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x20f10 Family = f Model = 21 Stepping = 0 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3 AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow! AMD Features2=0x3LAHF,CMP real memory = 3758030848 (3583 MB) avail memory = 3677495296 (3507 MB) ACPI APIC Table: A M I OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 4 package(s) x 2 core(s) ... 2. uname -v FreeBSD 9.0-CURRENT #3 3. sysctl kern.osreldate kern.osreldate: 900014 4. //depot/projects/soc2010/ringmap/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coherence-problem on the mapped memory buffer
on 29/07/2010 17:13 Alexander Fiveg said the following: P.S. Details about hardware and used software: 1. /var/run/dmesg.boot : ... CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x20f10 Family = f Model = 21 Stepping = 0 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3 AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow! AMD Features2=0x3LAHF,CMP real memory = 3758030848 (3583 MB) avail memory = 3677495296 (3507 MB) ACPI APIC Table: A M I OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 4 package(s) x 2 core(s) ... 2. uname -v FreeBSD 9.0-CURRENT #3 3. sysctl kern.osreldate kern.osreldate: 900014 4. //depot/projects/soc2010/ringmap/ No help, but just curious - do use amd64 variant? If yes, can you reproduce the problem with i386? -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coherence-problem on the mapped memory buffer
On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote: on 29/07/2010 17:13 Alexander Fiveg said the following: P.S. Details about hardware and used software: 1. /var/run/dmesg.boot : ... CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x20f10 Family = f Model = 21 Stepping = 0 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3 AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow! AMD Features2=0x3LAHF,CMP real memory = 3758030848 (3583 MB) avail memory = 3677495296 (3507 MB) ACPI APIC Table: A M I OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 4 package(s) x 2 core(s) ... 2. uname -v FreeBSD 9.0-CURRENT #3 3. sysctl kern.osreldate kern.osreldate: 900014 4. //depot/projects/soc2010/ringmap/ No help, but just curious - do use amd64 variant? If yes, can you reproduce the problem with i386? No, my kernel is i386, but I will try test it with amd64. Thanks Alex ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coherence-problem on the mapped memory buffer
on 29/07/2010 19:13 Andriy Gapon said the following: on 29/07/2010 17:13 Alexander Fiveg said the following: P.S. Details about hardware and used software: 1. /var/run/dmesg.boot : ... CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x20f10 Family = f Model = 21 Stepping = 0 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3 AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow! AMD Features2=0x3LAHF,CMP real memory = 3758030848 (3583 MB) avail memory = 3677495296 (3507 MB) ACPI APIC Table: A M I OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 4 package(s) x 2 core(s) ... 2. uname -v FreeBSD 9.0-CURRENT #3 3. sysctl kern.osreldate kern.osreldate: 900014 4. //depot/projects/soc2010/ringmap/ In fact I have a suspicion that the problem might have to do with multiple mappings of the shared pages, but far from sure... Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT; starting at the following words: «The PAT allows any memory type to be specified in the page tables, and therefore it is possible to have a single physical page mapped to two or more different linear addresses, each with different memory types. Intel does not support this practice...» -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coherence-problem on the mapped memory buffer
on 29/07/2010 19:45 Alexander Fiveg said the following: On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote: on 29/07/2010 17:13 Alexander Fiveg said the following: P.S. Details about hardware and used software: 1. /var/run/dmesg.boot : ... CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x20f10 Family = f Model = 21 Stepping = 0 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x1SSE3 AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow! AMD Features2=0x3LAHF,CMP real memory = 3758030848 (3583 MB) avail memory = 3677495296 (3507 MB) ACPI APIC Table: A M I OEMAPIC FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 4 package(s) x 2 core(s) ... 2. uname -v FreeBSD 9.0-CURRENT #3 3. sysctl kern.osreldate kern.osreldate: 900014 4. //depot/projects/soc2010/ringmap/ No help, but just curious - do use amd64 variant? If yes, can you reproduce the problem with i386? No, my kernel is i386, but I will try test it with amd64. Oh, nevermind actually. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Re: coherence-problem on the mapped memory buffer
Jul 29, 2010 12:58:07 PM, a...@icyb.net.ua wrote: on 29/07/2010 19:13 Andriy Gapon said the following: on 29/07/2010 17:13 Alexander Fiveg said the following: In fact I have a suspicion that the problem might have to do with multiple mappings of the shared pages, but far from sure... Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT; starting at the following words: «The PAT allows any memory type to be specified in the page tables, and therefore it is possible to have a single physical page mapped to two or more different linear addresses, each with different memory types. Intel does not support this practice...» My guess would be that the memory type is not marked as DMA-capable. AFAIK the Intel CPUs do the hardware snooping on the physical addresses, so they have no coherency issues benween themselves. However if a DMA writer changes the memory, this I think does not get normally propagated to the front-side bus, and the CPUs would not see it. You may need to either explicitly flush the CPU cache before accessing these pages or mark them as non-cacheable. -SB ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coherence-problem on the mapped memory buffer
on 29/07/2010 23:02 Sergey Babkin said the following: Jul 29, 2010 12:58:07 PM, a...@icyb.net.ua wrote: on 29/07/2010 19:13 Andriy Gapon said the following: on 29/07/2010 17:13 Alexander Fiveg said the following: In fact I have a suspicion that the problem might have to do with multiple mappings of the shared pages, but far from sure... Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT; starting at the following words: «The PAT allows any memory type to be specified in the page tables, and therefore it is possible to have a single physical page mapped to two or more different linear addresses, each with different memory types. Intel does not support this practice...» My guess would be that the memory type is not marked as DMA-capable. AFAIK the Intel CPUs do the hardware snooping on the physical addresses, so they have no coherency issues benween themselves. However if a DMA writer changes the memory, this I think does not get normally propagated to the front-side bus, and the CPUs would not see it. You may need to either explicitly flush the CPU cache before accessing these pages or mark them as non-cacheable. My guess was approximately the same - if one mapping is done in kernel for DMA purposes, then the memory type is, most likely, set to uncached. But the userland mapping of the same pages most likely marks the same pages (via different virtual addresses) as cached. Depending on the hardware and on what mappings were used on a particular CPU (core) to access that memory, there could be differences in interaction with DMA. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Improvement for Distributed Audit Project
I am Sergio Ligregni, from Mexico, I am currently working in the Distributed Audit Project at GSoC 2010, I want to ask your help in these things: HELP NEEDED: /*++*/ - which code should I base my development in getting parameters from a file? (I've searched some audit.c, auditd_fbsd.c, auditd.c but not got the function to do that, maybe I missed something), currently I have files like: /var/audit /var2/audit 1000 yes 53686 and got the parameters with sscanf, but the right way (the one I want to know wich code to take as baseline): dir:/var/audit /var2/audit time: 1000 slave_dir: yes port: 53686 and not to use sscanf (the avoiding of that function is a security concern made by my mentor). I think I can do an algorithm to implement that, but maybe there is a better/safer way to do in order to keeping the standard. /*++*/ Currently I have this function to verify if a file is a trail, having it's name, this is very poor and it needs to be improved, any ideas? /* * When exploring /var/audit/ (or the directory where the trails are), not * all files are trails so we must ensure we will only deal with the ones * that are trails. */ static int is_audit_trail(char *path) { /* * We have these posibilities, only the first one is allowed * 20100619223115.20100619223131 20100619223131.not_terminated * current */ if (strlen(path) == 29 path[14] == '.' isdigit(path[15])) { /* XXX To improve this checking later */ return 1; } return 0; } /*++*/ By the way the Wiki and the Perforce Repository for this project are: http://wiki.freebsd.org/SOC2010SergioLigregni http://p4db.freebsd.org/depotTreeBrowser.cgi?FSPC=//depot/projects/soc2010/disauditHIDEDEL=NO Thanks! -- --- Sergio Andrés Ligregni Arredondo Estudiante Ingeniería en Sistemas Computacionales, ITQ. Is UNIX Hot Enough for You? | FreeBSD ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: svn commit: r210561 - projects/sv/sys/net
On Wed, Jul 28, 2010 at 03:10:31PM +, Attilio Rao wrote: Log: Initial import of the netdump files. They still need a lot of polishing and cleanup so they might not be considered definitive at all. This code is a port to recent FreeBSD of Darrell Anderson's network crashdump support, which was done in the 4.x days. I can't find a current website with the original versions but archive.org has a cache of course: http://web.archive.org/web/20041204223729/http://www.cs.duke.edu/~anderson/freebsd/netdump/ Quoting from the old readme: Netdump provides FreeBSD kernel crash dumping over the network. Netdump is a FreeBSD kernel module client and user-level server. A normal kernel crash writes a raw dump of memory to a dedicated partition (usually the swap partition) using a low-level disk routine, and then copies that raw dump into a file (via savecore) during the following boot process. Netdump replaces the standard dump routine. During a crash, a netdump client broadcasts to locate a netdump server, then sends the dump as UDP/IP packets (with retransmission after loss). The netdump server creates a dump file suitable for gdb. If netdump fails (for example, no netdump server is located), a normal disk dump is performed. There is cleanup work to be done still, but we plan to have this in shape for 9.0. -Ed ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coherence-problem on the mapped memory buffer
On Thursday 29 July 2010 22:16:24 Andriy Gapon wrote: on 29/07/2010 23:02 Sergey Babkin said the following: Jul 29, 2010 12:58:07 PM, a...@icyb.net.ua wrote: on 29/07/2010 19:13 Andriy Gapon said the following: on 29/07/2010 17:13 Alexander Fiveg said the following: In fact I have a suspicion that the problem might have to do with multiple mappings of the shared pages, but far from sure... Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT; starting at the following words: «The PAT allows any memory type to be specified in the page tables, and therefore it is possible to have a single physical page mapped to two or more different linear addresses, each with different memory types. Intel does not support this practice...» My guess would be that the memory type is not marked as DMA-capable. AFAIK the Intel CPUs do the hardware snooping on the physical addresses, so they have no coherency issues benween themselves. However if a DMA writer changes the memory, this I think does not get normally propagated to the front-side bus, and the CPUs would not see it. You may need to either explicitly flush the CPU cache before accessing these pages or mark them as non-cacheable. My guess was approximately the same - if one mapping is done in kernel for DMA purposes, then the memory type is, most likely, set to uncached. But the userland mapping of the same pages most likely marks the same pages (via different virtual addresses) as cached. Depending on the hardware and on what mappings were used on a particular CPU (core) to access that memory, there could be differences in interaction with DMA. Thanks a lot for your answers. But i am afraid i do not have enough experience to solve these tasks. Could you please provide me with helpful information how to: - get access to the pages associated with a certain memory-buffer ? I mean, I want to get the structures, that describe the page properties I should change (for instance, in order to make the page non-cacheable). if you are aware of any good papers or examples in the system code, where these topics are covered, I would appreciate it if you gave me the references. Alex ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coherence-problem on the mapped memory buffer
on 30/07/2010 00:41 Alexander Fiveg said the following: Thanks a lot for your answers. But i am afraid i do not have enough experience to solve these tasks. Could you please provide me with helpful information how to: - get access to the pages associated with a certain memory-buffer ? I mean, I want to get the structures, that describe the page properties I should change (for instance, in order to make the page non-cacheable). if you are aware of any good papers or examples in the system code, where these topics are covered, I would appreciate it if you gave me the references. I don't have a recipe, but some pointers to get you started: 1. investigate BUS_DMA_NOCACHE, see bus_dma(9) 2. check sys/dev/sound/pci/hda/hdac.c for HDAC_F_DMA_NOCACHE and comment about PCIe snoop - this might be relevenat 3. see pmap_change_attr for way to change caching type for a memory mapping 4. hope that more knowledgeable people (experts) provide their advice, keep nudging them via mailing list(s) :-) -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
sched_pin() versus PCPU_GET
We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td-td_flags = ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: 802db210: 65 48 8b 1c 25 00 00mov%gs:0x0,%rbx 802db217: 00 00 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db21f: 65 48 8b 04 25 48 00mov%gs:0x48,%rax 802db226: 00 00 if (lock_list != NULL lock_list-ll_count != 0) { 802db228: 48 85 c0test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db22b: 48 89 85 f0 fe ff ffmov%rax,-0x110(%rbp) 802db232: 48 89 85 f8 fe ff ffmov%rax,-0x108(%rbp) if (lock_list != NULL lock_list-ll_count != 0) { 802db239: 0f 84 ff 00 00 00 je 802db33e witness_warn+0x30e 802db23f: 44 8b 60 50 mov0x50(%rax),%r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re-ordered the mov%gs:0x48,%rax PCPU_GET to before the sched_pin() increment? Thanks, matthew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sched_pin() versus PCPU_GET
On Thu, Jul 29, 2010 at 4:39 PM, m...@freebsd.org wrote: We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td-td_flags = ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: 802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx 802db217: 00 00 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax 802db226: 00 00 if (lock_list != NULL lock_list-ll_count != 0) { 802db228: 48 85 c0 test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db22b: 48 89 85 f0 fe ff ff mov %rax,-0x110(%rbp) 802db232: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) if (lock_list != NULL lock_list-ll_count != 0) { 802db239: 0f 84 ff 00 00 00 je 802db33e witness_warn+0x30e 802db23f: 44 8b 60 50 mov 0x50(%rax),%r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re-ordered the mov %gs:0x48,%rax PCPU_GET to before the sched_pin() increment? So after some research, the answer I'm getting is maybe. What I'm concerned about is whether the h/w reordered the read of PCPU_GET in front of the previous store to increment td_pinned. While not an ultimate authority, http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_systems implies that stores can be reordered after loads for both Intel and amd64 chips, which would I believe account for the behavior seen here. Thanks, matthew ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org