Re: fd code multithreaded race?
On Wed Aug 04 2010 at 13:21:07 +, Andrew Doran wrote: On Sat, Jul 31, 2010 at 08:31:19PM +0300, Antti Kantee wrote: Hi, I'm looking at a KASSERT which is triggering quite rarely for me (in terms of iterations): panic: kernel diagnostic assertion dt-dt_ff[i]-ff_refcnt == 0 failed: file /usr/allsrc/src/sys/rump/librump/rumpkern/../../../kern/kern_descrip.c, line 856 Upon closer examination, it seems that this can trigger while another thread is in fd_getfile() between upping the refcount, testing for ff_file, and fd_putfile(). Removing the KASSERT seems to restore correct You're right there, the KASSERT() is wrong, it should be removed. Thanks, I'll do that. operation, but I didn't read the code far enough to see where the race is actually handled and what stops the code from using the wrong file. FYI the fdfile_t (per-descriptor records) are stable for the lifetime of the process, what each record descibes can and does of course change, and how those records are pointed to does change (fdtab_t). There isn't really a concept of wrong file, as in, the app gets what it asked for. It is free to ask for the wrong thing, and it's free to ask for the right thing at the wrong time, etc - that's its problem. Unless you're alluding to another bug? Not really. I just started thinking about how applications can make sure they use the right file descriptor. It seems using close() to notify other threads of a file descriptor being closed is racy. So something naiive like this: t1: lock t1: get fd1 t1: unlock /* t1 wants to do a syscall with fd1 but is preempted */ t2: lock t2: close fd1 t2: unlock t3: lock t3: open, result fd1 t3: unlock t1: syscall fd1 ... will give you the wrong result. Essentially there is no interlock from the application lookup to the kernel backing object lookup. So I guess if you want things to work correctly, instead of close() you need to dup2() to a zombie/deadfs fd and wait for all threads to check in before you can close it. (i assume dup2 is atomic) Never realized file descriptors and threads were so tricky ;)
Re: fd code multithreaded race?
So something naiive like this: [...] will give you the wrong result. True. Essentially, you have a shared resource (the fd-number to thing table) which you are using without any locking. Of course there are races! Some threading setups allow different threads to have independent file descriptor tables, which would avoid the issue you sketch (but, of course, introduce other issues). Never realized file descriptors and threads were so tricky ;) There's nothing special about file descriptors here. You have basically the same issue with any other piece of state which is shared by all threads. To pick two more examples the kernel maintains, working directory and umask. These are a shared resource; like any shared resource, accessing them from multiple threads requires care, and usually locking of some sort. /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: weird ktrace timestamps?
There are two interesting things here: the timestamps in the file do not warp backwards by almost two seconds - but they do warp backwards, by 3353 ns. Try a different time counter. See kern.timecounter. kern.timecounter.choice = clockinterrupt(q=0, f=100 Hz) TSC(q=-100, f=1795636350 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-100, f=100 Hz) kern.timecounter.hardware = ACPI-Fast kern.timecounter.timestepwarnings = 0 I switched to TSC and got the same syndrome (well, as far as kdump output goes; I didn't dig into ktrace.out). Switching to clockinterrupt made the syndrome go away, but all the timestamp deltas printed by kdump -R were zero (there probably were a very few that weren't, but I didn't see any of them in a quick eyeball skim). Is this UP or MP? MP. cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel Core 2 (Merom) (686-class), 1795.62 MHz, id 0x6fd cpu0: features bfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR cpu0: features bfebfbffPGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX cpu0: features bfebfbffFXSR,SSE,SSE2,SS,HTT,TM,SBF cpu0: features2 e3bdSSE3,MONITOR,DS-CPL,VMX,EST,TM2,xTPR cpu0: Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz cpu0: I-cache 32 KB 64B/line 8-way, D-cache 32 KB 64B/line 8-way cpu0: L2 cache 2 MB 64B/line 8-way cpu0: using thermal monitor 1 cpu0: Enhanced SpeedStep (1420 mV) 2000 MHz cpu0: unknown Enhanced SpeedStep CPU. cpu0: using only highest and lowest power states. cpu0: Enhanced SpeedStep frequencies available (MHz): 2000 1200 cpu0: calibrating local timer cpu0: apic clock running at 199 MHz cpu0: 64 page colors cpu1 at mainbus0: apid 1 (application processor) cpu1: starting cpu1: Intel Core 2 (Merom) (686-class), 1795.43 MHz, id 0x6fd cpu1: features bfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR cpu1: features bfebfbffPGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX cpu1: features bfebfbffFXSR,SSE,SSE2,SS,HTT,TM,SBF cpu1: features2 e3bdSSE3,MONITOR,DS-CPL,VMX,EST,TM2,xTPR cpu1: Intel(R) Core(TM)2 Duo CPU T7100 @ 1.80GHz cpu1: I-cache 32 KB 64B/line 8-way, D-cache 32 KB 64B/line 8-way cpu1: L2 cache 2 MB 64B/line 8-way cpu1: using thermal monitor 1 I did consider the possibility that MP was relevant, but it strikes me as highly unlikely that it would migrate between processors exactly there every time (and, apparently, _only_ there). Is it likely enough that it's something else tied to MPness that it's worth trying a test under a UP kernel? /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTMLmo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Re: Module and device configuration locking [was Re: Modules loading modules?]
On Wed, 4 Aug 2010, Andrew Doran wrote: Sorry for not replying sooner. Please don't add a generic recursive lock facility, it would be abused. Yeah! My current patches have no new generic facilities at all. I'd like something roughly like the following. I think this should also cover major configuration tasks such as device attach and detach, so it wouldn't really be module specific. The naming is a poor suggestion. void sysconfig_lock(void) { if (mutex_owned(sysconfig_mutex)) { /* * At this point it's OK for the thread to hold other * locks, since the system configuration lock was the * first taken on the way in. */ sysconfig_recurse++; return; } /* * I don't remember the correct argument sequence to the * following, but basically we need to assert that NO locks, * sleep or spin, other than kernel_lock are held, * as the system configuration lock would be the outermost * lock in the system. */ LOCKDEBUG_BARRIER(...); mutex_enter(sysconfig_mutex); KASSERT(sysconfig_recurse == 0); } For the boot autoconfig path where you have things being driven by the kernel, this would work OK. Situations where there's a kthread or something attaching and detaching devices on the fly are a bit different, since they likely have local atomicity requirements (need to take device private locks). There you'd probably need the caller to take the lock. USB comes to mind (along with some hardware RAID drivers that do hotplug attach/detach). Thoughts? Well, at first glance, this proposal makes sense. But it is certainly a much larger task than dealing with the immediate problem - modules which want to load other modules. So, I really have three questions: 1. In keeping with earlier concerns about holding locks for long periods of time (eeh and mrg both commented on this), the approach described above would seem to have an even longer hold-time than the current module_lock. Would not something similar to haad's condvar approach be appropriate for this proposal, as well as for the more-limited-in- scope recursive module lock? 2. Would it be reasonable to solve the immediate problem as previously proposed, rather than waiting for this ultimate solution? I think it would be a long time before we could find and resolve all of the issues that might be created in the various threaded situations that may exist. I know I'm certainly not sufficiently qualified to identify all those situations, and I also know that I don't have sufficient available work-hours to do this in any reasonable time- frame. I'd still like to move forward with the most recent solution to the module_lock problem. 3. Since we're still technically abusing the mutex(9) man-page caveat about using mutex_owned() for making locking decisions, it would be very much appreciated (at least by myself) to have an explanation of WHY it is OK in this case to do it here, but not somewhere else. - | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net | | Kernel Developer | | pgoyette at netbsd.org | -
Re: fd code multithreaded race?
On Wed, 4 Aug 2010 17:06:59 +0300 Antti Kantee po...@cs.hut.fi wrote: Never realized file descriptors and threads were so tricky ;) I'm just finishing writing multi-threaded logging subsystem for one of my projects. I think I faced a similar issue with threads and shared file descriptors. Threads can log data to the same file, or different files on the filesystem. They can log data to one file, or many different files simultaneously. I implemented a new type, which is basically a structure that contains mutex, file descriptor, file path string and reference counter. Any thread doing I/O on a particular descriptor has to lock its mutex. Since, logging data can remain in buffers, waiting to be flushed, threads increment descriptors' reference counters, when they flush buffers they decrement those counters. This ensures that descriptors are not freed from memory, until reference counter drops to 0. If the value of the descriptor == -1, thread holding the lock calls open() with the file path as the first argument. My code uses a hash table to lookup descriptors, however I don't use the value of an open file descriptor (which can change between multiple open()/close() calls), but the address of the structure that describes a particular file descriptor.