Re: Safely writing to the same file in parallel foreach loop
On Wed, Nov 14, 2012 at 04:56:53PM +0100, Joseph Rushton Wakeling wrote: > Suppose that I've got a foreach loop in which I write output to a file: > > auto f = File("test.txt", "w"); f.close(); // to start with a blank file > foreach(i; iota(0, 100)) > { > f = File("test.txt", "a"); > f.writeln(i); > f.close(); > } > > I'm guessing it is at least potentially unsafe to parallelize the > loop without also considering the file interactions: > > foreach(i; parallel(iota(0, 100), 20)) > { > f = File("test.txt", "a"); // What happens if 2 threads want to > f.writeln(i); // open this file at the same time? > f.close(); > } > > ... so, is there a way that I can ensure that the file appending > takes place successfully but also safely in each thread? Let's > assume that I don't care about the order of writing, only that it > takes place. If you're on Posix, you can use file locks to ensure atomic writes to the file (all threads have to use it though: it's only an advisory lock, not a mandatory lock): see the manpage for fcntl, look for F_GETLK. T -- It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?
Re: Safely writing to the same file in parallel foreach loop
On 11/14/2012 05:16 PM, H. S. Teoh wrote: If you're on Posix, you can use file locks to ensure atomic writes to the file (all threads have to use it though: it's only an advisory lock, not a mandatory lock): see the manpage for fcntl, look for F_GETLK. I take it there's no more "native-to-D" way of implementing a file lock? :-( I was browsing through the library descriptions of Mutex and ReadWriteMutex, but it's not clear how they'd apply to this case (parallelism is really something I have very limited experience of). I'm actually inclining towards an alternate solution, where the different threads send back results to the master thread which integrates them and writes out everything itself.
Re: Safely writing to the same file in parallel foreach loop
On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton Wakeling wrote: On 11/14/2012 05:16 PM, H. S. Teoh wrote: I take it there's no more "native-to-D" way of implementing a file lock? :-( Could you put the file access in a synchronized block? http://dlang.org/statement.html#SynchronizedStatement - Vijay
Re: Safely writing to the same file in parallel foreach loop
On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton Wakeling wrote: On 11/14/2012 05:16 PM, H. S. Teoh wrote: I take it there's no more "native-to-D" way of implementing a file lock? :-( Could you put the file access in a synchronized block? http://dlang.org/statement.html#SynchronizedStatement - Vijay
Re: Safely writing to the same file in parallel foreach loop
On 11/14/2012 06:49 PM, Vijay Nayar wrote: Could you put the file access in a synchronized block? http://dlang.org/statement.html#SynchronizedStatement Oh, good call -- seems to work. If you try and run the parallel code without it, there's a pretty nasty-looking error: /tmp/.rdmd-1000/rdmd-pforeach.d-7A1C6D0E6B47053236731E75615AD487/pforeach: double free or corruption (out): 0x7f58bc000910 *** ... but with the synchronized {} block around the file append, all seems to work fine.
Re: Safely writing to the same file in parallel foreach loop
I think this is what you want around the file access section: http://dlang.org/statement.html#SynchronizedStatement - Vijay On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton Wakeling wrote: I take it there's no more "native-to-D" way of implementing a file lock? :-(
Re: Safely writing to the same file in parallel foreach loop
On Wednesday, November 14, 2012 18:59:29 Joseph Rushton Wakeling wrote: > On 11/14/2012 06:49 PM, Vijay Nayar wrote: > > Could you put the file access in a synchronized block? > > > > http://dlang.org/statement.html#SynchronizedStatement > > Oh, good call -- seems to work. I would point out though that given how expensive disk writes are, unless you're doing a lot of work within the parallel foreach loop, there's a good chance that it would be more efficient to use std.concurrency and pass the writes to another thread to do the writing. And the loop itself should still be able to be a parallel foreach, so you wouldn't have to change much otherwise. But with the synchronized block, you'll probably end up with each thread spending a lot of its time waiting on the lock, which will end up making the whole thing effectively single-threaded. If the work being done in the parallel foreach is small enough, it might even be the case that simply making it a normal foreach and ditching the synchronized block would be faster. But you'll obviously have to experiment to see what works best with whatever you're doing. - Jonathan M Davis
Re: Safely writing to the same file in parallel foreach loop
On 11/14/2012 10:17 PM, Jonathan M Davis wrote: I would point out though that given how expensive disk writes are, unless you're doing a lot of work within the parallel foreach loop, there's a good chance that it would be more efficient to use std.concurrency and pass the writes to another thread to do the writing. In the application I have in mind, there is a LOT of work that would be done within the parallel foreach loop -- we're talking at least 20 minutes' solid processing before the file write takes place, so this seems an appropriate approach given how simple it is. That said, the separate-thread-for-writes is a nice concept and I'll have a play with it. Concurrency is something where I'm very much a beginner, so I'm very open to all suggestions -- thanks very much for this one!
Re: Safely writing to the same file in parallel foreach loop
On 11/15/2012 12:44 AM, Joseph Rushton Wakeling wrote: In the application I have in mind, there is a LOT of work that would be done within the parallel foreach loop -- we're talking at least 20 minutes' solid processing before the file write takes place, so this seems an appropriate approach given how simple it is. An oddity here: although the correct results seem to come out of the calculation, at the end, the program containing the parallel foreach hangs -- it doesn't stop running, even though all the calculations are complete. Any thoughts as to why? I guess a thread that has not closed correctly, but I can't see why any one of them should not do so.
Re: Safely writing to the same file in parallel foreach loop
On 11/15/2012 01:55 AM, Joseph Rushton Wakeling wrote: An oddity here: although the correct results seem to come out of the calculation, at the end, the program containing the parallel foreach hangs -- it doesn't stop running, even though all the calculations are complete. Any thoughts as to why? I guess a thread that has not closed correctly, but I can't see why any one of them should not do so. On closer examination, this appears to be only with gdc-compiled code -- if I compile with ldc or dmd the program exits normally.
Re: Safely writing to the same file in parallel foreach loop
On 11/15/2012 12:31 PM, Joseph Rushton Wakeling wrote: On 11/15/2012 01:55 AM, Joseph Rushton Wakeling wrote: An oddity here: although the correct results seem to come out of the calculation, at the end, the program containing the parallel foreach hangs -- it doesn't stop running, even though all the calculations are complete. Any thoughts as to why? I guess a thread that has not closed correctly, but I can't see why any one of them should not do so. On closer examination, this appears to be only with gdc-compiled code -- if I compile with ldc or dmd the program exits normally. OK, this is a known bug with GDC: http://www.gdcproject.org/bugzilla/show_bug.cgi?id=16
Re: Safely writing to the same file in parallel foreach loop
On 11/14/2012 10:17 PM, Jonathan M Davis wrote: I would point out though that given how expensive disk writes are, unless you're doing a lot of work within the parallel foreach loop, there's a good chance that it would be more efficient to use std.concurrency and pass the writes to another thread to do the writing. And the loop itself should still be able to be a parallel foreach, so you wouldn't have to change much otherwise. But with the synchronized block, you'll probably end up with each thread spending a lot of its time waiting on the lock, which will end up making the whole thing effectively single-threaded. Do you mean that the synchronized {} blocks have to all be completed before the threads can all be terminated? In the end the solution I came to was something like this: enum N = 16; // number of cases shared real[N+1] results; foreach(i; parallel(iota(0, N+1))) { // ... do a lot of calculation ... results[i] = // result of calculation } // and now at the end we write out all the data ... which seems to work, although I'm not 100% confident about its safety.
Re: Safely writing to the same file in parallel foreach loop
I'm not a robot and didn't mean to spam, the page got stuck in this odd refresh loop and I wasn't sure what was going on. On Wednesday, 14 November 2012 at 17:45:35 UTC, Vijay Nayar wrote: On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton Wakeling wrote: On 11/14/2012 05:16 PM, H. S. Teoh wrote: I take it there's no more "native-to-D" way of implementing a file lock? :-( Could you put the file access in a synchronized block? http://dlang.org/statement.html#SynchronizedStatement - Vijay
Re: Safely writing to the same file in parallel foreach loop
On Thursday, November 15, 2012 15:33:31 Joseph Rushton Wakeling wrote: > On 11/14/2012 10:17 PM, Jonathan M Davis wrote: > > I would point out though that given how expensive disk writes are, unless > > you're doing a lot of work within the parallel foreach loop, there's a > > good > > chance that it would be more efficient to use std.concurrency and pass the > > writes to another thread to do the writing. And the loop itself should > > still be able to be a parallel foreach, so you wouldn't have to change > > much otherwise. But with the synchronized block, you'll probably end up > > with each thread spending a lot of its time waiting on the lock, which > > will end up making the whole thing effectively single-threaded. > > Do you mean that the synchronized {} blocks have to all be completed before > the threads can all be terminated? No, I mean that if you have a bunch of threads all trying to get the mutex for the synchronized block, then the only one doing anything is the one in the synchronized block. Once it's done, it then just loops back around, quickly going through whatever calculations it has to do before hitting the synchronized block again. In the meantime one of the other threads got the synchronized block and is writing to disk. But all the other threads are still waiting. So, for each of the threads, almost all of the time is spent blocked, making it so that most of the time, only one thread is doing anything, which completely defeats the purpose of having multiple threads. >From the sounds of it, this doesn't really affect you, because you're doing expensive calculations, but anything with very fast but parallelizable calculations could be totally screwed by the synchronized block. - Jonathan M Davis