Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread H. S. Teoh
On Wed, Nov 14, 2012 at 04:56:53PM +0100, Joseph Rushton Wakeling wrote:
> Suppose that I've got a foreach loop in which I write output to a file:
> 
> auto f = File("test.txt", "w"); f.close();  // to start with a blank file
> foreach(i; iota(0, 100))
> {
> f = File("test.txt", "a");
> f.writeln(i);
> f.close();
> }
> 
> I'm guessing it is at least potentially unsafe to parallelize the
> loop without also considering the file interactions:
> 
> foreach(i; parallel(iota(0, 100), 20))
> {
> f = File("test.txt", "a");  // What happens if 2 threads want to
> f.writeln(i);   // open this file at the same time?
> f.close();
> }
> 
> ... so, is there a way that I can ensure that the file appending
> takes place successfully but also safely in each thread?  Let's
> assume that I don't care about the order of writing, only that it
> takes place.

If you're on Posix, you can use file locks to ensure atomic writes to
the file (all threads have to use it though: it's only an advisory lock,
not a mandatory lock): see the manpage for fcntl, look for F_GETLK.


T

-- 
It always amuses me that Windows has a Safe Mode during bootup. Does
that mean that Windows is normally unsafe?


Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Joseph Rushton Wakeling

On 11/14/2012 05:16 PM, H. S. Teoh wrote:

If you're on Posix, you can use file locks to ensure atomic writes to
the file (all threads have to use it though: it's only an advisory lock,
not a mandatory lock): see the manpage for fcntl, look for F_GETLK.


I take it there's no more "native-to-D" way of implementing a file lock? :-(

I was browsing through the library descriptions of Mutex and ReadWriteMutex, but 
it's not clear how they'd apply to this case (parallelism is really something I 
have very limited experience of).  I'm actually inclining towards an alternate 
solution, where the different threads send back results to the master thread 
which integrates them and writes out everything itself.


Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Vijay Nayar
On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton 
Wakeling wrote:

On 11/14/2012 05:16 PM, H. S. Teoh wrote:
I take it there's no more "native-to-D" way of implementing a 
file lock? :-(


Could you put the file access in a synchronized block?

http://dlang.org/statement.html#SynchronizedStatement

 - Vijay



Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Vijay Nayar
On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton 
Wakeling wrote:

On 11/14/2012 05:16 PM, H. S. Teoh wrote:
I take it there's no more "native-to-D" way of implementing a 
file lock? :-(


Could you put the file access in a synchronized block?

http://dlang.org/statement.html#SynchronizedStatement

 - Vijay






Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Joseph Rushton Wakeling

On 11/14/2012 06:49 PM, Vijay Nayar wrote:

Could you put the file access in a synchronized block?

http://dlang.org/statement.html#SynchronizedStatement


Oh, good call -- seems to work.

If you try and run the parallel code without it, there's a pretty nasty-looking 
error:


   /tmp/.rdmd-1000/rdmd-pforeach.d-7A1C6D0E6B47053236731E75615AD487/pforeach: 
double free or corruption (out): 0x7f58bc000910 ***


... but with the synchronized {} block around the file append, all seems to work 
fine.


Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Vijay Nayar

I think this is what you want around the file access section:

http://dlang.org/statement.html#SynchronizedStatement

 - Vijay

On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton 
Wakeling wrote:
I take it there's no more "native-to-D" way of implementing a 
file lock? :-(




Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Jonathan M Davis
On Wednesday, November 14, 2012 18:59:29 Joseph Rushton Wakeling wrote:
> On 11/14/2012 06:49 PM, Vijay Nayar wrote:
> > Could you put the file access in a synchronized block?
> > 
> > http://dlang.org/statement.html#SynchronizedStatement
> 
> Oh, good call -- seems to work.

I would point out though that given how expensive disk writes are, unless 
you're doing a lot of work within the parallel foreach loop, there's a good 
chance that it would be more efficient to use std.concurrency and pass the 
writes to another thread to do the writing. And the loop itself should still 
be able to be a parallel foreach, so you wouldn't have to change much 
otherwise. But with the synchronized block, you'll probably end up with each 
thread spending a lot of its time waiting on the lock, which will end up 
making the whole thing effectively single-threaded. If the work being done in 
the parallel foreach is small enough, it might even be the case that simply 
making it a normal foreach and ditching the synchronized block would be 
faster. But you'll obviously have to experiment to see what works best with 
whatever you're doing.

- Jonathan M Davis


Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Joseph Rushton Wakeling

On 11/14/2012 10:17 PM, Jonathan M Davis wrote:

I would point out though that given how expensive disk writes are, unless
you're doing a lot of work within the parallel foreach loop, there's a good
chance that it would be more efficient to use std.concurrency and pass the
writes to another thread to do the writing.


In the application I have in mind, there is a LOT of work that would be done 
within the parallel foreach loop -- we're talking at least 20 minutes' solid 
processing before the file write takes place, so this seems an appropriate 
approach given how simple it is.


That said, the separate-thread-for-writes is a nice concept and I'll have a play 
with it.  Concurrency is something where I'm very much a beginner, so I'm very 
open to all suggestions -- thanks very much for this one!




Re: Safely writing to the same file in parallel foreach loop

2012-11-14 Thread Joseph Rushton Wakeling

On 11/15/2012 12:44 AM, Joseph Rushton Wakeling wrote:

In the application I have in mind, there is a LOT of work that would be done
within the parallel foreach loop -- we're talking at least 20 minutes' solid
processing before the file write takes place, so this seems an appropriate
approach given how simple it is.


An oddity here: although the correct results seem to come out of the 
calculation, at the end, the program containing the parallel foreach hangs -- it 
doesn't stop running, even though all the calculations are complete.


Any thoughts as to why?  I guess a thread that has not closed correctly, but I 
can't see why any one of them should not do so.


Re: Safely writing to the same file in parallel foreach loop

2012-11-15 Thread Joseph Rushton Wakeling

On 11/15/2012 01:55 AM, Joseph Rushton Wakeling wrote:

An oddity here: although the correct results seem to come out of the
calculation, at the end, the program containing the parallel foreach hangs -- it
doesn't stop running, even though all the calculations are complete.

Any thoughts as to why?  I guess a thread that has not closed correctly, but I
can't see why any one of them should not do so.


On closer examination, this appears to be only with gdc-compiled code -- if I 
compile with ldc or dmd the program exits normally.




Re: Safely writing to the same file in parallel foreach loop

2012-11-15 Thread Joseph Rushton Wakeling

On 11/15/2012 12:31 PM, Joseph Rushton Wakeling wrote:

On 11/15/2012 01:55 AM, Joseph Rushton Wakeling wrote:

An oddity here: although the correct results seem to come out of the
calculation, at the end, the program containing the parallel foreach hangs -- it
doesn't stop running, even though all the calculations are complete.

Any thoughts as to why?  I guess a thread that has not closed correctly, but I
can't see why any one of them should not do so.


On closer examination, this appears to be only with gdc-compiled code -- if I
compile with ldc or dmd the program exits normally.


OK, this is a known bug with GDC:
http://www.gdcproject.org/bugzilla/show_bug.cgi?id=16



Re: Safely writing to the same file in parallel foreach loop

2012-11-15 Thread Joseph Rushton Wakeling

On 11/14/2012 10:17 PM, Jonathan M Davis wrote:

I would point out though that given how expensive disk writes are, unless
you're doing a lot of work within the parallel foreach loop, there's a good
chance that it would be more efficient to use std.concurrency and pass the
writes to another thread to do the writing. And the loop itself should still
be able to be a parallel foreach, so you wouldn't have to change much
otherwise. But with the synchronized block, you'll probably end up with each
thread spending a lot of its time waiting on the lock, which will end up
making the whole thing effectively single-threaded.


Do you mean that the synchronized {} blocks have to all be completed before the 
threads can all be terminated?


In the end the solution I came to was something like this:

enum N = 16;   // number of cases
shared real[N+1] results;

foreach(i; parallel(iota(0, N+1)))
{
// ... do a lot of calculation ...
results[i] = // result of calculation
}

// and now at the end we write out all the data

... which seems to work, although I'm not 100% confident about its safety.


Re: Safely writing to the same file in parallel foreach loop

2012-11-15 Thread Vijay Nayar
I'm not a robot and didn't mean to spam, the page got stuck in 
this odd refresh loop and I wasn't sure what was going on.


On Wednesday, 14 November 2012 at 17:45:35 UTC, Vijay Nayar wrote:
On Wednesday, 14 November 2012 at 16:43:37 UTC, Joseph Rushton 
Wakeling wrote:

On 11/14/2012 05:16 PM, H. S. Teoh wrote:
I take it there's no more "native-to-D" way of implementing a 
file lock? :-(


Could you put the file access in a synchronized block?

http://dlang.org/statement.html#SynchronizedStatement

 - Vijay





Re: Safely writing to the same file in parallel foreach loop

2012-11-15 Thread Jonathan M Davis
On Thursday, November 15, 2012 15:33:31 Joseph Rushton Wakeling wrote:
> On 11/14/2012 10:17 PM, Jonathan M Davis wrote:
> > I would point out though that given how expensive disk writes are, unless
> > you're doing a lot of work within the parallel foreach loop, there's a
> > good
> > chance that it would be more efficient to use std.concurrency and pass the
> > writes to another thread to do the writing. And the loop itself should
> > still be able to be a parallel foreach, so you wouldn't have to change
> > much otherwise. But with the synchronized block, you'll probably end up
> > with each thread spending a lot of its time waiting on the lock, which
> > will end up making the whole thing effectively single-threaded.
> 
> Do you mean that the synchronized {} blocks have to all be completed before
> the threads can all be terminated?

No, I mean that if you have a bunch of threads all trying to get the mutex for 
the synchronized block, then the only one doing anything is the one in the 
synchronized block. Once it's done, it then just loops back around, quickly 
going through whatever calculations it has to do before hitting the 
synchronized block again. In the meantime one of the other threads got the 
synchronized block and is writing to disk. But all the other threads are still 
waiting. So, for each of the threads, almost all of the time is spent blocked, 
making it so that most of the time, only one thread is doing anything, which 
completely defeats the purpose of having multiple threads.

>From the sounds of it, this doesn't really affect you, because you're doing 
expensive calculations, but anything with very fast but parallelizable 
calculations could be totally screwed by the synchronized block.

- Jonathan M Davis