missing data with parallel and stdin

2016-05-23 Thread moechofe via Digitalmars-d-learn
Hi, I write a script that take a list of files from STDIN, 
compute some stuff, and copy files with a new names.


I got 33k lines at input but got only 3k-5k in the destination 
folder.

This is not append if I remove the .parallel() function.

What did I do wrong?

void delegate(string source,string dest) handler;

if(use_symlink) handler = delegate(string s,string d){
symlink(s,d);
}; else handler = delegate(string s,string d){
copy(s,d);
};

foreach(entry; parallel(stdin.byLineCopy)) try
{
auto source = buildPath(static_path,entry);
auto md5 = digest!MD5(File(source).byChunk(64*1024));
auto hash = toHexString!(LetterCase.lower)(md5);
auto file = text(hash,'_',baseName(entry));
auto dest = buildPath(hashed_path,file);
handler(source,dest);
writeln(entry,' ',file);
}
catch(Exception e)
{
error("Couldn't read, hash or copy %s",entry);
}



Re: missing data with parallel and stdin

2016-05-23 Thread Jack Stouffer via Digitalmars-d-learn

On Monday, 23 May 2016 at 08:59:31 UTC, moechofe wrote:

void delegate(string source,string dest) handler;

if(use_symlink) handler = delegate(string s,string d){
symlink(s,d);
}; else handler = delegate(string s,string d){
copy(s,d);
};


Boy that's a confusing way to write that. Here's a clearer version

if(use_symlink)
handler = delegate(string s,string d){ symlink(s,d); };
else
handler = delegate(string s,string d){ copy(s,d); };


What did I do wrong?


Sounds like a data race problem. Use a lock on the file write 
operation and see if that helps.


Re: missing data with parallel and stdin

2016-05-23 Thread moechofe via Digitalmars-d-learn

On Monday, 23 May 2016 at 14:16:13 UTC, Jack Stouffer wrote:
Sounds like a data race problem. Use a lock on the file write 
operation and see if that helps.


Like this?:

synchronized(mutex) copy(source,dest);

That didn't solve anything.
What I observe is: when the process is slower, more files are 
copied.




Re: missing data with parallel and stdin

2016-05-23 Thread Era Scarecrow via Digitalmars-d-learn

On Monday, 23 May 2016 at 15:53:23 UTC, moechofe wrote:

On Monday, 23 May 2016 at 14:16:13 UTC, Jack Stouffer wrote:
Sounds like a data race problem. Use a lock on the file write 
operation and see if that helps.

That didn't solve anything.
What I observe is: when the process is slower, more files are 
copied.


 Last night I took the code sample and left copy out, everything 
else I got working. However when I ran it I noticed it's only 
running on one core and worked fine. However when I put in a 
number for how many to work on at once (adding any number to 
parallel's call) it would crash the program quite often, 
generally because it couldn't close files it was scanning.


 Looking over the documentation you appear to be using parallel 
correctly, so I don't know why it isn't working.