Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread via Digitalmars-d

On Friday, 15 May 2015 at 00:07:15 UTC, Laeeth Isharc wrote:
But why would one use python when fork itself isn't hard to use 
in a narrow sense, and neither is the kind of interprocess 
communication I would like to do for the kind of tasks I have 
in mind. It just seems to make sense to have a light wrapper.


The managing process doesn't have to be fast, but should be easy 
to reconfigure. It is overall more effective (not efficient) to 
use a scripting language with a REPL for scripty tasks. Forking 
comes with its own set of pitfalls. The unix-way is to have a 
conglomerate of simple processes tied together with a script. 
Overall easier to debug and modify.


Just because some problems in parallel processing are hard 
doesn't seem to me a reason not to do some work on addressing 
the easier ones that may in a practical sense have great value 
in having an imperfect (but real) solution for.  Sometimes I 
have the sense when talking with you that the answer to any 
question is anything but D! ;)  (But I am sure I must be 
mistaken!)


I would have said the same thing about Rust and Nim too. Overall, 
what other people do with a tool affects the eco system and 
maturity. If you do system level programming you are less 
affected by the eco system then when you do higher level 
task-oriented programming.


What is your mission, to solve a problem effectively now or to 
start building a new framework with a time horizon measured in 
years? You have to decide this first.


Then you have to decide what is more expensive, your time or 
spending twice as much on CPU power (whether it is hardware or 
rented time at a datacenter).


True.  But we are not speaking of getting from a raw state to 
perfection but just starting to play with the problem.  If 
Walter Bright had listened to well-intentioned advice, he 
wouldn't be in the compiler business, let alone have given us 
what became D.


He set out to build a new framework with a time horizon measured 
in decades. That's perfectly reasonable and what you have to 
expect when starting on a new language.


If you want to build a framework for a specific use you need both 
the theoretical insights and the pragmatical experience in order 
to complete it in a timely manner. You need many many iterations 
to get to a state where it is better (than whatever people use 
today). Which is why most (sensible) engineers will pick existing 
solutions that are receiving polish, rather than the next big 
thing.


Yes, indeed.  But my question was more about the distinctions 
between processes and threads and the non-obvious implications 
for the design of such a framework.


If you want to use fork(), you might as well use threads, the 
main distinction is that with processes you have to be explicit 
about what resources to share, but after a fork() you also risk 
ending up in an inconsistent state if you aren't careful.


With a fork based solution you still need to deal with a 
different level of complexity than you get with a Unixy 
conglomerate of simple programs that cooperate, the Unix way is 
easier to debug and test, but slower than an optimized multi 
threaded solution (and marginally slower than a process that fork 
itself).


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread Laeeth Isharc via Digitalmars-d
On Thursday, 14 May 2015 at 20:56:16 UTC, Ola Fosheim Grøstad 
wrote:

On Thursday, 14 May 2015 at 20:28:20 UTC, Laeeth Isharc wrote:
My own is a pragmatic commercial one.  I have some problems 
which perhaps scale quite well, and rather than write it using 
fork directly, I would rather have a higher level wrapper 
along the lines of std.parallelism.


Languages like Chapel and extended versions of C++ have built 
in support for parallel computing that is relatively effortless 
and designed by experts (Cray/IBM etc) to cover common patterns 
in demanding batch processing for those who wants something 
higher level than plain C++ (or in this case D which is pretty 
much the same thing).


Yes - I am sure that there is excellent stuff here, from which 
one may learn much: especially if approaching it from a more 
theoretical or enterprisey industrial scale perspective.


However, you could consider combining single threaded processes 
in D with e.g. Python as a supervising process if the datasets 
allow it. You'll find lots of literature on Inter Process 
Communication (IPC) for Unix. Performance will be lower, but 
your own productivity might be higher, YMMV.


But why would one use python when fork itself isn't hard to use 
in a narrow sense, and neither is the kind of interprocess 
communication I would like to do for the kind of tasks I have in 
mind. It just seems to make sense to have a light wrapper.  Just 
because some problems in parallel processing are hard doesn't 
seem to me a reason not to do some work on addressing the easier 
ones that may in a practical sense have great value in having an 
imperfect (but real) solution for.  Sometimes I have the sense 
when talking with you that the answer to any question is anything 
but D! ;)  (But I am sure I must be mistaken!)


Perhaps such would be flawed and limited, but often something 
is better than nothing, even if not perfect.  And I mention it 
on the forum only because usually I have found the problems I 
face turn out to be those faced by many others too..


You need momentum in order to get from a raw state to something 
polished, so you essentially need a larger community that both 
have experience with the topic and a need for it in order to 
get a sensible framework that is maintained.


True.  But we are not speaking of getting from a raw state to 
perfection but just starting to play with the problem.  If Walter 
Bright had listened to well-intentioned advice, he wouldn't be in 
the compiler business, let alone have given us what became D.  I 
am no Walter Bright, but this is an easier problem to start 
exploring, and this would be beyond the scope of anything I would 
do just by myself.


If you can get away with it, the most common simplistic 
approach seems to be map-reduce. Because it is easy to 
distribute over many machines and there are frameworks that do 
the tedious bits for you.


Yes, indeed.  But my question was more about the distinctions 
between processes and threads and the non-obvious implications 
for the design of such a framework.


Nice chatting.



Laeeth.


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread via Digitalmars-d

On Thursday, 14 May 2015 at 20:28:20 UTC, Laeeth Isharc wrote:
My own is a pragmatic commercial one.  I have some problems 
which perhaps scale quite well, and rather than write it using 
fork directly, I would rather have a higher level wrapper along 
the lines of std.parallelism.


Languages like Chapel and extended versions of C++ have built in 
support for parallel computing that is relatively effortless and 
designed by experts (Cray/IBM etc) to cover common patterns in 
demanding batch processing for those who wants something higher 
level than plain C++ (or in this case D which is pretty much the 
same thing).


However, you could consider combining single threaded processes 
in D with e.g. Python as a supervising process if the datasets 
allow it. You'll find lots of literature on Inter Process 
Communication (IPC) for Unix. Performance will be lower, but your 
own productivity might be higher, YMMV.


Perhaps such would be flawed and limited, but often something 
is better than nothing, even if not perfect.  And I mention it 
on the forum only because usually I have found the problems I 
face turn out to be those faced by many others too..


You need momentum in order to get from a raw state to something 
polished, so you essentially need a larger community that both 
have experience with the topic and a need for it in order to get 
a sensible framework that is maintained.


If you can get away with it, the most common simplistic approach 
seems to be map-reduce. Because it is easy to distribute over 
many machines and there are frameworks that do the tedious bits 
for you.


If you have any thoughts on what should be considered, I would 
very much appreciate them.  (And I owe you a response on our 
last suspended discussion, but haven't had time of late).


Nah, you owe me nothing ;-). And I also have no time atm. ;-)

Ola.


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread Laeeth Isharc via Digitalmars-d
On Thursday, 14 May 2015 at 20:15:38 UTC, Ola Fosheim Grøstad 
wrote:

On Thursday, 14 May 2015 at 20:06:55 UTC, Laeeth Isharc wrote:
To start the process off (because small beginnings are better 
than no beginning): what are the key features of processes vs 
threads one would need to bear in mind when designing such a 
thing?  Because I spent the past couple of decades in a 
different field, multiprocessing passed me by, so I am only 
now slowly catching up.


"nobody" understands multiprocessing. Or rather… you need to 
understand the hardware and the concrete problem space first. 
There are no general solutions.


Yes, I certainly understand that it is a highly specialist and 
complex area where the best minds in the world have not yet the 
answers.  So if one were addressing the problem from a computer 
science academic perspective, then perhaps one will arrive at a 
different answer.


My own is a pragmatic commercial one.  I have some problems which 
perhaps scale quite well, and rather than write it using fork 
directly, I would rather have a higher level wrapper along the 
lines of std.parallelism.  Perhaps such would be flawed and 
limited, but often something is better than nothing, even if not 
perfect.  And I mention it on the forum only because usually I 
have found the problems I face turn out to be those faced by many 
others too..


If you have any thoughts on what should be considered, I would 
very much appreciate them.  (And I owe you a response on our last 
suspended discussion, but haven't had time of late).



Laeeth.


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread via Digitalmars-d

On Thursday, 14 May 2015 at 20:06:55 UTC, Laeeth Isharc wrote:
To start the process off (because small beginnings are better 
than no beginning): what are the key features of processes vs 
threads one would need to bear in mind when designing such a 
thing?  Because I spent the past couple of decades in a 
different field, multiprocessing passed me by, so I am only now 
slowly catching up.


"nobody" understands multiprocessing. Or rather… you need to 
understand the hardware and the concrete problem space first. 
There are no general solutions.


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread Laeeth Isharc via Digitalmars-d

On Thursday, 14 May 2015 at 10:15:48 UTC, Daniel Murphy wrote:
"Laeeth Isharc"  wrote in message 
news:ejbhesbstgazkxnpv...@forum.dlang.org...


Is there value to having equivalents to the std.parallelism 
approach that works with processes rather than threads, and 
makes it easy to manage tasks over multiple machines?


I took a look at std.parallelism and it's beyond what I can do 
for now. But it seems like this might be a useful project, and 
not one of unmanageable difficulty...


Yes, there is enormous value.  It's just waiting for someone to 
do it.


To start the process off (because small beginnings are better 
than no beginning): what are the key features of processes vs 
threads one would need to bear in mind when designing such a 
thing?  Because I spent the past couple of decades in a different 
field, multiprocessing passed me by, so I am only now slowly 
catching up.


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread Laeeth Isharc via Digitalmars-d

On Thursday, 14 May 2015 at 16:33:46 UTC, John Colvin wrote:

On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:

On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
Is there value to having equivalents to the std.parallelism 
approach that works with processes rather than threads, and 
makes it easy to manage tasks over multiple machines?


I'm not sure if you're asking because of this thread, but see

http://forum.dlang.org/thread/tczkndtepnvppggzm...@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org

python outperforming D because it doesn't have to deal with 
synchronization headaches. I found D to be way faster when 
reimplemented with fork, but having to use the stdc API is 
ugly(IMO)


It was also easy to get D very fast by just being a little more 
eager with IO and reducing the enormous number of little 
allocations being made.


Yes - thank you for your highly educational rewrite, which I 
personally very much appreciate your taking the trouble to do.  
Perhaps this should be turned (by you or someone else) into a 
mini case-study on the wiki of how to write idiomatic and 
efficient D code.  Or maybe just put up the slides from your 
forthcoming talk (which I look forward to watching later when it 
is up).


It's good to know D can in fact deliver on the implicit promise 
in a real use case with not too much work.  (Yes, naively written 
code was a bit slow when dealing with millions of lines, but in 
which language of comparable flexibility would that not be true). 
 It's also interesting that your code was idiomatic.  (I was 
reading up about Scala, which seems beautiful in many ways, but 
it is terribly disturbing to see that the idiomatic way often 
seems to be the most inefficient, at least as things stood a 
couple of years ago).


But, even so, I think having a wrapper for fork and an API for 
multiprocessing (which you could then hook up to eg the Digital 
Ocean, AWS apis etc) would be rather helpful.


I spoke with a friend of mine at one of the most admired/hated 
Wall Street firms.  One of the smartest quants I know who has now 
moved to portfolio management.  He was doing a study on tick data 
going back to 2000.  I asked him how long it took to run on his 
firm's infrastructure.  An hour!  And the operations were pretty 
simple.  I think it should only take a couple of minutes.  And it 
would be nice to show an example of - from a spreadsheet - 
spinning up 100 digital ocean instances - and running the numbers 
not just on one security, but every relevant security, and having 
a nice summary appear back in the sheet within a couple of 
minutes.


The reason speed matters is that long waits interfere with rapid 
iteration and the creative thought process.  In a market 
environment you may well have forgotten what you wanted after an 
hour...



Laeeth.


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread John Colvin via Digitalmars-d

On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:

On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
Is there value to having equivalents to the std.parallelism 
approach that works with processes rather than threads, and 
makes it easy to manage tasks over multiple machines?


I'm not sure if you're asking because of this thread, but see

http://forum.dlang.org/thread/tczkndtepnvppggzm...@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org

python outperforming D because it doesn't have to deal with 
synchronization headaches. I found D to be way faster when 
reimplemented with fork, but having to use the stdc API is 
ugly(IMO)


It was also easy to get D very fast by just being a little more 
eager with IO and reducing the enormous number of little 
allocations being made.


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-14 Thread Daniel Murphy via Digitalmars-d
"Laeeth Isharc"  wrote in message 
news:ejbhesbstgazkxnpv...@forum.dlang.org...


Is there value to having equivalents to the std.parallelism approach that 
works with processes rather than threads, and makes it easy to manage 
tasks over multiple machines?


I took a look at std.parallelism and it's beyond what I can do for now. 
But it seems like this might be a useful project, and not one of 
unmanageable difficulty...


Yes, there is enormous value.  It's just waiting for someone to do it. 



Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-13 Thread Laeeth Isharc via Digitalmars-d

On Wednesday, 13 May 2015 at 20:34:24 UTC, weaselcat wrote:

On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
Is there value to having equivalents to the std.parallelism 
approach that works with processes rather than threads, and 
makes it easy to manage tasks over multiple machines?


I'm not sure if you're asking because of this thread, but see

http://forum.dlang.org/thread/tczkndtepnvppggzm...@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org

python outperforming D because it doesn't have to deal with 
synchronization headaches. I found D to be way faster when 
reimplemented with fork, but having to use the stdc API is 
ugly(IMO)


yes - that is what spurred me to post,but it had been on my mind 
for a while (especially the multi-machine stuff).


Re: std.parallelism equivalents for posix fork and multi-machine processing

2015-05-13 Thread weaselcat via Digitalmars-d

On Wednesday, 13 May 2015 at 20:28:02 UTC, Laeeth Isharc wrote:
Is there value to having equivalents to the std.parallelism 
approach that works with processes rather than threads, and 
makes it easy to manage tasks over multiple machines?


I'm not sure if you're asking because of this thread, but see

http://forum.dlang.org/thread/tczkndtepnvppggzm...@forum.dlang.org#post-tczkndtepnvppggzmews:40forum.dlang.org

python outperforming D because it doesn't have to deal with 
synchronization headaches. I found D to be way faster when 
reimplemented with fork, but having to use the stdc API is 
ugly(IMO)


std.parallelism equivalents for posix fork and multi-machine processing

2015-05-13 Thread Laeeth Isharc via Digitalmars-d
Is there value to having equivalents to the std.parallelism 
approach that works with processes rather than threads, and makes 
it easy to manage tasks over multiple machines?


I took a look at std.parallelism and it's beyond what I can do 
for now.  But it seems like this might be a useful project, and 
not one of unmanageable difficulty...