Re: [Jprogramming] wiki: task scheduler and zeromq intro

Pascal Jasmin Sun, 24 Nov 2013 15:00:00 -0800

Probably not news to anyone here, but any algorithm that can be expressed in n 
^ x time (polynomial including cases where x is 1 or less) can more imporantly 
be expressed as a multi-core/thread/processor algorithm (of k cores) if the 
data can be segmented into k parts.  n/k ^ x can be a significant performance 
improvement if x >1 , but even if x = 1 or less, a data partitioned algorithms 
is valuable considering the reality that there are more affordable quad core 3 
ghz processors, than 12 ghz single core processors.

So simple search usually has n/a run time, and is usually partitionable, and it 
can benefit from multi-core approaches, but there are costs to coordinating the 
threads and accumulating results.

The point, if you are looking for ideas to apply multi-core solutions to, is 
that you can do simple search as an example, or focus on any other problem that 
has a data partitionable solution.

________________________________
From: Joe Bogner <[email protected]>
To: [email protected] 
Sent: Saturday, November 23, 2013 9:46:24 PM
Subject: Re: [Jprogramming] wiki: task scheduler and zeromq intro

Can anyone share specific examples where it was needed to scale out to multiple 
cores and machines? I am interested in learning about the types of problems 
this would be applied to. I have read some examples while researching but 
haven't ran into anyone who has.

For example, last week I had to create a database of the best 100,000 solutions 
out 56 billion combinations as part of a work deliverable. I am sure there may 
have been more elegant solutions however brute forcing with 4 instances of R 
and 32 gig of ram took 3 hours, which was fine.

It might be worthwhile to create a small reproducible example of a problem that 
would benefit from multiple cores and machines. I could make one up or borrow 
from somewhere else but does anyone have any examples that come to mind?

On Sat, Nov 23, 2013 at 7:09 PM, Scott Locklin 
<[email protected]="mailto:[email protected]";>> wrote:
Dang, thanks for working on this, Pascal. This is one of the use cases I had in 
mind for J-ZMQ, though it was taking me some time to figure out how to do it in 
my clumsy way. I had been looking at joebo's fork thing for a little 
inspiration, but I got busy with a half dozen other things. I will try to play 
with this later next week. FWIIW, one way to prevent the "miss the first 
message" problem is to fire up the SUB instances before you send them anything.

For what it is worth; I originally wrote these zeromq hooks for use in a ticker 
plant for my own P/L. I had a better idea while working on a consulting job 
recently. "Big data" and "the cloud" is in the news all the time now. Mostly, 
this is hype, but there are some real business needs involving extracting 
meaning from data sets which do not fit into core. The existing solutions are 
generally not real impressive (Hadoop) or not designed for weakly coupled 
"clouds" (MPI based solutions). Almost none of these "solutions" use ideas 
which are actually appropriate to big data (Vowpal Wabbit being a rare 
exception), and I have no inclination to contribute to these tools. 

J has several fast and flexible data stores (still learning about Jd; very 
impressed so far). J is also extremely memory efficient, and can do out of core 
calculations. I have not done any tests to see if ZMQ can get the data across 
the pipes well enough to do anything useful on AWS and other such weakly 
coupled "clouds" favored by business, but I think I can write some useful 
coarse grained parallel ML tools using ZMQ and J. This is perhaps arrogant of 
me to think about: my J skills are weak, this is my first ZMQ project, and it's 
rare I have time to think about big projects like this, but I do know how to 
build fast and scalable machine learning algorithms. I think this could be a 
way forward which solves important business problems. 

Stuff I don't know yet:

0) Stability: as Pascal noticed, J and ZMQ are sometimes unstable when used 
together. I haven't fired up GDB to find out why yet. It's probably a buffer 
allocation thing. It's possible there is a show stopper here: I don't know. It 
seems to work with Kx at least: https://github.com/jaeheum/qzmq

1) Fault tolerance: things are going to crash or blow up memory. Maybe Pascal's 
task manager is enough for now.

2) Data provisioning: I'm guessing I'll need a framework for provisioning each 
server with "owned" data, using Jd or JDB. I have to look at how other 
frameworks do this.

3) Software provisioning: J is pretty simple to set up, but if this is going to 
scale out to more than a couple of machines, some kind of tool will be needed 
to accomplish this. I know such tools exist, but I have no way of picking the 
"right one" at present (suggestions?). 

4) Security: many of the existing parallel analytics tools have none. 

CZMQ seems to provide some, but it looks unpleasant to use compared to the rest

of ZMQ. This is low priority, since nobody else bothers with it.

Will I ever actually accomplish this? Probably not real quickly (too many day 
jobs), but I think it is an exciting potential use for J.

Oddly, this thread does not show up in Nabble, where I usually read the J-lists.

-Scott

> http://www.jsoftware.com/jwiki/PascalJasmin/OOP%20scheduler%20and%20ZeroMQ

>

> Thanks to Scott for getting this started.  Its a work in progress, but its

> probably more helpful to see the simplest core version first, than just the

> bloated version.

>

>

> The scheduler is a framework for multitasking several polling (endless)

> loops within a

single J instance. The simplest multithreading

> synchronization library is avoiding multithreading altogether, and an in

> process scheduler allows what are semantically seperate processes to work

> together without concerning yourself about the possibilities of one process

> writting to a variable that is being read or written to by another process.

>

> It can integrate with other "real" multiprocessing setups by grouping

> together tasks that need tight cooperation. The canonical usefulness is for

> socket programing, which typically involve polling loops for each client

> and server that add testing tedium even with just a single client and

> server. The scheduler eases development and testing of several clients and

> servers all in a single application, and simplifies testing/learning of

> frameworks like ZeroMQ and its J implementation.

----------------------------------------------------------------------

For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] wiki: task scheduler and zeromq intro

Reply via email to