Re: [Jprogramming] wiki: task scheduler and zeromq intro

Devon McCormick Sun, 24 Nov 2013 15:42:23 -0800

Most stochastic solutions are very amenable to multi-core.


On Sun, Nov 24, 2013 at 5:58 PM, Pascal Jasmin <[email protected]>wrote:

> Probably not news to anyone here, but any algorithm that can be expressed
> in n ^ x time (polynomial including cases where x is 1 or less) can more
> imporantly be expressed as a multi-core/thread/processor algorithm (of k
> cores) if the data can be segmented into k parts.  n/k ^ x can be a
> significant performance improvement if x >1 , but even if x = 1 or less, a
> data partitioned algorithms is valuable considering the reality that there
> are more affordable quad core 3 ghz processors, than 12 ghz single core
> processors.
>
> So simple search usually has n/a run time, and is usually partitionable,
> and it can benefit from multi-core approaches, but there are costs to
> coordinating the threads and accumulating results.
>
> The point, if you are looking for ideas to apply multi-core solutions to,
> is that you can do simple search as an example, or focus on any other
> problem that has a data partitionable solution.
>
>
>
> ________________________________
> From: Joe Bogner <[email protected]>
> To: [email protected]
> Sent: Saturday, November 23, 2013 9:46:24 PM
> Subject: Re: [Jprogramming] wiki: task scheduler and zeromq intro
>
>
> Can anyone share specific examples where it was needed to scale out to
> multiple cores and machines? I am interested in learning about the types of
> problems this would be applied to. I have read some examples while
> researching but haven't ran into anyone who has.
>
>
>
>
> For example, last week I had to create a database of the best 100,000
> solutions out 56 billion combinations as part of a work deliverable. I am
> sure there may have been more elegant solutions however brute forcing with
> 4 instances of R and 32 gig of ram took 3 hours, which was fine.
>
>
>
> It might be worthwhile to create a small reproducible example of a problem
> that would benefit from multiple cores and machines. I could make one up or
> borrow from somewhere else but does anyone have any examples that come to
> mind?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, Nov 23, 2013 at 7:09 PM, Scott Locklin <[email protected]
> ="mailto:[email protected]";>> wrote:
> Dang, thanks for working on this, Pascal. This is one of the use cases I
> had in mind for J-ZMQ, though it was taking me some time to figure out how
> to do it in my clumsy way. I had been looking at joebo's fork thing for a
> little inspiration, but I got busy with a half dozen other things. I will
> try to play with this later next week. FWIIW, one way to prevent the "miss
> the first message" problem is to fire up the SUB instances before you send
> them anything.
>
>
>
> For what it is worth; I originally wrote these zeromq hooks for use in a
> ticker plant for my own P/L. I had a better idea while working on a
> consulting job recently. "Big data" and "the cloud" is in the news all the
> time now. Mostly, this is hype, but there are some real business needs
> involving extracting meaning from data sets which do not fit into core. The
> existing solutions are generally not real impressive (Hadoop) or not
> designed for weakly coupled "clouds" (MPI based solutions). Almost none of
> these "solutions" use ideas which are actually appropriate to big data
> (Vowpal Wabbit being a rare exception), and I have no inclination to
> contribute to these tools.
>
>
>
> J has several fast and flexible data stores (still learning about Jd; very
> impressed so far). J is also extremely memory efficient, and can do out of
> core calculations. I have not done any tests to see if ZMQ can get the data
> across the pipes well enough to do anything useful on AWS and other such
> weakly coupled "clouds" favored by business, but I think I can write some
> useful coarse grained parallel ML tools using ZMQ and J. This is perhaps
> arrogant of me to think about: my J skills are weak, this is my first ZMQ
> project, and it's rare I have time to think about big projects like this,
> but I do know how to build fast and scalable machine learning algorithms. I
> think this could be a way forward which solves important business problems.
>
>
> Stuff I don't know yet:
>
> 0) Stability: as Pascal noticed, J and ZMQ are sometimes unstable when
> used together. I haven't fired up GDB to find out why yet. It's probably a
> buffer allocation thing. It's possible there is a show stopper here: I
> don't know. It seems to work with Kx at least:
> https://github.com/jaeheum/qzmq
>
> 1) Fault tolerance: things are going to crash or blow up memory. Maybe
> Pascal's task manager is enough for now.
>
> 2) Data provisioning: I'm guessing I'll need a framework for provisioning
> each server with "owned" data, using Jd or JDB. I have to look at how other
> frameworks do this.
>
> 3) Software provisioning: J is pretty simple to set up, but if this is
> going to scale out to more than a couple of machines, some kind of tool
> will be needed to accomplish this. I know such tools exist, but I have no
> way of picking the "right one" at present (suggestions?).
>
> 4) Security: many of the existing parallel analytics tools have none.
>
> CZMQ seems to provide some, but it looks unpleasant to use compared to the
> rest
>
> of ZMQ. This is low priority, since nobody else bothers with it.
>
>
>
> Will I ever actually accomplish this? Probably not real quickly (too many
> day jobs), but I think it is an exciting potential use for J.
>
>
> Oddly, this thread does not show up in Nabble, where I usually read the
> J-lists.
>
>
>
> -Scott
>
>
>
> >
> http://www.jsoftware.com/jwiki/PascalJasmin/OOP%20scheduler%20and%20ZeroMQ
>
> >
>
> > Thanks to Scott for getting this started.  Its a work in progress, but
> its
>
> > probably more helpful to see the simplest core version first, than just
> the
>
> > bloated version.
>
> >
>
> >
>
> > The scheduler is a framework for multitasking several polling (endless)
>
> > loops within a
>
> single J instance. The simplest multithreading
>
> > synchronization library is avoiding multithreading altogether, and an in
>
> > process scheduler allows what are semantically seperate processes to work
>
> > together without concerning yourself about the possibilities of one
> process
>
> > writting to a variable that is being read or written to by another
> process.
>
> >
>
> > It can integrate with other "real" multiprocessing setups by grouping
>
> > together tasks that need tight cooperation. The canonical usefulness is
> for
>
> > socket programing, which typically involve polling loops for each client
>
> > and server that add testing tedium even with just a single client and
>
> > server. The scheduler eases development and testing of several clients
> and
>
> > servers all in a single application, and simplifies testing/learning of
>
> > frameworks like ZeroMQ and its J implementation.
>
> ----------------------------------------------------------------------
>
> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Devon McCormick, CFA
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] wiki: task scheduler and zeromq intro

Reply via email to