Most stochastic solutions are very amenable to multi-core.
On Sun, Nov 24, 2013 at 5:58 PM, Pascal Jasmin <[email protected]>wrote: > Probably not news to anyone here, but any algorithm that can be expressed > in n ^ x time (polynomial including cases where x is 1 or less) can more > imporantly be expressed as a multi-core/thread/processor algorithm (of k > cores) if the data can be segmented into k parts. n/k ^ x can be a > significant performance improvement if x >1 , but even if x = 1 or less, a > data partitioned algorithms is valuable considering the reality that there > are more affordable quad core 3 ghz processors, than 12 ghz single core > processors. > > So simple search usually has n/a run time, and is usually partitionable, > and it can benefit from multi-core approaches, but there are costs to > coordinating the threads and accumulating results. > > The point, if you are looking for ideas to apply multi-core solutions to, > is that you can do simple search as an example, or focus on any other > problem that has a data partitionable solution. > > > > ________________________________ > From: Joe Bogner <[email protected]> > To: [email protected] > Sent: Saturday, November 23, 2013 9:46:24 PM > Subject: Re: [Jprogramming] wiki: task scheduler and zeromq intro > > > Can anyone share specific examples where it was needed to scale out to > multiple cores and machines? I am interested in learning about the types of > problems this would be applied to. I have read some examples while > researching but haven't ran into anyone who has. > > > > > For example, last week I had to create a database of the best 100,000 > solutions out 56 billion combinations as part of a work deliverable. I am > sure there may have been more elegant solutions however brute forcing with > 4 instances of R and 32 gig of ram took 3 hours, which was fine. > > > > It might be worthwhile to create a small reproducible example of a problem > that would benefit from multiple cores and machines. I could make one up or > borrow from somewhere else but does anyone have any examples that come to > mind? > > > > > > > > > > > > > > > > On Sat, Nov 23, 2013 at 7:09 PM, Scott Locklin <[email protected] > ="mailto:[email protected]">> wrote: > Dang, thanks for working on this, Pascal. This is one of the use cases I > had in mind for J-ZMQ, though it was taking me some time to figure out how > to do it in my clumsy way. I had been looking at joebo's fork thing for a > little inspiration, but I got busy with a half dozen other things. I will > try to play with this later next week. FWIIW, one way to prevent the "miss > the first message" problem is to fire up the SUB instances before you send > them anything. > > > > For what it is worth; I originally wrote these zeromq hooks for use in a > ticker plant for my own P/L. I had a better idea while working on a > consulting job recently. "Big data" and "the cloud" is in the news all the > time now. Mostly, this is hype, but there are some real business needs > involving extracting meaning from data sets which do not fit into core. The > existing solutions are generally not real impressive (Hadoop) or not > designed for weakly coupled "clouds" (MPI based solutions). Almost none of > these "solutions" use ideas which are actually appropriate to big data > (Vowpal Wabbit being a rare exception), and I have no inclination to > contribute to these tools. > > > > J has several fast and flexible data stores (still learning about Jd; very > impressed so far). J is also extremely memory efficient, and can do out of > core calculations. I have not done any tests to see if ZMQ can get the data > across the pipes well enough to do anything useful on AWS and other such > weakly coupled "clouds" favored by business, but I think I can write some > useful coarse grained parallel ML tools using ZMQ and J. This is perhaps > arrogant of me to think about: my J skills are weak, this is my first ZMQ > project, and it's rare I have time to think about big projects like this, > but I do know how to build fast and scalable machine learning algorithms. I > think this could be a way forward which solves important business problems. > > > Stuff I don't know yet: > > 0) Stability: as Pascal noticed, J and ZMQ are sometimes unstable when > used together. I haven't fired up GDB to find out why yet. It's probably a > buffer allocation thing. It's possible there is a show stopper here: I > don't know. It seems to work with Kx at least: > https://github.com/jaeheum/qzmq > > 1) Fault tolerance: things are going to crash or blow up memory. Maybe > Pascal's task manager is enough for now. > > 2) Data provisioning: I'm guessing I'll need a framework for provisioning > each server with "owned" data, using Jd or JDB. I have to look at how other > frameworks do this. > > 3) Software provisioning: J is pretty simple to set up, but if this is > going to scale out to more than a couple of machines, some kind of tool > will be needed to accomplish this. I know such tools exist, but I have no > way of picking the "right one" at present (suggestions?). > > 4) Security: many of the existing parallel analytics tools have none. > > CZMQ seems to provide some, but it looks unpleasant to use compared to the > rest > > of ZMQ. This is low priority, since nobody else bothers with it. > > > > Will I ever actually accomplish this? Probably not real quickly (too many > day jobs), but I think it is an exciting potential use for J. > > > Oddly, this thread does not show up in Nabble, where I usually read the > J-lists. > > > > -Scott > > > > > > http://www.jsoftware.com/jwiki/PascalJasmin/OOP%20scheduler%20and%20ZeroMQ > > > > > > Thanks to Scott for getting this started. Its a work in progress, but > its > > > probably more helpful to see the simplest core version first, than just > the > > > bloated version. > > > > > > > > > The scheduler is a framework for multitasking several polling (endless) > > > loops within a > > single J instance. The simplest multithreading > > > synchronization library is avoiding multithreading altogether, and an in > > > process scheduler allows what are semantically seperate processes to work > > > together without concerning yourself about the possibilities of one > process > > > writting to a variable that is being read or written to by another > process. > > > > > > It can integrate with other "real" multiprocessing setups by grouping > > > together tasks that need tight cooperation. The canonical usefulness is > for > > > socket programing, which typically involve polling loops for each client > > > and server that add testing tedium even with just a single client and > > > server. The scheduler eases development and testing of several clients > and > > > servers all in a single application, and simplifies testing/learning of > > > frameworks like ZeroMQ and its J implementation. > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Devon McCormick, CFA ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
