On Apr 9, 10:15 am, sturlamolden <sturlamol...@yahoo.no> wrote: > On 9 apr, 09:36, John Ladasky <lada...@my-deja.com> wrote: > > > Thanks for finding my discussion! Yes, it's about passing numpy > > arrays to multiple processors. I'll accomplish that any way that I > > can. > > My preferred ways of doing this are: > > 1. Most cases for parallel processing are covered by libraries, even > for neural nets. This particularly involves linear algebra solvers and > FFTs, or calling certain expensive functions (sin, cos, exp) over and > over again. The solution here is optimised LAPACK and BLAS (Intel MKL, > AMD ACML, GotoBLAS, ATLAS, Cray libsci), optimised FFTs (FFTW, Intel > MKL, ACML), and fast vector math libraries (Intel VML, ACML). For > example, if you want to make multiple calls to the function "exp", > there is a good chance you want to use a vector math library. Despite > of this, most Python programmers' instinct seems to be to use multiple > processes with numpy.exp or math.exp, or use multiple threads in C > with exp from libm (cf. math.h). Why go through this pain when a > single function call to Intel VML or AMD ACML (acml-vm) will be much > better? It is common to see scholars argue that "yes but my needs are > so special that I need to customise everything myself." Usually this > translates to "I don't know these libraries (not even that they exist) > and are happy to reinvent the wheel."
Whoa, Sturla. That was a proper core dump! You're right, I'm unfamiliar with the VAST array of libraries that you have just described. I will have to look at them. It's true, I probably only know of the largest and most widely-used Python libraries. There are so many, who can keep track? Now, I do have a special need. I've implemented a modified version of the Fahlman Cascade Correlation algorithm that will not be found in any existing libraries, and which I think should be superior for certain types of problems. (I might even be able to publish this algorithm, if I can get it working and show some examples?) That doesn't mean that I can't use the vector math libraries that you've recommended. As long as those libraries can take advantage of my extra computing power, I'm interested. Note, however, that the cascade evaluation does have a strong sequential requirement. It's not a traditional three-layer network. In fact, describing a cascade network according to the number of "layers" it has is not very meaningful, because each hidden node is essentially its own layer. So, there are limited advantages to trying to parallelize the evaluation of ONE cascade network's weights against ONE input vector. However, evaluating several copies of one cascade network's output, against several different test inputs simultaneously, should scale up nicely. Evaluating many possible test inputs is exactly what you do when training a network to a data set, and so this is how my program is being designed. > Thus, if you think you need to > use manually managed threads or processes for parallel technical > computing, and even contemplate that the GIL might get in your way, > there is a 99% chance you are wrong. You will almost ALWAYS want ot > use a fast library, either directly in Python or linked to your own > serial C or Fortran code. You have probably heard that "premature > optimisation is the root of all evil in computer programming." It > particularly applies here. Well, I thought that NUMPY was that fast library... Funny how this works, though -- I built my neural net class in Python, rather than avoiding numpy and going straight to wrapping code in C, precisely because I wanted to AVOID premature optimization (for unknown, and questionable gains in performance). I started on this project when I had only a single-core CPU, though. Now that multi- core CPU's are apparently here to stay, and I've seen just how long my program takes to run, I want to make full use of multiple cores. I've even looked at MPI. I'm considering networking to another multi-CPU machine down the hall, once I have my program working. > But again, I'd urge you to consider a library or threads > (threading.Thread in Cython or OpenMP) before you consider multiple > processes. My single-CPU neural net training program had two threads, one for the GUI and one for the neural network computations. Correct me if I'm wrong here, but -- since the two threads share a single Python interpreter, this means that only a single CPU is used, right? I'm looking at multiprocessing for this reason. > The reason I have not updated the sharedmem arrays for two > years is that I have come to the conclusion that there are better ways > to do this (paricularly vendor tuned libraries). But since they are > mostly useful with 64-bit (i.e. large arrays), I'll post an update > soon. > > If you decide to use a multithreaded solution (or shared memory as > IPC), beware of "false sharing". If multiple processors write to the > same cache line (they can be up to 512K depending on hardware), you'll > create an invisible "GIL" that will kill any scalability. That is > because dirty cache lines need to be synchonized with RAM. "False > sharing" is one of the major reasons that "home-brewed" compute- > intensive code will not scale. Even though I'm not formally trained in computer science, I am very conscious of the fact that WRITING to shared memory is a problem, cache or otherwise. At the very top of this thread, I pointed out that my neural network training function would need READ-ONLY access to two items -- the network weights, and the input data. Given that, and my (temporary) struggles with pickling, I considered the shared- memory approach as an alternative. > It is not uncommon to see Java programmers complain about Python's > GIL, and then they go on to write i/o bound or false shared code. Rest > assured that multi-threaded Java will not scale better than Python in > these cases :-) I've never been a Java programmer, and I hope it stays that way! -- http://mail.python.org/mailman/listinfo/python-list