On 11 March 2013 14:57, Abhinav M Kulkarni <amkul...@uci.edu> wrote: > Hi Jean, > > Below is the code where I am creating multiple processes: > > if __name__ == '__main__': > # List all files in the games directory > files = list_sgf_files() > > # Read board configurations > (intermediateBoards, finalizedBoards) = read_boards(files) > > # Initialize parameters > param = Param() > > # Run maxItr iterations of gradient descent > for itr in range(maxItr): > # Each process analyzes one single data point > # They dump their gradient calculations in queue q > # Queue in Python is process safe > start_time = time.time() > q = Queue() > jobs = [] > # Create a process for each game board > for i in range(len(files)): > p = Process(target=TrainGoCRFIsingGibbs, > args=(intermediateBoards[i], finalizedBoards[i], param, q))
Use a multiprocessing.Pool for this, rather than creating one process for each job. e.g.: p = Pool(4) # 1 process for each core results = [] for ib, fb in zip(intermediateBoards, finalizedBoards): results.append(p.apply_async(TrainGoCRFIsingGibbs, args=(ib, fb, param, q))) p.close() p.join() # To retrieve the return values for r in results: print(r.get()) This will distribute your jobs over a fixed number of processes. You avoid the overhead of creating and killing processes and the process switching that occurs when you have more processes than cores. > p.start() > jobs.append(p) > # Blocking wait for each process to finish > for p in jobs: > p.join() > elapsed_time = time.time() - start_time > print 'Iteration: ', itr, '\tElapsed time: ', elapsed_time > > As you recommended, I'll use the profiler to see which part of the code is > slow. Do this without using multiprocessing first. Loosely you can hope that multiprocessing would give you a factor of 4 speedup but no more. You haven't reported a comparison of times with/without multiprocessing so it's not clear that that is the issue. Oscar -- http://mail.python.org/mailman/listinfo/python-list