Re: Advice regarding multiprocessing module
Hi Jean, Below is the code where I am creating multiple processes: if __name__ == '__main__': # List all files in the games directory files = list_sgf_files() # Read board configurations (intermediateBoards, finalizedBoards) = read_boards(files) # Initialize parameters param = Param() # Run maxItr iterations of gradient descent for itr in range(maxItr): # Each process analyzes one single data point # They dump their gradient calculations in queue q # Queue in Python is process safe start_time = time.time() q = Queue() jobs = [] # Create a process for each game board for i in range(len(files)): p = Process(target=TrainGoCRFIsingGibbs, args=(intermediateBoards[i], finalizedBoards[i], param, q)) p.start() jobs.append(p) # Blocking wait for each process to finish for p in jobs: p.join() elapsed_time = time.time() - start_time print 'Iteration: ', itr, '\tElapsed time: ', elapsed_time As you recommended, I'll use the profiler to see which part of the code is slow. Thanks, Abhinav On 03/11/2013 04:14 AM, Jean-Michel Pichavant wrote: - Original Message - Dear all, I need some advice regarding use of the multiprocessing module. Following is the scenario: * I am running gradient descent to estimate parameters of a pairwise grid CRF (or a grid based graphical model). There are 106 data points. Each data point can be analyzed in parallel. * To calculate gradient for each data point, I need to perform approximate inference since this is a loopy model. I am using Gibbs sampling. * My grid is 9x9 so there are 81 variables that I am sampling in one sweep of Gibbs sampling. I perform 1000 iterations of Gibbs sampling. * My laptop has quad-core Intel i5 processor, so I thought using multiprocessing module I can parallelize my code (basically calculate gradient in parallel on multiple cores simultaneously). * I did not use the multi-threading library because of GIL issues, GIL does not allow multiple threads to run at a time. * As a result I end up creating a process for each data point (instead of a thread that I would ideally like to do, so as to avoid process creation overhead). * I am using basic NumPy array functionalities. Previously I was running this code in MATLAB. It runs quite faster, one iteration of gradient descent takes around 14 sec in MATLAB using parfor loop (parallel loop - data points is analyzed within parallel loop). However same program takes almost 215 sec in Python. I am quite amazed at the slowness of multiprocessing module. Is this because of process creation overhead for each data point? Please keep my email in the replies as I am not a member of this mailing list. Thanks, Abhinav Hi, Can you post some code, especially the part where you're create/running the processes ? If it's not too big, the process function as well. Either multiprocess is slow like you stated, or you did something wrong. Alternatively, if posting code is an issue, you can profile your python code, it's very easy and effective at finding which the code is slowing down everyone. http://docs.python.org/2/library/profile.html Cheers, JM -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -- http://mail.python.org/mailman/listinfo/python-list
Advice regarding multiprocessing module
Dear all, I need some advice regarding use of the multiprocessing module. Following is the scenario: * I am running gradient descent to estimate parameters of a pairwise grid CRF (or a grid based graphical model). There are 106 data points. Each data point can be analyzed in parallel. * To calculate gradient for each data point, I need to perform approximate inference since this is a loopy model. I am using Gibbs sampling. * My grid is 9x9 so there are 81 variables that I am sampling in one sweep of Gibbs sampling. I perform 1000 iterations of Gibbs sampling. * My laptop has quad-core Intel i5 processor, so I thought using multiprocessing module I can parallelize my code (basically calculate gradient in parallel on multiple cores simultaneously). * I did not use the multi-threading library because of GIL issues, GIL does not allow multiple threads to run at a time. * As a result I end up creating a process for each data point (instead of a thread that I would ideally like to do, so as to avoid process creation overhead). * I am using basic NumPy array functionalities. Previously I was running this code in MATLAB. It runs quite faster, one iteration of gradient descent takes around 14 sec in MATLAB using parfor loop (parallel loop - data points is analyzed within parallel loop). However same program takes almost 215 sec in Python. I am quite amazed at the slowness of multiprocessing module. Is this because of process creation overhead for each data point? Please keep my email in the replies as I am not a member of this mailing list. Thanks, Abhinav -- http://mail.python.org/mailman/listinfo/python-list
Re: Anybody use web2py?
We have built Radbox.me on web2py and it's amazing.. > On Saturday, December 19, 2009 1:42 AM AppRe Godeck wrote: > Just curious if anybody prefers web2py over django, and visa versa. I > know it is been discussed on a flame war level a lot. I am looking for a > more intellectual reasoning behind using one or the other. >> On Saturday, December 19, 2009 3:48 PM Yarko wrote: >> Chevy or Ford? (or whatever pair you prefer) >> vi or emacs? >> ... >> >> These hold one aspect. >> >> Hammer or a saw? >> >> Hold (perhaps) another... >> >> us.pycon.org, for example, uses both (in reality a mix of the above >> argument sets, but at least evidence of the latter: different tools >> for different problems). >> >> From a rapid prototyping perspective, web2py is heavily data-table >> efficient: that is, you can define a system, and all the app creation, >> form generation and validation have defaults out of the box, and you >> can have a "sense" of your data-centric structure in minutes. The >> same argument can go against ("how do I get it to do exactly what _I_ >> want it to, not what it wants to?") - that is, defaults hide things, >> and that has two edges... >> >> From a layout/user interaction rapid prototyping perspective, web2py >> is just entering the waters... >> >> There is a steady growth of users, and (as you would expect for a >> young framework), a lot of changes going on (although backward >> compatiblity is a constant mantra when considering changes, that too >> is a double-edged thing). >> >> I find web2py useful, fast, and at times / in areas not as evolved / >> flexible as I'd like. BUT I could learn it quickly, and get to work >> quickly. >> >> I have taken an intro Django course (at a PyCon), have built a few >> things with it (not nearly as many as I have w/ web2py), and I _can_ >> do things in it - so I will let someone else w/ django "miles" under >> their belt speak their mind. >> >> - Yarko >>> On Saturday, December 19, 2009 3:51 PM Yarko wrote: >>> Oh and one more thing: I find it dependable (not that snapshots do not >>> have bugs, but that they are well defined, not "wild", and quickly >>> fixed - and if you work around them, you can also depend on the system >>> you have created). FYI, it does the money/registration part of PyCon >>> (past 2 years). On Saturday, December 19, 2009 6:32 PM mdipierro wrote: Of course I am the most biased person in the world on this topic but perhaps you want to hear my bias. A little bit of history... I thought a Django course at DePaul University and built a CMS for the United Nations in Django. I loved it. Then I also learned RoR. I found RoR more intuitive and better for rapid prototyping. I found Django much faster and more solid. I decided to build a proof of concept system that was somewhat in between Django and Rails with focus on 3 features: 1) easy to start with (no installation, no configuration, web based IDE, web based testing, debugging, and database interface); 2) enforce good practice (MVC, postbacks); 3) secure (escape all output, talk to database via DAL to present injections, server-side cookies with uuid session keys, role based access control with pluggable login methods, regex validation for all input including URLs). Originally it was a proof of concept, mostly suitable for teaching. Then lots of people helped to make it better and turn it into a production system. Now he had more than 50 contributors and a more than 20 companies that provide support. There are some distinctive features of web2py vs Django. Some love them, some hate hate them (mostly people who did not try them): - We promise backward compatibility. I do not accept patches that break it. It has been backward compatible for three years and I will enforce the copyright, if necessary, in order to ensure it for the future. - In web2py models and controllers are not modules. They are not imported. They are executed. This means you do not need to import basic web2py symbols. They are already defined in the environment that executes the models and controllers (like in Rails). This also means you do not need to restart the web server when you edit your app. You can import additional modules and you can define modules if you like. - You have a web based IDE with editor, some conflict resolution, Mercurial integration, ticketing system, web-based testing and debugging. - The Database Abstraction Layer (DAL) is closed to SQL than Dango ORM is. This means it does less for you (in particular about many 2 many) but it is more flaxible when it comes to complex joins, aggregates and nested selects. - The DAL supports out of the box SQLite, MySQL, PostgreSQL, MSSQL, Oracle, FireBird, FireBase, DB2, Informix, Ingres, an
Re: Python multithreading problem
thanks guys.I solved the problem by moving self.stdmutex.acquire() before if c<5: -- http://mail.python.org/mailman/listinfo/python-list
Python multithreading problem
//A CRAWLER IMPLEMENTATION please run this prog. on the shell and under the control of debugger when this prog. is run normally the prog. does not terminate .It doesn't come out of the cond. if c<5: so this prog. continues infinitely but if this prog is run under the control of debugger the prog terminates when the cond. if c<5: becomes false i think this prob. may be due to multithreading pls help. from sgmllib import SGMLParser import threading import re import urllib import pdb import time class urlist(SGMLParser): def reset(self): SGMLParser.reset(self) self.list=[] def start_a(self,attr): href=[v for k,v in attr if k=="href"] if href: self.list.extend(href) mid=2 c=0 class mythread(threading.Thread): stdmutex=threading.Lock() global threads threads=[] def __init__(self,u,myid): self.u=u self.myid=myid threading.Thread.__init__(self) def run(self): global c global mid if c<5: self.stdmutex.acquire() self.usock=urllib.urlopen(self.u) self.p=urlist() self.s=self.usock.read() self.p.feed(self.s) self.usock.close() self.p.close() c=c+1 fname="/root/" + str(c) + ".txt" self.f=open(fname,"w") self.f.write(self.s) self.f.close() print c print self.p.list print self.u print self.myid for j in self.p.list: k=re.search("^https?:",j) if k: i=mythread(j,mid) i.start() threads.append(i) mid=mid+1 self.stdmutex.release() if __name__=="__main__": thread=mythread("http://www.google.co.in/",1) thread.start() threads.append(thread) for thread in threads: thread.join() print "main thread exits" -- http://mail.python.org/mailman/listinfo/python-list
Python multithreading on cluster system? Embedding python in PVM?
Hi guys.I have read that one cannot perform true multithreading in python due to global interpreter lock mechanism.Suppose i have to implement a crawler on a say cluster system like clusterknoppix so that i can use parallel virtual machine (PVM)for programming in multiprocessor environment or say open MPI.Can i integrate python with PVM or MPI.Can i embed python into C for programming in multiprocessor environment.Is there any way of embedding python in PVM or MPI so that i can implement a true cluster based search engine? Any help would be very kind.Thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: web crawler in python or C?
It is DSL broadband 128kbps.But thats not the point.What i am saying is that would python be fine for implementing fast crawler algorithms or should i use C.Handling huge data,multithreading,file handling,heuristics for ranking,and maintaining huge data structures.What should be the language so as not to compromise that much on speed.What is the performance of python based crawlers vs C based crawlers.Should I use both the languages(partly C and python).How should i decide what part to be implemented in C and what should be done in python? Please guide me.Thanks. -- http://mail.python.org/mailman/listinfo/python-list
web crawler in python or C?
Hi guys.I have to implement a topical crawler as a part of my project.What language should i implement C or Python?Python though has fast development cycle but my concern is speed also.I want to strke a balance between development speed and crawler speed.Since Python is an interpreted language it is rather slow.The crawler which will be working on huge set of pages should be as fast as possible.One possible implementation would be implementing partly in C and partly in Python so that i can have best of both worlds.But i don't know to approach about it.Can anyone guide me on what part should i implement in C and what should be in Python? -- http://mail.python.org/mailman/listinfo/python-list
Re: Python vs C for a mail server
ya its supposed to be some stupid 6 month project which my friend has to do.I am just helping him out.he may not be implementing a full fledged rfc compliance mail server but may support some of the major functionalities.so basically its an extra ordinary need.I just wanted to know which language would be better for implementation and has faster development cycle.I have heard a lot about python and its ease of use.My point is it should be worth a 6 month project and speedy development since he is already proficient in C/C++ socket programming and taking the pain of learning python should be worth the effort. -- http://mail.python.org/mailman/listinfo/python-list
Python vs C for a mail server
Hello guys, I am a novice in python.I have to implement a full fledged mail server .But i am not able to choose the language.Should i go for C(socket API) or python for this project? What are the advantages of one over the other in implementing this server.which language will be easier? What are the performance issues?In what language are mail servers generally written? -- http://mail.python.org/mailman/listinfo/python-list