Re: Advice regarding multiprocessing module

2013-03-11 Thread Abhinav M Kulkarni

Hi Jean,

Below is the code where I am creating multiple processes:

if __name__ == '__main__':
# List all files in the games directory
files = list_sgf_files()

# Read board configurations
(intermediateBoards, finalizedBoards) = read_boards(files)

# Initialize parameters
param = Param()

# Run maxItr iterations of gradient descent
for itr in range(maxItr):
# Each process analyzes one single data point
# They dump their gradient calculations in queue q
# Queue in Python is process safe
start_time = time.time()
q = Queue()
jobs = []
# Create a process for each game board
for i in range(len(files)):
p = Process(target=TrainGoCRFIsingGibbs, 
args=(intermediateBoards[i], finalizedBoards[i], param, q))

p.start()
jobs.append(p)
# Blocking wait for each process to finish
for p in jobs:
p.join()
elapsed_time = time.time() - start_time
print 'Iteration: ', itr, '\tElapsed time: ', elapsed_time

As you recommended, I'll use the profiler to see which part of the code 
is slow.


Thanks,
Abhinav

On 03/11/2013 04:14 AM, Jean-Michel Pichavant wrote:

- Original Message -


Dear all,
I need some advice regarding use of the multiprocessing module.
Following is the scenario:
* I am running gradient descent to estimate parameters of a pairwise
grid CRF (or a grid based graphical model). There are 106 data
points. Each data point can be analyzed in parallel.
* To calculate gradient for each data point, I need to perform
approximate inference since this is a loopy model. I am using Gibbs
sampling.
* My grid is 9x9 so there are 81 variables that I am sampling in one
sweep of Gibbs sampling. I perform 1000 iterations of Gibbs
sampling.
* My laptop has quad-core Intel i5 processor, so I thought using
multiprocessing module I can parallelize my code (basically
calculate gradient in parallel on multiple cores simultaneously).
* I did not use the multi-threading library because of GIL issues,
GIL does not allow multiple threads to run at a time.
* As a result I end up creating a process for each data point
(instead of a thread that I would ideally like to do, so as to avoid
process creation overhead).
* I am using basic NumPy array functionalities.
Previously I was running this code in MATLAB. It runs quite faster,
one iteration of gradient descent takes around 14 sec in MATLAB
using parfor loop (parallel loop - data points is analyzed within
parallel loop). However same program takes almost 215 sec in Python.
I am quite amazed at the slowness of multiprocessing module. Is this
because of process creation overhead for each data point?
Please keep my email in the replies as I am not a member of this
mailing list.
Thanks,
Abhinav

Hi,

Can you post some code, especially the part where you're create/running the 
processes ? If it's not too big, the process function as well.

Either multiprocess is slow like you stated, or you did something wrong.

Alternatively, if posting code is an issue, you can profile your python code, 
it's very easy and effective at finding which the code is slowing down everyone.
http://docs.python.org/2/library/profile.html

Cheers,

JM


-- IMPORTANT NOTICE:

The contents of this email and any attachments are confidential and may also be 
privileged. If you are not the intended recipient, please notify the sender 
immediately and do not disclose the contents to any other person, use it for 
any purpose, or store or copy the information in any medium. Thank you.


-- 
http://mail.python.org/mailman/listinfo/python-list


Advice regarding multiprocessing module

2013-03-10 Thread Abhinav M Kulkarni

Dear all,

I need some advice regarding use of the multiprocessing module. 
Following is the scenario:


 * I am running gradient descent to estimate parameters of a pairwise
   grid CRF (or a grid based graphical model). There are 106 data
   points. Each data point can be analyzed in parallel.
 * To calculate gradient for each data point, I need to perform
   approximate inference since this is a loopy model. I am using Gibbs
   sampling.
 * My grid is 9x9 so there are 81 variables that I am sampling in one
   sweep of Gibbs sampling. I perform 1000 iterations of Gibbs sampling.
 * My laptop has quad-core Intel i5 processor, so I thought using
   multiprocessing module I can parallelize my code (basically
   calculate gradient in parallel on multiple cores simultaneously).
 * I did not use the multi-threading library because of GIL issues, GIL
   does not allow multiple threads to run at a time.
 * As a result I end up creating a process for each data point (instead
   of a thread that I would ideally like to do, so as to avoid process
   creation overhead).
 * I am using basic NumPy array functionalities.

Previously I was running this code in MATLAB. It runs quite faster, one 
iteration of gradient descent takes around 14 sec in MATLAB using parfor 
loop (parallel loop - data points is analyzed within parallel loop). 
However same program takes almost 215 sec in Python.


I am quite amazed at the slowness of multiprocessing module. Is this 
because of process creation overhead for each data point?


Please keep my email in the replies as I am not a member of this mailing 
list.


Thanks,
Abhinav



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Anybody use web2py?

2011-03-03 Thread Abhinav Sood
We have built Radbox.me on web2py and it's amazing..

> On Saturday, December 19, 2009 1:42 AM AppRe Godeck wrote:

> Just curious if anybody prefers web2py over django, and visa versa. I
> know it is been discussed on a flame war level a lot. I am looking for a
> more intellectual reasoning behind using one or the other.


>> On Saturday, December 19, 2009 3:48 PM Yarko wrote:

>> Chevy or Ford?  (or whatever pair you prefer)
>> vi or emacs?
>> ...
>> 
>> These hold one aspect.
>> 
>> Hammer or a saw?
>> 
>> Hold (perhaps) another...
>> 
>> us.pycon.org, for example, uses both (in reality a mix of the above
>> argument sets, but at least evidence of the latter: different tools
>> for different problems).
>> 
>> From a rapid prototyping perspective, web2py is heavily data-table
>> efficient: that is, you can define a system, and all the app creation,
>> form generation and validation have defaults out of the box, and you
>> can have a "sense" of your data-centric structure in minutes.   The
>> same argument can go against ("how do I get it to do exactly what _I_
>> want it to, not what it wants to?") - that is, defaults hide things,
>> and  that has two edges...
>> 
>> From a layout/user interaction rapid prototyping perspective, web2py
>> is just entering the waters...
>> 
>> There is a steady growth of users, and (as you would expect for a
>> young framework), a lot of changes going on (although backward
>> compatiblity is a constant mantra when considering changes, that too
>> is a double-edged thing).
>> 
>> I find web2py useful, fast, and at times / in areas not as evolved /
>> flexible as I'd like.  BUT I could learn it quickly, and get to work
>> quickly.
>> 
>> I have taken an intro Django course (at a PyCon), have built a few
>> things with it (not nearly as many as I have w/ web2py), and I _can_
>> do things in it - so I will let someone else w/ django "miles" under
>> their belt speak their mind.
>> 
>> - Yarko


>>> On Saturday, December 19, 2009 3:51 PM Yarko wrote:

>>> Oh and one more thing: I find it dependable (not that snapshots do not
>>> have bugs, but that they are well defined, not "wild", and quickly
>>> fixed - and if you work around them, you can also depend on the system
>>> you have created).  FYI, it does the money/registration part of PyCon
>>> (past 2 years).


 On Saturday, December 19, 2009 6:32 PM mdipierro wrote:

 Of course I am the most biased person in the world on this topic but
 perhaps you want to hear my bias.
 
 A little bit of history... I thought a Django course at DePaul
 University and built a CMS for the United Nations in Django. I loved
 it. Then I also learned RoR. I found RoR more intuitive and better for
 rapid prototyping. I found Django much faster and more solid. I
 decided to build a proof of concept system that was somewhat in
 between Django and Rails with focus on 3 features: 1) easy to start
 with (no installation, no configuration, web based IDE, web based
 testing, debugging, and database interface); 2) enforce good practice
 (MVC, postbacks); 3) secure (escape all output, talk to database via
 DAL to present injections, server-side cookies with uuid session keys,
 role based access control with pluggable login methods, regex
 validation for all input including URLs).
 
 Originally it was a proof of concept, mostly suitable for teaching.
 Then lots of people helped to make it better and turn it into a
 production system. Now he had more than 50 contributors and a more
 than 20 companies that provide support.
 
 There are some distinctive features of web2py vs Django. Some love
 them, some hate hate them (mostly people who did not try them):
 
 - We promise backward compatibility. I do not accept patches that
 break it. It has been backward compatible for three years and I will
 enforce the copyright, if necessary, in order to ensure it for the
 future.
 
 - In web2py models and controllers are not modules. They are not
 imported. They are executed. This means you do not need to import
 basic web2py symbols. They are already defined in the environment that
 executes the models and controllers (like in Rails). This also means
 you do not need to restart the web server when you edit your app. You
 can import additional modules and you can define modules if you like.
 
 - You have a web based IDE with editor, some conflict resolution,
 Mercurial integration, ticketing system, web-based testing and
 debugging.
 
 - The Database Abstraction Layer (DAL) is closed to SQL than Dango ORM
 is. This means it does less for you (in particular about many 2 many)
 but it is more flaxible when it comes to complex joins, aggregates and
 nested selects.
 
 - The DAL supports out of the box SQLite, MySQL, PostgreSQL, MSSQL,
 Oracle, FireBird, FireBase, DB2, Informix, Ingres, an

Re: Python multithreading problem

2006-03-27 Thread abhinav
thanks guys.I solved the problem by moving self.stdmutex.acquire()
before if c<5:

-- 
http://mail.python.org/mailman/listinfo/python-list


Python multithreading problem

2006-03-26 Thread abhinav
//A CRAWLER IMPLEMENTATION
please run this prog. on the shell and under the control of debugger
when this prog. is run normally the prog. does not terminate .It
doesn't come out of the cond. if c<5: so this prog. continues
infinitely
but if this prog is run under the control of debugger the prog
terminates when the cond. if c<5: becomes false
i think this prob. may be due to multithreading pls help.


from sgmllib import SGMLParser
import threading
import re
import urllib
import pdb
import time
class urlist(SGMLParser):
def reset(self):
SGMLParser.reset(self)
self.list=[]

def start_a(self,attr):
href=[v for k,v in attr if k=="href"]
if href:
self.list.extend(href)
mid=2
c=0
class mythread(threading.Thread):
 stdmutex=threading.Lock()
 global threads
 threads=[]
 def __init__(self,u,myid):
self.u=u
self.myid=myid
threading.Thread.__init__(self)
 def run(self):
global c
global mid
if c<5:
self.stdmutex.acquire()
self.usock=urllib.urlopen(self.u)
self.p=urlist()
self.s=self.usock.read()
self.p.feed(self.s)
self.usock.close()
self.p.close()
c=c+1
fname="/root/" + str(c) + ".txt"
self.f=open(fname,"w")
self.f.write(self.s)
self.f.close()
print c
print self.p.list
print self.u
print self.myid
for j in self.p.list:
k=re.search("^https?:",j)
if k:
   i=mythread(j,mid)
   i.start()
   threads.append(i)
   mid=mid+1
self.stdmutex.release()






if __name__=="__main__":
thread=mythread("http://www.google.co.in/",1)
thread.start()
threads.append(thread)
for thread in threads:
  thread.join()
print "main thread exits"

































































































































































































































































-- 
http://mail.python.org/mailman/listinfo/python-list


Python multithreading on cluster system? Embedding python in PVM?

2006-02-19 Thread abhinav
Hi guys.I have read that one cannot perform true multithreading in
python due to global interpreter lock mechanism.Suppose i have to
implement a crawler on a say cluster system like clusterknoppix so that
i can use parallel virtual machine (PVM)for programming in
multiprocessor environment or say open MPI.Can i integrate python with
PVM or MPI.Can i embed python into C for programming in multiprocessor
environment.Is there any way of embedding python in PVM or MPI so that
i can implement a true cluster based search engine?
Any help would be very kind.Thanks.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: web crawler in python or C?

2006-02-15 Thread abhinav
It is DSL broadband 128kbps.But thats not the point.What i am saying is
that would python be fine for implementing fast crawler algorithms or
should i use C.Handling huge data,multithreading,file
handling,heuristics for ranking,and maintaining huge data
structures.What should be the language so as not to compromise that
much on speed.What is the performance of python based crawlers vs C
based crawlers.Should I use both the languages(partly C and python).How
should i decide what part to be implemented in C and what should be
done in python?
Please guide me.Thanks.

-- 
http://mail.python.org/mailman/listinfo/python-list


web crawler in python or C?

2006-02-15 Thread abhinav
Hi guys.I have to implement a topical crawler as a part of my
project.What language should i implement
C or Python?Python though has fast development cycle but my concern is
speed also.I want to strke a balance between development speed and
crawler speed.Since Python is an interpreted language it is rather
slow.The crawler which will be working on huge set of pages should be
as fast as possible.One possible implementation would be implementing
partly in C and partly in Python so that i can have best of both
worlds.But i don't know to approach about it.Can anyone guide me on
what part should i implement in C and what should be in Python?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python vs C for a mail server

2006-01-28 Thread abhinav
ya its supposed to be some stupid 6 month project which my friend has
to do.I am just helping him out.he may not be implementing a full
fledged rfc compliance mail server but may support some of the major
functionalities.so basically its an extra ordinary need.I just wanted
to know which language would be better for implementation and has
faster development cycle.I have heard a lot about python and its ease
of use.My point is it should be worth a 6 month project and speedy
development since he is already proficient in C/C++ socket programming
and taking the pain of learning python should be worth the effort.

-- 
http://mail.python.org/mailman/listinfo/python-list


Python vs C for a mail server

2006-01-27 Thread abhinav
Hello guys,
I am a novice in python.I have to implement a full fledged mail server
.But i am not able to choose the language.Should i go for C(socket API)
or python for this project? What are the advantages of one over the
other in implementing this server.which language will be easier? What
are the performance issues?In what language are mail servers generally
written?

-- 
http://mail.python.org/mailman/listinfo/python-list