Re: Multiprocessing takes higher execution time

2009-01-08 Thread Nick Craig-Wood
Sibtey Mehdi sibt...@infotechsw.com wrote:
 I use multiprocessing to compare more then one set of files.
 
 For comparison each set of files (i.e. Old file1 Vs New file1)
 I create a process,
 
 Process(target=compare, args=(oldFile, newFile)).start()
 
 It takes 61 seconds execution time.
 
 When I do the same comparison without implementing
 multiprocessing, it takes 52 seconds execution time.

 The oldProjects and newProjects will contains zip files
 i.e(oldxyz1.zip,oldxyz2.zip, newxyz2.zip,newxyz2.zip)
 it will unzip both the zip files and compare all the files between old
 and new (mdb files or txt files) and gives the result.
 I do this comparision for n number set of zip files and i assigne each
 set of zip files comparision to a process.

I had a brief look at the code and your use of multiprocessing looks
fine.

How many projects are you processing at once?  And how many MB of zip
files is it?  As reading zip files does lots of disk IO I would guess
it is disk limited rather than anything else, which explains why doing
many at once is actually slower (the disk has to do more seeks).

-- 
Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing takes higher execution time

2009-01-08 Thread James Mills
On Thu, Jan 8, 2009 at 7:31 PM, Nick Craig-Wood n...@craig-wood.com wrote:
(...)

 How many projects are you processing at once?  And how many MB of zip
 files is it?  As reading zip files does lots of disk IO I would guess
 it is disk limited rather than anything else, which explains why doing
 many at once is actually slower (the disk has to do more seeks).

If this is the case, this problem is not well suited to multi processing
but rather distributed processing :)

--JamesMills
--
http://mail.python.org/mailman/listinfo/python-list


RE: Multiprocessing takes higher execution time

2009-01-08 Thread Sibtey Mehdi
Thanks Nick.

It processes 10-15 projects(i.e. 10-15 processes are started) at once. One
Zip file size is 2-3 MB.

When I used dual core system it reduced the execution time from 61 seconds
to 55 seconds.

My dual core system Configuration is,
Pentium(R) D CPU 3.00GHz, 2.99GHz
1 GB RAM

Regards,
Gopal



-Original Message-
From: Nick Craig-Wood [mailto:n...@craig-wood.com] 
Sent: Thursday, January 08, 2009 3:01 PM
To: python-list@python.org
Subject: Re: Multiprocessing takes higher execution time

Sibtey Mehdi sibt...@infotechsw.com wrote:
 I use multiprocessing to compare more then one set of files.
 
 For comparison each set of files (i.e. Old file1 Vs New file1)
 I create a process,
 
 Process(target=compare, args=(oldFile, newFile)).start()
 
 It takes 61 seconds execution time.
 
 When I do the same comparison without implementing
 multiprocessing, it takes 52 seconds execution time.

 The oldProjects and newProjects will contains zip files
 i.e(oldxyz1.zip,oldxyz2.zip, newxyz2.zip,newxyz2.zip)
 it will unzip both the zip files and compare all the files between old
 and new (mdb files or txt files) and gives the result.
 I do this comparision for n number set of zip files and i assigne each
 set of zip files comparision to a process.

I had a brief look at the code and your use of multiprocessing looks
fine.

How many projects are you processing at once?  And how many MB of zip
files is it?  As reading zip files does lots of disk IO I would guess
it is disk limited rather than anything else, which explains why doing
many at once is actually slower (the disk has to do more seeks).

-- 
Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick


--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing takes higher execution time

2009-01-07 Thread Steve Holden
Sibtey Mehdi wrote:
 Hi,
 
  
 
 I use multiprocessing to compare more then one set of files.
 
 For comparison each set of files (i.e. Old file1 Vs New file1) I create
 a process,
 
 Process(target=compare, args=(oldFile, newFile)).start()
 
 It takes 61 seconds execution time.
 
  
 
 When I do the same comparison without implementing multiprocessing, it
 takes 52 seconds execution time.
 
  
 
 The parallel processing time should be lesser.
 
  
 
 I am not able to get advantage of multiprocessing here.
 
  
 
 Any suggestions can be very helpful.
 
My first suggestion would be: show us some code. We aren't psychic, you
know.

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing takes higher execution time

2009-01-07 Thread Grant Edwards
On 2009-01-07, Steve Holden st...@holdenweb.com wrote:

 I use multiprocessing to compare more then one set of files.
 
 For comparison each set of files (i.e. Old file1 Vs New file1)
 I create a process,
 
 Process(target=compare, args=(oldFile, newFile)).start()
 
 It takes 61 seconds execution time.
 
 When I do the same comparison without implementing
 multiprocessing, it takes 52 seconds execution time.

 My first suggestion would be: show us some code. We aren't
 psychic, you know.

I am!

He's only got one processor, and he's just been bit by Amdahl's
law when P1 and S1.

There you have a perfectly psychic answer: an educated guess
camoflaged in plausible-sounding but mostly-bullshit buzzwords.
A better psychic would have avoided making that one falsifiable
statement (he's only got one processor).

-- 
Grant Edwards   grante Yow! Hello.  Just walk
  at   along and try NOT to think
   visi.comabout your INTESTINES being
   almost FORTY YARDS LONG!!
--
http://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing takes higher execution time

2009-01-07 Thread Nick Craig-Wood
Grant Edwards inva...@invalid wrote:
  On 2009-01-07, Steve Holden st...@holdenweb.com wrote:
 
  I use multiprocessing to compare more then one set of files.
  
  For comparison each set of files (i.e. Old file1 Vs New file1)
  I create a process,
  
  Process(target=compare, args=(oldFile, newFile)).start()
  
  It takes 61 seconds execution time.
  
  When I do the same comparison without implementing
  multiprocessing, it takes 52 seconds execution time.
 
  My first suggestion would be: show us some code. We aren't
  psychic, you know.
 
  I am!
 
  He's only got one processor, and he's just been bit by Amdahl's
  law when P1 and S1.
 
  There you have a perfectly psychic answer: an educated guess
  camoflaged in plausible-sounding but mostly-bullshit buzzwords.
  A better psychic would have avoided making that one falsifiable
  statement (he's only got one processor).

;-)

My guess would be that the job is IO bound rather than CPU bound, but
that is covered by Amdahl's Law too where P is approx 0, N
irrelevant...

Being IO bound explains why it takes longer with multiprocessing - it
causes more disk seeks to run an IO bound algorithm in parallel than
running it sequentially.

-- 
Nick Craig-Wood n...@craig-wood.com -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list


RE: Multiprocessing takes higher execution time

2009-01-07 Thread Sibtey Mehdi
Hello,

Please see the code I have send in attachment.
Any suggestions will highly appreciate.

Thanks and Regards,
Gopal
-Original Message-
From: Grant Edwards [mailto:inva...@invalid] 
Sent: Wednesday, January 07, 2009 8:58 PM
To: python-list@python.org
Subject: Re: Multiprocessing takes higher execution time

On 2009-01-07, Steve Holden st...@holdenweb.com wrote:

 I use multiprocessing to compare more then one set of files.
 
 For comparison each set of files (i.e. Old file1 Vs New file1)
 I create a process,
 
 Process(target=compare, args=(oldFile, newFile)).start()
 
 It takes 61 seconds execution time.
 
 When I do the same comparison without implementing
 multiprocessing, it takes 52 seconds execution time.

 My first suggestion would be: show us some code. We aren't
 psychic, you know.

I am!

He's only got one processor, and he's just been bit by Amdahl's
law when P1 and S1.

There you have a perfectly psychic answer: an educated guess
camoflaged in plausible-sounding but mostly-bullshit buzzwords.
A better psychic would have avoided making that one falsifiable
statement (he's only got one processor).

-- 
Grant Edwards   grante Yow! Hello.  Just walk
  at   along and try NOT to
think
   visi.comabout your INTESTINES
being
   almost FORTY YARDS LONG!!


The oldProjects and newProjects will contains zip files 
i.e(oldxyz1.zip,oldxyz2.zip, newxyz2.zip,newxyz2.zip)
it will unzip both the zip files and compare all the files between old and new 
(mdb files or txt files) and gives the result.
I do this comparision for n number set of zip files and i assigne each set of 
zip files comparision to a process.





class CompareProjects(dict):

Compares the set of  projects(zip files)

def __init__(self, oldProjects, newProjects, ignoreOidFields, tempDir):
self.oldProjects = oldProjects
self.newProjects = newProjects

def _compare(self, tempDir, ignoreOidFields):

   Compares each project

projects = 
set(self.oldProjects.keys()).union(set(self.newProjects.keys()))
progress.totalProjects = len(projects)
progress.progress = 0
que = Queue()
for count,project in enumerate(projects):
oldProject = self.oldProjects.get(project) if project in 
self.oldProjects else None
newProject = self.newProjects.get(project) if project in 
self.newProjects else None
prj = '_'.join((os.path.basename(oldProject)[:-4], 
os.path.basename(newProject)[:-4]))
cmpProj = CompareProject(oldProject, newProject, ignoreOidFields, 
tempDir)
p = Process(target=cmpProj._compare, args=(os.path.join(tempDir, 
prj), ignoreOidFields, False, project, que))
p.start()
print 'pid',p.pid

while progress.totalProjects != len(self):
if not que.empty():
proj, cmpCitect = que.get_nowait()#get()
self.__setitem__(proj, cmpCitect)
progress.progress += 1
else:
time.sleep(0.001)


class CompareProject(object):

compares two projects

def __init__(self, oldProject, newProject, ignoreOidFields, tempDir, 
unitCompare = False):
self.oldProject = oldProject
self.newProject = newProject

def _compare(self, tempDir, ignoreOidFields, unitCompare, project=None, 
que=None):

   Compares the extracted .mdb files and txt files

oldProjectDir   = os.path.join(tempDir,'oldProject')
newProjectDir   = os.path.join(tempDir,'newProject')

# get .mdb files and txt files from the project
oldmdbFiles,oldTxtFiles = self.getFiles(self.oldProject, oldProjectDir) 
newmdbFiles,newTxtFiles = self.getFiles(self.newProject, newProjectDir) 

#start comparing mdb files and txt files
self.comparedTables  = ComparedTables(oldDbfFiles, newDbfFiles, 
ignoreOidFields, tempDir)
self.comparedTxtFiles = ComparedTextFiles(oldTxtFiles, newTxtFiles, 
tempDir)
if que and project:
que.put_nowait((project, self))   


class ComparedTables(dict):
 
 This class compare two mdb files and it each tables and gives the results
 
 

class ComparedTextFiles(dict):
 
 This class compare two txt files line by line and give the diff results
 
 ..--
http://mail.python.org/mailman/listinfo/python-list