On Jan 25, 2011, at 8:19 PM, Craig Yoshioka wrote: > Hi all, > > I could really use some help with a problem I'm having.
Hiya Craig, I don't know if I can help, but it's really difficult to do without a full working example. Also, your code has several OS X-isms in it so I guess that's the platform you're on. But in case you're on Windows, note that that platform requires some extra care when using multiprocessing: http://docs.python.org/library/multiprocessing.html#windows Good luck Philip > I wrote a function that can take a pattern of actions and it apply it to the > filesystem. > It takes a list of starting paths, and a pattern like this: > > pattern = { > InGlob('Test/**'):{ > MatchRemove('DS_Store'):[], > NoMatchAdd('(alhpaID_)|(DS_Store)','warnings'):[], > MatchAdd('alphaID_','alpha_found'):[], > InDir('alphaID_'):{ > NoMatchAdd('(betaID_)|(DS_Store)','warnings'):[], > InDir('betaID_'):{ > NoMatchAdd('(gammaID_)|(DS_Store)','warnings'):[], > MatchAdd('gammaID_','gamma_found'):[] }}}} > > so if you run evalFSPattern(['Volumes/**'],pattern) it'll return a dictionary > where: > > dict['gamma_found'] = [list of paths that matched] (i.e. > '/Volumes/HD1/Test/alphaID_3382/betaID_38824/gammaID_848384') > dict['warning'] = [list of paths that failed to match] (ie. > '/Volumes/HD1/Test/alphaID_3382/gammaID_47383') > > Since some of these volumes are on network shares I also wanted to > parallelize this so that it would not block on IO. I started the > parallelization by using multiprocessing.Pool and got it to work if I ran the > fsparser from the interpreter. It ran in *much* less time and produced > correct output that matched the non-parallelized version. The problem begins > if I then try to use the parallelized function from within the code. > > For example I wrote a class whose instances are created around valid FS > paths, that are cached to reduce expensive FS lookups. > > class Experiment(object): > > SlidePaths = None > > @classmethod > def getSlidePaths(cls): > if cls.SlidePaths == None: > cls.SlidePaths = fsparser(['/Volumes/**'],pattern) > return cls.SlidePaths > > @classmethod > def lookupPathWithGammaID(cls,id): > paths = cls.getSlidePaths() > ... > return paths[selected] > > @classmethod > def fromGamaID(cls,id): > path = cls.lookupPathWithGammaID(id) > return cls(path) > > def __init__(self,path) > self.Path = path > ... > > ... > > If I do the following from the interpreter it works: > >>>> from experiment import Experiment >>>> expt = Experiment.fromGammaID(10102) > > but if I write a script called test.py: > > from experiment import Experiment > expt1 = Experiment.fromGammaID(10102) > expt2 = Experiment.fromGammaID(10103) > comparison = expt1.compareTo(expt2) > > it fails, if I try to import it or run it from bash prompt: > >>>> from test import comparison (hangs forever) > $ python test.py (hangs forever) > > I would really like some help trying to figure this out... I thought it > should work easily since all the spawned processes don't share data or state > (their results are merged in the main thread). The classes used in the > pattern are also simple python objects (use python primitives). > > > These are the main functions: > > def mapAction(pool,paths,action): > merge = {'next':[]} > for result in pool.map(action,paths): > if result == None: > continue > merge = mergeDicts(merge,result) > return merge > > > def mergeDicts(d1,d2): > for key in d2: > if key not in d1: > d1[key] = d2[key] > else: > d1[key] += d2[key] > return d1 > > > def evalFSPattern(paths,pattern): > pool = Pool(10) > results = {} > for action in pattern: > tomerge1 = mapAction(pool,paths,action) > tomerge2 = evalFSPattern(tomerge1['next'],pattern[action]) > del tomerge1['next'] > results = mergeDicts(results,tomerge1) > results = mergeDicts(results,tomerge2) > return results > > the classes used in the pattern (InGlob,NoMatchAdd,etc.) are callable classes > that take a single parameter (a path) and return a dict result or None which > makes them trivial to adapt to Pool.map. > > Note if I change the mapAction function to: > > def mapAction(pool,paths,action): > merge = {'next':[]} > for path in paths: > result = action(path) > if result == None: > continue > merge = mergeDicts(merge,result) > return merge > > everything works just fine. > > > Thanks. > > > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list