[Tutor] recursive glob -- recursive dir walk
Hello, A foolow-up ;-) from previous question about glob.glob(). I need to 'glob' files recursively from a top dir (parameter). Tried to use os.walk, but the structure of its return value is really unhandy for such a use (strange, because it seems to me this precise use is typical). On the other hand, os.path.walk seemed to meet my needs, but it is deprecated. I'd like to know if there are standard tools to do that. And your comments on the 2 approaches below. Thank you, denis -1- I first wrote the following recurseDirGlob() tool func. import os, glob def dirGlob(dir, pattern): ''' File names matching pattern in directory dir. ''' fullPattern = os.path.join(dir,pattern) return glob.glob(fullPattern) def recurseDirGlob(topdir=None, pattern=*.*, nest=False, verbose=False): ''' ''' allFilenames = list() # current dir if verbose: print *** %s %topdir if topdir is None: topdir = os.getcwd() filenames = dirGlob(topdir, pattern) if verbose: for filename in [os.path.basename(d) for d in filenames]: print%s %filename allFilenames.extend(filenames) # possible sub dirs names = [os.path.join(topdir, dir) for dir in os.listdir(topdir)] dirs = [n for n in names if os.path.isdir(n)] if verbose: print -- %s % [os.path.basename(d) for d in dirs] if len(dirs) 0: for dir in dirs: filenames = recurseDirGlob(dir, pattern, nest, verbose) if nest: allFilenames.append(filenames) else: allFilenames.extend(filenames) # final result return allFilenames Example with the following dir structure ; the version with nest=True will recursively nest files from subdirs. d0 d01 d02 d020 2 .txt files and 1 with a different pattern, in each dir recurseDirGlob(/home/spir/prog/d0, *.txt, verbose=True) -- *** /home/spir/prog/d0 t01.txt t02.txt -- ['d01', 'd02'] *** /home/spir/prog/d0/d01 t011.txt t012.txt -- [] *** /home/spir/prog/d0/d02 t021.txt t022.txt -- ['d020'] *** /home/spir/prog/d0/d02/d020 t0201.txt t0202.txt -- [] ['/home/spir/prog/d0/t01.txt', '/home/spir/prog/d0/t02.txt', '/home/spir/prog/d0/d01/t011.txt', '/home/spir/prog/d0/d01/t012.txt', '/home/spir/prog/d0/d02/t021.txt', '/home/spir/prog/d0/d02/t022.txt', '/home/spir/prog/d0/d02/d020/t0201.txt', '/home/spir/prog/d0/d02/d020/t0202.txt'] recurseDirGlob(/home/spir/prog/d0, *.txt) -- ['/home/spir/prog/d0/t01.txt', '/home/spir/prog/d0/t02.txt', '/home/spir/prog/d0/d01/t011.txt', '/home/spir/prog/d0/d01/t012.txt', '/home/spir/prog/d0/d02/t021.txt', '/home/spir/prog/d0/d02/t022.txt', '/home/spir/prog/d0/d02/d020/t0201.txt', '/home/spir/prog/d0/d02/d020/t0202.txt'] recurseDirGlob(/home/spir/prog/d0, *.txt, nest=True) -- ['/home/spir/prog/d0/t01.txt', '/home/spir/prog/d0/t02.txt', ['/home/spir/prog/d0/d01/t011.txt', '/home/spir/prog/d0/d01/t012.txt'], ['/home/spir/prog/d0/d02/t021.txt', '/home/spir/prog/d0/d02/t022.txt', ['/home/spir/prog/d0/d02/d020/t0201.txt', '/home/spir/prog/d0/d02/d020/t0202.txt']]] -2- Another approach was to build a general 'dirWalk' tool func, similar to os.path.walk: def dirWalk(topdir=None, func=None, args=[], nest=False, verbose=False): ''' ''' allResults = list() # current dir if verbose: print *** %s %topdir if topdir is None: topdir = os.getcwd() results = func(topdir, *args) if verbose: print %s % results allResults.extend(results) # possible sub dirs names = [os.path.join(topdir, dir) for dir in os.listdir(topdir)] dirs = [n for n in names if os.path.isdir(n)] if verbose: print -- %s % [os.path.basename(d) for d in dirs] if len(dirs) 0: for dir in dirs: results = dirWalk(dir, func, args, nest, verbose) if nest: allResults.append(results) else: allResults.extend(results) # final allResults return allResults Example uses to bring the same results, calling dirGlob, would be: dirWalk(/home/spir/prog/d0, dirGlob, args=[*.txt], verbose=True) -- dirWalk(/home/spir/prog/d0, dirGlob, args=[*.txt]) dirWalk(/home/spir/prog/d0, dirGlob, args=[*.txt],
Re: [Tutor] recursive glob -- recursive dir walk
2009/6/10 spir denis.s...@free.fr: A foolow-up ;-) from previous question about glob.glob(). Hopefully no misunderstanding this time :-) I need to 'glob' files recursively from a top dir (parameter). Tried to use os.walk, but the structure of its return value is really unhandy for such a use (strange, because it seems to me this precise use is typical). On the other hand, os.path.walk seemed to meet my needs, but it is deprecated. Is it really derecated? It is still in the 3.0 docs with no mention of this.. I'd like to know if there are standard tools to do that. And your comments on the 2 approaches below. Well, this is what I came up with which I am sure someone can improve on. patern = '*.txt' topdir = 'C:\\GTK\\' textfiles = [f[0] for f in [glob.glob(os.path.join(d[0], patern)) for d in os.walk(topdir)] if f] textfiles ['C:\\GTK\\license.txt'] Greets Sander -1- I first wrote the following recurseDirGlob() tool func. import os, glob def dirGlob(dir, pattern): ''' File names matching pattern in directory dir. ''' fullPattern = os.path.join(dir,pattern) return glob.glob(fullPattern) def recurseDirGlob(topdir=None, pattern=*.*, nest=False, verbose=False): ''' ''' allFilenames = list() # current dir if verbose: print *** %s %topdir if topdir is None: topdir = os.getcwd() filenames = dirGlob(topdir, pattern) if verbose: for filename in [os.path.basename(d) for d in filenames]: print %s %filename allFilenames.extend(filenames) # possible sub dirs names = [os.path.join(topdir, dir) for dir in os.listdir(topdir)] dirs = [n for n in names if os.path.isdir(n)] if verbose: print -- %s % [os.path.basename(d) for d in dirs] if len(dirs) 0: for dir in dirs: filenames = recurseDirGlob(dir, pattern, nest, verbose) if nest: allFilenames.append(filenames) else: allFilenames.extend(filenames) # final result return allFilenames Example with the following dir structure ; the version with nest=True will recursively nest files from subdirs. d0 d01 d02 d020 2 .txt files and 1 with a different pattern, in each dir recurseDirGlob(/home/spir/prog/d0, *.txt, verbose=True) -- *** /home/spir/prog/d0 t01.txt t02.txt -- ['d01', 'd02'] *** /home/spir/prog/d0/d01 t011.txt t012.txt -- [] *** /home/spir/prog/d0/d02 t021.txt t022.txt -- ['d020'] *** /home/spir/prog/d0/d02/d020 t0201.txt t0202.txt -- [] ['/home/spir/prog/d0/t01.txt', '/home/spir/prog/d0/t02.txt', '/home/spir/prog/d0/d01/t011.txt', '/home/spir/prog/d0/d01/t012.txt', '/home/spir/prog/d0/d02/t021.txt', '/home/spir/prog/d0/d02/t022.txt', '/home/spir/prog/d0/d02/d020/t0201.txt', '/home/spir/prog/d0/d02/d020/t0202.txt'] recurseDirGlob(/home/spir/prog/d0, *.txt) -- ['/home/spir/prog/d0/t01.txt', '/home/spir/prog/d0/t02.txt', '/home/spir/prog/d0/d01/t011.txt', '/home/spir/prog/d0/d01/t012.txt', '/home/spir/prog/d0/d02/t021.txt', '/home/spir/prog/d0/d02/t022.txt', '/home/spir/prog/d0/d02/d020/t0201.txt', '/home/spir/prog/d0/d02/d020/t0202.txt'] recurseDirGlob(/home/spir/prog/d0, *.txt, nest=True) -- ['/home/spir/prog/d0/t01.txt', '/home/spir/prog/d0/t02.txt', ['/home/spir/prog/d0/d01/t011.txt', '/home/spir/prog/d0/d01/t012.txt'], ['/home/spir/prog/d0/d02/t021.txt', '/home/spir/prog/d0/d02/t022.txt', ['/home/spir/prog/d0/d02/d020/t0201.txt', '/home/spir/prog/d0/d02/d020/t0202.txt']]] -2- Another approach was to build a general 'dirWalk' tool func, similar to os.path.walk: def dirWalk(topdir=None, func=None, args=[], nest=False, verbose=False): ''' ''' allResults = list() # current dir if verbose: print *** %s %topdir if topdir is None: topdir = os.getcwd() results = func(topdir, *args) if verbose: print %s % results allResults.extend(results) # possible sub dirs names = [os.path.join(topdir, dir) for dir in os.listdir(topdir)] dirs = [n for n in names if os.path.isdir(n)] if verbose: print -- %s % [os.path.basename(d) for d in dirs] if len(dirs) 0: for dir in dirs: results = dirWalk(dir, func, args, nest, verbose) if nest:
Re: [Tutor] recursive glob -- recursive dir walk
On Wed, Jun 10, 2009 at 2:28 AM, spirdenis.s...@free.fr wrote: Hello, A foolow-up ;-) from previous question about glob.glob(). I need to 'glob' files recursively from a top dir (parameter). Tried to use os.walk, but the structure of its return value is really unhandy for such a use (strange, because it seems to me this precise use is typical). On the other hand, os.path.walk seemed to meet my needs, but it is deprecated. I'd like to know if there are standard tools to do that. And your comments on the 2 approaches below. I would use os.walk(), with fnmatch.fnmatch() to do the pattern matching, and write the function as a generator (using yield). It would look something like this (untested): import os, fnmatch def findFiles(topDir, pattern): for dirpath, dirnames, filenames in os.walk(topDir): for filename in filenames: if fnmatch.fnmatch(filename, pattern): yield os.path.join(dirpath, filename) To get a list of matches you would call list(findFiles(topDir, pattern)) but if you just want to iterate over the paths you don't need the list. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] recursive glob -- recursive dir walk
spir wrote: Hello, A foolow-up ;-) from previous question about glob.glob(). I need to 'glob' files recursively from a top dir (parameter). Tried to use os.walk, but the structure of its return value is really unhandy for such a use (strange, because it seems to me this precise use is typical). On the other hand, os.path.walk seemed to meet my needs, but it is deprecated. I often use Fredrik Lundh's implementation, when I need a recursive 'glob'. And even though it was contributed some time ago, it appears to be 3.x compatible. http://mail.python.org/pipermail/python-list/2001-February/069987.html HTH, Marty ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor