On 23May2017 21:14, Mahmood Naderan <nt_mahm...@yahoo.com> wrote:
OK guys thank you very much. It is better to sort them first.

Here is what I wrote

files =  glob.glob('*chunk*')

I'd be inclined to go with either '*chunk_*' or just to read the strings from os.listdir, because what you want isn't easily written as a glob pattern (the syntax just isn't expressive enough). The glob is handy, because it guarrentees that there is an underscore in the name to split on, avoiding a tedious try/except around the split for names with no "_". Also below: notice that we're using rsplit, not split. You want the rightmost one. Consider the file "foo_chunk_9".

So:

 filenames = {}
 for name in glob.glob('*chunk_*'):
   left, right = name.rsplit('_', 1)
   if left.endswith('chunk') and right.isdigit():
     filenames[int(right)] = filename
 sorted_filenames = [ filenames[k] for k in sorted(filenames.keys()) ]

There's a few things to observe here:

- using glob to select names containing 'chunk_', which (a) ensures there is an underscore for the rsplit and (b) mostly picks only the files you want.

- using rsplit, to handle filenames with multiple underscores

- turning the suffix into an int, and storing the names keyed by the _numeric_ value of the suffix

sorted=[[int(name.split("_")[-1]), name] for name in files]
with open('final.txt', 'w') as outf:
     for fname in sorted:
           with open(fname[1]) as inf:
      for line in inf:
        outf.write(line)

A few remarks:

- try to avoid the word "sorted", it is a presupplied python function

- you're not doing any sorting! you have probably just been lucky with your filenames and the order they came back from the glob

Try making these files in your test directory:

 foo
 foo_chunk_0
 foo_chunk_1
 foo_chunk_2
 foo_chunk_10

and see what happens to your code. Temporarily drop the "with open..." and just print the filenames to see what order you would have processed the files.

Cheers,
Cameron Simpson <c...@zip.com.au>
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to