[EMAIL PROTECTED] wrote:
Hi group,

I have a basic question on the zip built in function.

I am writing a simple text file comparison script, that compares line
by line and character by character. The output is the original file,
with an X in place of any characters that are different.

I have managed a solution for a fixed (3) number of files, but I want
a solution of any number of input files.

The outline of my solution:

        for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
            res = ''
            for entry in zip(vec[0],vec[1],vec[2]):
                if len(set(entry)) > 1:
                    res = res+'X'
                else:
                    res = res+entry[0]
            outfile.write(res)

So vec is a tuple containing a line from each file, and then entry is
a tuple containg a character from each line.

2 questions
1) What is the general solution. Using zip in this way looks wrong. Is
there another function that does what I want

zip(*vec_list) will zip together all entries in vec_list
Do be aware that zip stops on the shortest iterable. So if vec[1] is shorter than vec[0] and matches otherwise, your output line will be truncated. Or if vec[1] is longer and vec[0] matches as far as it goes, there will be no signal either.

res=rex+whatever can be written as res+=whatever

2) I am using set to remove any repeated characters. Is there a
"better" way ?

I might have written a third loop to compare vec[0] to vec[1]..., but your set solution is easier and prettier.

If speed is an issue, don't rebuild the output line char by char. Just change what is needed in a mutable copy. I like this better anyway.

res = list(vec[0]) # if all ascii, in 3.0 use bytearray
for n, entry in enumerate(zip(vec[0],vec[1],vec[2])):
  if len(set(entry)) > 1:
      res[n] = 'X'
  outfile.write(''.join(res)) # in 3.0, write(res)

tjr




--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to