[EMAIL PROTECTED] wrote:
Hi group,
I have a basic question on the zip built in function.
I am writing a simple text file comparison script, that compares line
by line and character by character. The output is the original file,
with an X in place of any characters that are different.
I have managed a solution for a fixed (3) number of files, but I want
a solution of any number of input files.
The outline of my solution:
for vec in zip(vec_list[0],vec_list[1],vec_list[2]):
res = ''
for entry in zip(vec[0],vec[1],vec[2]):
if len(set(entry)) > 1:
res = res+'X'
else:
res = res+entry[0]
outfile.write(res)
So vec is a tuple containing a line from each file, and then entry is
a tuple containg a character from each line.
2 questions
1) What is the general solution. Using zip in this way looks wrong. Is
there another function that does what I want
zip(*vec_list) will zip together all entries in vec_list
Do be aware that zip stops on the shortest iterable. So if vec[1] is
shorter than vec[0] and matches otherwise, your output line will be
truncated. Or if vec[1] is longer and vec[0] matches as far as it goes,
there will be no signal either.
res=rex+whatever can be written as res+=whatever
2) I am using set to remove any repeated characters. Is there a
"better" way ?
I might have written a third loop to compare vec[0] to vec[1]..., but
your set solution is easier and prettier.
If speed is an issue, don't rebuild the output line char by char. Just
change what is needed in a mutable copy. I like this better anyway.
res = list(vec[0]) # if all ascii, in 3.0 use bytearray
for n, entry in enumerate(zip(vec[0],vec[1],vec[2])):
if len(set(entry)) > 1:
res[n] = 'X'
outfile.write(''.join(res)) # in 3.0, write(res)
tjr
--
http://mail.python.org/mailman/listinfo/python-list