vox wrote:

> I'm contsructing a simple compare-script and thought I would use set
> ([]) to generate the difference output. But I'm obviosly doing
> something wrong.
> 
> file1 contains 410 rows.
> file2 contains 386 rows.
> I want to know what rows are in file1 but not in file2.
> 
> This is my script:
> s1 = set(open("file1"))
> s2 = set(open("file2"))

Remove the following three lines:

> s3 = set([])
> s1temp = set([])
> s2temp = set([])

 
> s1temp = set(i.strip() for i in s1)
> s2temp = set(i.strip() for i in s2)
> s3 = s1temp-s2temp
> 
> print len(s3)
> 
> Output is 119. AFAIK 410-386=24. What am I doing wrong here?

You are probably misinterpreting len(s3). s3 contains lines occuring in 
"file1" but not in "file2". Duplicate lines are only counted once, and the 
order doesn't matter. 

So there are 119 lines that occur at least once in "file2", but not in 
"file1".

If that is not what you want you have to tell us what exactly you are 
looking for.

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to