Hello Pythonistas,

i have a very large textfile with contents like:

@INBOOK{Ackermann1999-b,
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
        and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann},
  year = {1980},
  timestamp = {1995-12-02}
}       

And i want to delete the duplicate rows except these rows containing the 
brackets { or }. 
The result should look like:

@INBOOK{Ackermann1999-b,
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann,
        Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and 
Ackermann},
  year = {1980},
  timestamp = {1995-12-02}
}

I come across with this Python-Skript:

lines_seen = set() # holds lines already seen
outfile = open("literatur_clean.txt", "w")
for line in open("literatur_dupl.txt", "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

But it deletes also the lines with a closing bracket } and the lines with the 
same authordata.
Therefor i need the condition of the brackets.

Could someone point me out to adding this condition?

Thanks in advance,
Joon




-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to