Suresh Krishna wrote:
it works (i think), but since this is my very first python program, i
would really appreciate feedback on how the program could be improved..
First of all, welcome, new Python programmer,
My main trouble with the program is lack of structure, some lines deal with
individual lines of the input file, while other deal with complete
citation-records. I think the program would improve if this is more clearly
expressed in the code.
One of the reasons that you got the current structure may be due to the use of
the 'for line in bibfile' loop, which gives you a new line at only one point
in the program.
Such a for-loop is good for simple line processing (for each line do ...), but
your program has grown beyond that imho. For this reason I'd like to propose
to throw out the for-loop, and use a while-loop with three parts instead.
For simplicity, I left out the handling of the records (which we can discuss
later perhaps)
inbibfile = open(InFileName,'r')
finished = False
while not finished:
# Read & skip blank lines
while True:
line = inbibfile.readline()
if len(line) == 0:
finished = True
break # end of file detected
if len(line.rstrip()) > 0: # Line contains non-white-space
break
# else it was a blank line, skip, read next line
if finished:
break
# line contains the first non-whitespace line of a new record
# read the entire record until the next empty line or EOF
current_entry = [line]
while True:
line = inbibfile.readline()
if len(line) == 0:
finished = True
break # Found EOF
if len(line.rstrip()) > 0:
current_entry.append(line)
else: # Found an empty line, finished with current record
break
# finished reading (empty line or EOF encountered)
## Do something with current_entry
# if not finished, do the next record
inbibfile.close()
You can go further by introducing a few functions:
inbibfile = open(InFileName,'r')
finished = False
while not finished:
finished, line = read_blank_lines(inbibfile)
if finished:
break
# line contains the first non-whitespace line of a new record
current_entry = make_new_record(line)
finished, current_entry = read_record_lines(current_entry, inbibfile)
# finished reading (empty line or EOF encountered)
## Do something with current_entry
inbibfile.close()
and the functions then contain the details of each step.
A few comments about your current program:
InFileName= "original_library.ris";
In Python, you don't end a line with a ';'.
Also, 'InFileName' is a constant, which is normally given an all-uppercase name.
OUTDUPBIBFILE=open(OutDupFileName,'w')
current_entry=[]
numduplicates=0
One of the most difficult things to do in programming is achieving
consistency. Above you have 3 global variables, and you use several different
way of writing their names. You should try to write them the same.
The big advantage of consistent names is that just by looking at a name you
know what kind of thing it is, which reduces the chance of making errors in
your program.
(Don't feel bad about it, this seems a very simple and obvious problem but in
fact it is very hard to achieve, especially for larger programs and projects
with several people).
>if keyvalue not in current_keys: #is a unique entry
current_keys.append(keyvalue) #append current key to list of
keys
current_entry.append(line) #add the blank line to current entry
OUTBIBFILE.writelines(current_entry) #write out to new bib
file without duplicates
current_entry=[] #clear current entry for next one
current_keyval=[] #clear current key
Look at the code of each line above, then look at the comment at the same
line. What does the comment tell you that the code didn't?
>continue #dont write out successive blanks or initial blanks
>elif current_entry and line.isspace(): #reached a blank that demarcates
end of current entry
>keyvalue=''.join(current_keyval) #generated a key based on certain
fields
> elif len(line)>2: #not a blank, so more stuff in currrent entry
Now do the same here.
You see the difference?
In the first lines you repeat your code in words. In the second lines, you
tell what you aim to achieve at that line, ie WHY do you do what you are doing
rather than just WHAT does the line do (since that is already defined in the
code).
In general, there is no need to comment each line. You may also assume that
the reader knows Python (otherwise there is little value for him reading the
code).
Try to make 'blocks' of code with one or more empty lines in between (see my
example code), and write a comment what that block as a whole aims to do.
Hope I didn't critize you too much. For a first Python program it is quite nice.
Si