tysondog...@gmail.com wrote: > I am trying to delete duplicates but the job just finishes with an exit > code 0 and does not delete any duplicates. > > The duplicates for the data always exist in Column F and I am desiring to > delete the entire row B-I > > Any ideas? > > > import openpyxl > wb1 = openpyxl.load_workbook('C:/dwad/SWWA.xlsx') > ws1 = wb1.active # keep naming convention consistent > > values = [] > for i in range(2,ws1.max_row+1): > if ws1.cell(row=i,column=1).value in values: > #pass > #else: > values.append(ws1.cell(row=i,column=1).value) > > for value in values: > ws1.append([value])
append() will add even more duplicates to the sheet. If you do not care about cell styles you can create a new sheet and copy only unique values. A complete example: import openpyxl SOURCE_FILE = "duplicates.xlsx" DEST_FILE = "unique.xlsx" HEADER_COUNT = 1 KEY_COLUMNS = [1] # zero-based A=0, B=1, ... workbook = openpyxl.load_workbook(SOURCE_FILE) source_sheet = workbook.active dest_sheet = workbook.create_sheet() seen = set() for i, row in enumerate(source_sheet.values): if i < HEADER_COUNT: dest_sheet.append(row) else: key = tuple(row[i] for i in KEY_COLUMNS) print("row = %r, key = %r" %(row, key)) if key not in seen: print("adding row", row) seen.add(key) dest_sheet.append(row) workbook.save(DEST_FILE) > I have attempted to do this with openpyxl for an excel as well as other > methods (including csv though this deleted rows excessively). I find that hard to believe. If anything it should keep more rows as you compare whole lines, not just the columns you are interested in. > CSV: > with open('1.csv','r') as in_file, open('2.csv','w') as out_file: > seen = set() # set for fast O(1) amortized lookup > for line in in_file: > if line not in seen: > seen.add(line) > out_file.write(line) General remark: use the csv module in the standard library rather than trying to parse the records manually. -- https://mail.python.org/mailman/listinfo/python-list