Hi, im trying to delete duplicate documents from my index, the unique indentifier is the documents url (aka field "url").
my initial thought of how to acomplish this is to open the index via a reader and sort them by the documents url and then iterate through them looking for a match with the current document and the previous document, if it matches i would delete the current document etc. what other methods that are not too taxing could i try? how could i sort the documents via url internally? what classes should i be looking at to do this Thanks, _gk