(Trying to get the email thriough the system; it hasn't appeared on my epimorphics mailbox inbox)
I posted to Stephen: Dealing with deletions. We do not yet address the general case. TextDocProducerTriples simply ignores the deletion case and we don't change that behaviour in jena-text -- instead we use an alternative docProducer (which you can see in the Epimorphics github repository in ppd-text-index). This docProducer deals with deletions in this way: - incoming quads are accumulated while the subject remains the same. When the subject changes (or we reach finish()) then we deal with the batch. - we delete the documents corresponding to the subject. - however we may have deleted documents that should still exist. We make a new document entity and then reach back into the dataset and add to the entity all the quads that are still in the dataset and about this subject. - if we added any quads, then we put this new entity back into the index This requires that the producer has access to the dataset, which is why our TextDocProducerBatch takes the dataset (graph) as one of its constructor arguments. Our branch's assembler, when attempting to construct the producer specified by its classname, checks to see if there's a two-argument form (DatasetGraph, TextIndex) as well as the one-argument form (TextIndex). -- Chris "allusive" Dollin
