I knew that I had forgotten something. Below is the line that I use to
create the field that I am trying to use to delete the entries with. I
hope this avoids some confusion. Thank you very much to anyone that takes
the time to read these messages.
doc.add(new StringField("FileName",filename, Field.Store.YES));
On Sat, Dec 14, 2013 at 1:28 AM, Jason Corekin <[email protected]>wrote:
> Let me start by stating that I almost certain that I am doing something
> wrong, and that I hope that I am because if not there is a VERY large bug
> in Lucene. What I am trying to do is use the method
>
>
> deleteDocuments(Term... terms)
>
>
> out of the IndexWriter class to delete several Term object Arrays, each
> fed to it via a separate Thread. Each array has around 460k+ Term object
> in it. The issue is that after running for around 30 minutes or more the
> method finishes, I then have a commit run and nothing changes with my files.
> To be fair, I am running a custom Directory implementation that might be
> causing problems, but I do not think that this is the case as I do not even
> see any of the my Directory methods in the stack trace. In fact when I
> set break points inside the delete methods of my Directory implementation
> they never even get hit. To be clear replacing the custom Directory
> implementation with a standard one is not an option due to the nature of
> the data which is made up of terabytes of small (1k and less) files. So,
> if the issue is in the Directory implementation I have to figure out how to
> fix it.
>
>
> Below are the pieces of code that I think are relevant to this issue as
> well as a copy of the stack trace thread that was doing work when I paused
> the debug session. As you are likely to notice, the thread is called a
> DBCloner because it is being used to clone the underlying Index based
> database (needed to avoid storing trillions of files directly on disk). The
> idea is to duplicate the selected group of terms into a new database and
> then delete to original terms from the original database. The duplicate
> work wonderfully, but not matter what I do including cutting the program
> down to one thread I cannot shrink the database and the time to try to do
> the deletes takes drastically too long.
>
>
> In an attempt to be as helpful as possible, I will say this. I have been
> tracing this problem for a few days and have seen that
>
> BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef)
>
> is where that majority of the execution time is spent. I have also
> noticed that this method return false MUCH more often than it returns true.
> I have been trying to figure out how the mechanics of this process work
> just in case the issue was not in my code and I might have been able to
> find the problem. But I have yet to find the problem either in Lucene
> 4.5.1 or Lucene 4.6. If anyone has any ideas as to what I might be doing
> wrong, I would really appreciate reading what you have to say. Thanks in
> advance.
>
>
>
> Jason
>
>
>
> private void cloneDB() throws QueryNodeException {
>
>
>
> Document doc;
>
> ArrayList<String> fileNames;
>
> int start = docRanges[(threadNumber * 2)];
>
> int stop = docRanges[(threadNumber * 2) +
> 1];
>
>
>
> try {
>
>
>
> fileNames = new
> ArrayList<String>(docsPerThread);
>
> for (int i = start; i <
> stop; i++) {
>
> doc =
> searcher.doc(i);
>
> try {
>
>
> adder.addDoc(doc);
>
>
> fileNames.add(doc.get("FileName"));
>
> } catch
> (TransactionExceptionRE | TransactionException | LockConflictException te) {
>
>
> adder.txnAbort();
>
>
> System.err.println(Thread.currentThread().getName() + ": Adding a message
> failed, retrying.");
>
> }
>
> }
>
>
> deleters[threadNumber].deleteTerms("FileName",
> fileNames);
>
>
> deleters[threadNumber].commit();
>
>
>
> } catch (IOException | ParseException ex)
> {
>
>
> Logger.getLogger(DocReader.class.getName()).log(Level.SEVERE,
> null, ex);
>
> }
>
> }
>
>
>
>
>
> public void deleteTerms(String
> dbField,ArrayList<String> fieldTexts) throws IOException {
>
> Term[] terms = new
> Term[fieldTexts.size()];
>
> for(int i=0;i<fieldTexts.size();i++){
>
> terms[i]= new
> Term(dbField,fieldTexts.get(i));
>
> }
>
> writer.deleteDocuments(terms);
>
> }
>
>
>
> public void deleteDocuments(Term... terms) throws
> IOException
>
>
>
>
>
> Thread [DB Cloner 2] (Suspended)
>
> owns: BufferedUpdatesStream (id=54)
>
> owns: IndexWriter (id=49)
>
> FST<T>.readFirstRealTargetArc(long, Arc<T>, BytesReader)
> line: 979
>
> FST<T>.findTargetArc(int, Arc<T>, Arc<T>, BytesReader)
> line: 1220
>
>
> BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef)
> line: 1679
>
> BufferedUpdatesStream.applyTermDeletes(Iterable<Term>,
> ReadersAndUpdates, SegmentReader) line: 414
>
> BufferedUpdatesStream.applyDeletesAndUpdates(ReaderPool,
> List<SegmentCommitInfo>) line: 283
>
> IndexWriter.applyAllDeletesAndUpdates() line: 3112
>
> IndexWriter.applyDeletesAndPurge(boolean) line: 4641
>
>
> DocumentsWriter$ApplyDeletesEvent.process(IndexWriter,
> boolean, boolean) line: 673
>
> IndexWriter.processEvents(Queue<Event>, boolean, boolean)
> line: 4665
>
> IndexWriter.processEvents(boolean, boolean) line: 4657
>
>
> IndexWriter.deleteDocuments(Term...) line: 1421
>
> DocDeleter.deleteTerms(String, ArrayList<String>) line: 95
>
>
> DBCloner.cloneDB() line: 233
>
> DBCloner.run() line: 133
>
> Thread.run() line: 744
>
>
>
>
>
>
>