Re: Lucene : avoiding locking

yahootintin-lucene Thu, 11 Nov 2004 15:57:01 -0800

I'm working on a similar project...
Make sure that only one call to the index method is occuring at
a time.  Synchronizing that method should do it.


--- Luke Shannon <[EMAIL PROTECTED]> wrote:

> Hi All;
> 
> I have hit a snag in my Lucene integration and don't know what
> to do.
> 
>  My company has a content management product. Each time
> someone changes the
>  directory structure or a file with in it that portion of the
> site needs to
>  be re-indexed so the changes are reflected in future searches
> (indexing
> must
>  happen during run time).
> 
>  I have written a Indexer class with a static Index() method.
> The idea is
> too
>  call the method every time something changes and the index
> needs to be
>  re-examined. I am hoping the logic put in by Doug Cutting
> surrounding the
>  UID will make indexing efficient enough to be called so
> frequently.
> 
>  This class works great when I tested it on my own little site
> (I have about
>  2000 file). But when I drop the functionality into the QA
> environment I get
>  a locking error.
> 
>  I can't access the stack trace, all I can get at is a log
> file the
>  application writes too. Here is the section my class wrote.
> It was right in
>  the middle of indexing and bang lock issue.
> 
>  I don't know if the problem is in my code or something in the
> existing
>  application.
> 
>  Error Message:
>  ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
>  |INFO|INDEXING INFO: Start Indexing new content.
>  |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
> Creation Of New
> Index
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING INFO: Beginnging Incremental update
> comparisions
>  |INFO|INDEXING ERROR: Unable to index new content Lock obtain
> timed out:
> 
>
Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
>  10f7fe8-write.lock
> 
> |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
> 
>  Here is my code. You will recognize it pretty much as the
> IndexHTML class
>  from the Lucene demo written by Doug Cutting. I have put a
> ton of comments
>  in a attempt to understand what is going on.
> 
>  Any help would be appreciated.
> 
>  Luke
> 
>  package com.fbhm.bolt.search;
> 
>  /*
>   * Created on Nov 11, 2004
>   *
>   * This class will create a single index file for the Content
>   * Management System (CMS). It contains logic to ensure
>   * indexing is done "intelligently". Based on IndexHTML.java
>   * from the demo folder that ships with Lucene
>   */
> 
>  import java.io.File;
>  import java.io.IOException;
>  import java.util.Arrays;
>  import java.util.Date;
> 
>  import org.apache.lucene.analysis.standard.StandardAnalyzer;
>  import org.apache.lucene.document.Document;
>  import org.apache.lucene.index.IndexReader;
>  import org.apache.lucene.index.IndexWriter;
>  import org.apache.lucene.index.Term;
>  import org.apache.lucene.index.TermEnum;
>  import org.pdfbox.searchengine.lucene.LucenePDFDocument;
>  import org.apache.lucene.demo.HTMLDocument;
> 
>  import com.alaia.common.debug.Trace;
>  import com.alaia.common.util.AppProperties;
> 
>  /**
>   * @author lshannon Description: <br
>   *   This class is used to index a content folder. It
> contains logic to
>   *   ensure only new or documents that have been modified
> since the last
>   *   search are indexed. <br
>   *   Based on code writen by Doug Cutting in the IndexHTML
> class found in
>   *   the Lucene demo
>   */
>  public class Indexer {
>   //true during deletion pass, this is when the index already
> exists
>   private static boolean deleting = false;
> 
>   //object to read existing indexes
>   private static IndexReader reader;
> 
>   //object to write to the index folder
>   private static IndexWriter writer;
> 
>   //this will be used to write the index file
>   private static TermEnum uidIter;
> 
>   /*
>    * This static method does all the work, the end result is
> an up-to-date
>  index folder
>   */
>   public static void Index() {
>    //we will assume to start the index has been created
>    boolean create = true;
>    //set the name of the index file
>    String indexFileLocation =
> 
>
AppProperties.getPropertyAsString("bolt.search.siteIndex.index.root");
>    //set the name of the content folder
>    String contentFolderLocation =
>  AppProperties.getPropertyAsString("site.root");
>    //manage whether the index needs to be created or not
>    File index = new File(indexFileLocation);
>    File root = new File(contentFolderLocation);
>    //the index file indicated exists, we need an incremental
> update of the
>    // index
>    if (index.exists()) {
>     Trace.TRACE("INDEXING INFO: An index folder exists at: " +
>  indexFileLocation);
>     deleting = true;
>     create = false;
>     try {
>      //this version of index docs is able to execute the
> incremental
>      // update
>      indexDocs(root, indexFileLocation, create);
>     } catch (Exception e) {
>      //we were unable to do the incremental update
>      Trace.TRACE("INDEXING ERROR: Unable to execute
> incremental update "
>          + e.getMessage());
>     }
>     //after exiting this loop the index should be current with
> content
>     Trace.TRACE("INDEXING INFO: Incremental update
> completed.");
>    }
>    try {
>     //create the writer
>     writer = new IndexWriter(index, new StandardAnalyzer(),
> create);
>     //configure the writer
>     writer.mergeFactor = 10000;
>     writer.maxFieldLength = 100000;
>     try {
>      //get the start date
>      Date start = new Date();
>      //call the indexDocs method, this time we will add new
>      // documents
>      Trace.TRACE("INDEXING INFO: Start Indexing new
> content.");
>      indexDocs(root, indexFileLocation, create);
>      Trace.TRACE("INDEXING INFO: Indexing new content
> complete.");
>      //optimize the index
>      writer.optimize();
>      //close the writer
>      writer.close();
>      //get the end date
>      Date end = new Date();
>      long totalTime = end.getTime() - start.getTime();
>      Trace.TRACE("INDEXING INFO: All Indexing Operations
> Completed in "
>          + totalTime + " milliseconds");
>     } catch (Exception e1) {
>      //unable to add new documents
>      Trace.TRACE("INDEXING ERROR: Unable to index new content
> "
>          + e1.getMessage());
>     }
>    } catch (IOException e) {
>     Trace.TRACE("INDEXING ERROR: Unable to create IndexWriter
> "
>       + e.getMessage());
>    }
>   }
> 
>   /*
>    * Walk directory hierarchy in uid order, while keeping uid
> iterator from
>  /*
>    * existing index in sync. Mismatches indicate one of: (a)
> old documents
> to
>  /*
>    * be deleted; (b) unchanged documents, to be left alone; or
> (c) new /*
>    * documents, to be indexed.
>    */
> 
>   private static void indexDocs(File file, String index,
> boolean create)
>     throws Exception {
>    //the index already exists we do an incremental update
>    if (!create) {
>     Trace.TRACE("INDEXING INFO: Incremental Update Request
> Confirmed");
>     //open existing index
>     reader = IndexReader.open(index);
>     //this gets an enummeration of uid terms
>     uidIter = reader.terms(new Term("uid", ""));
>     //jump to the index method that does the work
>     //this will use the Iteration above and does
>     //all the "smart" indexing
>     indexDocs(file);
>     //this will be true everytime the index already existed
>     //we are not going to delete documents that are old
>     if (deleting) {
>      Trace.TRACE("INDEXING INFO: Deleting Old Content Phase
> Started. All
>  Deleted Docs will be listed.");
>      while (uidIter.term() != null
>        && uidIter.term().field() == "uid") {
>       //basically we are deleting all the document we have
>       // indexed before
>       Trace.TRACE("INDEXING INFO: Deleting document "
>         + HTMLDocument.uid2url(uidIter.term().text()));
>       //delete the term from the reader
>       reader.delete(uidIter.term());
>       //go to the nextfield
>       uidIter.next();
>      }
>      Trace.TRACE("INDEXING INFO: Deleting Old Content Phase
> Completed");
>      //turn off the deleting flag
>      deleting = false;
>     }//close the deleting branch
>     //close the enummeration
>     uidIter.close(); // close uid iterator
>     //close the reader
>     reader.close(); // close existing index
> 
>    }
>    //we go here is the index already existed
>    else {
>     Trace.TRACE("INDEXING INFO: Index Folder Did Not Exist.
> Start Creation
> Of
>  New Index");
>     // don't have exisiting
>     indexDocs(file);
>    }
>   }
> 
>   private static void indexDocs(File file) throws Exception {
>    //check if we are at the top of a directory
>    if (file.isDirectory()) {
>     //get a list of the files
>     String[] files = file.list();
>     //sort them
>     Arrays.sort(files);
>     //index each file in the directory recursively
>     //we keep repeating this logic until we hit a
>     //file
>     for (int i = 0; i < files.length; i++)
>      //pass in the parent directory and the current file
>      //into the file constructor and index
>      indexDocs(new File(file, files[i]));
> 
>    }
>    //we have an actual file, so we need to consider the
>    //file extensions so the correct Document is created
>    else if (file.getPath().endsWith(".html")
>      || file.getPath().endsWith(".htm")
>      || file.getPath().endsWith(".txt")
>      || file.getPath().endsWith(".doc")
>      || file.getPath().endsWith(".xml")
>      || file.getPath().endsWith(".pdf")) {
> 
>     //if this is reached it means we were in the midst
>     //of an incremental update
>     if (uidIter != null) {
>      //get the uid for the document we are on
>      String uid = HTMLDocument.uid(file);
>      //now compare this document to the one we have in the
>      //enummeration of terms.
>      //if the term in the enummeration is less than the
>      //term we are on it must be deleted (if we are indeed
>      //doing an incrementatal update)
>      Trace.TRACE("INDEXING INFO: Beginnging Incremental update
>  comparisions");
>      while (uidIter.term() != null
>        && uidIter.term().field() == "uid"
>        && uidIter.term().text().compareTo(uid) < 0) {
>       //delete stale docs
>       if (deleting) {
>        reader.delete(uidIter.term());
>       }
>       uidIter.next();
>      }
>      //if the terms are equal there is no change with this
> document
>      //we keep it as is
>      if (uidIter.term() != null && uidIter.term().field() ==
> "uid"
>        && uidIter.term().text().compareTo(uid) == 0) {
>       uidIter.next();
>      }
>      //if we are not deleting and the document was not there
>      //it means we didn't have this document on the last index
>      //and we should add it
>      else if (!deleting) {
>       if (file.getPath().endsWith(".pdf")) {
>        Document doc = LucenePDFDocument.getDocument(file);
>        Trace.TRACE("INDEXING INFO: Adding new document to the
> existing
> index:
>  "
>            + doc.get("url"));
>        writer.addDocument(doc);
>       } else if (file.getPath().endsWith(".xml")) {
>        Document doc = XMLDocument.Document(file);
>        Trace.TRACE("INDEXING INFO: Adding new document to the
> existing
> index:
>  "
>            + doc.get("url"));
>        writer.addDocument(doc);
>       } else {
>        Document doc = HTMLDocument.Document(file);
>        Trace.TRACE("INDEXING INFO: Adding new document to the
> existing
> index:
>  "
>            + doc.get("url"));
>        writer.addDocument(doc);
>       }
>      }
>     }//end the if for an incremental update
>     //we are creating a new index, add all document types
>     else {
>      if (file.getPath().endsWith(".pdf")) {
>       Document doc = LucenePDFDocument.getDocument(file);
>       Trace.TRACE("INDEXING INFO: Adding a new document to the
> new index: "
>           + doc.get("url"));
>       writer.addDocument(doc);
>      } else if (file.getPath().endsWith(".xml")) {
>       Document doc = XMLDocument.Document(file);
>       Trace.TRACE("INDEXING INFO: Adding a new document to the
> new index: "
>           + doc.get("url"));
>       writer.addDocument(doc);
>      } else {
>       Document doc = HTMLDocument.Document(file);
>       Trace.TRACE("INDEXING INFO: Adding a new document to the
> new index: "
>           + doc.get("url"));
>       writer.addDocument(doc);
>      }//close the else
>     }//close the else for a new index
>    }//close the else if to handle file types
>   }//close the indexDocs method
> 
>  }
> 
> 
>  ----- Original Message ----- 
>  From: "Craig McClanahan" <[EMAIL PROTECTED]
>  To: "Jakarta Commons Users List"
> <[EMAIL PROTECTED]
>  Sent: Thursday, November 11, 2004 6:13 PM
>  Subject: Re: avoiding locking
> 
> 
>   In order to get any useful help, it would be nice to know
> what you are
>   trying to do, and (most importantly) what commons component
> is giving
>   you the problem :-).  The traditional approach is to put a
> prefix on
>   your subject line -- for commons package "foo" it would be:
> 
>     [foo] avoiding locking
> 
>   It's also generally helpful to see the entire stack trace,
> not just
>   the exception message itself.
> 
>   Craig
> 
> 
>   On Thu, 11 Nov 2004 17:27:19 -0500, Luke Shannon
>   <[EMAIL PROTECTED] wrote:
>    What can I do to avoid locking issues?
> 
>    Unable to execute incremental update Lock obtain timed out:
> 
>
Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
>  10f7fe8-write.lock
> 
>    Thanks,
> 
>    Luke
> 
> 
>  
>
---------------------------------------------------------------------
>   To unsubscribe, e-mail:
> [EMAIL PROTECTED]
>   For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> 
>
---------------------------------------------------------------------
>  To unsubscribe, e-mail:
> [EMAIL PROTECTED]
>  For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene : avoiding locking

Reply via email to