Hi there,

thanks for your resonse guys!

For the answers I got the info that I must not have an IndexWriter
and an IndexReader open at the same time that both want to modify
the index - even sequentially.

What I have is the following:

1 Thread is working out events such as resource (file or folder)
  was added/removed/deleted/etc. All index modifications are
  synchronized against a write-lock object.

1 Thread does "index switching" what means that he synchronizes on
  the write lock and then closes modifying index-reader and index-writer.
  Next it copies that index completely and reopens the index-reader and
  -writer on the copied index.
  Then he syncs on the read lock and closes the index searcher and
  reopens it on the index that was previously copied.

N Threads that perform search requestes but sync against the read-lock.

Since I can garantee that there is only one thread working out the
change events sequentially, the index-writer and index-reader will never
do any concurrent modifications.

This time I will attatch my source as text in this mail to get sure.
For those who do not know avalon/exalibur: It is a framework that
will be the only one calling the configure/start/stop methods.
No one can access the instance until it is properly created, configured
and started so synchronization is not neccessary in the start method.

Thanks again
  Jörg
----
/**
 * This is the implementation of the ISearchManager using lucene as underlying
 * search engine.<br/>
 * Everything would be so simple if lucene was thread-safe for concurrently
 * modifying and searching on the same index, but it is not. <br/>
 * My first idea was to have a single index that is continiusly modified and a
 * background thread that continuosly closes and reopens the index searcher.
 * This should bring most recent search results but it did not work proberly
 * with lucene.<br/>  
 * My strategy now is to have multiple indexes and to cycle over all of them
 * in a backround thread copying the most recent one to the next (least recent)
 * one. Index modifications are always performed on the most recent index, 
 * while searching is always performed on the second recent (copy of the) index.
 * This stategy results in less acutal (but still very acceptable) actuality
 * of search results. Further it produces a lot more disk space overhead but
 * with the advantage of having backups of the index.<br/>
 * Because the search must filter the search results the user does not have 
 * read access on, it can also filter the results that do not exist anymore
 * without further costs.  
 * 
 * @author Joerg Hohwiller (jhohwill)
 */
public class SearchManager
    extends AbstractManager
    implements
        ISearchManager,
        IDataEventListener,
        Startable,
        Serviceable,
        Disposable,
        Configurable,
        Runnable,
        ThreadSafe {

    /** 
     * A background thread is switching/updating the index used for indexing
     * and/or searching. The thread sleeps an amount of this constant in 
     * milliseconds until the next switch is done.<br/>
     * The shorter the delay, the more actual the search results but also the
     * more preformance overhead is produced.<br/>
     * Be aware that the delay does not determine the index switching frequency
     * because after a sleep of the delay, the index is copied and the switched.
     * This required time for this operation does depend on the size of the
     * index. This also means that the bigger the index, the less acutal are
     * the search results.<br/> 
     * A value of 60 seconds (60 * 1000L) should be OK. 
     */
    private static final long INDEX_SWITCH_DELAY = 30 * 1000L;

    /** the URI field name */
    public static final String FIELD_URI = "uri";

    /** the title field name */
    public static final String FIELD_TITLE = "dc_title";

    /** the text field name */
    public static final String FIELD_TEXT = "text";

    /** the read action */
    private static final String READ_ACTION_URI = "/actions/read";

    /** the name of the configuration tag for the index settings */
    private static final String CONFIGURATION_TAG_INDEXER = "indexer";

    /** the name of the configuration attribute for the index path */
    private static final String CONFIGURATION_ATTRIBUTE_INDEX_PATH = "index-path";

    /** the user used to access resources for indexing (global read access) */
    private static final String SEARCH_INDEX_USER = "indexer";

    /** the maximum number of search hits */
    private static final int MAX_SEARCH_HITS = 100;

    /** the default analyzer used for the search index */
    private static final Analyzer ANALYZER = new StandardAnalyzer();

    /** 
     * the number of indexes used, must be at least 3:
     * <ul>
     *   <li>one for writing/updating</li>
     *   <li>one for read/search</li>
     *   <li>one temporary where the index is copied to</li>
     * </ul>
     * All further indexes will act as extra backups of the index but will
     * also waste more disk space. 
     */
    private static final int INDEX_COUNT = 3;

    /** the descriptor manager used for searching, restricted to query method!!! */
    private IDescriptorManager searchDescriptorManager;

    /** the descriptor manager used for indexing */
    private IDescriptorManager indexDescriptorManager;

    /** the read lock object used to synchronize read access (search queries) */
    private Object readLock;

    /** the write lock object used to synchronize write access (index modification) */
    private Object writeLock;

    /** the event manager */
    private IDataEventManager eventManager;

    /** the writer used to add documents to the index */
    private IndexWriter indexWriter;

    /** the reader used to remove documents from the index */
    private IndexReader indexReader;

    /** the index searcher */
    private Searcher searcher;

    /** the path where the indexes is stored */
    private String indexPath;

    /** the index directories */
    private File[] indexDirectories;

    /** the array-index in the indexDir for the currently used write index */
    private int indexPosition;

    /** this flag is set to <code>true</code> if the background thread should stop */
    private boolean done;

    /** the background thread that updates the indexes for search */
    private Thread backgroundThread;

    /** the factory to gets search parsers by mimetype */
    private SearchParserFactory parserFactory;

    /**
     * The constructor.
     */
    public SearchManager() {
        super();
        this.indexWriter = null;
        this.indexReader = null;
        this.indexPath = null;
        this.done = false;
        this.backgroundThread = null;
        this.parserFactory = new SearchParserFactory();
        this.readLock = new Object();
        this.writeLock = new Object();
    }
    
    /**
     * This method adds a resource to the search index.
     * 
     * @param resourceUri is the URI of the resource to be added to the index.
     * @param htmlPath is the path to the rendered html version of the 
     *         resource or <code>null</code> if the document has not (maybe only
     *         yet) been rendered to html.
     */
    public void addResource(String resourceUri, String htmlPath) {
        try {
            //create a lucene document...
            Document document = new Document();

            //add the URI as field to that document (used as ID of the document)
            document.add(Field.Keyword(FIELD_URI, resourceUri));

            //set the default title
            String title = FileUtil.basename(resourceUri);

            //determine the mimetype by extension
            String mimetype = null;
            if (htmlPath == null) {
                mimetype = DmsUtil.getMimetype(resourceUri);
            } else {
                mimetype = DmsUtil.MIMETYPE_HTML;
            }

            ISearchParser parser = this.parserFactory.getParser(mimetype);
            if (parser != null) {
                InputStream contentStream = null;
                try {
                    if (htmlPath == null) {
                        contentStream =
                            
this.indexDescriptorManager.getResourceContent(resourceUri);
                    } else {
                        contentStream = new FileInputStream(htmlPath);
                    }
                    String newTitle = parser.parse(document, contentStream);
                    if (newTitle != null) {
                        title = newTitle;
                    }
                } catch (SearchParserException e1) {
                    getLogger().debug(e1.getMessage());
                } catch (Throwable t) {
                    getLogger().debug(t.getMessage());
                } finally {
                    if (contentStream != null) {
                        contentStream.close();
                    }
                }
            }

            Enumeration metaDataList = 
this.indexDescriptorManager.getMetaData(resourceUri);
            while (metaDataList.hasMoreElements()) {
                IMetaData metaData = (IMetaData) metaDataList.nextElement();
                Object value = metaData.getValue();
                String text = null;
                String key = (metaData.getNamespace() + "_" + 
metaData.getName()).toLowerCase();
                //do not add empty meta fields...
                if (value != null) {
                    text = String.valueOf(value);
                    if (text.length() == 0) {
                        text = null;
                    }
                }
                if (text != null) {
                    if (FIELD_TITLE.equals(key)) {
                        title = text;
                    } else {
                        document.add(Field.Text(key, text));
                    }
                }
            }
            document.add(Field.Text(FIELD_TITLE, title));

            synchronized (this.writeLock) {
                this.indexWriter.addDocument(document);
            }
        } catch (IOException e) {
            throw new DmsServiceException("Search index error!", e);
        } catch (ResourceNotExistsException e) {
            throw new DmsServiceException("Search index error!", e);
        } catch (AccessException e) {
            throw new DmsServiceException("Search index error!", e);
        }
    }

    /**
     * This method removes a resource from the search index.
     * 
     * @param resourceUri is the URI of the resource to delete from the index.
     */
    public void removeResource(String resourceUri) {
        try {
            synchronized (this.writeLock) {
                this.indexReader.delete(new Term(FIELD_URI, resourceUri));
            }
        } catch (IOException e) {
            throw new DmsServiceException("Search index error!", e);
        }
    }

    /**
     * This method updates (reindexes) a resource in the search index.
     * 
     * @param resourceUri is the resource to update in the index.
     */
    public void updateResource(String resourceUri) {
        //keep it simple so far...
        removeResource(resourceUri);
        addResource(resourceUri, null);
    }

    /**
     * This method moves/renames a resource in the index.
     * 
     * @param sourceUri is the old URI of the resource that has moved.
     * @param targetUri is the new URI of the resource that has moved.
     */
    public void moveResource(String sourceUri, String targetUri) {
        removeResource(sourceUri);
        addResource(targetUri, null);
    }

    /**
     * This method moves/renames a resource in the index.
     * 
     * @param sourceUri is the old URI of the resource that has moved.
     * @param targetUri is the new URI of the resource that has moved.
     */
    public void copyResource(String sourceUri, String targetUri) {
        addResource(targetUri, null);
    }

    /**
     * @see org.apache.avalon.framework.activity.Startable#start()
     */
    public void start() throws Exception {
        boolean createFreshIndex = true;

        //create all index directories and determine the most recent one if
        //one exists...
        this.indexPosition = 0;
        long latestModified = 0;
        this.indexDirectories = new File[INDEX_COUNT];
        for (int i = 0; i < INDEX_COUNT; i++) {
            this.indexDirectories[i] = new File(this.indexPath, "index" + i);
            if (this.indexDirectories[i].isDirectory()) {
                long lastModified = this.indexDirectories[i].lastModified();
                if (lastModified > latestModified) {
                    //this is currently the most recent index.
                    this.indexPosition = i;
                    latestModified = lastModified;
                }
                createFreshIndex = false;
            } else {
                this.indexDirectories[i].mkdirs();
            }
        }

        //TODO: this is only for testing!!!
        createFreshIndex = true;

        int startIndexPosition = this.indexPosition;
        boolean recoverWriteIndex = !createFreshIndex;
        while (recoverWriteIndex) {
            try {
                this.indexWriter =
                    new IndexWriter(this.indexDirectories[this.indexPosition], 
ANALYZER, false);
                this.indexWriter.maxFieldLength = 1000000;
                this.indexWriter.optimize();
                this.indexReader = 
IndexReader.open(this.indexDirectories[this.indexPosition]);
                recoverWriteIndex = false;
            } catch (Throwable t) {
                //the index could not be recovered...
                getLogger().warn(
                    "The index ("
                        + this.indexDirectories[this.indexPosition].getName()
                        + ") is broken!");
                //now we cycle backwards, because we want to have the most recent 
                //index that is valid.
                this.indexPosition--;
                if (this.indexPosition < 0) {
                    this.indexPosition = INDEX_COUNT - 1;
                }
                if (startIndexPosition == this.indexPosition) {
                    //oh,oh all indexes are broken
                    getLogger().fatalError(
                        "All indexes are borken, search engine is in big trouble!");
                    break;
                }
            }
        }
        if (!createFreshIndex) {
            if (recoverWriteIndex) {
                //all indexes are broken, we actually have to rebuild the index 
                //from scratch!
                //There is only one problem - this is not implemented
                createFreshIndex = true;
                //TODO implement fallback index rebuild                

                //} else {
                //okay this is the regular case: the write index is recovered, 
                //now we have to build a read index...

            }
        }
        if (createFreshIndex) {
            //maybe this is the first start of the DMS, we create a fresh index.
            try {
                this.indexWriter =
                    new IndexWriter(this.indexDirectories[this.indexPosition], 
ANALYZER, true);
                this.indexWriter.maxFieldLength = 1000000;
                this.indexReader = 
IndexReader.open(this.indexDirectories[this.indexPosition]);
            } catch (Throwable t) {
                getLogger().fatalError("No (fresh) search index could be created!", t);
                throw new DmsServiceException("Search engine could not startup!", t);
            }
        }
        //now we have a working write index, next we have to create a copy to
        //use as read index...
        int readIndexPosition = this.indexPosition - 1;
        if (readIndexPosition < 0) {
            readIndexPosition = INDEX_COUNT - 1;
        }
        FileUtil.copy(
            this.indexDirectories[this.indexPosition],
            this.indexDirectories[readIndexPosition],
            true);
        //since we just copied the index, we assume that everyting goes right here...
        this.searcher =
            new 
IndexSearcher(IndexReader.open(this.indexDirectories[readIndexPosition]));

        //now lets go...
        this.eventManager.addChangeListener(this);
        this.backgroundThread = new Thread(this, "Index Switcher");
        this.backgroundThread.start();
    }

    /**
     * @see org.apache.avalon.framework.activity.Startable#stop()
     */
    public void stop() throws Exception {
        this.eventManager.removeChangeListener(this);
        this.done = true;
        this.backgroundThread.interrupt();
        this.backgroundThread = null;
        synchronized (this.writeLock) {
            this.indexReader.close();
            this.indexWriter.optimize();
            this.indexWriter.close();
        }
        synchronized (this.readLock) {
            this.searcher.close();
        }
    }

    /**
     * @see 
org.apache.avalon.framework.configuration.Configurable#configure(org.apache.avalon.framework.configuration.Configuration)
     */
    public void configure(Configuration configuration) throws ConfigurationException {
        Configuration settings = configuration.getChild(CONFIGURATION_TAG_INDEXER);
        this.indexPath = settings.getAttribute(CONFIGURATION_ATTRIBUTE_INDEX_PATH);
    }

    public Enumeration query(DmsSession session, String query) throws 
IllegalQueryException {
        try {
            query = query.toLowerCase();
            Query parsedQuery = QueryParser.parse(query, FIELD_TEXT, ANALYZER);

            Hits hitList = null;
            synchronized (this.readLock) {
                hitList = this.searcher.search(parsedQuery);
            }
            //maybe the rest has to be synchronized, too...

            int len = Math.min(MAX_SEARCH_HITS, hitList.length());
            if (len == 0) {
                return EmptyEnumeration.getInstance();
            }
            this.searchDescriptorManager.setSession(session);
            Vector result = new Vector(len);

            //necessary to expand search terms:
            //parsedQuery = parsedQuery.rewrite(getIndexReader());
            QueryHighlightExtractor highlighter =
                new QueryHighlightExtractor(parsedQuery, ANALYZER, "<b>", "</b>");
            for (int i = 0; i < len; i++) {
                String uri = "unknown";
                try {
                    Document doc = hitList.doc(i);
                    uri = doc.get(FIELD_URI);
                    if (uri == null) {
                        getLogger().debug("found document with uri=null!");
                    } else {
                        String title = doc.get(FIELD_TITLE);
                        if (this.searchDescriptorManager.checkPermission(uri, 
READ_ACTION_URI)) {
                            String text = doc.get(FIELD_TEXT);
                            String highlightedText = "";
                                //highlighter.getBestFragments(text, 80, 3, "...") + 
"...";
                            ISearchResult hit =
                                new SearchResult(uri, hitList.score(i), title, 
highlightedText);
                            result.add(hit);
                        }
                    }
                } catch (ResourceNotExistsException e1) {
                    getLogger().info("Search found illegal uri (" + uri + ")!");
                } catch (DmsServiceException e) {
                    getLogger().warn("Problems with uri (" + uri + ")!", e);
                } catch (IOException e) {
                    getLogger().warn("Problems with index", e);
                }
            }
            return result.elements();
        } catch (ParseException e) {
            throw new IllegalQueryException("Illegal query (" + query + ")!", e);
        } catch (IOException e) {
            throw new DmsServiceException("Search index error!", e);
        }
    }

    public void dataChanged(DataEvent event) {
        if (event.getDataType() == IResource.DATA_TYPE) {
            if (event.isAddEvent()) {
                addResource(event.getSourceUri(), null);
            } else if (event.isRenderEvent()) {
                String mimetype = (String) 
event.getParameter(DataEvent.KEY_RENDER_MIMETYPE);
                if (DmsUtil.MIMETYPE_HTML.equals(mimetype)) {
                    String filepath = (String) 
event.getParameter(DataEvent.KEY_RENDER_FILEPATH);
                    addResource(event.getSourceUri(), filepath);
                }
            } else if (event.isModifyEvent() || event.isMetadataModifyEvent()) {
                updateResource(event.getSourceUri());
            } else if (event.isRemoveEvent()) {
                removeResource(event.getSourceUri());
            } else if (event.isRenameEvent()) {
                moveResource(event.getSourceUri(), event.getTargetUri());
            } else if (event.isCopyEvent()) {
                copyResource(event.getSourceUri(), event.getTargetUri());
            }
        }
    }

    /**
     * @see 
org.apache.avalon.framework.service.Serviceable#service(org.apache.avalon.framework.service.ServiceManager)
     */
    public void service(ServiceManager manager) throws ServiceException {
        super.service(manager);
        //the descriptor manager must only be used in the query method,
        //because its session is concurrently modified.
        this.searchDescriptorManager =
            (IDescriptorManager) getServiceManager().lookup(IDescriptorManager.ROLE);
        //for the indexing this descriptor manager is used that holds a fixed
        //session with an index-specific user that has appropriate rights.
        this.indexDescriptorManager =
            (IDescriptorManager) getServiceManager().lookup(IDescriptorManager.ROLE);
        
this.indexDescriptorManager.setSession(DmsSession.createSession(SEARCH_INDEX_USER));

        //register ourself as event listener...
        this.eventManager = (IDataEventManager) 
getServiceManager().lookup(IDataEventManager.ROLE);
    }

    /**
     * @see org.apache.avalon.framework.activity.Disposable#dispose()
     */
    public void dispose() {
        //this.eventManager.removeChangeListener(this);
        getServiceManager().release(this.eventManager);
        getServiceManager().release(this.searchDescriptorManager);
        this.searchDescriptorManager = null;
        getServiceManager().release(this.indexDescriptorManager);
        this.searchDescriptorManager = null;
    }

    /**
     * @see java.lang.Runnable#run()
     */
    public void run() {
        while (!this.done) {
            DmsUtil.doSleep(INDEX_SWITCH_DELAY);
            try {
                int oldIndexPosition = this.indexPosition;
                synchronized (this.writeLock) {
                    this.indexWriter.optimize();
                    this.indexWriter.close();
                    this.indexReader.close();
                    this.indexPosition++;
                    if (this.indexPosition >= INDEX_COUNT) {
                        this.indexPosition = 0;
                    }
                    FileUtil.delete(this.indexDirectories[this.indexPosition]);
                    FileUtil.copy(
                        this.indexDirectories[oldIndexPosition],
                        this.indexDirectories[this.indexPosition],
                        true);
                    this.indexWriter =
                        new IndexWriter(
                            this.indexDirectories[this.indexPosition],
                            ANALYZER,
                            false);
                    this.indexReader = 
IndexReader.open(this.indexDirectories[this.indexPosition]);
                }
                synchronized (this.readLock) {
                    this.searcher.close();
                    this.searcher =
                        new IndexSearcher(
                            IndexReader.open(this.indexDirectories[oldIndexPosition]));
                }
            } catch (IOException e) {
                getLogger().fatalError("Index Switching failed - search engine in 
trouble!", e);
            }
        }
    }

}

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to