Hi there, thanks for your resonse guys!
For the answers I got the info that I must not have an IndexWriter and an IndexReader open at the same time that both want to modify the index - even sequentially. What I have is the following: 1 Thread is working out events such as resource (file or folder) was added/removed/deleted/etc. All index modifications are synchronized against a write-lock object. 1 Thread does "index switching" what means that he synchronizes on the write lock and then closes modifying index-reader and index-writer. Next it copies that index completely and reopens the index-reader and -writer on the copied index. Then he syncs on the read lock and closes the index searcher and reopens it on the index that was previously copied. N Threads that perform search requestes but sync against the read-lock. Since I can garantee that there is only one thread working out the change events sequentially, the index-writer and index-reader will never do any concurrent modifications. This time I will attatch my source as text in this mail to get sure. For those who do not know avalon/exalibur: It is a framework that will be the only one calling the configure/start/stop methods. No one can access the instance until it is properly created, configured and started so synchronization is not neccessary in the start method. Thanks again Jörg ---- /** * This is the implementation of the ISearchManager using lucene as underlying * search engine.<br/> * Everything would be so simple if lucene was thread-safe for concurrently * modifying and searching on the same index, but it is not. <br/> * My first idea was to have a single index that is continiusly modified and a * background thread that continuosly closes and reopens the index searcher. * This should bring most recent search results but it did not work proberly * with lucene.<br/> * My strategy now is to have multiple indexes and to cycle over all of them * in a backround thread copying the most recent one to the next (least recent) * one. Index modifications are always performed on the most recent index, * while searching is always performed on the second recent (copy of the) index. * This stategy results in less acutal (but still very acceptable) actuality * of search results. Further it produces a lot more disk space overhead but * with the advantage of having backups of the index.<br/> * Because the search must filter the search results the user does not have * read access on, it can also filter the results that do not exist anymore * without further costs. * * @author Joerg Hohwiller (jhohwill) */ public class SearchManager extends AbstractManager implements ISearchManager, IDataEventListener, Startable, Serviceable, Disposable, Configurable, Runnable, ThreadSafe { /** * A background thread is switching/updating the index used for indexing * and/or searching. The thread sleeps an amount of this constant in * milliseconds until the next switch is done.<br/> * The shorter the delay, the more actual the search results but also the * more preformance overhead is produced.<br/> * Be aware that the delay does not determine the index switching frequency * because after a sleep of the delay, the index is copied and the switched. * This required time for this operation does depend on the size of the * index. This also means that the bigger the index, the less acutal are * the search results.<br/> * A value of 60 seconds (60 * 1000L) should be OK. */ private static final long INDEX_SWITCH_DELAY = 30 * 1000L; /** the URI field name */ public static final String FIELD_URI = "uri"; /** the title field name */ public static final String FIELD_TITLE = "dc_title"; /** the text field name */ public static final String FIELD_TEXT = "text"; /** the read action */ private static final String READ_ACTION_URI = "/actions/read"; /** the name of the configuration tag for the index settings */ private static final String CONFIGURATION_TAG_INDEXER = "indexer"; /** the name of the configuration attribute for the index path */ private static final String CONFIGURATION_ATTRIBUTE_INDEX_PATH = "index-path"; /** the user used to access resources for indexing (global read access) */ private static final String SEARCH_INDEX_USER = "indexer"; /** the maximum number of search hits */ private static final int MAX_SEARCH_HITS = 100; /** the default analyzer used for the search index */ private static final Analyzer ANALYZER = new StandardAnalyzer(); /** * the number of indexes used, must be at least 3: * <ul> * <li>one for writing/updating</li> * <li>one for read/search</li> * <li>one temporary where the index is copied to</li> * </ul> * All further indexes will act as extra backups of the index but will * also waste more disk space. */ private static final int INDEX_COUNT = 3; /** the descriptor manager used for searching, restricted to query method!!! */ private IDescriptorManager searchDescriptorManager; /** the descriptor manager used for indexing */ private IDescriptorManager indexDescriptorManager; /** the read lock object used to synchronize read access (search queries) */ private Object readLock; /** the write lock object used to synchronize write access (index modification) */ private Object writeLock; /** the event manager */ private IDataEventManager eventManager; /** the writer used to add documents to the index */ private IndexWriter indexWriter; /** the reader used to remove documents from the index */ private IndexReader indexReader; /** the index searcher */ private Searcher searcher; /** the path where the indexes is stored */ private String indexPath; /** the index directories */ private File[] indexDirectories; /** the array-index in the indexDir for the currently used write index */ private int indexPosition; /** this flag is set to <code>true</code> if the background thread should stop */ private boolean done; /** the background thread that updates the indexes for search */ private Thread backgroundThread; /** the factory to gets search parsers by mimetype */ private SearchParserFactory parserFactory; /** * The constructor. */ public SearchManager() { super(); this.indexWriter = null; this.indexReader = null; this.indexPath = null; this.done = false; this.backgroundThread = null; this.parserFactory = new SearchParserFactory(); this.readLock = new Object(); this.writeLock = new Object(); } /** * This method adds a resource to the search index. * * @param resourceUri is the URI of the resource to be added to the index. * @param htmlPath is the path to the rendered html version of the * resource or <code>null</code> if the document has not (maybe only * yet) been rendered to html. */ public void addResource(String resourceUri, String htmlPath) { try { //create a lucene document... Document document = new Document(); //add the URI as field to that document (used as ID of the document) document.add(Field.Keyword(FIELD_URI, resourceUri)); //set the default title String title = FileUtil.basename(resourceUri); //determine the mimetype by extension String mimetype = null; if (htmlPath == null) { mimetype = DmsUtil.getMimetype(resourceUri); } else { mimetype = DmsUtil.MIMETYPE_HTML; } ISearchParser parser = this.parserFactory.getParser(mimetype); if (parser != null) { InputStream contentStream = null; try { if (htmlPath == null) { contentStream = this.indexDescriptorManager.getResourceContent(resourceUri); } else { contentStream = new FileInputStream(htmlPath); } String newTitle = parser.parse(document, contentStream); if (newTitle != null) { title = newTitle; } } catch (SearchParserException e1) { getLogger().debug(e1.getMessage()); } catch (Throwable t) { getLogger().debug(t.getMessage()); } finally { if (contentStream != null) { contentStream.close(); } } } Enumeration metaDataList = this.indexDescriptorManager.getMetaData(resourceUri); while (metaDataList.hasMoreElements()) { IMetaData metaData = (IMetaData) metaDataList.nextElement(); Object value = metaData.getValue(); String text = null; String key = (metaData.getNamespace() + "_" + metaData.getName()).toLowerCase(); //do not add empty meta fields... if (value != null) { text = String.valueOf(value); if (text.length() == 0) { text = null; } } if (text != null) { if (FIELD_TITLE.equals(key)) { title = text; } else { document.add(Field.Text(key, text)); } } } document.add(Field.Text(FIELD_TITLE, title)); synchronized (this.writeLock) { this.indexWriter.addDocument(document); } } catch (IOException e) { throw new DmsServiceException("Search index error!", e); } catch (ResourceNotExistsException e) { throw new DmsServiceException("Search index error!", e); } catch (AccessException e) { throw new DmsServiceException("Search index error!", e); } } /** * This method removes a resource from the search index. * * @param resourceUri is the URI of the resource to delete from the index. */ public void removeResource(String resourceUri) { try { synchronized (this.writeLock) { this.indexReader.delete(new Term(FIELD_URI, resourceUri)); } } catch (IOException e) { throw new DmsServiceException("Search index error!", e); } } /** * This method updates (reindexes) a resource in the search index. * * @param resourceUri is the resource to update in the index. */ public void updateResource(String resourceUri) { //keep it simple so far... removeResource(resourceUri); addResource(resourceUri, null); } /** * This method moves/renames a resource in the index. * * @param sourceUri is the old URI of the resource that has moved. * @param targetUri is the new URI of the resource that has moved. */ public void moveResource(String sourceUri, String targetUri) { removeResource(sourceUri); addResource(targetUri, null); } /** * This method moves/renames a resource in the index. * * @param sourceUri is the old URI of the resource that has moved. * @param targetUri is the new URI of the resource that has moved. */ public void copyResource(String sourceUri, String targetUri) { addResource(targetUri, null); } /** * @see org.apache.avalon.framework.activity.Startable#start() */ public void start() throws Exception { boolean createFreshIndex = true; //create all index directories and determine the most recent one if //one exists... this.indexPosition = 0; long latestModified = 0; this.indexDirectories = new File[INDEX_COUNT]; for (int i = 0; i < INDEX_COUNT; i++) { this.indexDirectories[i] = new File(this.indexPath, "index" + i); if (this.indexDirectories[i].isDirectory()) { long lastModified = this.indexDirectories[i].lastModified(); if (lastModified > latestModified) { //this is currently the most recent index. this.indexPosition = i; latestModified = lastModified; } createFreshIndex = false; } else { this.indexDirectories[i].mkdirs(); } } //TODO: this is only for testing!!! createFreshIndex = true; int startIndexPosition = this.indexPosition; boolean recoverWriteIndex = !createFreshIndex; while (recoverWriteIndex) { try { this.indexWriter = new IndexWriter(this.indexDirectories[this.indexPosition], ANALYZER, false); this.indexWriter.maxFieldLength = 1000000; this.indexWriter.optimize(); this.indexReader = IndexReader.open(this.indexDirectories[this.indexPosition]); recoverWriteIndex = false; } catch (Throwable t) { //the index could not be recovered... getLogger().warn( "The index (" + this.indexDirectories[this.indexPosition].getName() + ") is broken!"); //now we cycle backwards, because we want to have the most recent //index that is valid. this.indexPosition--; if (this.indexPosition < 0) { this.indexPosition = INDEX_COUNT - 1; } if (startIndexPosition == this.indexPosition) { //oh,oh all indexes are broken getLogger().fatalError( "All indexes are borken, search engine is in big trouble!"); break; } } } if (!createFreshIndex) { if (recoverWriteIndex) { //all indexes are broken, we actually have to rebuild the index //from scratch! //There is only one problem - this is not implemented createFreshIndex = true; //TODO implement fallback index rebuild //} else { //okay this is the regular case: the write index is recovered, //now we have to build a read index... } } if (createFreshIndex) { //maybe this is the first start of the DMS, we create a fresh index. try { this.indexWriter = new IndexWriter(this.indexDirectories[this.indexPosition], ANALYZER, true); this.indexWriter.maxFieldLength = 1000000; this.indexReader = IndexReader.open(this.indexDirectories[this.indexPosition]); } catch (Throwable t) { getLogger().fatalError("No (fresh) search index could be created!", t); throw new DmsServiceException("Search engine could not startup!", t); } } //now we have a working write index, next we have to create a copy to //use as read index... int readIndexPosition = this.indexPosition - 1; if (readIndexPosition < 0) { readIndexPosition = INDEX_COUNT - 1; } FileUtil.copy( this.indexDirectories[this.indexPosition], this.indexDirectories[readIndexPosition], true); //since we just copied the index, we assume that everyting goes right here... this.searcher = new IndexSearcher(IndexReader.open(this.indexDirectories[readIndexPosition])); //now lets go... this.eventManager.addChangeListener(this); this.backgroundThread = new Thread(this, "Index Switcher"); this.backgroundThread.start(); } /** * @see org.apache.avalon.framework.activity.Startable#stop() */ public void stop() throws Exception { this.eventManager.removeChangeListener(this); this.done = true; this.backgroundThread.interrupt(); this.backgroundThread = null; synchronized (this.writeLock) { this.indexReader.close(); this.indexWriter.optimize(); this.indexWriter.close(); } synchronized (this.readLock) { this.searcher.close(); } } /** * @see org.apache.avalon.framework.configuration.Configurable#configure(org.apache.avalon.framework.configuration.Configuration) */ public void configure(Configuration configuration) throws ConfigurationException { Configuration settings = configuration.getChild(CONFIGURATION_TAG_INDEXER); this.indexPath = settings.getAttribute(CONFIGURATION_ATTRIBUTE_INDEX_PATH); } public Enumeration query(DmsSession session, String query) throws IllegalQueryException { try { query = query.toLowerCase(); Query parsedQuery = QueryParser.parse(query, FIELD_TEXT, ANALYZER); Hits hitList = null; synchronized (this.readLock) { hitList = this.searcher.search(parsedQuery); } //maybe the rest has to be synchronized, too... int len = Math.min(MAX_SEARCH_HITS, hitList.length()); if (len == 0) { return EmptyEnumeration.getInstance(); } this.searchDescriptorManager.setSession(session); Vector result = new Vector(len); //necessary to expand search terms: //parsedQuery = parsedQuery.rewrite(getIndexReader()); QueryHighlightExtractor highlighter = new QueryHighlightExtractor(parsedQuery, ANALYZER, "<b>", "</b>"); for (int i = 0; i < len; i++) { String uri = "unknown"; try { Document doc = hitList.doc(i); uri = doc.get(FIELD_URI); if (uri == null) { getLogger().debug("found document with uri=null!"); } else { String title = doc.get(FIELD_TITLE); if (this.searchDescriptorManager.checkPermission(uri, READ_ACTION_URI)) { String text = doc.get(FIELD_TEXT); String highlightedText = ""; //highlighter.getBestFragments(text, 80, 3, "...") + "..."; ISearchResult hit = new SearchResult(uri, hitList.score(i), title, highlightedText); result.add(hit); } } } catch (ResourceNotExistsException e1) { getLogger().info("Search found illegal uri (" + uri + ")!"); } catch (DmsServiceException e) { getLogger().warn("Problems with uri (" + uri + ")!", e); } catch (IOException e) { getLogger().warn("Problems with index", e); } } return result.elements(); } catch (ParseException e) { throw new IllegalQueryException("Illegal query (" + query + ")!", e); } catch (IOException e) { throw new DmsServiceException("Search index error!", e); } } public void dataChanged(DataEvent event) { if (event.getDataType() == IResource.DATA_TYPE) { if (event.isAddEvent()) { addResource(event.getSourceUri(), null); } else if (event.isRenderEvent()) { String mimetype = (String) event.getParameter(DataEvent.KEY_RENDER_MIMETYPE); if (DmsUtil.MIMETYPE_HTML.equals(mimetype)) { String filepath = (String) event.getParameter(DataEvent.KEY_RENDER_FILEPATH); addResource(event.getSourceUri(), filepath); } } else if (event.isModifyEvent() || event.isMetadataModifyEvent()) { updateResource(event.getSourceUri()); } else if (event.isRemoveEvent()) { removeResource(event.getSourceUri()); } else if (event.isRenameEvent()) { moveResource(event.getSourceUri(), event.getTargetUri()); } else if (event.isCopyEvent()) { copyResource(event.getSourceUri(), event.getTargetUri()); } } } /** * @see org.apache.avalon.framework.service.Serviceable#service(org.apache.avalon.framework.service.ServiceManager) */ public void service(ServiceManager manager) throws ServiceException { super.service(manager); //the descriptor manager must only be used in the query method, //because its session is concurrently modified. this.searchDescriptorManager = (IDescriptorManager) getServiceManager().lookup(IDescriptorManager.ROLE); //for the indexing this descriptor manager is used that holds a fixed //session with an index-specific user that has appropriate rights. this.indexDescriptorManager = (IDescriptorManager) getServiceManager().lookup(IDescriptorManager.ROLE); this.indexDescriptorManager.setSession(DmsSession.createSession(SEARCH_INDEX_USER)); //register ourself as event listener... this.eventManager = (IDataEventManager) getServiceManager().lookup(IDataEventManager.ROLE); } /** * @see org.apache.avalon.framework.activity.Disposable#dispose() */ public void dispose() { //this.eventManager.removeChangeListener(this); getServiceManager().release(this.eventManager); getServiceManager().release(this.searchDescriptorManager); this.searchDescriptorManager = null; getServiceManager().release(this.indexDescriptorManager); this.searchDescriptorManager = null; } /** * @see java.lang.Runnable#run() */ public void run() { while (!this.done) { DmsUtil.doSleep(INDEX_SWITCH_DELAY); try { int oldIndexPosition = this.indexPosition; synchronized (this.writeLock) { this.indexWriter.optimize(); this.indexWriter.close(); this.indexReader.close(); this.indexPosition++; if (this.indexPosition >= INDEX_COUNT) { this.indexPosition = 0; } FileUtil.delete(this.indexDirectories[this.indexPosition]); FileUtil.copy( this.indexDirectories[oldIndexPosition], this.indexDirectories[this.indexPosition], true); this.indexWriter = new IndexWriter( this.indexDirectories[this.indexPosition], ANALYZER, false); this.indexReader = IndexReader.open(this.indexDirectories[this.indexPosition]); } synchronized (this.readLock) { this.searcher.close(); this.searcher = new IndexSearcher( IndexReader.open(this.indexDirectories[oldIndexPosition])); } } catch (IOException e) { getLogger().fatalError("Index Switching failed - search engine in trouble!", e); } } } } --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]