Re: Is housekeeping of Lucene indexes block index update but allow search ?
Kumaran, Below is the code snippet for concurrent writes (i.e. concurrent updates/deletes etc.) alongwith Search operation using the NRT Manger APIs. Let me know if you need any other details or have any suggesstion for me :- public class LuceneEngineInstance implements IndexEngineInstance { private final String indexName; private final String indexBaseDir; private IndexWriter indexWriter; private Directory luceneDirectory; private TrackingIndexWriter trackingIndexWriter; private ReferenceManagerIndexSearcher indexSearcherReferenceManager; // Note : Note that this will only scale well if most searches do not need to wait for a specific index generation. private final ControlledRealTimeReopenThreadIndexSearcher indexSearcherReopenThread; private long reopenToken; // index update/delete methods returned token private static final Log log = LogFactory.getLog(LuceneEngineInstance.class); private static final String VERBOSE = NO; // read from property file // CONSTRUCTOR FINALIZE /** * Constructor based on an instance of the type responsible of the lucene index persistence * @param indexName */ public LuceneEngineInstance(Directory luceneDirectory, final IndexWriterConfig writerConfig, final String indexName, final String indexBaseDir) { this.indexName = indexName; this.indexBaseDir = indexBaseDir; this.luceneDirectory = luceneDirectory; try { // [1]: Create the indexWriter if (YES.equalsIgnoreCase(VERBOSE)) { writerConfig.setInfoStream(System.out); } indexWriter = new IndexWriter(luceneDirectory, writerConfig); // [2a]: Create the TrackingIndexWriter to track changes to the delegated previously created IndexWriter trackingIndexWriter = new TrackingIndexWriter(indexWriter); // [2b]: Create an IndexSearcher ReferenceManager to safelly share IndexSearcher instances across multiple threads // Note : applyAllDeletes=true : means each reopened reader is required to apply all previous deletion operations // (deleteDocuments or updateDocument/s) up until that point. indexSearcherReferenceManager = new SearcherManager(indexWriter, true, null); // [3]: Create the ControlledRealTimeReopenThread that reopens the index periodically having into // account the changes made to the index and tracked by the TrackingIndexWriter instance // The index is refreshed every 60sc when nobody is waiting // and every 100 millis whenever is someone waiting (see search method) indexSearcherReopenThread = new ControlledRealTimeReopenThreadIndexSearcher(trackingIndexWriter, indexSearcherReferenceManager, 60.00, // when there is nobody waiting 0.1); // when there is someone waiting indexSearcherReopenThread.start(); // start the refresher thread } catch (IOException ioEx) { throw new IllegalStateException(Lucene index could not be created for : + indexName, ioEx); } } // INDEX @Override public void indexDocWithoutCommit(final Document doc) { Monitor addDocumentMonitor = MonitorFactory.start(SearchIndex.addDocument); try { reopenToken = trackingIndexWriter.addDocument(doc); // if (log.istraceEnabled()) // { log.trace(document added in the Index, doc : + doc); // } } catch (IOException ioEx) { log.error(Error while adding the doc: + doc, ioEx); } finally { addDocumentMonitor.stop(); } } @Override public void commitDocuments() { Monitor indexCommitMonitor = MonitorFactory.start(SearchIndex.commit); try { indexWriter.commit(); } catch (IOException ioEx) { // throw new IndexerException( Error while commiting changes to Lucene inddex for : + _indexName, ioEx); try { log.warn(Trying rollback. Error while commiting changes to Lucene inddex for : + indexName, ioEx); indexWriter.rollback(); // TODO: handle roll back records } catch (IOException ioe) { log.error(Error while roll back process., ioe); } } finally { indexCommitMonitor.stop(); } } @Override public void reIndexDocWithoutCommit(final Term recordIdTerm, final Document doc) { Monitor updateMonitor = MonitorFactory.start(SearchIndex.updateDoc); try { reopenToken = trackingIndexWriter.updateDocument(recordIdTerm, doc); log.trace(document re-indexed in lucene : + recordIdTerm.text()); } catch (IOException ioEx) { log.error(Error while updating the doc: + doc, ioEx); } finally { updateMonitor.stop(); } } /* (non-Javadoc) * @see com.pb.spectrum.component.index.lucene.IndexInstanceEngine#unIndex(org.apache.lucene.index.Term) */ @Override public void unIndexDocWithoutCommit(final Term idTerm) throws IndexerException { try
escaping characters
Hi everyone, I'm trying to escape special characters and it doesn't seem to be working. If I do a search like resume_text: (LS\/MS) it searches for LS AND MS instead of LS/MS. How would I escape the slash so it searches for LS/MS? Thanks
Re: escaping characters
Take a look at the adnim/analysis page for the field in question. The next bit of critical information is adding debug=query to the URL. The former will tell you what happens to the input stream at query and index time, the latter will tell you how the query got through the query parsing process. My guess is that you have WordDelimiterFilterFactory in your analysis chain and that's breaking things up. Best, Erick On Mon, Aug 11, 2014 at 8:54 AM, Chris Salem csa...@mainsequence.net wrote: Hi everyone, I'm trying to escape special characters and it doesn't seem to be working. If I do a search like resume_text: (LS\/MS) it searches for LS AND MS instead of LS/MS. How would I escape the slash so it searches for LS/MS? Thanks
RE: escaping characters
I'm not using Solr. Here's my code: FSDirectory fsd = FSDirectory.open(new File(C:\\indexes\\Lucene4)); IndexReader reader = DirectoryReader.open(fsd); IndexSearcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_4_9, getStopWords()); BooleanQuery.setMaxClauseCount(10); QueryParser qptemp = new QueryParser(Version.LUCENE_4_9, resume_text,analyzer); qptemp.setAllowLeadingWildcard(true); qptemp.setDefaultOperator(QueryParser.AND_OPERATOR); Query querytemp = qptemp.parse(resume_text: (LS\\/MS)); System.out.println(querytemp.toString()); TopFieldCollector tfcollector = TopFieldCollector.create(new Sort(), 20, false, true, false, true); ScoreDoc[] hits; searcher.search(querytemp, tfcollector); hits = tfcollector.topDocs().scoreDocs; long resultCount = tfcollector.getTotalHits(); reader.close(); -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 11, 2014 12:27 PM To: java-user Subject: Re: escaping characters Take a look at the adnim/analysis page for the field in question. The next bit of critical information is adding debug=query to the URL. The former will tell you what happens to the input stream at query and index time, the latter will tell you how the query got through the query parsing process. My guess is that you have WordDelimiterFilterFactory in your analysis chain and that's breaking things up. Best, Erick On Mon, Aug 11, 2014 at 8:54 AM, Chris Salem csa...@mainsequence.net wrote: Hi everyone, I'm trying to escape special characters and it doesn't seem to be working. If I do a search like resume_text: (LS\/MS) it searches for LS AND MS instead of LS/MS. How would I escape the slash so it searches for LS/MS? Thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Can't get case insensitive keyword analyzer to work
It does look like the lowercase is working. The following code Document theDoc = theIndexReader.document(0); System.out.println(theDoc.get(sn)); IndexableField theField = theDoc.getField(sn); TokenStream theTokenStream = theField.tokenStream(theAnalyzer); System.out.println(theTokenStream); produces the following output SN345-B21 LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62 32 31],startOffset=0,endOffset=9 But the search does not work. Anything obvious popping out for anyone? On Sat, Aug 9, 2014 at 4:39 PM, Milind mili...@gmail.com wrote: I looked at a couple of examples on how to get keyword analyzer to be case insensitive but I think I missed something since it's not working for me. In the code below, I'm indexing text in upper case and searching in lower case. But I get back no hits. Do I need to something more while indexing? private static class LowerCaseKeywordAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents(String theFieldName, Reader theReader) { KeywordTokenizer theTokenizer = new KeywordTokenizer(theReader); TokenStreamComponents theTokenStreamComponents = new TokenStreamComponents( theTokenizer, new LowerCaseFilter(Version.LUCENE_46, theTokenizer)); return theTokenStreamComponents; } } private static void addDocment(IndexWriter theWriter, String theFieldName, String theValue, boolean storeTokenized) throws Exception { Document theDocument = new Document(); FieldType theFieldType = new FieldType(); theFieldType.setStored(true); theFieldType.setIndexed(true); theFieldType.setTokenized(storeTokenized); theDocument.add(new Field(theFieldName, theValue, theFieldType)); theWriter.addDocument(theDocument); } static void testLowerCaseKeywordAnalyzer() throws Exception { Version theVersion = Version.LUCENE_46; Directory theIndex = new RAMDirectory(); Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer(); IndexWriterConfig theConfig = new IndexWriterConfig(theVersion, theAnalyzer); IndexWriter theWriter = new IndexWriter(theIndex, theConfig); addDocment(theWriter, sn, SN345-B21, false); addDocment(theWriter, sn, SN445-B21, false); theWriter.close(); QueryParser theParser = new QueryParser(theVersion, sn, theAnalyzer); Query theQuery = theParser.parse(sn:sn345-b21); IndexReader theIndexReader = DirectoryReader.open(theIndex); IndexSearcher theSearcher = new IndexSearcher(theIndexReader); TopScoreDocCollector theCollector = TopScoreDocCollector.create(10, true); theSearcher.search(theQuery, theCollector); ScoreDoc[] theHits = theCollector.topDocs().scoreDocs; System.out.println(Number of results found: + theHits.length); } -- Regards Milind -- Regards Milind
Problem of calling indexWriterConfig.clone()
I tried to create a clone of indexwriteconfig with indexWriterConfig.clone() for re-creating a new indexwriter, but I then I got this very annoying illegalstateexception: clone this object before it is used. Why does this exception happen, and how can I get around it? Thanks!
Re: escaping characters
You need to manually enable automatic generation of phrase queries - it defaults to disabled, which simply treats the sub-terms as individual terms subject to the default operator. See: http://lucene.apache.org/core/4_9_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) -- Jack Krupansky -Original Message- From: Chris Salem Sent: Monday, August 11, 2014 1:03 PM To: java-user@lucene.apache.org Subject: RE: escaping characters I'm not using Solr. Here's my code: FSDirectory fsd = FSDirectory.open(new File(C:\\indexes\\Lucene4)); IndexReader reader = DirectoryReader.open(fsd); IndexSearcher searcher = new IndexSearcher(reader); Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_4_9, getStopWords()); BooleanQuery.setMaxClauseCount(10); QueryParser qptemp = new QueryParser(Version.LUCENE_4_9, resume_text,analyzer); qptemp.setAllowLeadingWildcard(true); qptemp.setDefaultOperator(QueryParser.AND_OPERATOR); Query querytemp = qptemp.parse(resume_text: (LS\\/MS)); System.out.println(querytemp.toString()); TopFieldCollector tfcollector = TopFieldCollector.create(new Sort(), 20, false, true, false, true); ScoreDoc[] hits; searcher.search(querytemp, tfcollector); hits = tfcollector.topDocs().scoreDocs; long resultCount = tfcollector.getTotalHits(); reader.close(); -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 11, 2014 12:27 PM To: java-user Subject: Re: escaping characters Take a look at the adnim/analysis page for the field in question. The next bit of critical information is adding debug=query to the URL. The former will tell you what happens to the input stream at query and index time, the latter will tell you how the query got through the query parsing process. My guess is that you have WordDelimiterFilterFactory in your analysis chain and that's breaking things up. Best, Erick On Mon, Aug 11, 2014 at 8:54 AM, Chris Salem csa...@mainsequence.net wrote: Hi everyone, I'm trying to escape special characters and it doesn't seem to be working. If I do a search like resume_text: (LS\/MS) it searches for LS AND MS instead of LS/MS. How would I escape the slash so it searches for LS/MS? Thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Can't get case insensitive keyword analyzer to work
I found the problem. But it makes no sense to me. If I set the field type to be tokenized, it works. But if I set it to not be tokenized the search fails. i.e. I have to pass in true to the method. theFieldType.setTokenized(storeTokenized); I want the field to be stored as un-tokenized. But it seems that I don't need to do that. The LowerCaseKeywordAnalyzer works if the field is tokenized, but not if it's un-tokenized! How can that be? On Mon, Aug 11, 2014 at 1:49 PM, Milind mili...@gmail.com wrote: It does look like the lowercase is working. The following code Document theDoc = theIndexReader.document(0); System.out.println(theDoc.get(sn)); IndexableField theField = theDoc.getField(sn); TokenStream theTokenStream = theField.tokenStream(theAnalyzer); System.out.println(theTokenStream); produces the following output SN345-B21 LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62 32 31],startOffset=0,endOffset=9 But the search does not work. Anything obvious popping out for anyone? On Sat, Aug 9, 2014 at 4:39 PM, Milind mili...@gmail.com wrote: I looked at a couple of examples on how to get keyword analyzer to be case insensitive but I think I missed something since it's not working for me. In the code below, I'm indexing text in upper case and searching in lower case. But I get back no hits. Do I need to something more while indexing? private static class LowerCaseKeywordAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents(String theFieldName, Reader theReader) { KeywordTokenizer theTokenizer = new KeywordTokenizer(theReader); TokenStreamComponents theTokenStreamComponents = new TokenStreamComponents( theTokenizer, new LowerCaseFilter(Version.LUCENE_46, theTokenizer)); return theTokenStreamComponents; } } private static void addDocment(IndexWriter theWriter, String theFieldName, String theValue, boolean storeTokenized) throws Exception { Document theDocument = new Document(); FieldType theFieldType = new FieldType(); theFieldType.setStored(true); theFieldType.setIndexed(true); theFieldType.setTokenized(storeTokenized); theDocument.add(new Field(theFieldName, theValue, theFieldType)); theWriter.addDocument(theDocument); } static void testLowerCaseKeywordAnalyzer() throws Exception { Version theVersion = Version.LUCENE_46; Directory theIndex = new RAMDirectory(); Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer(); IndexWriterConfig theConfig = new IndexWriterConfig(theVersion, theAnalyzer); IndexWriter theWriter = new IndexWriter(theIndex, theConfig); addDocment(theWriter, sn, SN345-B21, false); addDocment(theWriter, sn, SN445-B21, false); theWriter.close(); QueryParser theParser = new QueryParser(theVersion, sn, theAnalyzer); Query theQuery = theParser.parse(sn:sn345-b21); IndexReader theIndexReader = DirectoryReader.open(theIndex); IndexSearcher theSearcher = new IndexSearcher(theIndexReader); TopScoreDocCollector theCollector = TopScoreDocCollector.create(10, true); theSearcher.search(theQuery, theCollector); ScoreDoc[] theHits = theCollector.topDocs().scoreDocs; System.out.println(Number of results found: + theHits.length); } -- Regards Milind -- Regards Milind -- Regards Milind
Re: Problem of calling indexWriterConfig.clone()
Looks like you have to clone it prior to using with any IndexWriter instances. On Mon, Aug 11, 2014 at 2:49 PM, Sheng sheng...@gmail.com wrote: I tried to create a clone of indexwriteconfig with indexWriterConfig.clone() for re-creating a new indexwriter, but I then I got this very annoying illegalstateexception: clone this object before it is used. Why does this exception happen, and how can I get around it? Thanks!
Re: Problem of calling indexWriterConfig.clone()
So the indexWriterConfig.clone() failed at this step: clone.indexerThreadPool = indexerThreadPool http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/LiveIndexWriterConfig.java#LiveIndexWriterConfig.0indexerThreadPool .clone http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/DocumentsWriterPerThreadPool.java#DocumentsWriterPerThreadPool.clone%28%29 (); which then failed at this step in the indexerThreadPool if (numThreadStatesActive http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/DocumentsWriterPerThreadPool.java#DocumentsWriterPerThreadPool.0numThreadStatesActive != 0) { throw new IllegalStateException http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/IllegalStateException.java#IllegalStateException(clone this object before it is used!); } There is a comment right above this: // We should only be cloned before being used: Does this mean whenever the indexWriter gets called for commit/prepareCommit, etc., the corresponding indexWriterConfig object cannot be called with .clone() at all? On Mon, Aug 11, 2014 at 9:52 PM, Vitaly Funstein vfunst...@gmail.com wrote: Looks like you have to clone it prior to using with any IndexWriter instances. On Mon, Aug 11, 2014 at 2:49 PM, Sheng sheng...@gmail.com wrote: I tried to create a clone of indexwriteconfig with indexWriterConfig.clone() for re-creating a new indexwriter, but I then I got this very annoying illegalstateexception: clone this object before it is used. Why does this exception happen, and how can I get around it? Thanks!
Re: Problem of calling indexWriterConfig.clone()
I only have the source to 4.6.1, but if you look at the constructor of IndexWriter there, it looks like this: public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException { conf.setIndexWriter(this); // prevent reuse by other instances The setter throws an exception if the configuration object has already been used with another instance of IndexWriter. Therefore, it should be cloned before being used in the constructor of IndexWriter. On Mon, Aug 11, 2014 at 7:12 PM, Sheng sheng...@gmail.com wrote: So the indexWriterConfig.clone() failed at this step: clone.indexerThreadPool = indexerThreadPool http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/LiveIndexWriterConfig.java#LiveIndexWriterConfig.0indexerThreadPool .clone http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/DocumentsWriterPerThreadPool.java#DocumentsWriterPerThreadPool.clone%28%29 (); which then failed at this step in the indexerThreadPool if (numThreadStatesActive http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/DocumentsWriterPerThreadPool.java#DocumentsWriterPerThreadPool.0numThreadStatesActive != 0) { throw new IllegalStateException http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/IllegalStateException.java#IllegalStateException (clone this object before it is used!); } There is a comment right above this: // We should only be cloned before being used: Does this mean whenever the indexWriter gets called for commit/prepareCommit, etc., the corresponding indexWriterConfig object cannot be called with .clone() at all? On Mon, Aug 11, 2014 at 9:52 PM, Vitaly Funstein vfunst...@gmail.com wrote: Looks like you have to clone it prior to using with any IndexWriter instances. On Mon, Aug 11, 2014 at 2:49 PM, Sheng sheng...@gmail.com wrote: I tried to create a clone of indexwriteconfig with indexWriterConfig.clone() for re-creating a new indexwriter, but I then I got this very annoying illegalstateexception: clone this object before it is used. Why does this exception happen, and how can I get around it? Thanks!
Re: Problem of calling indexWriterConfig.clone()
From src code of DocumentsWriterPerThreadPool, the variable numThreadStatesActive seems to be always increasing, which explains why asserting on numThreadStatesActive == 0 before cloning this object fails. So what should be the most appropriate way of re-opening an indexwriter if what you have are the index directory plus the indexWriterConfig that the closed indexWriter has been using? BTW - I am reasonably sure calling indexWriterConfig.clone() in the middle of indexing documents used to work for my code(same Lucene 4.7). It is since recently I had to do faceted indexing as well that this problem started to emerge. Is it related? On Mon, Aug 11, 2014 at 11:31 PM, Vitaly Funstein vfunst...@gmail.com wrote: I only have the source to 4.6.1, but if you look at the constructor of IndexWriter there, it looks like this: public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException { conf.setIndexWriter(this); // prevent reuse by other instances The setter throws an exception if the configuration object has already been used with another instance of IndexWriter. Therefore, it should be cloned before being used in the constructor of IndexWriter. On Mon, Aug 11, 2014 at 7:12 PM, Sheng sheng...@gmail.com wrote: So the indexWriterConfig.clone() failed at this step: clone.indexerThreadPool = indexerThreadPool http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/LiveIndexWriterConfig.java#LiveIndexWriterConfig.0indexerThreadPool .clone http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/DocumentsWriterPerThreadPool.java#DocumentsWriterPerThreadPool.clone%28%29 (); which then failed at this step in the indexerThreadPool if (numThreadStatesActive http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/4.7.0/org/apache/lucene/index/DocumentsWriterPerThreadPool.java#DocumentsWriterPerThreadPool.0numThreadStatesActive != 0) { throw new IllegalStateException http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/IllegalStateException.java#IllegalStateException (clone this object before it is used!); } There is a comment right above this: // We should only be cloned before being used: Does this mean whenever the indexWriter gets called for commit/prepareCommit, etc., the corresponding indexWriterConfig object cannot be called with .clone() at all? On Mon, Aug 11, 2014 at 9:52 PM, Vitaly Funstein vfunst...@gmail.com wrote: Looks like you have to clone it prior to using with any IndexWriter instances. On Mon, Aug 11, 2014 at 2:49 PM, Sheng sheng...@gmail.com wrote: I tried to create a clone of indexwriteconfig with indexWriterConfig.clone() for re-creating a new indexwriter, but I then I got this very annoying illegalstateexception: clone this object before it is used. Why does this exception happen, and how can I get around it? Thanks!