Re: KeywordTokenizer isn't reusable
Yes please do! Thanks. Mike TAKAHASHI hideaki wrote: Hi, all I found KeywordAnalyzer/KeywordTokenizer on trunk has a problem. These have a condition(tokenStreams in Analyzer and done in KeywordTokenizer), but these don't reset the condition. So KeywordAnalyzer can't analyze a field more then twice. I already created a patch for this problem. Can I send this patch? Thanks, Hideaki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552434 ] Terry Yang commented on LUCENE-588: --- I wrote my first patch to this issue. if QueryParser knows the query is wildcard, it will directly pass the original query string to WildcardQuery which knows exactly which character is wildcard or not. i copied part of discardEscapeChar method from QueryParser because discardEscapeChar will throw ParseException which will causes WildcardQuery changed much. i am looking for a help/idea about what is the better way to process this exception? Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Yang updated LUCENE-588: -- Attachment: LUCENE-588.patch Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath Attachments: LUCENE-588.patch If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Background Merges
Not good! It's almost certainly a bug with Lucene, I think, because Solr is just a consumer of Lucene's API, which shouldn't ever cause something like this. Apparently, while merging stored fields, SegmentMerger tried to read too far. Is this easily repeatable? Mike Grant Ingersoll wrote: I am running Lucene trunk with Solr and am getting the exception below when I call Solr's optimize. I will see if I can isolate it to a test case, but thought I would throw it out there if anyone sees anything obvious. In this case, I am adding documents sequentially and then at the end call Solr's optimize, which invokes Lucene's optimize. The problem could be in Solr in that it's notion of commit does not play nice with Lucene's new merge policy. However, I am posting here b/c the signs point to an issue in Lucene. Cheers, Grant Exception in thread Thread-20 org.apache.lucene.index.MergePolicy $MergeException: java.io.IOException: read past EOF at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:274) Caused by: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill (BufferedIndexInput.java:146) at org.apache.lucene.store.BufferedIndexInput.readByte (BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt (IndexInput.java:76) at org.apache.lucene.index.FieldsReader.addFieldForMerge (FieldsReader.java:280) at org.apache.lucene.index.FieldsReader.doc (FieldsReader.java:167) at org.apache.lucene.index.SegmentReader.document (SegmentReader.java:659) at org.apache.lucene.index.SegmentMerger.mergeFields (SegmentMerger.java:300) at org.apache.lucene.index.SegmentMerger.merge (SegmentMerger.java:122) at org.apache.lucene.index.IndexWriter.mergeMiddle (IndexWriter.java:3050) at org.apache.lucene.index.IndexWriter.merge (IndexWriter.java:2792) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:240) Dec 17, 2007 1:44:26 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: background merge hit exception: _3:C500 _4:C3 _l:C500 into _m [optimize] at org.apache.lucene.index.IndexWriter.optimize (IndexWriter.java:1744) at org.apache.lucene.index.IndexWriter.optimize (IndexWriter.java:1684) at org.apache.lucene.index.IndexWriter.optimize (IndexWriter.java:1664) at org.apache.solr.update.DirectUpdateHandler2.commit (DirectUpdateHandler2.java:544) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit (RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.RequestHandlerUtils.handleCommit (RequestHandlerUtils.java:102) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody (XmlUpdateRequestHandler.java:113) at org.apache.solr.handler.RequestHandlerBase.handleRequest (RequestHandlerBase.java:121) at org.apache.solr.core.SolrCore.execute(SolrCore.java:875) at org.apache.solr.servlet.SolrDispatchFilter.execute (SolrDispatchFilter.java:283) at org.apache.solr.servlet.SolrDispatchFilter.doFilter (SolrDispatchFilter.java:234) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) ... Caused by: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.refill (BufferedIndexInput.java:146) at org.apache.lucene.store.BufferedIndexInput.readByte (BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt (IndexInput.java:76) at org.apache.lucene.index.FieldsReader.addFieldForMerge (FieldsReader.java:280) at org.apache.lucene.index.FieldsReader.doc (FieldsReader.java:167) at org.apache.lucene.index.SegmentReader.document (SegmentReader.java:659) at org.apache.lucene.index.SegmentMerger.mergeFields (SegmentMerger.java:300) at org.apache.lucene.index.SegmentMerger.merge (SegmentMerger.java:122) at org.apache.lucene.index.IndexWriter.mergeMiddle (IndexWriter.java:3050) at org.apache.lucene.index.IndexWriter.merge (IndexWriter.java:2792) at org.apache.lucene.index.ConcurrentMergeScheduler $MergeThread.run(ConcurrentMergeScheduler.java:240) Dec 17, 2007 1:44:26 PM org.apache.solr.core.SolrCore execute INFO: [null] /update optimize=truewt=xmlwaitFlush=truewaitSearcher=trueversion=2.2 0 1626 Dec 17, 2007 1:44:26 PM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: background merge hit exception: _3:C500 _4:C3 _l:C500 into _m [optimize] at org.apache.lucene.index.IndexWriter.optimize (IndexWriter.java:1744) at org.apache.lucene.index.IndexWriter.optimize (IndexWriter.java:1684) at org.apache.lucene.index.IndexWriter.optimize (IndexWriter.java:1664) at
Re: [Lucene-java Wiki] Update of PoweredBy by PietSchmidt
On Montag, 17. Dezember 2007, Apache Wiki wrote: + * [http://frauen-kennenlernen.com/ Frauen kennenlernen] - Search engine using Lucene I don't claim that this is spam, but more and more of the Wiki PoweredBy links look like someone just wants a link from the Lucene project, probably to boost their Google ranking. We cannot tell whether these people really use Lucene at all, or if they use some blogging software which in turn uses Lucene (in that case it wouldn't make sense to link them from our page either). My suggestion would be that we only accept links if people use Lucene directly (not via a software that has a Lucene-based search anyway) and that they put a link to Lucene on their imprint/contact page or on the search result page. On the other hand, while the page above is harmless, I guess it's not necessarily something Apache Lucene needs to be associated with. Any suggestions? Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED
Big IndexWriter memory leak: when Field.Index.TOKENIZED --- Key: LUCENE-1091 URL: https://issues.apache.org/jira/browse/LUCENE-1091 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.2 Environment: Ubuntu Linux 7.10, 32-bit Java 1.6.0 buld 1.6.0_03-b05 (default in Ubuntu 7.10) 1GB RAM Reporter: Mirza Hadzic This little program eats incrementally 2MB of virtual RAM per each 1000 documents indexed, only when Field.Index.TOKENIZED used : public Document getDoc() { Document document = new Document(); document.add(new Field(foo, foo bar, Field.Store.NO, Field.Index.TOKENIZED)); return document; } public Document run() { IndexWriter writer = new IndexWriter(new File(jIndexFileName), new StandardAnalyzer(), true); for (int i = 0; i 100; i++) { writer.addDocument(getDoc()); } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: O/S Search Comparisons
I did hear back from the authors. Some of the issues were based on values chosen for mergeFactor (10,000) I think, but there also seemed to be some questions about parsing the TREC collection. It was split out into individual files, as opposed to trying to stream in the documents like we do with Wikipedia, so I/O overhead may be an issue. At the time, 1.9.1 did not have much TREC support, so splitting files is probably the easiest way to do it. There indexing code was based off the demo and some LIA reading. They thought they would try Lucene again when 2.3 comes out. From our end, I think we need to improve the docs around mergeFactor. We generally just say bigger is better, but my understanding is there is definitely a limit to this (100?? Maybe 1000) so we should probably suggest that in the docs. And, of course, I think the new contrib/ benchmark has support for reading TREC (although I don't know if it handles streaming it) such that I think it shouldn't be a problem this time around. At any rate, I think we are for the most part doing the right things. Anyone have any thoughts on advice about an upper bound for mergeFactor? Cheers, Grant On Dec 10, 2007, at 2:54 PM, Mike Klaas wrote: On 8-Dec-07, at 10:04 PM, Doron Cohen wrote: +1 I have been thinking about this too. Solr clearly demonstrates the benefits of this kind of approach, although even it doesn't make it seamless for users in the sense that they still need to divvy up the docs on the app side. Would be nice if this layer also took care of searchers/readers refreshing warming. Solr has well-tested code that provides all this functionality and more (except for automatically spawning extra indexing threads, which I agree would be a useful addition). It does heavily depend on 1.5's java.util.concurrent package, though. Many people seem like using Solr as an embedded library layer on top of Lucene to do it all in-process, as well. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: O/S Search Comparisons
For the data that I normally work with (short articles), I found that the sweet spot was around 80-120. I actually saw a slight decrease going above that...not sure if that held forever though. That was testing on an earlier release (I think 2.1?). However, if you want to test searching it would seem that you are going to want to optimize the index. I have always found that whatever I save by changing the merge factor is paid back when you optimize. I have not scientifically tested this, but found it to be the case in every speed test I ran. This is an interesting thing to me for this test. Do you test with a full optimize for indexing? If you don't, can you really test the search performance with the advantage of a full optimize? So, if you are going to optimize, why mess with the merge factor? It may still play a small role, but at best I think its a pretty weak lever. - Mark Grant Ingersoll wrote: I did hear back from the authors. Some of the issues were based on values chosen for mergeFactor (10,000) I think, but there also seemed to be some questions about parsing the TREC collection. It was split out into individual files, as opposed to trying to stream in the documents like we do with Wikipedia, so I/O overhead may be an issue. At the time, 1.9.1 did not have much TREC support, so splitting files is probably the easiest way to do it. There indexing code was based off the demo and some LIA reading. They thought they would try Lucene again when 2.3 comes out. From our end, I think we need to improve the docs around mergeFactor. We generally just say bigger is better, but my understanding is there is definitely a limit to this (100?? Maybe 1000) so we should probably suggest that in the docs. And, of course, I think the new contrib/benchmark has support for reading TREC (although I don't know if it handles streaming it) such that I think it shouldn't be a problem this time around. At any rate, I think we are for the most part doing the right things. Anyone have any thoughts on advice about an upper bound for mergeFactor? Cheers, Grant On Dec 10, 2007, at 2:54 PM, Mike Klaas wrote: On 8-Dec-07, at 10:04 PM, Doron Cohen wrote: +1 I have been thinking about this too. Solr clearly demonstrates the benefits of this kind of approach, although even it doesn't make it seamless for users in the sense that they still need to divvy up the docs on the app side. Would be nice if this layer also took care of searchers/readers refreshing warming. Solr has well-tested code that provides all this functionality and more (except for automatically spawning extra indexing threads, which I agree would be a useful addition). It does heavily depend on 1.5's java.util.concurrent package, though. Many people seem like using Solr as an embedded library layer on top of Lucene to do it all in-process, as well. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: KeywordTokenizer isn't reusable
Hi, Here is the patch for KeywordAnalyzer, KeywordTokenizer, TestKeywordAnalyzer. Thanks, Hideaki, On Dec 17, 2007 6:49 PM, Michael McCandless [EMAIL PROTECTED] wrote: Yes please do! Thanks. Mike TAKAHASHI hideaki wrote: Hi, all I found KeywordAnalyzer/KeywordTokenizer on trunk has a problem. These have a condition(tokenStreams in Analyzer and done in KeywordTokenizer), but these don't reset the condition. So KeywordAnalyzer can't analyze a field more then twice. I already created a patch for this problem. Can I send this patch? Thanks, Hideaki - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- 高橋 秀明 Index: src/test/org/apache/lucene/analysis/TestKeywordAnalyzer.java === --- src/test/org/apache/lucene/analysis/TestKeywordAnalyzer.java (revision 605078) +++ src/test/org/apache/lucene/analysis/TestKeywordAnalyzer.java (working copy) @@ -18,7 +18,10 @@ */ import org.apache.lucene.util.LuceneTestCase; +import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.Term; +import org.apache.lucene.index.TermDocs; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; @@ -61,4 +64,22 @@ +partnum:Q36 +space, query.toString(description)); assertEquals(doc found!, 1, hits.length()); } + + public void testMutipleDocument() throws Exception { +RAMDirectory dir = new RAMDirectory(); +IndexWriter writer = new IndexWriter(dir,new KeywordAnalyzer(), true); +Document doc = new Document(); +doc.add(new Field(partnum, Q36, Field.Store.YES, Field.Index.TOKENIZED)); +writer.addDocument(doc); +doc = new Document(); +doc.add(new Field(partnum, Q37, Field.Store.YES, Field.Index.TOKENIZED)); +writer.addDocument(doc); +writer.close(); + +IndexReader reader = IndexReader.open(dir); +TermDocs td = reader.termDocs(new Term(partnum, Q36)); +assertTrue(td.next()); +td = reader.termDocs(new Term(partnum, Q37)); +assertTrue(td.next()); + } } Index: src/java/org/apache/lucene/analysis/KeywordTokenizer.java === --- src/java/org/apache/lucene/analysis/KeywordTokenizer.java (revision 605078) +++ src/java/org/apache/lucene/analysis/KeywordTokenizer.java (working copy) @@ -55,4 +55,9 @@ } return null; } + + public void reset(Reader input) throws IOException { +super.reset(input); +this.done = false; + } } Index: src/java/org/apache/lucene/analysis/KeywordAnalyzer.java === --- src/java/org/apache/lucene/analysis/KeywordAnalyzer.java(revision 605078) +++ src/java/org/apache/lucene/analysis/KeywordAnalyzer.java(working copy) @@ -17,6 +17,7 @@ * limitations under the License. */ +import java.io.IOException; import java.io.Reader; /** @@ -29,12 +30,13 @@ return new KeywordTokenizer(reader); } public TokenStream reusableTokenStream(String fieldName, - final Reader reader) { + final Reader reader) throws IOException { Tokenizer tokenizer = (Tokenizer) getPreviousTokenStream(); if (tokenizer == null) { tokenizer = new KeywordTokenizer(reader); setPreviousTokenStream(tokenizer); -} +} else + tokenizer.reset(reader); return tokenizer; } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: O/S Search Comparisons
On Dec 18, 2007 2:38 AM, Mark Miller [EMAIL PROTECTED] wrote: For the data that I normally work with (short articles), I found that the sweet spot was around 80-120. I actually saw a slight decrease going above that...not sure if that held forever though. That was testing on an earlier release (I think 2.1?). However, if you want to test searching it would seem that you are going to want to optimize the index. I have always found that whatever I save by changing the merge factor is paid back when you optimize. I have not scientifically tested this, but found it to be the case in every speed test I ran. This is an interesting thing to me for this test. Do you test with a full optimize for indexing? If you don't, can you really test the search performance with the advantage of a full optimize? So, if you are going to optimize, why mess with the merge factor? It may still play a small role, but at best I think its a pretty weak lever. I had similar experience - set merge factor to ~maxint and optimized at the end, and felt like it was the same (never meassured though). In fact, with the new concurrent merges, I think it should be faster to merge on the fly? (One comment - it is important to set back merge factor to a reasonable number before the final optimize, otherwise you hit OutOfMem due to so many segments being merged at once.) - Mark Grant Ingersoll wrote: I did hear back from the authors. Some of the issues were based on values chosen for mergeFactor (10,000) I think, but there also seemed to be some questions about parsing the TREC collection. It was split out into individual files, as opposed to trying to stream in the documents like we do with Wikipedia, so I/O overhead may be an issue. At the time, 1.9.1 did not have much TREC support, so splitting files is probably the easiest way to do it. There indexing code was based off the demo and some LIA reading. They thought they would try Lucene again when 2.3 comes out. From our end, I think we need to improve the docs around mergeFactor. We generally just say bigger is better, but my understanding is there is definitely a limit to this (100?? Maybe 1000) so we should probably suggest that in the docs. And, of course, I think the new contrib/benchmark has support for reading TREC (although I don't know if it handles streaming it) such that I think it shouldn't be a problem this time around. Yes it does streaming - TREC compressed files are read with GZIPInputStream on demand - next doc's text is read/parsed only when the indexer requests it, and the indexable document is created, no doc files are created on disk. At any rate, I think we are for the most part doing the right things. Anyone have any thoughts on advice about an upper bound for mergeFactor? Cheers, Grant On Dec 10, 2007, at 2:54 PM, Mike Klaas wrote: On 8-Dec-07, at 10:04 PM, Doron Cohen wrote: +1 I have been thinking about this too. Solr clearly demonstrates the benefits of this kind of approach, although even it doesn't make it seamless for users in the sense that they still need to divvy up the docs on the app side. Would be nice if this layer also took care of searchers/readers refreshing warming. Solr has well-tested code that provides all this functionality and more (except for automatically spawning extra indexing threads, which I agree would be a useful addition). It does heavily depend on 1.5's java.util.concurrent package, though. Many people seem like using Solr as an embedded library layer on top of Lucene to do it all in-process, as well. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1091: Attachment: TestOOM.java Attached TestOMM, not reproducing the problem on XP, JRE 1.5 Big IndexWriter memory leak: when Field.Index.TOKENIZED --- Key: LUCENE-1091 URL: https://issues.apache.org/jira/browse/LUCENE-1091 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.2 Environment: Ubuntu Linux 7.10, 32-bit Java 1.6.0 buld 1.6.0_03-b05 (default in Ubuntu 7.10) 1GB RAM Reporter: Mirza Hadzic Attachments: TestOOM.java This little program eats incrementally 2MB of virtual RAM per each 1000 documents indexed, only when Field.Index.TOKENIZED used : public Document getDoc() { Document document = new Document(); document.add(new Field(foo, foo bar, Field.Store.NO, Field.Index.TOKENIZED)); return document; } public Document run() { IndexWriter writer = new IndexWriter(new File(jIndexFileName), new StandardAnalyzer(), true); for (int i = 0; i 100; i++) { writer.addDocument(getDoc()); } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552634 ] doronc edited comment on LUCENE-1091 at 12/17/07 10:09 PM: I was not able to recreate this. Can you run the attached TestOOM, and see how much memory is consumed and what used-memory stats gets printed? was (Author: doronc): I was not able to recreate this. Can you run the attached TestOOM (it expects a single indexDir argument on your system, and see how much memory is consumed and what used-memory stats gets printed? Big IndexWriter memory leak: when Field.Index.TOKENIZED --- Key: LUCENE-1091 URL: https://issues.apache.org/jira/browse/LUCENE-1091 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.2 Environment: Ubuntu Linux 7.10, 32-bit Java 1.6.0 buld 1.6.0_03-b05 (default in Ubuntu 7.10) 1GB RAM Reporter: Mirza Hadzic Attachments: TestOOM.java This little program eats incrementally 2MB of virtual RAM per each 1000 documents indexed, only when Field.Index.TOKENIZED used : public Document getDoc() { Document document = new Document(); document.add(new Field(foo, foo bar, Field.Store.NO, Field.Index.TOKENIZED)); return document; } public Document run() { IndexWriter writer = new IndexWriter(new File(jIndexFileName), new StandardAnalyzer(), true); for (int i = 0; i 100; i++) { writer.addDocument(getDoc()); } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-588: - Lucene Fields: [Patch Available] Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath Assignee: Michael Busch Priority: Minor Attachments: LUCENE-588.patch If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch reassigned LUCENE-588: Assignee: Michael Busch Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath Assignee: Michael Busch Attachments: LUCENE-588.patch If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: TeeTokenFilter performance testing
17 dec 2007 kl. 05.40 skrev Grant Ingersoll: a somewhat common case whereby two or more fields share a fair number of common analysis steps. Right. For the smaller token counts, any performance difference is negligible. However, even at 500 tokens, one starts to see a difference. The first thing to note is that TeeTokenFilter (TTF) is much _slower_ in the case that all tokens are siphoned off (X = 1). I believe the reason is the cost of Token.clone() I might be missing something here, but why do you clone? I usually fill a ListToken with the same instances and then only clone the tokens I need to update. The same token instances are used in multiple fields and queries at the same time and I never had any problems with that. Should I be expecting some? -- karl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]