[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]
[ https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854853#action_12854853 ] Toke Eskildsen commented on LUCENE-2380: Working on LUCENE-2369 I essentially had to re-implement the FieldCache because of the hardwiring of arrays. Switching to accessor methods seems like the right direction to go. Add FieldCache.getTermBytes, to load term data as byte[] Key: LUCENE-2380 URL: https://issues.apache.org/jira/browse/LUCENE-2380 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Fix For: 3.1 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode string, but not necessarily), so we need to push this up the search stack. FieldCache now has getStrings and getStringIndex; we need corresponding methods to load terms as native byte[], since in general they may not be representable as String. This should be quite a bit more RAM efficient too, for US ascii content since each character would then use 1 byte not 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Getting fsync out of the loop
On Wed, Apr 7, 2010 at 3:27 PM, Earwin Burrfoot ear...@gmail.com wrote: No, this doesn't make sense. The OS detects a disk full on accepting the write into the write cache, not [later] on flushing the write cache to disk. If the OS accepts the write, then disk is not full (ie flushing the cache will succeed, unless some other not-disk-full problem happens). Hmmm, at least, normally. What OS/IO system were you on when you saw corruption due to disk full when fsync is disabled? I'm still skeptical that disk full even with fsync disabled can lead to corruption I'd like to see some concrete proof :) Linux 2.6.30-1-amd64, ext3, simple scsi drive Hm. Linux should detect disk full on the initial write. I checked with our resident DB brainiac, he says such things are possible. Okay, I'm not 100% sure this is the cause of my corruptions. It just happened that when the index got corrupted, disk space was also used up - several times. I had that silent-fail-to-write theory and checked it up with some knowledgeable people. Even if they are right, I can be mistaken and the root cause is different. OK... if you get a more concrete case where disk full causes corruption when you disable fsync, please post details back. From what I understand this should never happen. You're mixing up terminology a bit here -- you can't hold on to the latest commit then switch to it. A commit (as sent to the deletion policy) means a *real* commit (ie, IW.commit or IW.close was called). So I think your BG thread would simply be calling IW.commit every N seconds? Under hold on to I meant - keep from being deleted, like SnapshotDP does. But, IW doesn't let you hold on to checkpoints... only to commits. Ie SnapshotDP will only see actual commit/close calls, not intermediate checkpoints like a random segment merge completing, a flush happening, etc. Or... maybe you would in fact call commit frequently from the main threads (but with fsync disabled), and then your DP holds onto these fake commits, periodically picking one of them to do the real fsync ing? I'm just playing around with stupid idea. I'd like to have NRT look-alike without binding readers and writers. :) I see... well binding durability visibility will always be costly. This is why Lucene decouples them (by making NRT readers available). My experiments do the same, essentially. But after I understood that to perform deletions IW has to load term indexes anyway, I'm almost ready to give up and go for intertwined IW/IR mess :) Hey if you really think it's a mess, post a patch that cleans it up :) BTW, if you know your OS/IO system always persists cached writes w/in N seconds, a safe way to avoid fsync is to use a by-time expiring deletion policy. Ie, a commit stays alive as long as its age is less than X... DP's unit test has such a policy. But you better really know for sure that the OS/IO system guarantee that :) Yeah. I thought of it, but it is even more shady :) I agree. And even if you know you're on Linux, and that your pdflush flushes after X seconds, you still have the IO system to contend with. Best to stick with fsync, commit only for safety as needed by the app, and use NRT for fast visibility. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854876#action_12854876 ] Michael McCandless commented on LUCENE-2376: OK but I suspect the root cause is the same here -- your index seems to have a truly massive number of fields. Can you post the CheckIndex output? IW re-uses per-field objects internally, so that many docs with the same field can be indexed more efficiently. However, when IW sweeps to free up RAM, if it notices an allocated field object hasn't been used recently, because that field name has not occurred in recently added docs, it frees up that memory and logs that purge field. So from this output I can see you have at least 43K unique field names. If you have not disabled norms on these fields you'll certainly run out of memory. Even if you disable norms, Lucene is in general not optimized for a tremendous number of unique fields and you'll likely hit other issues. java.lang.OutOfMemoryError:Java heap space -- Key: LUCENE-2376 URL: https://issues.apache.org/jira/browse/LUCENE-2376 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda Attachments: InfoStreamOutput.txt I see an OutOfMemory error in our product and it is happening when we have some data objects on which we built the index. I see the following OutOfmemory error, this is happening after we call Indexwriter.optimize(): 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12] In thread Lucene Merge Thread #12 and the message is org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] Uncaught Exception in thread Lucene Merge Thread #12 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) at java.util.HashMap.addEntry(HashMap.java:755) at java.util.HashMap.put(HashMap.java:385) at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) 4/06/10 02:03:42.895 PM PDT [ERROR] this writer hit an OutOfMemoryError; cannot complete optimize -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Move NoDeletionPolicy to core
Hi I've noticed benchmark has a NoDeletionPolicy class and I was wondering if we can move it to core. I might want to use it for the parallel index stuff, but I think it'll also fit nicely in core, together with the other No* classes. In addition, this class should be made a singleton. If moving to core is acceptable, do you think any bw policy needs to be enforced (such as deprecating the one in benchmark and reference the one in core? I'll also want to change the package name from o.a.l.benchmark.utils to o.a.l.index, where the other IDPs are. Simple move and change (and update to benchmark algs which use it. Shai
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854882#action_12854882 ] Uwe Schindler commented on LUCENE-2074: --- As requested on the mailing list, I will look into resetting the zzBuffer on Tokenizer.reset(Reader). Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854885#action_12854885 ] Shai Erera commented on LUCENE-2074: Uwe, must this be coupled with that issue? This one waits for a long time (why? for JFlex 1.5 release?) and protecting against a huge buffer allocation can be a real quick and tiny fix. And this one also focuses on getting Unicode 5 to work, which is unrelated to the buffer size. But the buffer size is not a critical issue either that we need to move fast with it ... so it's your call. Just thought they are two unrelated problems. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854886#action_12854886 ] Uwe Schindler commented on LUCENE-2074: --- I plan to commit this soon! So any patch will get outdated, thats why i want to fix this here. And as this patch removes direct access from the Tokenizer to the lexer (as it is only accessible through an interface now), we have to change the jflex file to do it correctly. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854887#action_12854887 ] Shai Erera commented on LUCENE-2074: bq. I plan to commit this soon! That's great news ! BTW - what are you going to do w/ the JFlex 1.5 binary? Are you going to check it in somewhere? because it hasn't been released last I checked. I'm asking for general knowledge, because I know the scripts are downloading it, or rely on it to exist somewhere. In that case, then yes, let's fix it here. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854890#action_12854890 ] Uwe Schindler commented on LUCENE-2074: --- You dont need the jflex binaries in general, only if you reconstruct the source files (using ant jflex). And its easy to generate, check out and start mvn install. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Move NoDeletionPolicy to core
+1 I don't think bw needs to be kept -- contrib/benchmark is allowed to change. Mike On Thu, Apr 8, 2010 at 5:44 AM, Shai Erera ser...@gmail.com wrote: Hi I've noticed benchmark has a NoDeletionPolicy class and I was wondering if we can move it to core. I might want to use it for the parallel index stuff, but I think it'll also fit nicely in core, together with the other No* classes. In addition, this class should be made a singleton. If moving to core is acceptable, do you think any bw policy needs to be enforced (such as deprecating the one in benchmark and reference the one in core? I'll also want to change the package name from o.a.l.benchmark.utils to o.a.l.index, where the other IDPs are. Simple move and change (and update to benchmark algs which use it. Shai - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Here a new patch, with the zzBuffer reset to default implemented in a separate reset(Reader) method. As yyReset is generated as final, I had to change the name. Before apply, run: {noformat} svn copy StandardTokenizerImpl.* to StandardTokenizerImplOrig.* svn move StandardTokenizerImpl.* to StandardTokenizerImpl31.* {noformat} I will commit this in a day or two! Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Updated also the error message about missing jflex when calling ant jflex to regenerate the lexers. The message now contains instructions for downloading and building JFlex. Also add CHANGES.txt. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854899#action_12854899 ] Mark Miller commented on LUCENE-2074: - {quote}Uwe, must this be coupled with that issue? This one waits for a long time (why? for JFlex 1.5 release?) and protecting against a huge buffer allocation can be a real quick and tiny fix. And this one also focuses on getting Unicode 5 to work, which is unrelated to the buffer size. But the buffer size is not a critical issue either that we need to move fast with it ... so it's your call. Just thought they are two unrelated problems.{quote} Agreed. Whether its fixed as part of this commit or not, it really deserves its own issue anyway, for changes and tracking. It has nothing to do with this issue other than convenience. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: (was: LUCENE-2074.patch) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854900#action_12854900 ] Uwe Schindler commented on LUCENE-2074: --- Created sub-issue: LUCENE-2384 Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854901#action_12854901 ] Ruben Laguna commented on LUCENE-2384: -- The mailing list discussion that originated this is [1] [1] http://lucene.markmail.org/thread/ndmcgffg2mnwjo47 Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854902#action_12854902 ] Robert Muir commented on LUCENE-2384: - If tokenizers like StandardTokenizer just end out reading things into ram anyway, we should remove Reader from the Tokenizer interface. supporting reader instead of simply tokenizing the entire doc causes our tokenizers to be very very complex (see CharTokenizer). It would be nice to remove this complexity, if the objective doesn't really work anyway. Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854903#action_12854903 ] Uwe Schindler commented on LUCENE-2384: --- For JFlex this does not help as the Jflex-generated code always needs a Reader. This is special here, the lexer will not need to load the whole document into the reader, it only needs sometimes a large look forward/backwards buffer. Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruben Laguna updated LUCENE-2384: - Attachment: reset.diff patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: reset.diff When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854905#action_12854905 ] Ruben Laguna edited comment on LUCENE-2384 at 4/8/10 11:24 AM: --- patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list. I tested it and it seems to work for me. Just including it here is case somebody want to apply the patch directly to 3.0.1 (although it's better to wait for 3.1) was (Author: ecerulm): patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: reset.diff When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854906#action_12854906 ] Robert Muir commented on LUCENE-2384: - bq. For JFlex this does not help as the Jflex-generated code always needs a Reader. This can be fixed. Currently all I/O in all tokenizers is broken and buggy, and does not correctly handle special cases around their 'buffering'. The only one that is correct is CharTokenizer, but at what cost? It has so much complexity because of this Reader issue. We should stop pretending like we can really stream docs with Reader. We should stop pretending like 8GB documents or something exist, where we cant just analyze the whole doc at once and make things simple. And then we can fix the lucene tokenizers to be correct. Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: reset.diff When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2384) Reset zzBuffer in StandardTokenizerImpl* when lexer is reset.
[ https://issues.apache.org/jira/browse/LUCENE-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854908#action_12854908 ] Uwe Schindler commented on LUCENE-2384: --- {quote} patch to reset the zzBuffer when the input is reseted. The code is really taken from https://sourceforge.net/mailarchive/message.php?msg_id=444070.38422...@web38901.mail.mud.yahoo.com so I can't really grant license to use it but I think the guy realeased it as public domain by posting it to the mailing list. I tested it and it seems to work for me. Just including it here is case somebody want to apply the patch directly to 3.0.1 (although it's better to wait for 3.1) {quote} Your fix adds an addtional complexity. Just reset the buffer back to the default ZZ_BUFFERSIZE if grown on reset. Your patch always reallocates a new buffer. Use this: {code} public final void reset(Reader r) { // reset to default buffer size, if buffer has grown if (zzBuffer.length ZZ_BUFFERSIZE) { zzBuffer = new char[ZZ_BUFFERSIZE]; } yyreset(r); } {code} Reset zzBuffer in StandardTokenizerImpl* when lexer is reset. - Key: LUCENE-2384 URL: https://issues.apache.org/jira/browse/LUCENE-2384 Project: Lucene - Java Issue Type: Sub-task Components: Analysis Affects Versions: 3.0.1 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: reset.diff When indexing large documents, the lexer buffer may stay large forever. This sub-issue resets the lexer buffer back to the default on reset(Reader). This is done on the enclosing issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854919#action_12854919 ] Jukka Zitting commented on LUCENE-1482: --- We use SLF4J in Jackrabbit, and having logs from the embedded Lucene index available through the same mechanism would be quite useful in some situations. BTW, using isDebugEnabled() is often not necessary with SLF4J, see http://www.slf4j.org/faq.html#logging_performance Replace infoSteram by a logging framework (SLF4J) - Key: LUCENE-1482 URL: https://issues.apache.org/jira/browse/LUCENE-1482 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar Lucene makes use of infoStream to output messages in its indexing code only. For debugging purposes, when the search application is run on the customer side, getting messages from other code flows, like search, query parsing, analysis etc can be extremely useful. There are two main problems with infoStream today: 1. It is owned by IndexWriter, so if I want to add logging capabilities to other classes I need to either expose an API or propagate infoStream to all classes (see for example DocumentsWriter, which receives its infoStream instance from IndexWriter). 2. I can either turn debugging on or off, for the entire code. Introducing a logging framework can allow each class to control its logging independently, and more importantly, allows the application to turn on logging for only specific areas in the code (i.e., org.apache.lucene.index.*). I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, as it names states, a facade over different logging frameworks. As such, you can include the slf4j.jar in your application, and it recognizes at deploy time what is the actual logging framework you'd like to use. SLF4J comes with several adapters for Java logging, Log4j and others. If you know your application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in your classpath, and your logging statements will use Java logging underneath the covers. This makes the logging code very simple. For a class A the logger will be instantiated like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); } And will later be used like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); public void foo() { if (logger.isDebugEnabled()) { logger.debug(message); } } } That's all ! Checking for isDebugEnabled is very quick, at least using the JDK14 adapter (but I assume it's fast also over other logging frameworks). The important thing is, every class controls its own logger. Not all classes have to output logging messages, and we can improve Lucene's logging gradually, w/o changing the API, by adding more logging messages to interesting classes. I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854920#action_12854920 ] Shai Erera commented on LUCENE-1482: I still think that calling isDebugEnabled is better, because the message formatting stuff may do unnecessary things like casting, autoboxing etc. IMO, if logging is enabled, evaluating it twice is not a big deal ... it's a simple check. I'm glad someone here thinks logging will be useful though :). I wish there will be quorum here to proceed w/ that. Note that I also offered to not create any dependency on SLF4J, but rather extract infoStream to a static InfoStream class, which will avoid passing it around everywhere, and give the flexibility to output stuff from other classes which don't have an infoStream at hand. Replace infoSteram by a logging framework (SLF4J) - Key: LUCENE-1482 URL: https://issues.apache.org/jira/browse/LUCENE-1482 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Shai Erera Fix For: 3.1 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar Lucene makes use of infoStream to output messages in its indexing code only. For debugging purposes, when the search application is run on the customer side, getting messages from other code flows, like search, query parsing, analysis etc can be extremely useful. There are two main problems with infoStream today: 1. It is owned by IndexWriter, so if I want to add logging capabilities to other classes I need to either expose an API or propagate infoStream to all classes (see for example DocumentsWriter, which receives its infoStream instance from IndexWriter). 2. I can either turn debugging on or off, for the entire code. Introducing a logging framework can allow each class to control its logging independently, and more importantly, allows the application to turn on logging for only specific areas in the code (i.e., org.apache.lucene.index.*). I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, as it names states, a facade over different logging frameworks. As such, you can include the slf4j.jar in your application, and it recognizes at deploy time what is the actual logging framework you'd like to use. SLF4J comes with several adapters for Java logging, Log4j and others. If you know your application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in your classpath, and your logging statements will use Java logging underneath the covers. This makes the logging code very simple. For a class A the logger will be instantiated like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); } And will later be used like this: public class A { private static final logger = LoggerFactory.getLogger(A.class); public void foo() { if (logger.isDebugEnabled()) { logger.debug(message); } } } That's all ! Checking for isDebugEnabled is very quick, at least using the JDK14 adapter (but I assume it's fast also over other logging frameworks). The important thing is, every class controls its own logger. Not all classes have to output logging messages, and we can improve Lucene's logging gradually, w/o changing the API, by adding more logging messages to interesting classes. I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Created: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space
What kind of JVM settings are you using? Lots of people index lots of documents without running into this, can you provide more specifics about your indexing settings? On Tue, Apr 6, 2010 at 10:51 PM, Shivender Devarakonda (JIRA) j...@apache.org wrote: java.lang.OutOfMemoryError:Java heap space -- Key: LUCENE-2376 URL: https://issues.apache.org/jira/browse/LUCENE-2376 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda I see an OutOfMemory error in our product and it is happening when we have some data objects on which we built the index. I see the following OutOfmemory error, this is happening after we call Indexwriter.optimize(): 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12] In thread Lucene Merge Thread #12 and the message is org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] Uncaught Exception in thread Lucene Merge Thread #12 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) at java.util.HashMap.addEntry(HashMap.java:755) at java.util.HashMap.put(HashMap.java:385) at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) 4/06/10 02:03:42.895 PM PDT [ERROR] this writer hit an OutOfMemoryError; cannot complete optimize -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854957#action_12854957 ] Tom Burton-West commented on LUCENE-1709: - I am having the same issue Shai reported in LUCENE-2353 with the parallel tests apparently causing the tests to hang on my Windows box with both Revision 931573 and Revision 931304 when running the tests from root. Tests hang in WriteLineDocTaskTest, on this line: [junit] config properties: [junit] directory = RAMDirectory [junit] doc.maker = org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker [junit] line.file.out = D:\dev\lucene\lucene-trunk\build\contrib\benchmark\test\W\one-line [junit] --- I just ran the test last night with Revision 931708 and had no problem. Ran it again this morning and got the hanging behavior. The difference is that last night the only thing running on my computer besides a couple of ssh terminal windows was the tests. Today when I ran the tests and got the hanging behavior, I have firefox, outlook, exceed, wordpad open. The tests are taking 98-99.9% of my cpu while hanging. I suspect there is some kind of resource issue when running the tests in parallel. Tom Burton-West Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854959#action_12854959 ] Robert Muir commented on LUCENE-1709: - Thanks Tom and Shai... sorry I haven't gotten to fix this yet. Shai, would you mind committing your patch? we can keep the issue open to add the sysprop and fix the ant jar thing, and apply the same fixes to Solr's build.xml Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854960#action_12854960 ] Tom Burton-West commented on LUCENE-1709: - This may or may not be a clue to the problem in benchmark. When I control-C'd the hung test, I got the error reported below. Tom. [junit] directory = RAMDirectory [junit] doc.maker = org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker [junit] line.file.out = C:\cygwin\home\tburtonw\lucene\april07_good\build\contrib\benchmark\test\W\one-line [junit] --- [junit] - --- [junit] java.io.FileNotFoundException: C:\cygwin\home\tburtonw\lucene\april07_good\contrib\benchmark\junitvmwatcher203463231158436475.properties (The process cannot access the file because it is being used by another process) [junit] at java.io.FileInputStream.open(Native Method) [junit] at java.io.FileInputStream.init(FileInputStream.java:106) [junit] at java.io.FileReader.init(FileReader.java:55) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1025) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:876) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:803) [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) [junit] at org.apache.tools.ant.Task.perform(Task.java:348) [junit] at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:62) [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) [junit] at org.apache.tools.ant.Task.perform(Task.java:348) [junit] at org.apache.tools.ant.taskdefs.MacroInstance.execute(MacroInstance.java:394) [junit] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [junit] at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) [junit] at org.apache.tools.ant.Task.perform(Task.java:348) [junit] at org.apache.tools.ant.taskdefs.Parallel$TaskRunnable.run(Parallel.java:428) [junit] at java.lang.Thread.run(Thread.java:619) Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854967#action_12854967 ] Robert Muir commented on LUCENE-1709: - Thanks Tom, this is exactly what happened to Shai. Can you try his patch and see if it fixed the problem for you? Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855020#action_12855020 ] Shai Erera commented on LUCENE-1709: Robert, I will commit the patch, seems good to do anyway. We can handle the ant jars separately later. And ths hang behavior is exactly what I experience, including the FileInputStream thing. Only on my machine, when I took a thread dump, it showed that Ant waits on FIS.read() ... Robert - to remind you that even with the patch which forces junit to use a separate temp folder per thread, it still hung ... Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1709) Parallelize Tests
[ https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855022#action_12855022 ] Tom Burton-West commented on LUCENE-1709: - Hi Robert, I patched Revision 931708 and ran ant clean test-contribute and the tests ran just fine. The patch seems to have solved the problem. Tom Parallelize Tests - Key: LUCENE-1709 URL: https://issues.apache.org/jira/browse/LUCENE-1709 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Assignee: Robert Muir Fix For: 3.1 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py Original Estimate: 48h Remaining Estimate: 48h The Lucene tests can be parallelized to make for a faster testing system. This task from ANT can be used: http://ant.apache.org/manual/CoreTasks/parallel.html Previous discussion: http://www.gossamer-threads.com/lists/lucene/java-dev/69669 Notes from Mike M.: {quote} I'd love to see a clean solution here (the tests are embarrassingly parallelizable, and we all have machines with good concurrency these days)... I have a rather hacked up solution now, that uses -Dtestpackage=XXX to split the tests up. Ideally I would be able to say use N threads and it'd do the right thing... like the -j flag to make. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2385: --- Attachment: LUCENE-2385.patch Move NoDeletionPolicy to core, adds javadocs + TestNoDeletionPolicy. Also includes the relevant changes to benchmark (algorithms + CreateIndexTask). I've fixed a typo I had in NoMergeScheduler - not related to this issue, but since it was just a typo, thought it's no harm to do it here. Tests pass. Planning to commit shortly. Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855131#action_12855131 ] Shai Erera commented on LUCENE-2386: Took a look at IndexFileDeleter, and located to offending code segment which is responsible for the IndexCorruptException: {code} if (currentCommitPoint == null) { // We did not in fact see the segments_N file // corresponding to the segmentInfos that was passed // in. Yet, it must exist, because our caller holds // the write lock. This can happen when the directory // listing was stale (eg when index accessed via NFS // client with stale directory listing cache). So we // try now to explicitly open this commit point: SegmentInfos sis = new SegmentInfos(); try { sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs); } catch (IOException e) { throw new CorruptIndexException(failed to locate current segments_N file); } {code} Looks like this code protects against a real problem, which was raised on the list a couple of times already - stale NFS cache. So I'm reluctant to remove that check ... thought I still think we should differentiate between a newly created index on a fresh Directory, to a stale NFS problem. Maybe we can pass a boolean isNew or something like that to the ctor, and if it's a new index and the last commit point is missing, IFD will not throw the exception, but silently ignore that? So the code would become something like this: {code} if (currentCommitPoint == null !isNew) { } {code} Does this make sense, or am I missing something? IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2387) IndexWriter retains references to Readers used in Fields (memory leak)
IndexWriter retains references to Readers used in Fields (memory leak) -- Key: LUCENE-2387 URL: https://issues.apache.org/jira/browse/LUCENE-2387 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0.1 Reporter: Ruben Laguna As described in [1] IndexWriter retains references to Reader used in Fields and that can lead to big memory leaks when using tika's ParsingReaders (as those can take 1MB per ParsingReader). [2] shows a screenshot of the reference chain to the Reader from the IndexWriter taken with Eclipse MAT (Memory Analysis Tool) . The chain is the following: IndexWriter - DocumentsWriter - DocumentsWriterThreadState - DocFieldProcessorPerThread - DocFieldProcessorPerField - Fieldable - Field (fieldsData) - [1] http://markmail.org/thread/ndmcgffg2mnwjo47 [2] http://skitch.com/ecerulm/n7643/eclipse-memory-analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855135#action_12855135 ] Michael McCandless commented on LUCENE-2386: I agree: IW really should not commit the first segments_1, for CREATE when Dir has no index already. App should immediately .commit() if it really wants to. We should fix IFD to know if it's dealing with a known new index and bypass that check that works around stale NFS dir listing (boolean arg sounds good). IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855136#action_12855136 ] Uwe Schindler commented on LUCENE-2385: --- The patch does not look like you svn moved the files. To preserve history, you should do a svn move of the file in your local repository and then modify it to reflect the package changes (if any). Did you do this? Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855140#action_12855140 ] Shai Erera commented on LUCENE-2385: I did that first, but then remembered that when I did that in the past, people were unable to apply my patches, w/o doing the svn move themselves. Anyway, for this file it's not really important I think - a very simple and tiny file, w/ no history to preserve? Is that ok for this file (b/c I have no idea how to do the svn move now ... after I've made all the changes already) :) Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855148#action_12855148 ] Shai Erera commented on LUCENE-2386: Looking at IFD again, I think a boolean ctor arg is not required. What I can do is check if any Lucene file has been seen (in the for-loop iteration on the Directory files), and if not, then deduce it's a new Directory, and skip that 'if' check. I'll give it a shot. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855150#action_12855150 ] Uwe Schindler commented on LUCENE-2385: --- In general we place a list of all svn move/copy command together with the patch, executeable from the root dir. If you paste those commands into your terminal and then apply the patch, it works. One example is the jflex issue (ok, the commands are shortened). Another possibility is to have a second checkout, where you arrange the files correctly (svn moved/copied) and one for creating the patches. Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2385: --- Attachment: LUCENE-2385.patch Is it better now? Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855155#action_12855155 ] Shai Erera commented on LUCENE-2385: Forgot to mention that the only move I made was of NoDeletionPolicy: svn move contrib/benchmark/src/java/org/apache/lucene/benchmark/utils/NoDeletionPolicy.java src/java/org/apache/lucene/index/NoDeletionPolicy.java I'll remember that in the future Uwe - thanks for the heads up ! Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855164#action_12855164 ] Uwe Schindler commented on LUCENE-2385: --- Yeah thats fine! Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core
[ https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2385. Resolution: Fixed Committed revision 932129. Move NoDeletionPolicy from benchmark to core Key: LUCENE-2385 URL: https://issues.apache.org/jira/browse/LUCENE-2385 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark, Index Reporter: Shai Erera Assignee: Shai Erera Priority: Trivial Fix For: 3.1 Attachments: LUCENE-2385.patch, LUCENE-2385.patch As the subject says, but I'll also make it a singleton + add some unit tests, as well as some documentation. I'll post a patch hopefully today. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: IndexWriter memory leak?
There is one possibility, that could be fixed: As Tokenizers are reused, the analyzer holds a reference to the last used Reader. The easy fix would be to unset the Reader in Tokenizer.close(). If this is the case for you, that may be easy to do. So Tokenizer.close() looks like this: /** By default, closes the input Reader. */ @Override public void close() throws IOException { input.close(); input = null; // -- new! } Any comments from other committers? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ruben Laguna [mailto:ruben.lag...@gmail.com] Sent: Thursday, April 08, 2010 2:50 PM To: java-u...@lucene.apache.org Subject: Re: IndexWriter memory leak? I will double check in the afternoon the heapdump.hprof. But I think that *some* readers are indeed held by docWriter.threadStates[0].consumer.fieldHash[1].fields[], as shown in [1] (this heapdump contains only live objects). The heapdump was taken after IndexWriter.commit() /IndexWriter.optimize() and all the Documents were already indexed and GCed (I will double check). So that would mean that the Reader is retained in memory by the following chaing of references, DocumentsWriter - DocumentsWriterThreadState - DocFieldProcessorPerThread - DocFieldProcessorPerField - Fieldable - Field (fieldsData) I'll double check with Eclipse MAT as I said that this chain is actually made of hard references only (no SoftReferences,WeakReferences, etc). I will also double check also that there is no live Document that is referencing the Reader via the Field. [1] http://img.skitch.com/20100407-b86irkp7e4uif2wq1dd4t899qb.jpg On Thu, Apr 8, 2010 at 2:16 PM, Uwe Schindler u...@thetaphi.de wrote: Readers are not held. If you indexed the document and gced the document instance they readers are gone. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Ruben Laguna [mailto:ruben.lag...@gmail.com] Sent: Thursday, April 08, 2010 1:28 PM To: java-u...@lucene.apache.org Subject: Re: IndexWriter memory leak? Now that the zzBuffer issue is solved... what about the references to the Readers held by docWriter. Tika´s ParsingReaders are quite heavyweight so retaining those in memory unnecesarily is also a hidden memory leak. Should I open a bug report on that one? /Rubén On Thu, Apr 8, 2010 at 12:11 PM, Shai Erera ser...@gmail.com wrote: Guess we were replying at the same time :). On Thu, Apr 8, 2010 at 1:04 PM, Uwe Schindler u...@thetaphi.de wrote: I already answered, that I will take care of this! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Shai Erera [mailto:ser...@gmail.com] Sent: Thursday, April 08, 2010 12:00 PM To: java-u...@lucene.apache.org Subject: Re: IndexWriter memory leak? Yes, that's the trimBuffer version I was thinking about, only this guy created a reset(Reader, int) and does both ops (resetting + trim) in one method call. More convenient. Can you please open an issue to track that? People will have a chance to comment on whether we (Lucene) should handle that, or it should be a JFlex fix. Based on the number of replies this guy received (0 !), I doubt JFlex would consider it a problem. But we can do some small service to our users base by protecting against such problems. And while you're opening the issue, if you want to take a stab at fixing it and post a patch, it'd be great :). Shai On Thu, Apr 8, 2010 at 12:51 PM, Ruben Laguna ruben.lag...@gmail.comwrote: I was investigating this a little further and in the JFlex mailing list I found [1] I don't know much about flex / JFlex but it seems that this guy resets the zzBuffer to 16384 or less when setting the input for the lexer Quoted from shef she...@ya... I set %buffer 0 in the options section, and then added this method to the lexer: /** * Set the input for the lexer. The size parameter really speeds things up, * because by default, the lexer allocates an internal buffer of 16k. For * most strings, this is unnecessarily large. If the size param is 0 or greater * than 16k, then the buffer is set to 16k. If the size param is smaller, then * the buf will be set to the exact size. *
Re: Changing the subject for a JIRA-issue (Was: [jira] Created: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into f
: Is it possible to change it? If not, what is the policy here? To open a : new issue and close the old one? ... : In this case, that would mean either closing this issue and opening a new one, : or taking the discussion to the mailing list where subject headers may be : modified as the conversation evolves. Any one who can edit an issue (ie: all hte committers, and anyone in the developer group) can change the summary (which change the email subjects) It's not clear to me what the summar of LUCENE-2335 should be, but McCandless opened the issue, he can certainly fix the summar as the issue evolves. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2386: --- Attachment: LUCENE-2386.patch First stab at this. Patch still missing CHANGES entry, and I haven't run all the tests, just TestIndexWriter. With those changes it passes. One thing that I think should be fixed is testImmediateDiskFull - if I don't add writer.commit(), the test fails, because dir.getRecomputeActualSizeInBytes returns 0 (no RAMFiles yet), and then the test succeeds at adding one document. So maybe just change the test to set maxSizeInBytes to '1', always? TestNoDeletionPolicy is not covered by this patch (should be fixed as well, because now the number of commits is exactly N and not N+1). Will fix it tomorrow. Anyway, it's really late now, so hopefully some fresh eyes will look at it while I'm away, and comment on the proposed changes. I hope I got all the changes to the tests right. IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2074: -- Attachment: LUCENE-2074.patch New patch with replacement of deprecated TermAttribute - CharTermAttribute. It also fixes the reset()/reset(Reader) methods to be conform to all other Tokenizers and the documentations. The current one was resetting multiple times. This has no effect on backwards. Also improve the JFlex classpath detection to work with svn checkouts or future release zips. I will commit this soon when all tests ran. Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer --- Key: LUCENE-2074 URL: https://issues.apache.org/jira/browse/LUCENE-2074 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file. After regeneration the Tokenizer behaves different for some characters. Because of that we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Getting fsync out of the loop
But, IW doesn't let you hold on to checkpoints... only to commits. Ie SnapshotDP will only see actual commit/close calls, not intermediate checkpoints like a random segment merge completing, a flush happening, etc. Or... maybe you would in fact call commit frequently from the main threads (but with fsync disabled), and then your DP holds onto these fake commits, periodically picking one of them to do the real fsync ing? Yeah, that's exactly what I tried to describe in my initial post :) I'm just playing around with stupid idea. I'd like to have NRT look-alike without binding readers and writers. :) I see... well binding durability visibility will always be costly. This is why Lucene decouples them (by making NRT readers available). My experiments do the same, essentially. But after I understood that to perform deletions IW has to load term indexes anyway, I'm almost ready to give up and go for intertwined IW/IR mess :) Hey if you really think it's a mess, post a patch that cleans it up :) Uh oh. Let me finish current one, first. Second - I don't know yet how this should look like. Something along the lines of deletions/norms writers being extracted from segment reader and reader pool being made external to IW?? -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache)
Actually Toke opened a new issue (LUCENE-2369) for the new approach to Locale-based sorting... I think we should leave the existing issue as the single-segment optimization (it's a separate issue). Mike On Thu, Apr 8, 2010 at 6:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Is it possible to change it? If not, what is the policy here? To open a : new issue and close the old one? ... : In this case, that would mean either closing this issue and opening a new one, : or taking the discussion to the mailing list where subject headers may be : modified as the conversation evolves. Any one who can edit an issue (ie: all hte committers, and anyone in the developer group) can change the summary (which change the email subjects) It's not clear to me what the summar of LUCENE-2335 should be, but McCandless opened the issue, he can certainly fix the summar as the issue evolves. -Hoss - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855215#action_12855215 ] Michael McCandless commented on LUCENE-2386: I think the patch is good Shai. I'd be curious what other tests rely on an immediate commit on creating an index Maybe change testImmediateDiskFull to set max allowed size to max(1, current-usage)? In case we change IW to write other stuff in the future on create... IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Getting fsync out of the loop
On Thu, Apr 8, 2010 at 6:21 PM, Earwin Burrfoot ear...@gmail.com wrote: But, IW doesn't let you hold on to checkpoints... only to commits. Ie SnapshotDP will only see actual commit/close calls, not intermediate checkpoints like a random segment merge completing, a flush happening, etc. Or... maybe you would in fact call commit frequently from the main threads (but with fsync disabled), and then your DP holds onto these fake commits, periodically picking one of them to do the real fsync ing? Yeah, that's exactly what I tried to describe in my initial post :) Ahh ok then it makes more sense. But still you shouldn't commit that often (even with fake fsync) since it must flush the segment. I'm just playing around with stupid idea. I'd like to have NRT look-alike without binding readers and writers. :) I see... well binding durability visibility will always be costly. This is why Lucene decouples them (by making NRT readers available). My experiments do the same, essentially. But after I understood that to perform deletions IW has to load term indexes anyway, I'm almost ready to give up and go for intertwined IW/IR mess :) Hey if you really think it's a mess, post a patch that cleans it up :) Uh oh. Let me finish current one, first. Heh, yes :) Second - I don't know yet how this should look like. Something along the lines of deletions/norms writers being extracted from segment reader and reader pool being made external to IW?? Yeah, reader pool should be pulled out of IW, and I think IW should be split into that which manages the segment infos, that which adds/deletes docs, and the rest (merging, addIndexes*)? (There's an issue open for this refactoring...). I'm not sure about deletions/norms writers being extracted from SR I think delete ops would still go through IW? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Incremental Field Updates
Good point. I meant the model at the document level: i.e. what milestones does a document go through in its life cycle. Today: created -- deleted With incremental updates: created -- update1 -- update2 -- deleted I think what I'm trying to say is that this second threaded sequence of state changes seems intuitively more fragile under concurrent scenarios. So for example, in a lock-free design, the system would also have to anticipate the following sequence of events: created -- update1 -- deleted -- update2 and consider update2 a null op. I'm imagining there are other cases that I can't think of.. -Babak On Tue, Apr 6, 2010 at 3:40 AM, Michael McCandless luc...@mikemccandless.com wrote: write once, plus the option to the app to keep multiple commit points around (by customizing the deletion policy). Actually order of operations / commits very much matters in Lucene today. Deletions are not idempotent: if you add a doc w/ term X, delete by term X, add a new doc with term X... that's very different than if you moved the delete op to the end. Ie the deletion only applies to the docs added before it. Mike On Mon, Apr 5, 2010 at 12:45 AM, Babak Farhang farh...@gmail.com wrote: Sure. Because of the write once principle. But at some cost (duplicated data). I was just agreeing that it would not be a good idea to bake in version-ing by keeping the layers around forever in a merged index; I wasn't keying in on transactions per se. Speaking of transactions: I'm not sure if we should worry about this much yet, but with updates the order of the transaction commits seems important. I think commit order is less important today in Lucene because its model supports only 2 types of events: document creation--which only happens once, and document deletion, which is idempotent. What do you think? Will commits have to be ordered if we introduce updates? Or does the onus of maintaining order fall on the application? -Babak On Sat, Apr 3, 2010 at 3:28 AM, Michael McCandless luc...@mikemccandless.com wrote: On Sat, Apr 3, 2010 at 1:25 AM, Babak Farhang farh...@gmail.com wrote: I think they get merged in by the merger, ideally in the background. That sounds sensible. (In other words, we wont concern ourselves with roll backs--something possible while a layer is still around.) Actually roll backs would still be very possible even if layers are merged. Ie, one could keep multiple commits around, and the older commits would still be referring to the old postings + layers, keeping them alive. Lucene would still be transactional with such an approach. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855265#action_12855265 ] Shai Erera commented on LUCENE-2386: bq. Maybe change testImmediateDiskFull to set max allowed size to max(1, current-usage)? Good idea ! Did it and it works. Now ... one thing I haven't mentioned is the bw break. This is a behavioral bw break, which specifically I'm not so sure we should care about, because I wonder how many apps out there rely on being able to open a reader before they ever commited on a fresh new index. So what do you think - do this change anyway, OR ... utilize Version to our aid? I.e., if the Version that was passed to IWC is before LUCENE_31, we keep the initial commit, otherwise we don't do it? Pros is that I won't need to change many of the tests because they still use the LUCENE_30 version (but that is not a strong argument), so it's a weak Pro. Cons is that IW will keep having that doCommit handling in its ctor, only now w/ added comments on why this is being kept around etc. What do you think? IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
TestCodecs running time
Hi I've noticed that TestCodecs takes an insanely long time to run on my machine - between 35-40 seconds. Is that expected? The reason why it runs so long, seems to be that its threads make (each) 4000 iterations ... is that really required to ensure correctness? Shai
Controlling the maximum size of a segment during indexing
Here is a Java unit test that uses the LogByteSizeMergePolicy to control the maximum size of segment files during indexing. That is, it tries. It does not succeed. Will someone who truly understands the merge policy code please examine it. There is probably one tiny parameter missing. It adds 20 documents that each are 100k in size. It creates an index in a RAMDirectory which should have one segment that's a tad over 1mb, and then a set of segments that are a tad over 500k. Instead, the data does not flush until it commits, writing one 5m segment. - org.apache.lucene.index.TestIndexWriterMergeMB --- package org.apache.lucene.index; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.IOException; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.FieldSelectorResult; import org.apache.lucene.document.Field.Index; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.LuceneTestCase; /* * Verify that segment sizes are limited to # of bytes. * * Sizing: * Max MB is 0.5m. Verify against thiAs plus 100k slop. (1.2x) * Min MB is 10k. * Each document is 100k. * mergeSegments=2 * MaxRAMBuffer=1m. Verify against this plus 200k slop. (1.2x) * * This test should cause the ram buffer to flush after 10 documents, and create a CFS a little over 1meg. * The later documents should be flushed to disk every 5-6 documents, and create CFS files a little over 0.5meg. */ public class TestIndexWriterMergeMB extends LuceneTestCase { private static final int MERGE_FACTOR = 2; private static final double RAMBUFFER_MB = 1.0; static final double MIN_MB = 0.01d; static final double MAX_MB = 0.5d; static final double SLOP_FACTOR = 1.2d; static final double MB = 1000*1000; static String VALUE_100k = null; // Test controlling the mergePolicy for max # of docs public void testMaxMergeMB() throws IOException { Directory dir = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig( TEST_VERSION_CURRENT, new WhitespaceAnalyzer(TEST_VERSION_CURRENT)); LogByteSizeMergePolicy mergeMB = new LogByteSizeMergePolicy(); config.setMergePolicy(mergeMB); mergeMB.setMinMergeMB(MIN_MB); mergeMB.setMaxMergeMB(MAX_MB); mergeMB.setUseCompoundFile(true); mergeMB.setMergeFactor(MERGE_FACTOR); config.setMaxBufferedDocs(100);// irrelevant but the next line fails without this. config.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH); MergeScheduler scheduler = new SerialMergeScheduler(); config.setMergeScheduler(scheduler); IndexWriter writer = new IndexWriter(dir, config); System.out.println(Start indexing); for (int i = 0; i 50; i++) { addDoc(writer, i); printSegmentSizes(dir); } checkSegmentSizes(dir); System.out.println(Commit); writer.commit(); printSegmentSizes(dir); checkSegmentSizes(dir); writer.close(); } // document that takes of 100k of RAM private void addDoc(IndexWriter writer, int i) throws IOException { if (VALUE_100k == null) { StringBuilder value = new StringBuilder(10); for(int fill = 0; fill 10; fill ++) { value.append('a'); } VALUE_100k = value.toString(); } Document doc = new Document(); doc.add(new Field(id, i + , Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field(content, VALUE_100k, Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); } private void checkSegmentSizes(Directory dir) { try { String[] files = dir.listAll(); for (String file : files) { if (file.equals(_0.cfs)) { long length = dir.fileLength(file); assertTrue(First segment: + file + size = + length + + (int) ((SLOP_FACTOR * RAMBUFFER_MB) * MB), length (SLOP_FACTOR * RAMBUFFER_MB) * MB); } else if (file.endsWith(.cfs)) { long length
Re: Controlling the maximum size of a segment during indexing
I'm not sure .. but did you set the RAMBufferSizeMB on IWC? Doesn't look like it, and the default is 16 MB, which can explain why it doesn't flush before that. Shai On Fri, Apr 9, 2010 at 8:01 AM, Lance Norskog goks...@gmail.com wrote: Here is a Java unit test that uses the LogByteSizeMergePolicy to control the maximum size of segment files during indexing. That is, it tries. It does not succeed. Will someone who truly understands the merge policy code please examine it. There is probably one tiny parameter missing. It adds 20 documents that each are 100k in size. It creates an index in a RAMDirectory which should have one segment that's a tad over 1mb, and then a set of segments that are a tad over 500k. Instead, the data does not flush until it commits, writing one 5m segment. - org.apache.lucene.index.TestIndexWriterMergeMB --- package org.apache.lucene.index; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the License); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an AS IS BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.IOException; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.FieldSelectorResult; import org.apache.lucene.document.Field.Index; import org.apache.lucene.store.Directory; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.LuceneTestCase; /* * Verify that segment sizes are limited to # of bytes. * * Sizing: * Max MB is 0.5m. Verify against thiAs plus 100k slop. (1.2x) * Min MB is 10k. * Each document is 100k. * mergeSegments=2 * MaxRAMBuffer=1m. Verify against this plus 200k slop. (1.2x) * * This test should cause the ram buffer to flush after 10 documents, and create a CFS a little over 1meg. * The later documents should be flushed to disk every 5-6 documents, and create CFS files a little over 0.5meg. */ public class TestIndexWriterMergeMB extends LuceneTestCase { private static final int MERGE_FACTOR = 2; private static final double RAMBUFFER_MB = 1.0; static final double MIN_MB = 0.01d; static final double MAX_MB = 0.5d; static final double SLOP_FACTOR = 1.2d; static final double MB = 1000*1000; static String VALUE_100k = null; // Test controlling the mergePolicy for max # of docs public void testMaxMergeMB() throws IOException { Directory dir = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig( TEST_VERSION_CURRENT, new WhitespaceAnalyzer(TEST_VERSION_CURRENT)); LogByteSizeMergePolicy mergeMB = new LogByteSizeMergePolicy(); config.setMergePolicy(mergeMB); mergeMB.setMinMergeMB(MIN_MB); mergeMB.setMaxMergeMB(MAX_MB); mergeMB.setUseCompoundFile(true); mergeMB.setMergeFactor(MERGE_FACTOR); config.setMaxBufferedDocs(100);// irrelevant but the next line fails without this. config.setRAMBufferSizeMB(IndexWriterConfig.DISABLE_AUTO_FLUSH); MergeScheduler scheduler = new SerialMergeScheduler(); config.setMergeScheduler(scheduler); IndexWriter writer = new IndexWriter(dir, config); System.out.println(Start indexing); for (int i = 0; i 50; i++) { addDoc(writer, i); printSegmentSizes(dir); } checkSegmentSizes(dir); System.out.println(Commit); writer.commit(); printSegmentSizes(dir); checkSegmentSizes(dir); writer.close(); } // document that takes of 100k of RAM private void addDoc(IndexWriter writer, int i) throws IOException { if (VALUE_100k == null) { StringBuilder value = new StringBuilder(10); for(int fill = 0; fill 10; fill ++) { value.append('a'); } VALUE_100k = value.toString(); } Document doc = new Document(); doc.add(new Field(id, i + , Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field(content, VALUE_100k, Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); } private void checkSegmentSizes(Directory dir) { try { String[] files = dir.listAll(); for (String file : files) { if
[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory
[ https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855277#action_12855277 ] Shai Erera commented on LUCENE-2386: Apparently, there are more tests that fail ... lost count but easy fixing. I tried writing the following test: {code} public void testNoCommits() throws Exception { // Tests that if we don't call commit(), the directory has 0 commits. This has // changed since LUCENE-2386, where before IW would always commit on a fresh // new index. Directory dir = new RAMDirectory(); IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new WhitespaceAnalyzer(TEST_VERSION_CURRENT))); assertEquals(expected 0 commits!, 0, IndexReader.listCommits(dir).size()); // No changes still should generate a commit, because it's a new index. writer.close(); assertEquals(expected 1 commits!, 0, IndexReader.listCommits(dir).size()); } {code} Simple test - validates that no commits are present following a freshly new index creation, w/o closing or committing. However, IndexReader.listCommits fails w/ the following exception: {code} java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.ramdirect...@2d262d26: files: [] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:652) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:323) at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1033) at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1023) at org.apache.lucene.index.IndexReader.listCommits(IndexReader.java:1341) at org.apache.lucene.index.TestIndexWriter.testNoCommits(TestIndexWriter.java:4966) {code} The failure occurs when SegmentInfos attempts to find segments.gen and fails. So I wonder if I should fix DirectoryReader to catch that exception and simply return an empty Collection .. or I should fix SegmentInfos at this point -- notice the files: [] at the end - I think that by adding a check to the following code (SegmentInfos, line 652) which validates that there were any files before throwing the exception, it'll still work properly and safely (i.e. to detect a problematic Directory). Will need probably to break away from the while loop and I guess fix some other things in upper layers ... therefore I'm not sure if I should not simply catch that exception in DirectoryReader.listCommits w/ proper documentation and be done w/ it. After all, it's not supposed to be called ... ever? or hardly ever? {code} if (gen == -1) { // Neither approach found a generation throw new FileNotFoundException(no segments* file found in + directory + : files: + Arrays.toString(files)); } {code} IndexWriter commits unnecessarily on fresh Directory Key: LUCENE-2386 URL: https://issues.apache.org/jira/browse/LUCENE-2386 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Shai Erera Assignee: Shai Erera Fix For: 3.1 Attachments: LUCENE-2386.patch I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created. I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits. Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivender Devarakonda updated LUCENE-2376: -- Attachment: CheckIndex_JavaHeapOOM.txt CheckIndex output for JavaHeapOOM error. As I specified earlier, We saw OOM when it is indexing the data. I ran the checkIndex on the partially generated index folder. java.lang.OutOfMemoryError:Java heap space -- Key: LUCENE-2376 URL: https://issues.apache.org/jira/browse/LUCENE-2376 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda Attachments: CheckIndex_JavaHeapOOM.txt, InfoStreamOutput.txt I see an OutOfMemory error in our product and it is happening when we have some data objects on which we built the index. I see the following OutOfmemory error, this is happening after we call Indexwriter.optimize(): 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12] In thread Lucene Merge Thread #12 and the message is org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] Uncaught Exception in thread Lucene Merge Thread #12 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) at java.util.HashMap.addEntry(HashMap.java:755) at java.util.HashMap.put(HashMap.java:385) at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) 4/06/10 02:03:42.895 PM PDT [ERROR] this writer hit an OutOfMemoryError; cannot complete optimize -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2376) java.lang.OutOfMemoryError:Java heap space
[ https://issues.apache.org/jira/browse/LUCENE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivender Devarakonda updated LUCENE-2376: -- Attachment: CheckIndex_PermGenSpaceOOM.txt If we start our product with already generated index content then we see and permgenspace OOM. I generated the CheckIndex on this index folder. Please let me know your thoughts on these output files. java.lang.OutOfMemoryError:Java heap space -- Key: LUCENE-2376 URL: https://issues.apache.org/jira/browse/LUCENE-2376 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1 Environment: Windows Reporter: Shivender Devarakonda Attachments: CheckIndex_JavaHeapOOM.txt, CheckIndex_PermGenSpaceOOM.txt, InfoStreamOutput.txt I see an OutOfMemory error in our product and it is happening when we have some data objects on which we built the index. I see the following OutOfmemory error, this is happening after we call Indexwriter.optimize(): 4/06/10 02:03:42.160 PM PDT [ERROR] [Lucene Merge Thread #12] In thread Lucene Merge Thread #12 and the message is org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space 4/06/10 02:03:42.207 PM PDT [VERBOSE] [Lucene Merge Thread #12] [Manager] Uncaught Exception in thread Lucene Merge Thread #12 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.resize(HashMap.java:462) at java.util.HashMap.addEntry(HashMap.java:755) at java.util.HashMap.put(HashMap.java:385) at org.apache.lucene.index.FieldInfos.addInternal(FieldInfos.java:256) at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:366) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:71) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) 4/06/10 02:03:42.895 PM PDT [ERROR] this writer hit an OutOfMemoryError; cannot complete optimize -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org