Solr-trunk - Build # 1346 - Failure
Build: https://hudson.apache.org/hudson/job/Solr-trunk/1346/ All tests passed Build Log (for compile errors): [...truncated 20163 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972758#action_12972758 ] Michael McCandless commented on LUCENE-2694: If I force scoring BQ rewrite for wildcard prefix queries (ie set that rewrite mode and then relax BQ max clause count) I see healthy speedups (~23-27%) for these queries! Great :) While this doesn't happen w/ our default settings (ie these queries quickly cutover to constant filter rewrite), apps that change these defaults will see a gain, plus, the term cache (which today protects you) is terribly fragile since apps w/ many MTQ queries in flight can thrash that cache thus killing performance. This patch prevents that entirely since MTQs do their own caching of the TermStates they need: awesome. MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Do we want 'nocommit' to fail the commit?
+1 this would be great :) Mike On Fri, Dec 17, 2010 at 10:45 PM, Shai Erera ser...@gmail.com wrote: Hi Out of curiosity, I searched if we can have a nocommit comment in the code fail the commit. As far as I see, we try to avoid accidental commits (of say debug messages) by putting a nocommit comment, but I don't know if svn ci would fail in the presence of such comment - I guess not because we've seen some accidental nocommits checked in already in the past. So I Googled around and found that if we have control of the svn repo, we can add a pre-commit hook that will check and fail the commit. Here is a nice article that explains how to add pre-commit hooks in general (http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't try it yet (on our local svn instance), so I cannot say how well it works, but perhaps someone has experience with it ... So if this is interesting, and is doable for Lucene (say, open a JIRA issue for Infra?) I don't mind investigating it further and write the script (which can be as simple as 'grep the changed files and fail on the presence of nocommit string'). Shai - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2818) abort() method for IndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972763#action_12972763 ] Michael McCandless commented on LUCENE-2818: +1 I think this'd be a good simplification of IW/IR code. I don't mind that IO would know how to delete the partial file it had created; that seems fair. So eg CompoundFileWriter would abort its output file on hitting any exception. I think we can make a default impl that simply closes suppresses exceptions? (We can't .deleteFile since an abstract IO doesn't know its Dir). Our concrete impls can override w/ versions that do delete the file... abort() method for IndexOutput -- Key: LUCENE-2818 URL: https://issues.apache.org/jira/browse/LUCENE-2818 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot I'd like to see abort() method on IndexOutput that silently (no exceptions) closes IO and then does silent papaDir.deleteFile(this.fileName()). This will simplify a bunch of error recovery code for IndexWriter and friends, but constitutes an API backcompat break. What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Do we want 'nocommit' to fail the commit?
But. Er. What if we happen to have nocommit in a string, or in some docs, or as a name of variable? On Sat, Dec 18, 2010 at 12:47, Michael McCandless luc...@mikemccandless.com wrote: +1 this would be great :) Mike On Fri, Dec 17, 2010 at 10:45 PM, Shai Erera ser...@gmail.com wrote: Hi Out of curiosity, I searched if we can have a nocommit comment in the code fail the commit. As far as I see, we try to avoid accidental commits (of say debug messages) by putting a nocommit comment, but I don't know if svn ci would fail in the presence of such comment - I guess not because we've seen some accidental nocommits checked in already in the past. So I Googled around and found that if we have control of the svn repo, we can add a pre-commit hook that will check and fail the commit. Here is a nice article that explains how to add pre-commit hooks in general (http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't try it yet (on our local svn instance), so I cannot say how well it works, but perhaps someone has experience with it ... So if this is interesting, and is doable for Lucene (say, open a JIRA issue for Infra?) I don't mind investigating it further and write the script (which can be as simple as 'grep the changed files and fail on the presence of nocommit string'). Shai - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Do we want 'nocommit' to fail the commit?
I like this idea, too. But I think we have no control on this, it would be as complicated as the mergeprops... What we have: Hudson halfly hour builds fail when svn contains commits, so you see it latest 30 Min later. Uwe Shai Erera ser...@gmail.com schrieb: Hi Out of curiosity, I searched if we can have a nocommit comment in the code fail the commit. As far as I see, we try to avoid accidental commits (of say debug messages) by putting a nocommit comment, but I don't know if svn ci would fail in the presence of such comment - I guess not because we've seen some accidental nocommits checked in already in the past. So I Googled around and found that if we have control of the svn repo, we can add a pre-commit hook that will check and fail the commit. Here is a nice article that explains how to add pre-commit hooks in general ( http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't try it yet (on our local svn instance), so I cannot say how well it works, but perhaps someone has experience with it ... So if this is interesting, and is doable for Lucene (say, open a JIRA issue for Infra?) I don't mind investigating it further and write the script (which can be as simple as 'grep the changed files and fail on the presence of nocommit string'). Shai -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2818) abort() method for IndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972764#action_12972764 ] Earwin Burrfoot commented on LUCENE-2818: - bq. Can abort() have a default impl in IndexOutput, such as close() followed by deleteFile() maybe? If so, then it won't break anything. It can't. To call deleteFile you need both a reference to papa-Directory and a name of the file this IO writes to. Abstract IO class has neither. If we add them, they have to be passed to a new constructor, and that's an API break ;) bq. Would abort() on Directory fit better? E.g., it can abort all currently open and modified files, instead of the caller calling abort() on each IndexOutput? Are you thinking of a case where a write failed, and the caller would call abort() immediately, instead of some higher-level code? If so, would rollback() be a better name? Oh, no, no. No way. I don't want to push someone else's responsibility on Directory. This abort() is merely a shortcut. Let's go with a usage example: Here's FieldsWriter.java with LUCENE-2814 applied (skipping irrelevant parts) - https://gist.github.com/746358 Now, the same, with abort() - https://gist.github.com/746367 abort() method for IndexOutput -- Key: LUCENE-2818 URL: https://issues.apache.org/jira/browse/LUCENE-2818 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot I'd like to see abort() method on IndexOutput that silently (no exceptions) closes IO and then does silent papaDir.deleteFile(this.fileName()). This will simplify a bunch of error recovery code for IndexWriter and friends, but constitutes an API backcompat break. What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2818) abort() method for IndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972765#action_12972765 ] Earwin Burrfoot commented on LUCENE-2818: - bq. I think we can make a default impl that simply closes suppresses exceptions? (We can't .deleteFile since an abstract IO doesn't know its Dir). Our concrete impls can override w/ versions that do delete the file... I don't think we need a default impl? For some directory impls close() is a noop + what is more important, having abstract method forces you to implement it, you can't forget this, so we're not gonna see broken directories that don't do abort() properly. abort() method for IndexOutput -- Key: LUCENE-2818 URL: https://issues.apache.org/jira/browse/LUCENE-2818 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot I'd like to see abort() method on IndexOutput that silently (no exceptions) closes IO and then does silent papaDir.deleteFile(this.fileName()). This will simplify a bunch of error recovery code for IndexWriter and friends, but constitutes an API backcompat break. What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2818) abort() method for IndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-2818: Priority: Minor (was: Major) This change is really minor, but I think, convinient. You don't have to lug reference to Directory along, and recalculate the file name, if the only thing you want to say is that write was a failure and you no longer need this file. abort() method for IndexOutput -- Key: LUCENE-2818 URL: https://issues.apache.org/jira/browse/LUCENE-2818 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Priority: Minor I'd like to see abort() method on IndexOutput that silently (no exceptions) closes IO and then does silent papaDir.deleteFile(this.fileName()). This will simplify a bunch of error recovery code for IndexWriter and friends, but constitutes an API backcompat break. What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments
[ https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972767#action_12972767 ] Michael McCandless commented on LUCENE-2814: Patch looks great! Nice work Earwin. I think it's ready to commit. Except, can you resync to trunk? I hit failures applying one hunk to DW.java. Also, on the nocommit on exc in DW.addDocument, yes I think that (IFD.deleteNewFiles, not checkpoint) is still needed because DW can orphan the store files on abort? Or: we could fix DW.abort to directly call Dir.deleteFile (instead of relying on IFD.deleteNewFiles). Ie, w/ no shared doc stores, these files should never have been registered w/ IFD so they can be privately managed by DW. But, if we end up leaving the delete up above, we should put the docWriter null check back so silly apps that close IW while still indexing don't get NPEs. I'm not looking forward to the 3.x back port!! stop writing shared doc stores across segments -- Key: LUCENE-2814 URL: https://issues.apache.org/jira/browse/LUCENE-2814 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 3.1, 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch Shared doc stores enables the files for stored fields and term vectors to be shared across multiple segments. We've had this optimization since 2.1 I think. It works best against a new index, where you open an IW, add lots of docs, and then close it. In that case all of the written segments will reference slices a single shared doc store segment. This was a good optimization because it means we never need to merge these files. But, when you open another IW on that index, it writes a new set of doc stores, and then whenever merges take place across doc stores, they must now be merged. However, since we switched to shared doc stores, there have been two optimizations for merging the stores. First, we now bulk-copy the bytes in these files if the field name/number assignment is congruent. Second, we now force congruent field name/number mapping in IndexWriter. This means this optimization is much less potent than it used to be. Furthermore, the optimization adds *a lot* of hair to IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over time, and causes odd behavior like a merge possibly forcing a flush when it starts. Finally, with DWPT (LUCENE-2324), which gets us truly concurrent flushing, we can no longer share doc stores. So, I think we should turn off the write-side of shared doc stores to pave the path for DWPT to land on trunk and simplify IW/DW. We still must support reading them (until 5.0), but the read side is far less hairy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972769#action_12972769 ] Uwe Schindler commented on LUCENE-2694: --- I have also some things: - We currently don't support seeking a FilteredTermsEnum, this is disallowed by UnsupportedOperationException (we may change this, but its complicated, Robert and me are thinking about it, but for now its disallowed, as it would break the enum logic). So the TermState seek method in FilteredTermsEnum should also throw UOE: {code} /** This enum does not support seeking! * @throws UnsupportedOperationException */ @Override public SeekStatus seek(BytesRef term, boolean useCache) throws IOException { throw new UnsupportedOperationException(getClass().getName()+ does not support seeking); } {code} - For what is setNextReader in TermCollector? I don't like that, but you seems to need it for the PerReaderTermState. The collector should really only work on the enum not on any reader. Thats what I have seen on first patch review, will now apply patch and look closer into it :-) But the first point is important, FilteredTermsEnum currently should not support seeking. MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972769#action_12972769 ] Uwe Schindler edited comment on LUCENE-2694 at 12/18/10 5:43 AM: - I have also some things: - We currently don't support seeking a FilteredTermsEnum, this is disallowed by UnsupportedOperationException (we may change this, but its complicated, Robert and me are thinking about it, but for now its disallowed, as it would break the enum logic). So the TermState seek method in FilteredTermsEnum should also throw UOE: {code} /** This enum does not support seeking! * @throws UnsupportedOperationException */ @Override public SeekStatus seek(BytesRef term, boolean useCache) throws IOException { throw new UnsupportedOperationException(getClass().getName()+ does not support seeking); } {code} - Additionally, can the next() implementation in FilteredTermsEnum use TermState? It does lots of seeking on the underlying (filtered) TermsEnum. This is the reason why sekking on the FilteredTermsEnum is not allowed. Filtering is done here on the accept() methods. - For what is setNextReader in TermCollector? I don't like that, but you seems to need it for the PerReaderTermState. The collector should really only work on the enum not on any reader. At least the Thats what I have seen on first patch review, will now apply patch and look closer into it :-) But the first point is important, FilteredTermsEnum currently should not support seeking. was (Author: thetaphi): I have also some things: - We currently don't support seeking a FilteredTermsEnum, this is disallowed by UnsupportedOperationException (we may change this, but its complicated, Robert and me are thinking about it, but for now its disallowed, as it would break the enum logic). So the TermState seek method in FilteredTermsEnum should also throw UOE: {code} /** This enum does not support seeking! * @throws UnsupportedOperationException */ @Override public SeekStatus seek(BytesRef term, boolean useCache) throws IOException { throw new UnsupportedOperationException(getClass().getName()+ does not support seeking); } {code} - For what is setNextReader in TermCollector? I don't like that, but you seems to need it for the PerReaderTermState. The collector should really only work on the enum not on any reader. Thats what I have seen on first patch review, will now apply patch and look closer into it :-) But the first point is important, FilteredTermsEnum currently should not support seeking. MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Do we want 'nocommit' to fail the commit?
I haven't seen nocommit in the code, neither as String nor as member. But we can decide that we do @nocommit@ or something, which is less likely to be contained in code :). Uwe, I didn't understand your response - do you mean that if the code contains a 'nocommit' in any of the .java files, Hudson will fail? If we have no control on svn, we can create a special unit test that asserts exactly that. If you run your tests before commit (as you should :)), it will be detected. If not, Hudson will detect it (that is, unless it already somehow detects it). Shai On Sat, Dec 18, 2010 at 11:56 AM, Uwe Schindler u...@thetaphi.de wrote: I like this idea, too. But I think we have no control on this, it would be as complicated as the mergeprops... What we have: Hudson halfly hour builds fail when svn contains commits, so you see it latest 30 Min later. Uwe Shai Erera ser...@gmail.com schrieb: Hi Out of curiosity, I searched if we can have a nocommit comment in the code fail the commit. As far as I see, we try to avoid accidental commits (of say debug messages) by putting a nocommit comment, but I don't know if svn ci would fail in the presence of such comment - I guess not because we've seen some accidental nocommits checked in already in the past. So I Googled around and found that if we have control of the svn repo, we can add a pre-commit hook that will check and fail the commit. Here is a nice article that explains how to add pre-commit hooks in general ( http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't try it yet (on our local svn instance), so I cannot say how well it works, but perhaps someone has experience with it ... So if this is interesting, and is doable for Lucene (say, open a JIRA issue for Infra?) I don't mind investigating it further and write the script (which can be as simple as 'grep the changed files and fail on the presence of nocommit string'). Shai -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments
[ https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-2814: Attachment: LUCENE-2814.patch Synced to trunk. bq. Also, on the nocommit on exc in DW.addDocument, yes I think that (IFD.deleteNewFiles, not checkpoint) is still needed because DW can orphan the store files on abort? Orphaned files are deleted directly in StoredFieldsWriter.abort() and TermVectorsTermsWriter.abort(). As I said - all the open files tracking is now gone. Turns out checkpoint() is also no longer needed. I have no other lingering cleanup urges, this is ready to be committed. I think. stop writing shared doc stores across segments -- Key: LUCENE-2814 URL: https://issues.apache.org/jira/browse/LUCENE-2814 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 3.1, 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch Shared doc stores enables the files for stored fields and term vectors to be shared across multiple segments. We've had this optimization since 2.1 I think. It works best against a new index, where you open an IW, add lots of docs, and then close it. In that case all of the written segments will reference slices a single shared doc store segment. This was a good optimization because it means we never need to merge these files. But, when you open another IW on that index, it writes a new set of doc stores, and then whenever merges take place across doc stores, they must now be merged. However, since we switched to shared doc stores, there have been two optimizations for merging the stores. First, we now bulk-copy the bytes in these files if the field name/number assignment is congruent. Second, we now force congruent field name/number mapping in IndexWriter. This means this optimization is much less potent than it used to be. Furthermore, the optimization adds *a lot* of hair to IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over time, and causes odd behavior like a merge possibly forcing a flush when it starts. Finally, with DWPT (LUCENE-2324), which gets us truly concurrent flushing, we can no longer share doc stores. So, I think we should turn off the write-side of shared doc stores to pave the path for DWPT to land on trunk and simplify IW/DW. We still must support reading them (until 5.0), but the read side is far less hairy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2818) abort() method for IndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972772#action_12972772 ] Shai Erera commented on LUCENE-2818: I offered a default impl just to not break the API. I don't think a default impl is a good option. If we're ok making an exception for 3x as well (I know I am), then I don't think we should have a default impl. abort() method for IndexOutput -- Key: LUCENE-2818 URL: https://issues.apache.org/jira/browse/LUCENE-2818 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Priority: Minor I'd like to see abort() method on IndexOutput that silently (no exceptions) closes IO and then does silent papaDir.deleteFile(this.fileName()). This will simplify a bunch of error recovery code for IndexWriter and friends, but constitutes an API backcompat break. What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?
LuceneTestCase's check for uncaught exceptions in threads causes collateral damage? --- Key: LUCENE-2819 URL: https://issues.apache.org/jira/browse/LUCENE-2819 Project: Lucene - Java Issue Type: Bug Components: Tests Reporter: Michael McCandless Fix For: 3.1, 4.0 Eg see these failures: https://hudson.apache.org/hudson/job/Lucene-3.x/214/ Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 1 test had a real failure but somehow our thread hit exc tracking incorrectly blames the other 3 cases? I'm not sure about this but it seems like something like that is going on... So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but if CMS had also hit a failure, then fails to clear CMS's thread failures. I think we should just remove CMS's thread failure tracking? (It's static so it can definitely bleed across tests). Ie, just rely on LuceneTestCase's tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?
[ https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972776#action_12972776 ] Robert Muir commented on LUCENE-2819: - I think this is the problem: lets say the main thread spawns 3 other threads (A,B,C). when A throws exception, our uncaught exception handler calls the test to fail. There is nothing wrong with this... the problem in your example is i think B and C are still running and then fail later (even if its just a few ms) So these get 'misattributed' to the next test method... we can't do anything about that either without doing insane amounts of buffering. So we need to improve the thread handling in general for the tests. LuceneTestCase's check for uncaught exceptions in threads causes collateral damage? --- Key: LUCENE-2819 URL: https://issues.apache.org/jira/browse/LUCENE-2819 Project: Lucene - Java Issue Type: Bug Components: Tests Reporter: Michael McCandless Fix For: 3.1, 4.0 Eg see these failures: https://hudson.apache.org/hudson/job/Lucene-3.x/214/ Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 1 test had a real failure but somehow our thread hit exc tracking incorrectly blames the other 3 cases? I'm not sure about this but it seems like something like that is going on... So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but if CMS had also hit a failure, then fails to clear CMS's thread failures. I think we should just remove CMS's thread failure tracking? (It's static so it can definitely bleed across tests). Ie, just rely on LuceneTestCase's tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2694: -- Attachment: LUCENE-2694-FTE.patch Here just the patch for a correct behaving FilteredTermsEnum (according to docs, that it does currently not support seeking). The assert is also not needed, as tenum is guranteed to be not null (its final and ctor already asserts this) :-) MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?
[ https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2819: --- Attachment: LUCENE-2819.patch Attaching current patch; includes lots of noise and does not work yet!! (I still see collateral damage). LuceneTestCase's check for uncaught exceptions in threads causes collateral damage? --- Key: LUCENE-2819 URL: https://issues.apache.org/jira/browse/LUCENE-2819 Project: Lucene - Java Issue Type: Bug Components: Tests Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2819.patch Eg see these failures: https://hudson.apache.org/hudson/job/Lucene-3.x/214/ Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 1 test had a real failure but somehow our thread hit exc tracking incorrectly blames the other 3 cases? I'm not sure about this but it seems like something like that is going on... So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but if CMS had also hit a failure, then fails to clear CMS's thread failures. I think we should just remove CMS's thread failure tracking? (It's static so it can definitely bleed across tests). Ie, just rely on LuceneTestCase's tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2691 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2691/ 4 tests failed. FAILED: org.apache.solr.util.SolrPluginUtilsTest.testAddToNamedListPrimitiveTypes Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at java.lang.Thread.run(Thread.java:636) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest Error Message: Could not get the port for ZooKeeper server Stack Trace: java.lang.RuntimeException: Could not get the port for ZooKeeper server at org.apache.solr.cloud.ZkTestServer.run(ZkTestServer.java:216) at org.apache.solr.cloud.AbstractZkTestCase.azt_beforeClass(AbstractZkTestCase.java:56) FAILED: junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.solr.cloud.ZkTestServer$ZKServerMain.shutdown(ZkTestServer.java:111) at org.apache.solr.cloud.ZkTestServer.shutdown(ZkTestServer.java:227) at org.apache.solr.cloud.AbstractZkTestCase.azt_afterClass(AbstractZkTestCase.java:112) FAILED: TEST-org.apache.solr.core.AlternateDirectoryTest.xml.init Error Message: Stack Trace: Test report file /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/test-results/TEST-org.apache.solr.core.AlternateDirectoryTest.xml was length 0 Build Log (for compile errors): [...truncated 8658 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?
[ https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2819: Attachment: LUCENE-2819.patch I worked on mike's patch a bit... here's an updated version. I think lucenetestcase is ok, but there are tests that need fixing. For example TestParallelMultiSearcher doesn't close() its searcher, so its executor never gets shutdown. because of this the test now fails. LuceneTestCase's check for uncaught exceptions in threads causes collateral damage? --- Key: LUCENE-2819 URL: https://issues.apache.org/jira/browse/LUCENE-2819 Project: Lucene - Java Issue Type: Bug Components: Tests Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2819.patch, LUCENE-2819.patch Eg see these failures: https://hudson.apache.org/hudson/job/Lucene-3.x/214/ Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 1 test had a real failure but somehow our thread hit exc tracking incorrectly blames the other 3 cases? I'm not sure about this but it seems like something like that is going on... So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but if CMS had also hit a failure, then fails to clear CMS's thread failures. I think we should just remove CMS's thread failure tracking? (It's static so it can definitely bleed across tests). Ie, just rely on LuceneTestCase's tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2818) abort() method for IndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972816#action_12972816 ] Michael McCandless commented on LUCENE-2818: I think a bw compat exception is fine too! abort() method for IndexOutput -- Key: LUCENE-2818 URL: https://issues.apache.org/jira/browse/LUCENE-2818 Project: Lucene - Java Issue Type: Improvement Reporter: Earwin Burrfoot Priority: Minor I'd like to see abort() method on IndexOutput that silently (no exceptions) closes IO and then does silent papaDir.deleteFile(this.fileName()). This will simplify a bunch of error recovery code for IndexWriter and friends, but constitutes an API backcompat break. What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2290) the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr
the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr -- Key: SOLR-2290 URL: https://issues.apache.org/jira/browse/SOLR-2290 Project: Solr Issue Type: New Feature Reporter: Tom Burton-West Priority: Minor Solr allows users to set the termInfosIndexDivisor used by the indexReader during search time in solrconfig.xml, but not in the indexReader opened by the IndexWriter when indexing/merging. When dealing with an index with a large number of unique terms, setting the termInfosIndexDivisor at search time is helpful in reducing memory use. It would also be helpful in reducing memory use during indexing/merging if it was made configurable for indexReaders opened by indexWriter during indexing/merging. This thread contains some background: http://www.lucidimagination.com/search/document/b5c756a366e1a0d6/memory_use_during_merges_oom In the Lucene 3.x branch it looks like this is done in IndexWriterConfig.setReaderTermsIndexDivisor, although there is also this method signature in IndexWriter.java: IndexReader getReader(int termInfosIndexDivisor) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-3.x - Build # 214 - Failure
I committed a fix for this. I think there was actually only one failure, which cascaded due still running threads spilling over to other test methods (LUCENE-2819). The one failure was caused by LUCENE-2811 (SI tracks hasVectors) in addIndexes(Directory[]); we were failing to copy over the vector files in the case where the first segment to share a doc store did not have vectors but a later segment sharing the same doc stores did... Mike On Fri, Dec 17, 2010 at 6:22 PM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-3.x/214/ 4 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull Error Message: addIndexes(Directory[]) + optimize() hit IOException after disk space was freed up Stack Trace: junit.framework.AssertionFailedError: addIndexes(Directory[]) + optimize() hit IOException after disk space was freed up at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:323) REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testCorruptionAfterDiskFullDuringMerge Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:354) REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testImmediateDiskFull Error Message: ConcurrentMergeScheduler hit unhandled exceptions Stack Trace: junit.framework.AssertionFailedError: ConcurrentMergeScheduler hit unhandled exceptions at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:375) REGRESSION: org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:354) Build Log (for compile errors): [...truncated 6950 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?
[ https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2819: Attachment: LUCENE-2819.patch here's an updated patch, I think its much better. The core tests are passing but still need to do contrib/solr. Some problems i found, were having to 'actually close' the executorservices because ParallelMultiShredder doesnt wait for the shutdown to actually happen in its close(). Also the TimeLimitingCollector creates a new thread...statically! This just seems really evil. I don't think tests should be creating threads and not cleaning up after themselves! You might also ask why even bother killing the the threads if we will fail anyway? True we will already fail the test in this case, but this is just to try to prevent the fails from being attributed to other test cases (the original problem here). LuceneTestCase's check for uncaught exceptions in threads causes collateral damage? --- Key: LUCENE-2819 URL: https://issues.apache.org/jira/browse/LUCENE-2819 Project: Lucene - Java Issue Type: Bug Components: Tests Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2819.patch, LUCENE-2819.patch, LUCENE-2819.patch Eg see these failures: https://hudson.apache.org/hudson/job/Lucene-3.x/214/ Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 1 test had a real failure but somehow our thread hit exc tracking incorrectly blames the other 3 cases? I'm not sure about this but it seems like something like that is going on... So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but if CMS had also hit a failure, then fails to clear CMS's thread failures. I think we should just remove CMS's thread failure tracking? (It's static so it can definitely bleed across tests). Ie, just rely on LuceneTestCase's tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API
[ https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-2723: - Attachment: LUCENE-2723_openEnum.patch Here's a small patch that may be sufficient to enable dropping down to per-segment work while still using MultiTerms/MultiTermsEnum to traverse terms in order. It basically makes the TermsEnumWithSlice members public, and adds a bulkPostings member for reuse. Is this the right approach? Speed up Lucene's low level bulk postings read API -- Key: LUCENE-2723 URL: https://issues.apache.org/jira/browse/LUCENE-2723 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723_openEnum.patch, LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch Spinoff from LUCENE-1410. The flex DocsEnum has a simple bulk-read API that reads the next chunk of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR (from LUCENE-1410). This is not unlike sucking coffee through those tiny plastic coffee stirrers they hand out airplanes that, surprisingly, also happen to function as a straw. As a result we see no perf gain from using FOR/PFOR. I had hacked up a fix for this, described at in my blog post at http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html I'm opening this issue to get that work to a committable point. So... I've worked out a new bulk-read API to address performance bottleneck. It has some big changes over the current bulk-read API: * You can now also bulk-read positions (but not payloads), but, I have yet to cutover positional queries. * The buffer contains doc deltas, not absolute values, for docIDs and positions (freqs are absolute). * Deleted docs are not filtered out. * The doc freq buffers need not be aligned. For fixed intblock codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16, Group varint, etc.) they won't be. It's still a work in progress... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments
[ https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972842#action_12972842 ] Michael McCandless commented on LUCENE-2814: OK I committed to trunk. I'll let this bake for a while on trunk before backporting to 3.x... Thanks Earwin! stop writing shared doc stores across segments -- Key: LUCENE-2814 URL: https://issues.apache.org/jira/browse/LUCENE-2814 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 3.1, 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch Shared doc stores enables the files for stored fields and term vectors to be shared across multiple segments. We've had this optimization since 2.1 I think. It works best against a new index, where you open an IW, add lots of docs, and then close it. In that case all of the written segments will reference slices a single shared doc store segment. This was a good optimization because it means we never need to merge these files. But, when you open another IW on that index, it writes a new set of doc stores, and then whenever merges take place across doc stores, they must now be merged. However, since we switched to shared doc stores, there have been two optimizations for merging the stores. First, we now bulk-copy the bytes in these files if the field name/number assignment is congruent. Second, we now force congruent field name/number mapping in IndexWriter. This means this optimization is much less potent than it used to be. Furthermore, the optimization adds *a lot* of hair to IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over time, and causes odd behavior like a merge possibly forcing a flush when it starts. Finally, with DWPT (LUCENE-2324), which gets us truly concurrent flushing, we can no longer share doc stores. So, I think we should turn off the write-side of shared doc stores to pave the path for DWPT to land on trunk and simplify IW/DW. We still must support reading them (until 5.0), but the read side is far less hairy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2290) the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr
[ https://issues.apache.org/jira/browse/SOLR-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972847#action_12972847 ] Jason Rutherglen commented on SOLR-2290: Tom, I think this can be generified to use SOLR-1447's property injection into IWC. the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr -- Key: SOLR-2290 URL: https://issues.apache.org/jira/browse/SOLR-2290 Project: Solr Issue Type: New Feature Reporter: Tom Burton-West Priority: Minor Solr allows users to set the termInfosIndexDivisor used by the indexReader during search time in solrconfig.xml, but not in the indexReader opened by the IndexWriter when indexing/merging. When dealing with an index with a large number of unique terms, setting the termInfosIndexDivisor at search time is helpful in reducing memory use. It would also be helpful in reducing memory use during indexing/merging if it was made configurable for indexReaders opened by indexWriter during indexing/merging. This thread contains some background: http://www.lucidimagination.com/search/document/b5c756a366e1a0d6/memory_use_during_merges_oom In the Lucene 3.x branch it looks like this is done in IndexWriterConfig.setReaderTermsIndexDivisor, although there is also this method signature in IndexWriter.java: IndexReader getReader(int termInfosIndexDivisor) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments
[ https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972850#action_12972850 ] Jason Rutherglen commented on LUCENE-2814: -- bq. backporting to 3.x... Out of curiosity, why are we backporting to 3.x or are we planning on also backporting the DWPT branch? stop writing shared doc stores across segments -- Key: LUCENE-2814 URL: https://issues.apache.org/jira/browse/LUCENE-2814 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 3.1, 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch Shared doc stores enables the files for stored fields and term vectors to be shared across multiple segments. We've had this optimization since 2.1 I think. It works best against a new index, where you open an IW, add lots of docs, and then close it. In that case all of the written segments will reference slices a single shared doc store segment. This was a good optimization because it means we never need to merge these files. But, when you open another IW on that index, it writes a new set of doc stores, and then whenever merges take place across doc stores, they must now be merged. However, since we switched to shared doc stores, there have been two optimizations for merging the stores. First, we now bulk-copy the bytes in these files if the field name/number assignment is congruent. Second, we now force congruent field name/number mapping in IndexWriter. This means this optimization is much less potent than it used to be. Furthermore, the optimization adds *a lot* of hair to IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over time, and causes odd behavior like a merge possibly forcing a flush when it starts. Finally, with DWPT (LUCENE-2324), which gets us truly concurrent flushing, we can no longer share doc stores. So, I think we should turn off the write-side of shared doc stores to pave the path for DWPT to land on trunk and simplify IW/DW. We still must support reading them (until 5.0), but the read side is far less hairy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2500) A Linux-specific Directory impl that bypasses the buffer cache
[ https://issues.apache.org/jira/browse/LUCENE-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972855#action_12972855 ] Jason Rutherglen commented on LUCENE-2500: -- DirectIOLinuxDirectory is in trunk and works? Are we using it with segment merging yet? Perhaps a separate Jira issue? A Linux-specific Directory impl that bypasses the buffer cache -- Key: LUCENE-2500 URL: https://issues.apache.org/jira/browse/LUCENE-2500 Project: Lucene - Java Issue Type: Improvement Components: contrib/* Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2500.patch I've been testing how we could prevent Lucene's merges from evicting pages from the OS's buffer cache. I tried fadvise/madvise (via JNI) but (frustratingly), I could not get them to work (details at http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html). The only thing that worked was to use Linux's O_DIRECT flag, which forces all IO to bypass the buffer cache entirely... so I created a Linux-specific Directory impl to do this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2706 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2706/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestRemoteCachingWrapperFilter.testTermRemoteFilter Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1094) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1032) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:486) Build Log (for compile errors): [...truncated 5368 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2707 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2707/ 1 tests failed. FAILED: org.apache.lucene.search.TestRemoteCachingWrapperFilter.testTermRemoteFilter Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1094) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1032) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:486) Build Log (for compile errors): [...truncated 5354 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2708 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2708/ 1 tests failed. FAILED: org.apache.lucene.search.TestRemoteCachingWrapperFilter.testTermRemoteFilter Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1094) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1032) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:486) Build Log (for compile errors): [...truncated 5350 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2820) CMS fails to cleanly stop threads
CMS fails to cleanly stop threads - Key: LUCENE-2820 URL: https://issues.apache.org/jira/browse/LUCENE-2820 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1, 4.0 When you close IW, it waits for (or aborts and then waits for) all running merges. However, it's wait criteria is wrong -- it waits for the threads to be done w/ their merges, not for the threads to actually die. CMS already has a sync() method, to wait for running threads, which we can call from CMS.close. However it has a thread hazard because a MergeThread removes itself from mergeThreads before it actually exits. So sync() is able to return even while a merge thread is still running. This was uncovered by LUCENE-2819 on the test case TestCustomScoreQuery.testCustomExternalQuery, though I expect other test cases would show it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API
[ https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972869#action_12972869 ] Michael McCandless commented on LUCENE-2723: Looks good Yonik! Speed up Lucene's low level bulk postings read API -- Key: LUCENE-2723 URL: https://issues.apache.org/jira/browse/LUCENE-2723 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723_openEnum.patch, LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch Spinoff from LUCENE-1410. The flex DocsEnum has a simple bulk-read API that reads the next chunk of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR (from LUCENE-1410). This is not unlike sucking coffee through those tiny plastic coffee stirrers they hand out airplanes that, surprisingly, also happen to function as a straw. As a result we see no perf gain from using FOR/PFOR. I had hacked up a fix for this, described at in my blog post at http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html I'm opening this issue to get that work to a committable point. So... I've worked out a new bulk-read API to address performance bottleneck. It has some big changes over the current bulk-read API: * You can now also bulk-read positions (but not payloads), but, I have yet to cutover positional queries. * The buffer contains doc deltas, not absolute values, for docIDs and positions (freqs are absolute). * Deleted docs are not filtered out. * The doc freq buffers need not be aligned. For fixed intblock codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16, Group varint, etc.) they won't be. It's still a work in progress... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2822) TimeLimitingCollector starts thread in static {} with no way to stop them
TimeLimitingCollector starts thread in static {} with no way to stop them - Key: LUCENE-2822 URL: https://issues.apache.org/jira/browse/LUCENE-2822 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir See the comment in LuceneTestCase. If you even do Class.forName(TimeLimitingCollector) it starts up a thread in a static method, and there isn't a way to kill it. This is broken. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2821) FilterManager starts threads with no way to stop, and should be in contrib/remote, not core
FilterManager starts threads with no way to stop, and should be in contrib/remote, not core --- Key: LUCENE-2821 URL: https://issues.apache.org/jira/browse/LUCENE-2821 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir See the warning produced by contrib/remote's tests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2823) contrib/demo's tests leave threads running
contrib/demo's tests leave threads running -- Key: LUCENE-2823 URL: https://issues.apache.org/jira/browse/LUCENE-2823 Project: Lucene - Java Issue Type: Bug Components: Examples Reporter: Robert Muir contrib/demo for some reason parses html in a strange way with PipedInputStream and a separate thread. I don't understand why it needs to do this or be this complicated (its an example), and its tests leave rogue threads running. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2820) CMS fails to cleanly stop threads
[ https://issues.apache.org/jira/browse/LUCENE-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2820: --- Attachment: LUCENE-2820.patch Patch. I changed CMS.sync to .join() to any still-alive threads, and changed MergeThread to not remove itself from mergeThreads but rather updateMergeThread to prune any dead threads. CMS fails to cleanly stop threads - Key: LUCENE-2820 URL: https://issues.apache.org/jira/browse/LUCENE-2820 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2820.patch When you close IW, it waits for (or aborts and then waits for) all running merges. However, it's wait criteria is wrong -- it waits for the threads to be done w/ their merges, not for the threads to actually die. CMS already has a sync() method, to wait for running threads, which we can call from CMS.close. However it has a thread hazard because a MergeThread removes itself from mergeThreads before it actually exits. So sync() is able to return even while a merge thread is still running. This was uncovered by LUCENE-2819 on the test case TestCustomScoreQuery.testCustomExternalQuery, though I expect other test cases would show it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2291) JSONWriter.writeSolrDocument() does not respect its SetString returnFields parameter.
JSONWriter.writeSolrDocument() does not respect its SetString returnFields parameter. --- Key: SOLR-2291 URL: https://issues.apache.org/jira/browse/SOLR-2291 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.4.1 Reporter: Ahmet Arslan Priority: Minor When SolrDocumentList used instead of DocList in the response, (unlike XMLWriter), JSONWriter prints all existing fields of a SolrDocument. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2291) JSONWriter.writeSolrDocument() does not respect its SetString returnFields parameter.
[ https://issues.apache.org/jira/browse/SOLR-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated SOLR-2291: --- Attachment: SOLR-2291.patch JSONWriter.writeSolrDocument() does not respect its SetString returnFields parameter. --- Key: SOLR-2291 URL: https://issues.apache.org/jira/browse/SOLR-2291 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 1.4.1 Reporter: Ahmet Arslan Priority: Minor Attachments: SOLR-2291.patch When SolrDocumentList used instead of DocList in the response, (unlike XMLWriter), JSONWriter prints all existing fields of a SolrDocument. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
SolrPluginUtils.docListToSolrDocumentList loads all stored fields
Hello, Regardless of SetString fields parameter, SolrPluginUtils#docListToSolrDocumentList method loads all of the stored fields. Shouldn't it just load the fields given in the set? Should I file a jira ticket? When small bug in TestCase is seen what is the preffered way to inform it? Open an issue or tell here? Example: In SolrPluginUtilsTest.testDocListConversion method, for loop is not executed because list.size() = 0. commit should be inside the assertU(), and cmd.setLen() should be called. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2816) MMapDirectory speedups
[ https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972880#action_12972880 ] Robert Muir commented on LUCENE-2816: - committed revision 1050737. I'll wait a bit for branch_3x. MMapDirectory speedups -- Key: LUCENE-2816 URL: https://issues.apache.org/jira/browse/LUCENE-2816 Project: Lucene - Java Issue Type: Improvement Components: Store Affects Versions: 3.1, 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-2816.patch MMapDirectory has some performance problems: # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, which does a lot of unnecessary bounds-checks for its buffer-switching etc. Instead, like MMapIndexInput, it should rely upon the contract of these operations in ByteBuffer (which will do a bounds check always and throw BufferUnderflowException). Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and doing our own bounds checks just slows things down. # the readInt()/readLong()/readShort() are slow and should just defer to ByteBuffer.readInt(), etc This isn't very important since we don't much use these, but I think there's no reason users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + get an IntBuffer view when readInt() can be almost as fast... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?
[ https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2819. - Resolution: Fixed committed and merged to 3.x in 3.x i kept the test code in CMS (even though unused) as i dont trust the 3.0 backwards LuceneTestCase enough to handle the uncaught exceptions... i marked @deprecated for us to remove in 3.2, i think thats easiest. we should try to resolve some of the rogue thread issues so we can make this stuff actually fail instead of warn. LuceneTestCase's check for uncaught exceptions in threads causes collateral damage? --- Key: LUCENE-2819 URL: https://issues.apache.org/jira/browse/LUCENE-2819 Project: Lucene - Java Issue Type: Bug Components: Tests Reporter: Michael McCandless Fix For: 3.1, 4.0 Attachments: LUCENE-2819.patch, LUCENE-2819.patch, LUCENE-2819.patch, LUCENE-2819.patch Eg see these failures: https://hudson.apache.org/hudson/job/Lucene-3.x/214/ Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 1 test had a real failure but somehow our thread hit exc tracking incorrectly blames the other 3 cases? I'm not sure about this but it seems like something like that is going on... So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but if CMS had also hit a failure, then fails to clear CMS's thread failures. I think we should just remove CMS's thread failure tracking? (It's static so it can definitely bleed across tests). Ie, just rely on LuceneTestCase's tracking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org