Solr-trunk - Build # 1346 - Failure

2010-12-18 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-trunk/1346/

All tests passed

Build Log (for compile errors):
[...truncated 20163 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972758#action_12972758
 ] 

Michael McCandless commented on LUCENE-2694:


If I force scoring BQ rewrite for wildcard  prefix queries (ie set that 
rewrite mode and then relax BQ max clause count) I see healthy speedups 
(~23-27%) for these queries!  Great :)

While this doesn't happen w/ our default settings (ie these queries quickly 
cutover to constant filter rewrite), apps that change these defaults will see a 
gain, plus, the term cache (which today protects you) is terribly fragile 
since apps w/ many MTQ queries in flight can thrash that cache thus killing 
performance.  This patch prevents that entirely since MTQs do their own caching 
of the TermStates they need: awesome.

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Do we want 'nocommit' to fail the commit?

2010-12-18 Thread Michael McCandless
+1 this would be great :)

Mike

On Fri, Dec 17, 2010 at 10:45 PM, Shai Erera ser...@gmail.com wrote:
 Hi
 Out of curiosity, I searched if we can have a nocommit comment in the code
 fail the commit. As far as I see, we try to avoid accidental commits (of say
 debug messages) by putting a nocommit comment, but I don't know if svn ci
 would fail in the presence of such comment - I guess not because we've seen
 some accidental nocommits checked in already in the past.
 So I Googled around and found that if we have control of the svn repo, we
 can add a pre-commit hook that will check and fail the commit. Here is a
 nice article that explains how to add pre-commit hooks in general
 (http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't try
 it yet (on our local svn instance), so I cannot say how well it works, but
 perhaps someone has experience with it ...
 So if this is interesting, and is doable for Lucene (say, open a JIRA issue
 for Infra?) I don't mind investigating it further and write the script
 (which can be as simple as 'grep the changed files and fail on the presence
 of nocommit string').
 Shai

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2818) abort() method for IndexOutput

2010-12-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972763#action_12972763
 ] 

Michael McCandless commented on LUCENE-2818:


+1 I think this'd be a good simplification of IW/IR code.  I don't mind that IO 
would know how to delete the partial file it had created; that seems fair.

So eg CompoundFileWriter would abort its output file on hitting any exception.

I think we can make a default impl that simply closes  suppresses exceptions?  
(We can't .deleteFile since an abstract IO doesn't know its Dir).  Our concrete 
impls can override w/ versions that do delete the file...

 abort() method for IndexOutput
 --

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot

 I'd like to see abort() method on IndexOutput that silently (no exceptions) 
 closes IO and then does silent papaDir.deleteFile(this.fileName()).
 This will simplify a bunch of error recovery code for IndexWriter and 
 friends, but constitutes an API backcompat break.
 What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Do we want 'nocommit' to fail the commit?

2010-12-18 Thread Earwin Burrfoot
But. Er. What if we happen to have nocommit in a string, or in some
docs, or as a name of variable?

On Sat, Dec 18, 2010 at 12:47, Michael McCandless
luc...@mikemccandless.com wrote:
 +1 this would be great :)

 Mike

 On Fri, Dec 17, 2010 at 10:45 PM, Shai Erera ser...@gmail.com wrote:
 Hi
 Out of curiosity, I searched if we can have a nocommit comment in the code
 fail the commit. As far as I see, we try to avoid accidental commits (of say
 debug messages) by putting a nocommit comment, but I don't know if svn ci
 would fail in the presence of such comment - I guess not because we've seen
 some accidental nocommits checked in already in the past.
 So I Googled around and found that if we have control of the svn repo, we
 can add a pre-commit hook that will check and fail the commit. Here is a
 nice article that explains how to add pre-commit hooks in general
 (http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't try
 it yet (on our local svn instance), so I cannot say how well it works, but
 perhaps someone has experience with it ...
 So if this is interesting, and is doable for Lucene (say, open a JIRA issue
 for Infra?) I don't mind investigating it further and write the script
 (which can be as simple as 'grep the changed files and fail on the presence
 of nocommit string').
 Shai

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Do we want 'nocommit' to fail the commit?

2010-12-18 Thread Uwe Schindler
I like this idea, too. But I think we have no control on this, it would be as 
complicated as the mergeprops...

What we have: Hudson halfly hour builds fail when svn contains commits, so you 
see it latest 30 Min later.

Uwe



Shai Erera ser...@gmail.com schrieb:

Hi

Out of curiosity, I searched if we can have a nocommit comment in the
code
fail the commit. As far as I see, we try to avoid accidental commits
(of say
debug messages) by putting a nocommit comment, but I don't know if svn
ci
would fail in the presence of such comment - I guess not because we've
seen
some accidental nocommits checked in already in the past.

So I Googled around and found that if we have control of the svn repo,
we
can add a pre-commit hook that will check and fail the commit. Here is
a
nice article that explains how to add pre-commit hooks in general (
http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't
try
it yet (on our local svn instance), so I cannot say how well it works,
but
perhaps someone has experience with it ...

So if this is interesting, and is doable for Lucene (say, open a JIRA
issue
for Infra?) I don't mind investigating it further and write the script
(which can be as simple as 'grep the changed files and fail on the
presence
of nocommit string').

Shai

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2818) abort() method for IndexOutput

2010-12-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972764#action_12972764
 ] 

Earwin Burrfoot commented on LUCENE-2818:
-

bq. Can abort() have a default impl in IndexOutput, such as close() followed by 
deleteFile() maybe? If so, then it won't break anything.
It can't. To call deleteFile you need both a reference to papa-Directory and a 
name of the file this IO writes to. Abstract IO class has neither. If we add 
them, they have to be passed to a new constructor, and that's an API break ;)

bq. Would abort() on Directory fit better? E.g., it can abort all currently 
open and modified files, instead of the caller calling abort() on each 
IndexOutput? Are you thinking of a case where a write failed, and the caller 
would call abort() immediately, instead of some higher-level code? If so, would 
rollback() be a better name?
Oh, no, no. No way. I don't want to push someone else's responsibility on 
Directory. This abort() is merely a shortcut.

Let's go with a usage example:
Here's FieldsWriter.java with LUCENE-2814 applied (skipping irrelevant parts) - 
https://gist.github.com/746358
Now, the same, with abort() - https://gist.github.com/746367

 abort() method for IndexOutput
 --

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot

 I'd like to see abort() method on IndexOutput that silently (no exceptions) 
 closes IO and then does silent papaDir.deleteFile(this.fileName()).
 This will simplify a bunch of error recovery code for IndexWriter and 
 friends, but constitutes an API backcompat break.
 What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2818) abort() method for IndexOutput

2010-12-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972765#action_12972765
 ] 

Earwin Burrfoot commented on LUCENE-2818:
-

bq. I think we can make a default impl that simply closes  suppresses 
exceptions? (We can't .deleteFile since an abstract IO doesn't know its Dir). 
Our concrete impls can override w/ versions that do delete the file...
I don't think we need a default impl? For some directory impls close() is a 
noop + what is more important, having abstract method forces you to implement 
it, you can't forget this, so we're not gonna see broken directories that don't 
do abort() properly.

 abort() method for IndexOutput
 --

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot

 I'd like to see abort() method on IndexOutput that silently (no exceptions) 
 closes IO and then does silent papaDir.deleteFile(this.fileName()).
 This will simplify a bunch of error recovery code for IndexWriter and 
 friends, but constitutes an API backcompat break.
 What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2818) abort() method for IndexOutput

2010-12-18 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2818:


Priority: Minor  (was: Major)

This change is really minor, but I think, convinient.

You don't have to lug reference to Directory along, and recalculate the file 
name, if the only thing you want to say is that write was a failure and you no 
longer need this file.

 abort() method for IndexOutput
 --

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Priority: Minor

 I'd like to see abort() method on IndexOutput that silently (no exceptions) 
 closes IO and then does silent papaDir.deleteFile(this.fileName()).
 This will simplify a bunch of error recovery code for IndexWriter and 
 friends, but constitutes an API backcompat break.
 What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972767#action_12972767
 ] 

Michael McCandless commented on LUCENE-2814:


Patch looks great!  Nice work Earwin.  I think it's ready to commit.

Except, can you resync to trunk?  I hit failures applying one hunk to
DW.java.

Also, on the nocommit on exc in DW.addDocument, yes I think that
(IFD.deleteNewFiles, not checkpoint) is still needed because DW can
orphan the store files on abort?

Or: we could fix DW.abort to directly call Dir.deleteFile (instead of
relying on IFD.deleteNewFiles).  Ie, w/ no shared doc stores, these
files should never have been registered w/ IFD so they can be
privately managed by DW.

But, if we end up leaving the delete up above, we should put the
docWriter null check back so silly apps that close IW while still
indexing don't get NPEs.

I'm not looking forward to the 3.x back port!!


 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
 LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972769#action_12972769
 ] 

Uwe Schindler commented on LUCENE-2694:
---

I have also some things:
- We currently don't support seeking a FilteredTermsEnum, this is disallowed by 
UnsupportedOperationException (we may change this, but its complicated, Robert 
and me are thinking about it, but for now its disallowed, as it would break the 
enum logic). So the TermState seek method in FilteredTermsEnum should also 
throw UOE:
{code}
/** This enum does not support seeking!
 * @throws UnsupportedOperationException
 */
@Override
public SeekStatus seek(BytesRef term, boolean useCache) throws IOException {
  throw new UnsupportedOperationException(getClass().getName()+ does not 
support seeking);
}
{code}
- For what is setNextReader in TermCollector? I don't like that, but you seems 
to need it for the PerReaderTermState. The collector should really only work on 
the enum not on any reader.

Thats what I have seen on first patch review, will now apply patch and look 
closer into it :-) But the first point is important, FilteredTermsEnum 
currently should not support seeking.

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-18 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972769#action_12972769
 ] 

Uwe Schindler edited comment on LUCENE-2694 at 12/18/10 5:43 AM:
-

I have also some things:
- We currently don't support seeking a FilteredTermsEnum, this is disallowed by 
UnsupportedOperationException (we may change this, but its complicated, Robert 
and me are thinking about it, but for now its disallowed, as it would break the 
enum logic). So the TermState seek method in FilteredTermsEnum should also 
throw UOE:
{code}
/** This enum does not support seeking!
 * @throws UnsupportedOperationException
 */
@Override
public SeekStatus seek(BytesRef term, boolean useCache) throws IOException {
  throw new UnsupportedOperationException(getClass().getName()+ does not 
support seeking);
}
{code}
- Additionally, can the next() implementation in FilteredTermsEnum use 
TermState? It does lots of seeking on the underlying (filtered) TermsEnum. This 
is the reason why sekking on the FilteredTermsEnum is not allowed. Filtering is 
done here on the accept() methods.
- For what is setNextReader in TermCollector? I don't like that, but you seems 
to need it for the PerReaderTermState. The collector should really only work on 
the enum not on any reader. At least the

Thats what I have seen on first patch review, will now apply patch and look 
closer into it :-) But the first point is important, FilteredTermsEnum 
currently should not support seeking.

  was (Author: thetaphi):
I have also some things:
- We currently don't support seeking a FilteredTermsEnum, this is disallowed by 
UnsupportedOperationException (we may change this, but its complicated, Robert 
and me are thinking about it, but for now its disallowed, as it would break the 
enum logic). So the TermState seek method in FilteredTermsEnum should also 
throw UOE:
{code}
/** This enum does not support seeking!
 * @throws UnsupportedOperationException
 */
@Override
public SeekStatus seek(BytesRef term, boolean useCache) throws IOException {
  throw new UnsupportedOperationException(getClass().getName()+ does not 
support seeking);
}
{code}
- For what is setNextReader in TermCollector? I don't like that, but you seems 
to need it for the PerReaderTermState. The collector should really only work on 
the enum not on any reader.

Thats what I have seen on first patch review, will now apply patch and look 
closer into it :-) But the first point is important, FilteredTermsEnum 
currently should not support seeking.
  
 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Do we want 'nocommit' to fail the commit?

2010-12-18 Thread Shai Erera
I haven't seen nocommit in the code, neither as String nor as member. But we
can decide that we do @nocommit@ or something, which is less likely to be
contained in code :).

Uwe, I didn't understand your response - do you mean that if the code
contains a 'nocommit' in any of the .java files, Hudson will fail?

If we have no control on svn, we can create a special unit test that asserts
exactly that. If you run your tests before commit (as you should :)), it
will be detected. If not, Hudson will detect it (that is, unless it already
somehow detects it).

Shai

On Sat, Dec 18, 2010 at 11:56 AM, Uwe Schindler u...@thetaphi.de wrote:

 I like this idea, too. But I think we have no control on this, it would be
 as complicated as the mergeprops...

 What we have: Hudson halfly hour builds fail when svn contains commits, so
 you see it latest 30 Min later.

 Uwe



 Shai Erera ser...@gmail.com schrieb:

 Hi
 
 Out of curiosity, I searched if we can have a nocommit comment in the
 code
 fail the commit. As far as I see, we try to avoid accidental commits
 (of say
 debug messages) by putting a nocommit comment, but I don't know if svn
 ci
 would fail in the presence of such comment - I guess not because we've
 seen
 some accidental nocommits checked in already in the past.
 
 So I Googled around and found that if we have control of the svn repo,
 we
 can add a pre-commit hook that will check and fail the commit. Here is
 a
 nice article that explains how to add pre-commit hooks in general (
 http://wordaligned.org/articles/a-subversion-pre-commit-hook). I didn't
 try
 it yet (on our local svn instance), so I cannot say how well it works,
 but
 perhaps someone has experience with it ...
 
 So if this is interesting, and is doable for Lucene (say, open a JIRA
 issue
 for Infra?) I don't mind investigating it further and write the script
 (which can be as simple as 'grep the changed files and fail on the
 presence
 of nocommit string').
 
 Shai

 --
 Uwe Schindler
 H.-H.-Meier-Allee 63, 28213 Bremen
 http://www.thetaphi.de

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] Updated: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-18 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2814:


Attachment: LUCENE-2814.patch

Synced to trunk.

bq. Also, on the nocommit on exc in DW.addDocument, yes I think that 
(IFD.deleteNewFiles, not checkpoint) is still needed because DW can orphan the 
store files on abort?
Orphaned files are deleted directly in StoredFieldsWriter.abort() and 
TermVectorsTermsWriter.abort(). As I said - all the open files tracking is now 
gone.
Turns out checkpoint() is also no longer needed.

I have no other lingering cleanup urges, this is ready to be committed. I think.

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
 LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2818) abort() method for IndexOutput

2010-12-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972772#action_12972772
 ] 

Shai Erera commented on LUCENE-2818:


I offered a default impl just to not break the API. I don't think a default 
impl is a good option. If we're ok making an exception for 3x as well (I know I 
am), then I don't think we should have a default impl.

 abort() method for IndexOutput
 --

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Priority: Minor

 I'd like to see abort() method on IndexOutput that silently (no exceptions) 
 closes IO and then does silent papaDir.deleteFile(this.fileName()).
 This will simplify a bunch of error recovery code for IndexWriter and 
 friends, but constitutes an API backcompat break.
 What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?

2010-12-18 Thread Michael McCandless (JIRA)
LuceneTestCase's check for uncaught exceptions in threads causes collateral 
damage?
---

 Key: LUCENE-2819
 URL: https://issues.apache.org/jira/browse/LUCENE-2819
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
Reporter: Michael McCandless
 Fix For: 3.1, 4.0


Eg see these failures:

https://hudson.apache.org/hudson/job/Lucene-3.x/214/

Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 1 
test had a real failure but somehow our thread hit exc tracking incorrectly 
blames the other 3 cases?

I'm not sure about this but it seems like something like that is going on...

So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but 
if CMS had also hit a failure, then fails to clear CMS's thread failures.  I 
think we should just remove CMS's thread failure tracking?  (It's static so it 
can definitely bleed across tests).  Ie, just rely on LuceneTestCase's tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?

2010-12-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972776#action_12972776
 ] 

Robert Muir commented on LUCENE-2819:
-

I think this is the problem: lets say the main thread spawns 3 other threads 
(A,B,C).
when A throws exception, our uncaught exception handler calls the test to fail.

There is nothing wrong with this... the problem in your example is i think B 
and C are still running and then fail later (even if its just a few ms)
So these get 'misattributed' to the next test method... we can't do anything 
about that either without doing insane amounts of buffering.

So we need to improve the thread handling in general for the tests.


 LuceneTestCase's check for uncaught exceptions in threads causes collateral 
 damage?
 ---

 Key: LUCENE-2819
 URL: https://issues.apache.org/jira/browse/LUCENE-2819
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
Reporter: Michael McCandless
 Fix For: 3.1, 4.0


 Eg see these failures:
 https://hudson.apache.org/hudson/job/Lucene-3.x/214/
 Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 
 1 test had a real failure but somehow our thread hit exc tracking 
 incorrectly blames the other 3 cases?
 I'm not sure about this but it seems like something like that is going on...
 So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but 
 if CMS had also hit a failure, then fails to clear CMS's thread failures.  I 
 think we should just remove CMS's thread failure tracking?  (It's static so 
 it can definitely bleed across tests).  Ie, just rely on LuceneTestCase's 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

2010-12-18 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2694:
--

Attachment: LUCENE-2694-FTE.patch

Here just the patch for a correct behaving FilteredTermsEnum (according to 
docs, that it does currently not support seeking). The assert is also not 
needed, as tenum is guranteed to be not null (its final and ctor already 
asserts this) :-)

 MTQ rewrite + weight/scorer init should be single pass
 --

 Key: LUCENE-2694
 URL: https://issues.apache.org/jira/browse/LUCENE-2694
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, 
 LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch


 Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
 Once we fix MTQ rewrite to be per-segment, we should take it further and make 
 weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?

2010-12-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2819:
---

Attachment: LUCENE-2819.patch

Attaching current patch; includes lots of noise and does not work yet!!  (I 
still see collateral damage).

 LuceneTestCase's check for uncaught exceptions in threads causes collateral 
 damage?
 ---

 Key: LUCENE-2819
 URL: https://issues.apache.org/jira/browse/LUCENE-2819
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2819.patch


 Eg see these failures:
 https://hudson.apache.org/hudson/job/Lucene-3.x/214/
 Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 
 1 test had a real failure but somehow our thread hit exc tracking 
 incorrectly blames the other 3 cases?
 I'm not sure about this but it seems like something like that is going on...
 So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but 
 if CMS had also hit a failure, then fails to clear CMS's thread failures.  I 
 think we should just remove CMS's thread failure tracking?  (It's static so 
 it can definitely bleed across tests).  Ie, just rely on LuceneTestCase's 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 2691 - Failure

2010-12-18 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2691/

4 tests failed.
FAILED:  
org.apache.solr.util.SolrPluginUtilsTest.testAddToNamedListPrimitiveTypes

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at java.lang.Thread.run(Thread.java:636)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:
Could not get the port for ZooKeeper server

Stack Trace:
java.lang.RuntimeException: Could not get the port for ZooKeeper server
at org.apache.solr.cloud.ZkTestServer.run(ZkTestServer.java:216)
at 
org.apache.solr.cloud.AbstractZkTestCase.azt_beforeClass(AbstractZkTestCase.java:56)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicZkTest

Error Message:
null

Stack Trace:
java.lang.NullPointerException
at 
org.apache.solr.cloud.ZkTestServer$ZKServerMain.shutdown(ZkTestServer.java:111)
at org.apache.solr.cloud.ZkTestServer.shutdown(ZkTestServer.java:227)
at 
org.apache.solr.cloud.AbstractZkTestCase.azt_afterClass(AbstractZkTestCase.java:112)


FAILED:  TEST-org.apache.solr.core.AlternateDirectoryTest.xml.init

Error Message:


Stack Trace:
Test report file 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/test-results/TEST-org.apache.solr.core.AlternateDirectoryTest.xml
 was length 0



Build Log (for compile errors):
[...truncated 8658 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?

2010-12-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2819:


Attachment: LUCENE-2819.patch

I worked on mike's patch a bit... here's an updated version.

I think lucenetestcase is ok, but there are tests that need fixing.

For example TestParallelMultiSearcher doesn't close() its searcher, so its 
executor never gets shutdown.
because of this the test now fails.

 LuceneTestCase's check for uncaught exceptions in threads causes collateral 
 damage?
 ---

 Key: LUCENE-2819
 URL: https://issues.apache.org/jira/browse/LUCENE-2819
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2819.patch, LUCENE-2819.patch


 Eg see these failures:
 https://hudson.apache.org/hudson/job/Lucene-3.x/214/
 Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 
 1 test had a real failure but somehow our thread hit exc tracking 
 incorrectly blames the other 3 cases?
 I'm not sure about this but it seems like something like that is going on...
 So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but 
 if CMS had also hit a failure, then fails to clear CMS's thread failures.  I 
 think we should just remove CMS's thread failure tracking?  (It's static so 
 it can definitely bleed across tests).  Ie, just rely on LuceneTestCase's 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2818) abort() method for IndexOutput

2010-12-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972816#action_12972816
 ] 

Michael McCandless commented on LUCENE-2818:


I think a bw compat exception is fine too!

 abort() method for IndexOutput
 --

 Key: LUCENE-2818
 URL: https://issues.apache.org/jira/browse/LUCENE-2818
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Priority: Minor

 I'd like to see abort() method on IndexOutput that silently (no exceptions) 
 closes IO and then does silent papaDir.deleteFile(this.fileName()).
 This will simplify a bunch of error recovery code for IndexWriter and 
 friends, but constitutes an API backcompat break.
 What do you think?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2290) the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr

2010-12-18 Thread Tom Burton-West (JIRA)
the termsInfosDivisor for readers opened by indexWriter should be configurable 
in Solr
--

 Key: SOLR-2290
 URL: https://issues.apache.org/jira/browse/SOLR-2290
 Project: Solr
  Issue Type: New Feature
Reporter: Tom Burton-West
Priority: Minor


Solr allows users to set the termInfosIndexDivisor used by the  indexReader 
during search time  in solrconfig.xml, but not in the  indexReader opened by 
the IndexWriter when indexing/merging.

When dealing with an index with a large number of unique terms, setting the 
termInfosIndexDivisor at search time is helpful in  reducing memory use.  It 
would also be helpful in reducing memory use during indexing/merging if it was 
made configurable for indexReaders opened by indexWriter during 
indexing/merging.

This thread contains some background:
http://www.lucidimagination.com/search/document/b5c756a366e1a0d6/memory_use_during_merges_oom

In the Lucene 3.x branch it looks like this is done in 
IndexWriterConfig.setReaderTermsIndexDivisor, although there is also this 
method signature in IndexWriter.java: IndexReader getReader(int 
termInfosIndexDivisor)

  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Lucene-3.x - Build # 214 - Failure

2010-12-18 Thread Michael McCandless
I committed a fix for this.

I think there was actually only one failure, which cascaded due still
running threads spilling over to other test methods (LUCENE-2819).

The one failure was caused by LUCENE-2811 (SI tracks hasVectors) in
addIndexes(Directory[]); we were failing to copy over the vector files
in the case where the first segment to share a doc store did not have
vectors but a later segment sharing the same doc stores did...

Mike

On Fri, Dec 17, 2010 at 6:22 PM, Apache Hudson Server
hud...@hudson.apache.org wrote:
 Build: https://hudson.apache.org/hudson/job/Lucene-3.x/214/

 4 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull

 Error Message:
 addIndexes(Directory[]) + optimize() hit IOException after disk space was 
 freed up

 Stack Trace:
 junit.framework.AssertionFailedError: addIndexes(Directory[]) + optimize() 
 hit IOException after disk space was freed up
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
        at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddIndexOnDiskFull(TestIndexWriterOnDiskFull.java:323)


 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testCorruptionAfterDiskFullDuringMerge

 Error Message:
 Some threads threw uncaught exceptions!

 Stack Trace:
 junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:354)


 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testImmediateDiskFull

 Error Message:
 ConcurrentMergeScheduler hit unhandled exceptions

 Stack Trace:
 junit.framework.AssertionFailedError: ConcurrentMergeScheduler hit unhandled 
 exceptions
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:375)


 REGRESSION:  org.apache.lucene.index.TestIndexWriterOnJRECrash.testNRTThreads

 Error Message:
 Some threads threw uncaught exceptions!

 Stack Trace:
 junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:891)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:829)
        at 
 org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:354)




 Build Log (for compile errors):
 [...truncated 6950 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?

2010-12-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2819:


Attachment: LUCENE-2819.patch

here's an updated patch, I think its much better.
The core tests are passing but still need to do contrib/solr.

Some problems i found, were having to 'actually close' the executorservices 
because ParallelMultiShredder doesnt wait for the shutdown to actually happen 
in its close().

Also the TimeLimitingCollector creates a new thread...statically! This just 
seems really evil.

I don't think tests should be creating threads and not cleaning up after 
themselves!

You might also ask why even bother killing the the threads if we will fail 
anyway? 
True we will already fail the test in this case, but this is just to try to
prevent the fails from being attributed to other test cases (the original 
problem here).


 LuceneTestCase's check for uncaught exceptions in threads causes collateral 
 damage?
 ---

 Key: LUCENE-2819
 URL: https://issues.apache.org/jira/browse/LUCENE-2819
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2819.patch, LUCENE-2819.patch, LUCENE-2819.patch


 Eg see these failures:
 https://hudson.apache.org/hudson/job/Lucene-3.x/214/
 Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 
 1 test had a real failure but somehow our thread hit exc tracking 
 incorrectly blames the other 3 cases?
 I'm not sure about this but it seems like something like that is going on...
 So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but 
 if CMS had also hit a failure, then fails to clear CMS's thread failures.  I 
 think we should just remove CMS's thread failure tracking?  (It's static so 
 it can definitely bleed across tests).  Ie, just rely on LuceneTestCase's 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

2010-12-18 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2723:
-

Attachment: LUCENE-2723_openEnum.patch

Here's a small patch that may be sufficient to enable dropping down to 
per-segment work while still using MultiTerms/MultiTermsEnum to traverse terms 
in order.  It basically makes the TermsEnumWithSlice members public, and adds a 
bulkPostings member for reuse.

Is this the right approach?

 Speed up Lucene's low level bulk postings read API
 --

 Key: LUCENE-2723
 URL: https://issues.apache.org/jira/browse/LUCENE-2723
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2723-termscorer.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
 LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723.patch, LUCENE-2723_openEnum.patch, LUCENE-2723_termscorer.patch, 
 LUCENE-2723_wastedint.patch


 Spinoff from LUCENE-1410.
 The flex DocsEnum has a simple bulk-read API that reads the next chunk
 of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
 (from LUCENE-1410).  This is not unlike sucking coffee through those
 tiny plastic coffee stirrers they hand out airplanes that,
 surprisingly, also happen to function as a straw.
 As a result we see no perf gain from using FOR/PFOR.
 I had hacked up a fix for this, described at in my blog post at
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
 I'm opening this issue to get that work to a committable point.
 So... I've worked out a new bulk-read API to address performance
 bottleneck.  It has some big changes over the current bulk-read API:
   * You can now also bulk-read positions (but not payloads), but, I
  have yet to cutover positional queries.
   * The buffer contains doc deltas, not absolute values, for docIDs
 and positions (freqs are absolute).
   * Deleted docs are not filtered out.
   * The doc  freq buffers need not be aligned.  For fixed intblock
 codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
 Group varint, etc.) they won't be.
 It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972842#action_12972842
 ] 

Michael McCandless commented on LUCENE-2814:


OK I committed to trunk.  I'll let this bake for a while on trunk before 
backporting to 3.x...

Thanks Earwin!

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
 LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2290) the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr

2010-12-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972847#action_12972847
 ] 

Jason Rutherglen commented on SOLR-2290:


Tom, I think this can be generified to use SOLR-1447's property injection into 
IWC.

 the termsInfosDivisor for readers opened by indexWriter should be 
 configurable in Solr
 --

 Key: SOLR-2290
 URL: https://issues.apache.org/jira/browse/SOLR-2290
 Project: Solr
  Issue Type: New Feature
Reporter: Tom Burton-West
Priority: Minor

 Solr allows users to set the termInfosIndexDivisor used by the  indexReader 
 during search time  in solrconfig.xml, but not in the  indexReader opened by 
 the IndexWriter when indexing/merging.
 When dealing with an index with a large number of unique terms, setting the 
 termInfosIndexDivisor at search time is helpful in  reducing memory use.  It 
 would also be helpful in reducing memory use during indexing/merging if it 
 was made configurable for indexReaders opened by indexWriter during 
 indexing/merging.
 This thread contains some background:
 http://www.lucidimagination.com/search/document/b5c756a366e1a0d6/memory_use_during_merges_oom
 In the Lucene 3.x branch it looks like this is done in 
 IndexWriterConfig.setReaderTermsIndexDivisor, although there is also this 
 method signature in IndexWriter.java: IndexReader getReader(int 
 termInfosIndexDivisor)
   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2814) stop writing shared doc stores across segments

2010-12-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972850#action_12972850
 ] 

Jason Rutherglen commented on LUCENE-2814:
--

bq. backporting to 3.x... 

Out of curiosity, why are we backporting to 3.x or are we planning on also 
backporting the DWPT branch?

 stop writing shared doc stores across segments
 --

 Key: LUCENE-2814
 URL: https://issues.apache.org/jira/browse/LUCENE-2814
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 3.1, 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-2814.patch, LUCENE-2814.patch, LUCENE-2814.patch, 
 LUCENE-2814.patch, LUCENE-2814.patch


 Shared doc stores enables the files for stored fields and term vectors to be 
 shared across multiple segments.  We've had this optimization since 2.1 I 
 think.
 It works best against a new index, where you open an IW, add lots of docs, 
 and then close it.  In that case all of the written segments will reference 
 slices a single shared doc store segment.
 This was a good optimization because it means we never need to merge these 
 files.  But, when you open another IW on that index, it writes a new set of 
 doc stores, and then whenever merges take place across doc stores, they must 
 now be merged.
 However, since we switched to shared doc stores, there have been two 
 optimizations for merging the stores.  First, we now bulk-copy the bytes in 
 these files if the field name/number assignment is congruent.  Second, we 
 now force congruent field name/number mapping in IndexWriter.  This means 
 this optimization is much less potent than it used to be.
 Furthermore, the optimization adds *a lot* of hair to 
 IndexWriter/DocumentsWriter; this has been the source of sneaky bugs over 
 time, and causes odd behavior like a merge possibly forcing a flush when it 
 starts.  Finally, with DWPT (LUCENE-2324), which gets us truly concurrent 
 flushing, we can no longer share doc stores.
 So, I think we should turn off the write-side of shared doc stores to pave 
 the path for DWPT to land on trunk and simplify IW/DW.  We still must support 
 reading them (until 5.0), but the read side is far less hairy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2500) A Linux-specific Directory impl that bypasses the buffer cache

2010-12-18 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972855#action_12972855
 ] 

Jason Rutherglen commented on LUCENE-2500:
--

DirectIOLinuxDirectory is in trunk and works?  Are we using it with segment 
merging yet?  Perhaps a separate Jira issue?

 A Linux-specific Directory impl that bypasses the buffer cache
 --

 Key: LUCENE-2500
 URL: https://issues.apache.org/jira/browse/LUCENE-2500
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2500.patch


 I've been testing how we could prevent Lucene's merges from evicting
 pages from the OS's buffer cache.  I tried fadvise/madvise (via JNI)
 but (frustratingly), I could not get them to work (details at
 http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html).
 The only thing that worked was to use Linux's O_DIRECT flag, which
 forces all IO to bypass the buffer cache entirely... so I created a
 Linux-specific Directory impl to do this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 2706 - Failure

2010-12-18 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2706/

1 tests failed.
REGRESSION:  
org.apache.lucene.search.TestRemoteCachingWrapperFilter.testTermRemoteFilter

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1094)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1032)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:486)




Build Log (for compile errors):
[...truncated 5368 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 2707 - Still Failing

2010-12-18 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2707/

1 tests failed.
FAILED:  
org.apache.lucene.search.TestRemoteCachingWrapperFilter.testTermRemoteFilter

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1094)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1032)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:486)




Build Log (for compile errors):
[...truncated 5354 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-trunk - Build # 2708 - Still Failing

2010-12-18 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2708/

1 tests failed.
FAILED:  
org.apache.lucene.search.TestRemoteCachingWrapperFilter.testTermRemoteFilter

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1094)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1032)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:486)




Build Log (for compile errors):
[...truncated 5350 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2820) CMS fails to cleanly stop threads

2010-12-18 Thread Michael McCandless (JIRA)
CMS fails to cleanly stop threads
-

 Key: LUCENE-2820
 URL: https://issues.apache.org/jira/browse/LUCENE-2820
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0


When you close IW, it waits for (or aborts and then waits for) all running 
merges.

However, it's wait criteria is wrong -- it waits for the threads to be done w/ 
their merges, not for the threads to actually die.

CMS already has a sync() method, to wait for running threads, which we can call 
from CMS.close.  However it has a thread hazard because a MergeThread removes 
itself from mergeThreads before it actually exits.  So sync() is able to return 
even while a merge thread is still running.

This was uncovered by LUCENE-2819 on the test case 
TestCustomScoreQuery.testCustomExternalQuery, though I expect other test cases 
would show it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

2010-12-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972869#action_12972869
 ] 

Michael McCandless commented on LUCENE-2723:


Looks good Yonik!

 Speed up Lucene's low level bulk postings read API
 --

 Key: LUCENE-2723
 URL: https://issues.apache.org/jira/browse/LUCENE-2723
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2723-termscorer.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
 LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723.patch, LUCENE-2723_openEnum.patch, LUCENE-2723_termscorer.patch, 
 LUCENE-2723_wastedint.patch


 Spinoff from LUCENE-1410.
 The flex DocsEnum has a simple bulk-read API that reads the next chunk
 of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
 (from LUCENE-1410).  This is not unlike sucking coffee through those
 tiny plastic coffee stirrers they hand out airplanes that,
 surprisingly, also happen to function as a straw.
 As a result we see no perf gain from using FOR/PFOR.
 I had hacked up a fix for this, described at in my blog post at
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
 I'm opening this issue to get that work to a committable point.
 So... I've worked out a new bulk-read API to address performance
 bottleneck.  It has some big changes over the current bulk-read API:
   * You can now also bulk-read positions (but not payloads), but, I
  have yet to cutover positional queries.
   * The buffer contains doc deltas, not absolute values, for docIDs
 and positions (freqs are absolute).
   * Deleted docs are not filtered out.
   * The doc  freq buffers need not be aligned.  For fixed intblock
 codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
 Group varint, etc.) they won't be.
 It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2822) TimeLimitingCollector starts thread in static {} with no way to stop them

2010-12-18 Thread Robert Muir (JIRA)
TimeLimitingCollector starts thread in static {} with no way to stop them
-

 Key: LUCENE-2822
 URL: https://issues.apache.org/jira/browse/LUCENE-2822
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


See the comment in LuceneTestCase.

If you even do Class.forName(TimeLimitingCollector) it starts up a thread in 
a static method, and there isn't a way to kill it.

This is broken.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2821) FilterManager starts threads with no way to stop, and should be in contrib/remote, not core

2010-12-18 Thread Robert Muir (JIRA)
FilterManager starts threads with no way to stop, and should be in 
contrib/remote, not core
---

 Key: LUCENE-2821
 URL: https://issues.apache.org/jira/browse/LUCENE-2821
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


See the warning produced by contrib/remote's tests.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2823) contrib/demo's tests leave threads running

2010-12-18 Thread Robert Muir (JIRA)
contrib/demo's tests leave threads running
--

 Key: LUCENE-2823
 URL: https://issues.apache.org/jira/browse/LUCENE-2823
 Project: Lucene - Java
  Issue Type: Bug
  Components: Examples
Reporter: Robert Muir


contrib/demo for some reason parses html in a strange way with PipedInputStream 
and a separate thread.
I don't understand why it needs to do this or be this complicated (its an 
example), and its tests leave rogue threads running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2820) CMS fails to cleanly stop threads

2010-12-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2820:
---

Attachment: LUCENE-2820.patch

Patch.

I changed CMS.sync to .join() to any still-alive threads, and changed 
MergeThread to not remove itself from mergeThreads but rather updateMergeThread 
to prune any dead threads.

 CMS fails to cleanly stop threads
 -

 Key: LUCENE-2820
 URL: https://issues.apache.org/jira/browse/LUCENE-2820
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2820.patch


 When you close IW, it waits for (or aborts and then waits for) all running 
 merges.
 However, it's wait criteria is wrong -- it waits for the threads to be done 
 w/ their merges, not for the threads to actually die.
 CMS already has a sync() method, to wait for running threads, which we can 
 call from CMS.close.  However it has a thread hazard because a MergeThread 
 removes itself from mergeThreads before it actually exits.  So sync() is able 
 to return even while a merge thread is still running.
 This was uncovered by LUCENE-2819 on the test case 
 TestCustomScoreQuery.testCustomExternalQuery, though I expect other test 
 cases would show it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2291) JSONWriter.writeSolrDocument() does not respect its SetString returnFields parameter.

2010-12-18 Thread Ahmet Arslan (JIRA)
JSONWriter.writeSolrDocument() does not respect its SetString returnFields 
parameter.
---

 Key: SOLR-2291
 URL: https://issues.apache.org/jira/browse/SOLR-2291
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.4.1
Reporter: Ahmet Arslan
Priority: Minor


When SolrDocumentList used instead of DocList in the response, (unlike 
XMLWriter), JSONWriter prints all existing fields of a SolrDocument.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2291) JSONWriter.writeSolrDocument() does not respect its SetString returnFields parameter.

2010-12-18 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-2291:
---

Attachment: SOLR-2291.patch

 JSONWriter.writeSolrDocument() does not respect its SetString returnFields 
 parameter.
 ---

 Key: SOLR-2291
 URL: https://issues.apache.org/jira/browse/SOLR-2291
 Project: Solr
  Issue Type: Bug
  Components: Response Writers
Affects Versions: 1.4.1
Reporter: Ahmet Arslan
Priority: Minor
 Attachments: SOLR-2291.patch


 When SolrDocumentList used instead of DocList in the response, (unlike 
 XMLWriter), JSONWriter prints all existing fields of a SolrDocument.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



SolrPluginUtils.docListToSolrDocumentList loads all stored fields

2010-12-18 Thread Ahmet Arslan
Hello,

Regardless of SetString fields parameter, 
SolrPluginUtils#docListToSolrDocumentList method loads all of the stored 
fields. Shouldn't it just load the fields given in the set? Should I file a 
jira ticket?

When small bug in TestCase is seen what is the preffered way to inform it? Open 
an issue or tell here?
Example: In SolrPluginUtilsTest.testDocListConversion method, for loop is not 
executed because list.size() = 0.
commit should be inside the assertU(), and cmd.setLen() should be called.



  

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2816) MMapDirectory speedups

2010-12-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12972880#action_12972880
 ] 

Robert Muir commented on LUCENE-2816:
-

committed revision 1050737. I'll wait a bit for branch_3x.

 MMapDirectory speedups
 --

 Key: LUCENE-2816
 URL: https://issues.apache.org/jira/browse/LUCENE-2816
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 3.1, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2816.patch


 MMapDirectory has some performance problems:
 # When the file is larger than Integer.MAX_VALUE, we use MultiMMapIndexInput, 
 which does a lot of unnecessary bounds-checks for its buffer-switching etc. 
 Instead, like MMapIndexInput, it should rely upon the contract of these 
 operations
 in ByteBuffer (which will do a bounds check always and throw 
 BufferUnderflowException).
 Our 'buffer' is so large (Integer.MAX_VALUE) that its rare this happens and 
 doing
 our own bounds checks just slows things down.
 # the readInt()/readLong()/readShort() are slow and should just defer to 
 ByteBuffer.readInt(), etc
 This isn't very important since we don't much use these, but I think there's 
 no reason
 users (e.g. codec writers) should have to readBytes() + wrap as bytebuffer + 
 get an 
 IntBuffer view when readInt() can be almost as fast...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2819) LuceneTestCase's check for uncaught exceptions in threads causes collateral damage?

2010-12-18 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2819.
-

Resolution: Fixed

committed and merged to 3.x

in 3.x i kept the test code in CMS (even though unused) as i dont trust the 3.0 
backwards LuceneTestCase 
enough to handle the uncaught exceptions... 

i marked @deprecated for us to remove in 3.2, i think thats easiest.

we should try to resolve some of the rogue thread issues so we can make this 
stuff actually fail instead of warn.

 LuceneTestCase's check for uncaught exceptions in threads causes collateral 
 damage?
 ---

 Key: LUCENE-2819
 URL: https://issues.apache.org/jira/browse/LUCENE-2819
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
Reporter: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2819.patch, LUCENE-2819.patch, LUCENE-2819.patch, 
 LUCENE-2819.patch


 Eg see these failures:
 https://hudson.apache.org/hudson/job/Lucene-3.x/214/
 Multiple test methods failed in TestIndexWriterOnDiskFull, but, I think only 
 1 test had a real failure but somehow our thread hit exc tracking 
 incorrectly blames the other 3 cases?
 I'm not sure about this but it seems like something like that is going on...
 So, one problem is that LuceneTestCase.tearDown fails on any thread excs, but 
 if CMS had also hit a failure, then fails to clear CMS's thread failures.  I 
 think we should just remove CMS's thread failure tracking?  (It's static so 
 it can definitely bleed across tests).  Ie, just rely on LuceneTestCase's 
 tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org