[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2011-04-29 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027279#comment-13027279
 ] 

Chris Male commented on LUCENE-3041:


To follow up on Earwin's comments, I'm going to do the following:

- Leave Query#rewrite out of the walking process.  As Earwin said, rewrite 
provides vital query optimization / conversion to primitive runnable queries.  
Having this method on Query is a good idea since user Queries can simply 
implement this method and move on.
- In a separate issue, add a RewriteState like concept which can be used for 
caching rewrites like that suggested by Simon.  This will have a considerable 
performance improvement for people doing lots of repeated FuzzyQuerys for 
example.
- Change my processing concept into a generic Walker system, which can be 
used for lots of things in Lucene.  Users can implement this Walker to do 
whatever they want (maybe we can pry Earwin's walker based highlighter from 
him? :D)
- Overload IndexSearcher's methods to support passing in a Walker.  We need 
this, instead of simply having the Walker external, because we really want to 
support per-segment Walking.

I'll make a patch for the stuff related to this issue shortly, and spin off the 
RewriteState stuff.

> Support Query Visting / Walking
> ---
>
> Key: LUCENE-3041
> URL: https://issues.apache.org/jira/browse/LUCENE-3041
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, 
> LUCENE-3041.patch
>
>
> Out of the discussion in LUCENE-2868, it could be useful to add a generic 
> Query Visitor / Walker that could be used for more advanced rewriting, 
> optimizations or anything that requires state to be stored as each Query is 
> visited.
> We could keep the interface very simple:
> {code}
> public interface QueryVisitor {
>   Query visit(Query query);
> }
> {code}
> and then use a reflection based visitor like Earwin suggested, which would 
> allow implementators to provide visit methods for just Querys that they are 
> interested in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Index searcher can't find the doc of any field value

2011-04-29 Thread soheila dehghanzadeh
Hi Friends,

i'm using lucene to index a file with this format, each lines contains 4
elements which separated by space. because I want to retrieve any line with
special text in a special part, so I try to add each line to index in a
seprate document with 4 fields. for example I named fields A,B,C,D

so i use this code to index my file:

File file = new File("e://data3");

BufferedReader reader = new BufferedReader(new
FileReader(file));

IndexWriter writer = new IndexWriter(indexDirectory, new
SimpleAnalyzer(),true);

writer.setUseCompoundFile(true);

String line;

while ((line = reader.readLine()) != null) {

string[] index = line.split(" ");

Document document = new Document();

document.add(new Field("A", index[0], Field.Store.YES,
Field.Index.UN_TOKENIZED));

document.add(new Field("B", index[1], Field.Store.YES,
Field.Index.UN_TOKENIZED));

document.add(new Field("C", index[2], Field.Store.YES,
Field.Index.UN_TOKENIZED));

document.add(new Field("D", index[3], Field.Store.YES,
Field.Index.UN_TOKENIZED));

writer.addDocument(document);

System.out.println(writer.docCount());

}

} catch (Exception e) {

e.printStackTrace();

}

but when i try to search this index with some letters which exist in for
example field A it fails to find the document(line) :( my search code is as
follows:

try {

IndexSearcher is = new
IndexSearcher(FSDirectory.getDirectory(indexDirectory, false));

Query q = new TermQuery(new Term("A", "hello"));

Hits hits = is.search(q);

for (int i = 0; i < hits.length(); i++) {

Document doc = hits.doc(i);

System.out.println("A: "+doc.get("A")+" B:"+doc.get("B")+"
C:"+doc.get("C")+" D:"+doc.get("D"));

}

} catch (Exception e) {

e.printStackTrace();

}



kindly let me know if there is any error in my code . thanks in advance.


[jira] [Issue Comment Edited] (SOLR-445) Update Handlers abort with bad documents

2011-04-29 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027245#comment-13027245
 ] 

Lance Norskog edited comment on SOLR-445 at 4/30/11 12:18 AM:
--

If the DIH semantics cover all of the use cases, please follow that model: 
behavior, names, etc. It will be much easier on users. 

  was (Author: lancenorskog):
If the DIH semantics cover all of the use cases, please follow that model: 
behavior, names, etc. It will be much easier on developers.
  
> Update Handlers abort with bad documents
> 
>
> Key: SOLR-445
> URL: https://issues.apache.org/jira/browse/SOLR-445
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
>Reporter: Will Johnson
>Assignee: Grant Ingersoll
> Fix For: Next
>
> Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
> SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid 
> batch.  Ie:
> 
>   
> 1
>   
>   
> 2
> I_AM_A_BAD_DATE
>   
>   
> 3
>   
> 
> Right now solr adds the first doc and then aborts.  It would seem like it 
> should either fail the entire batch or log a message/return a code and then 
> continue on to add doc 3.  Option 1 would seem to be much harder to 
> accomplish and possibly require more memory while Option 2 would require more 
> information to come back from the API.  I'm about to dig into this but I 
> thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents

2011-04-29 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027245#comment-13027245
 ] 

Lance Norskog commented on SOLR-445:


If the DIH semantics cover all of the use cases, please follow that model: 
behavior, names, etc. It will be much easier on developers.

> Update Handlers abort with bad documents
> 
>
> Key: SOLR-445
> URL: https://issues.apache.org/jira/browse/SOLR-445
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
>Reporter: Will Johnson
>Assignee: Grant Ingersoll
> Fix For: Next
>
> Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
> SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid 
> batch.  Ie:
> 
>   
> 1
>   
>   
> 2
> I_AM_A_BAD_DATE
>   
>   
> 3
>   
> 
> Right now solr adds the first doc and then aborts.  It would seem like it 
> should either fail the entire batch or log a message/return a code and then 
> continue on to add doc 3.  Option 1 would seem to be much harder to 
> accomplish and possibly require more memory while Option 2 would require more 
> information to come back from the API.  I'm about to dig into this but I 
> thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027207#comment-13027207
 ] 

Michael McCandless commented on LUCENE-3023:


+1 to commit!  Great work everyone :)

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023-svn-diff.patch, 
> LUCENE-3023-ws-changes.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_iw_iwc_jdoc.patch, 
> LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, 
> LUCENE-3023_svndiff.patch, diffMccand.py, diffSources.patch, 
> diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] new structure

2011-04-29 Thread Michael Herndon
I was thinking along the lines it was for all executables.

i'll put them in the build folder then.

On Fri, Apr 29, 2011 at 4:15 PM, Troy Howard  wrote:

> Only thing I would suggest is keeping .cmd/bin files in the build
> folder. The bin folder is meant for the compiled artifacts.
>
> Otherwise, everything else sounds great.
>
> Thanks,
> Troy
>
>
> On Fri, Apr 29, 2011 at 1:08 PM, Michael Herndon <
> mhern...@wickedsoftware.net> wrote:
>
> > If you think it would be beneficial to have the scripts in the branch, I
> > can
> > do that.
> >
> > On Fri, Apr 29, 2011 at 3:50 PM, Digy  wrote:
> >
> > > Would you add the same stuff to 2.9.4g branch too?
> > >
> > > DIGY
> > >
> > > -Original Message-
> > > From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
> > > Sent: Friday, April 29, 2011 10:28 PM
> > > To: lucene-net-...@lucene.apache.org
> > > Subject: Re: [Lucene.Net] new structure
> > >
> > > I'm going to move ahead with this stuff this weekend unless anyone
> > objects.
> > >
> > > On Sun, Apr 24, 2011 at 4:42 PM, Michael Herndon <
> > > mhern...@wickedsoftware.net> wrote:
> > >
> > > > if you celebrate Easter, Happy Easter, if not, then Happy
> > lucene.netclean
> > > > up day.
> > > >
> > > >
> > > > couple of questions. would it be cool if I can add a .gitignore to
> the
> > > root
> > > > folder?
> > > >
> > > > also would it upset anyone if I add .cmd  & .sh files to the /bin
> > folder
> > > >  and .xml/.build files to the /build folder ?
> > > >
> > > > and sand castle  and shfb to the /lib folder?
> > > >
> > > > - Michael
> > > >
> > > >
> > > > On Sat, Apr 23, 2011 at 7:57 AM, Digy  wrote:
> > > >
> > > >> Everything seems to be OK.
> > > >> +1 for removing old directory structure.
> > > >>
> > > >> Thanks Troy
> > > >>
> > > >> DIGY
> > > >>
> > > >> -Original Message-
> > > >> From: Troy Howard [mailto:thowar...@gmail.com]
> > > >> Sent: Saturday, April 23, 2011 3:07 AM
> > > >> To: lucene-net-...@lucene.apache.org
> > > >> Subject: Re: [Lucene.Net] new structure
> > > >>
> > > >> I guess by 'today' I meant 'In about 6 days'.
> > > >>
> > > >> Anyhow, I completed the commit of the new directory structure.. I
> did
> > > not
> > > >> delete the OLD directory structure, because they can live
> > side-by-side.
> > > >> Also, please note that I only created vs2010 solutions and upgraded
> > the
> > > >> projects to same.
> > > >>
> > > >> Please pull down the latest revision and validate these changes. If
> > all
> > > >> goes
> > > >> well, I'll delete the old directory structure (everything under the
> > 'C#'
> > > >> directory).
> > > >>
> > > >> Thanks,
> > > >> Troy
> > > >>
> > > >> On Sat, Apr 16, 2011 at 3:42 PM, Troy Howard 
> > > wrote:
> > > >>
> > > >> > Apologize. I got a bit derailed. Will be commiting today.
> > > >> > On Apr 16, 2011 2:20 PM, "Prescott Nasser"  >
> > > >> wrote:
> > > >> > >
> > > >> > >
> > > >> > > Hey Troy any status update on the new structure? I'm hesistant
> to
> > do
> > > >> > updates since I know you're going to be modifying it all shortly
> > > >> > >
> > > >> > > ~P
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>


[jira] [Commented] (LUCENE-3053) improve test coverage for Multi*

2011-04-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027204#comment-13027204
 ] 

Michael McCandless commented on LUCENE-3053:


Patch looks good Robert -- make our tests eviler!!

> improve test coverage for Multi*
> 
>
> Key: LUCENE-3053
> URL: https://issues.apache.org/jira/browse/LUCENE-3053
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3053.patch, LUCENE-3053.patch, LUCENE-3053.patch
>
>
> It seems like an easy win that when the test calls newSearcher(), 
> it should sometimes wrap the reader with a SlowMultiReaderWrapper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers

2011-04-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027179#comment-13027179
 ] 

Uwe Schindler edited comment on LUCENE-3055 at 4/29/11 8:51 PM:


{quote}
>From my perspective the most important reason is to avoid a huge performance 
>trap: previously if you subclassed one of these analyzers, override 
>tokenStream(), and added SpecialFilter for example, most of the time users 
>would actually slow down indexing, because now reusableTokenStream() cannot be 
>used by the indexer.
{quote}

Additionally, exactly this special case (overwriting one of the methods) was 
the biggest problem, leading to ugly reflection based checks in Lucene 3.0: In 
3.0 StandardAnalyzer correctly implemented both tokenStream() and 
reuseableTokenStream(). As soon as one subclass only overrided tokenStream(), 
but the indexer still calling reuseableTokenStream() the changes were not even 
used, leading to lots of bug reports. Because of this, a reflection based 
backwards hack was done in 3.0 (see o.a.l.util.VirtualMethod class to make this 
easier), that prevented the indexer from calling reuseableTokenStream if a 
subclass suddenly overwrote only one of the methods. With moving forward in 
3.1, these backwards hacks even got heavier (e.g. changes in TokenStreams, new 
base class ReuseableAnalyzerBase,...), so the only solution was to enforce the 
decorator pattern.

The above example by Robert is the correct way to implement your "factory" of 
TokenStreams. Everything else like subclassing StandardAnalyzer is ugly as it 
hides what you are really doing. The above pattern does exactly what also 
Solr's Schema does: You have to explicitely list all your components, making it 
clear what your TokenStreams are doing.

Trust me, the above example is shorter than subclassing previous 
StandardAnalyzer completely (both tokenStream and reuseableTokenStream) and is 
showing like solrschema.xml what your Analyzer looks like (no hidden stuff in 
superfactories,...)

  was (Author: thetaphi):
{quote}
>From my perspective the most important reason is to avoid a huge performance 
>trap: previously if you subclassed one of these analyzers, override 
>tokenStream(), and added SpecialFilter for example, most of the time users 
>would actually slow down indexing, because now reusableTokenStream() cannot be 
>used by the indexer.
{quote}

Additionally, exactly this special case (overwriting one of the methods) was 
the biggest problem, leading to ugly reflection based checks in Lucene 3.0: In 
3.0 StandardAnalyzer correctly implemented both tokenStream() and 
reuseableTokenStream(). As soon as one subclass only overrided tokenStream(), 
but the indexer still calling reuseableTokenStream() the changes were not even 
used, leading to lots of bug reports. Because of this, a reflection based 
backwards hack was done in 3.0 (see o.a.l.util.VirtualMethod class to make this 
easier), that prevented the indexer from calling reuseableTokenStream if a 
subclass suddenly overwrote only one of the methods. With moving forward in 
3.1, these backwards hacks even got heavier (e.g. changes in TokenStreams, new 
base class ReuseableAnalyzerBase,...), so the only solution was to enforce the 
decorator pattern.

The above example by Robert is the correct way to implement you "factory" of 
TokenStreams. Everything else like subclassing StandardAnalyzer is ugly as it 
hides what you are really doing. The above pattern does exactly what also 
Solr's Schemadoes: You have to explicitely list all your components, making it 
clear what your TokenStreams are doing.

Trust me, the above example is shorter than subclassing previous 
StandardAnalyzer completely (both tokenStream and reuseableTokenStream) and is 
showing like solrschema.xml what your Analyzer looks like (no hidden stuff in 
superfactories,...)
  
> LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
> --
>
> Key: LUCENE-3055
> URL: https://issues.apache.org/jira/browse/LUCENE-3055
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Ian Soboroff
>
> LUCENE-2372 and LUCENE-2389 marked all analyzers as final.  This makes 
> ReusableAnalyzerBase useless, and makes it impossible to subclass e.g. 
> StandardAnalyzer to make a small modification e.g. to tokenStream().  These 
> issues don't indicate a new method of doing this.  The issues don't give a 
> reason except for design considerations, which seems a poor reason to make a 
> backward-incompatible change

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--

[jira] [Commented] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers

2011-04-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027179#comment-13027179
 ] 

Uwe Schindler commented on LUCENE-3055:
---

{quote}
>From my perspective the most important reason is to avoid a huge performance 
>trap: previously if you subclassed one of these analyzers, override 
>tokenStream(), and added SpecialFilter for example, most of the time users 
>would actually slow down indexing, because now reusableTokenStream() cannot be 
>used by the indexer.
{quote}

Additionally, exactly this special case (overwriting one of the methods) was 
the biggest problem, leading to ugly reflection based checks in Lucene 3.0: In 
3.0 StandardAnalyzer correctly implemented both tokenStream() and 
reuseableTokenStream(). As soon as one subclass only overrided tokenStream(), 
but the indexer still calling reuseableTokenStream() the changes were not even 
used, leading to lots of bug reports. Because of this, a reflection based 
backwards hack was done in 3.0 (see o.a.l.util.VirtualMethod class to make this 
easier), that prevented the indexer from calling reuseableTokenStream if a 
subclass suddenly overwrote only one of the methods. With moving forward in 
3.1, these backwards hacks even got heavier (e.g. changes in TokenStreams, new 
base class ReuseableAnalyzerBase,...), so the only solution was to enforce the 
decorator pattern.

The above example by Robert is the correct way to implement you "factory" of 
TokenStreams. Everything else like subclassing StandardAnalyzer is ugly as it 
hides what you are really doing. The above pattern does exactly what also 
Solr's Schemadoes: You have to explicitely list all your components, making it 
clear what your TokenStreams are doing.

Trust me, the above example is shorter than subclassing previous 
StandardAnalyzer completely (both tokenStream and reuseableTokenStream) and is 
showing like solrschema.xml what your Analyzer looks like (no hidden stuff in 
superfactories,...)

> LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
> --
>
> Key: LUCENE-3055
> URL: https://issues.apache.org/jira/browse/LUCENE-3055
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Ian Soboroff
>
> LUCENE-2372 and LUCENE-2389 marked all analyzers as final.  This makes 
> ReusableAnalyzerBase useless, and makes it impossible to subclass e.g. 
> StandardAnalyzer to make a small modification e.g. to tokenStream().  These 
> issues don't indicate a new method of doing this.  The issues don't give a 
> reason except for design considerations, which seems a poor reason to make a 
> backward-incompatible change

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers

2011-04-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027171#comment-13027171
 ] 

Robert Muir commented on LUCENE-3055:
-

Hi Ian, you are right the justifications don't totally explain the reasoning 
behind this change.

>From my perspective the most important reason is to avoid a huge performance 
>trap: previously if you subclassed one of these analyzers, override 
>tokenStream(), and added SpecialFilter for example, most of the time users 
>would actually slow down indexing, because now reusableTokenStream() cannot be 
>used by the indexer.

This created worst-case situations like LUCENE-2279.

Instead, the recommended approach is to just let analyzers be tokenstream 
factories (which is all they are). They aren't really "extendable" only 
"overridable" since they are just factories for tokenstreams, and by doing so 
it creates the worst-case performance trap where new objects are created for 
every document. I would instead recommend writing your analyzer by extending 
ReusableAnalyzerBase instead, which is easy and safe:
{noformat}
Analyzer analyzer = new ReusableAnalyzerBase() {
  protected TokenStreamComponents createComponents(String fieldName, Reader 
reader) {
Tokenizer tokenizer = new WhitespaceTokenizer(...);
TokenStream filteredStream = new FooTokenFilter(tokenizer, ...);
filteredStream = new BarTokenFilter(filteredStream, ...);
return new TokenStreamComponents(tokenizer, filteredStream);
  }
};
{noformat}


> LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
> --
>
> Key: LUCENE-3055
> URL: https://issues.apache.org/jira/browse/LUCENE-3055
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 3.1
>Reporter: Ian Soboroff
>
> LUCENE-2372 and LUCENE-2389 marked all analyzers as final.  This makes 
> ReusableAnalyzerBase useless, and makes it impossible to subclass e.g. 
> StandardAnalyzer to make a small modification e.g. to tokenStream().  These 
> issues don't indicate a new method of doing this.  The issues don't give a 
> reason except for design considerations, which seems a poor reason to make a 
> backward-incompatible change

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] new structure

2011-04-29 Thread Troy Howard
Only thing I would suggest is keeping .cmd/bin files in the build
folder. The bin folder is meant for the compiled artifacts.

Otherwise, everything else sounds great.

Thanks,
Troy


On Fri, Apr 29, 2011 at 1:08 PM, Michael Herndon <
mhern...@wickedsoftware.net> wrote:

> If you think it would be beneficial to have the scripts in the branch, I
> can
> do that.
>
> On Fri, Apr 29, 2011 at 3:50 PM, Digy  wrote:
>
> > Would you add the same stuff to 2.9.4g branch too?
> >
> > DIGY
> >
> > -Original Message-
> > From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
> > Sent: Friday, April 29, 2011 10:28 PM
> > To: lucene-net-...@lucene.apache.org
> > Subject: Re: [Lucene.Net] new structure
> >
> > I'm going to move ahead with this stuff this weekend unless anyone
> objects.
> >
> > On Sun, Apr 24, 2011 at 4:42 PM, Michael Herndon <
> > mhern...@wickedsoftware.net> wrote:
> >
> > > if you celebrate Easter, Happy Easter, if not, then Happy
> lucene.netclean
> > > up day.
> > >
> > >
> > > couple of questions. would it be cool if I can add a .gitignore to the
> > root
> > > folder?
> > >
> > > also would it upset anyone if I add .cmd  & .sh files to the /bin
> folder
> > >  and .xml/.build files to the /build folder ?
> > >
> > > and sand castle  and shfb to the /lib folder?
> > >
> > > - Michael
> > >
> > >
> > > On Sat, Apr 23, 2011 at 7:57 AM, Digy  wrote:
> > >
> > >> Everything seems to be OK.
> > >> +1 for removing old directory structure.
> > >>
> > >> Thanks Troy
> > >>
> > >> DIGY
> > >>
> > >> -Original Message-
> > >> From: Troy Howard [mailto:thowar...@gmail.com]
> > >> Sent: Saturday, April 23, 2011 3:07 AM
> > >> To: lucene-net-...@lucene.apache.org
> > >> Subject: Re: [Lucene.Net] new structure
> > >>
> > >> I guess by 'today' I meant 'In about 6 days'.
> > >>
> > >> Anyhow, I completed the commit of the new directory structure.. I did
> > not
> > >> delete the OLD directory structure, because they can live
> side-by-side.
> > >> Also, please note that I only created vs2010 solutions and upgraded
> the
> > >> projects to same.
> > >>
> > >> Please pull down the latest revision and validate these changes. If
> all
> > >> goes
> > >> well, I'll delete the old directory structure (everything under the
> 'C#'
> > >> directory).
> > >>
> > >> Thanks,
> > >> Troy
> > >>
> > >> On Sat, Apr 16, 2011 at 3:42 PM, Troy Howard 
> > wrote:
> > >>
> > >> > Apologize. I got a bit derailed. Will be commiting today.
> > >> > On Apr 16, 2011 2:20 PM, "Prescott Nasser" 
> > >> wrote:
> > >> > >
> > >> > >
> > >> > > Hey Troy any status update on the new structure? I'm hesistant to
> do
> > >> > updates since I know you're going to be modifying it all shortly
> > >> > >
> > >> > > ~P
> > >> > >
> > >> >
> > >>
> > >>
> > >
> >
> >
>


RE: [Lucene.Net] new structure

2011-04-29 Thread Digy
I just want to keep the 2.9.4g & trunk in par.
The only divergence for now is LUCENENET-172 which will be applied to 2.9.4
eventually.

DIGY

-Original Message-
From: Michael Herndon [mailto:mhern...@wickedsoftware.net] 
Sent: Friday, April 29, 2011 11:08 PM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] new structure

If you think it would be beneficial to have the scripts in the branch, I can
do that.

On Fri, Apr 29, 2011 at 3:50 PM, Digy  wrote:

> Would you add the same stuff to 2.9.4g branch too?
>
> DIGY
>
> -Original Message-
> From: Michael Herndon [mailto:mhern...@wickedsoftware.net]
> Sent: Friday, April 29, 2011 10:28 PM
> To: lucene-net-...@lucene.apache.org
> Subject: Re: [Lucene.Net] new structure
>
> I'm going to move ahead with this stuff this weekend unless anyone
objects.
>
> On Sun, Apr 24, 2011 at 4:42 PM, Michael Herndon <
> mhern...@wickedsoftware.net> wrote:
>
> > if you celebrate Easter, Happy Easter, if not, then Happy
lucene.netclean
> > up day.
> >
> >
> > couple of questions. would it be cool if I can add a .gitignore to the
> root
> > folder?
> >
> > also would it upset anyone if I add .cmd  & .sh files to the /bin folder
> >  and .xml/.build files to the /build folder ?
> >
> > and sand castle  and shfb to the /lib folder?
> >
> > - Michael
> >
> >
> > On Sat, Apr 23, 2011 at 7:57 AM, Digy  wrote:
> >
> >> Everything seems to be OK.
> >> +1 for removing old directory structure.
> >>
> >> Thanks Troy
> >>
> >> DIGY
> >>
> >> -Original Message-
> >> From: Troy Howard [mailto:thowar...@gmail.com]
> >> Sent: Saturday, April 23, 2011 3:07 AM
> >> To: lucene-net-...@lucene.apache.org
> >> Subject: Re: [Lucene.Net] new structure
> >>
> >> I guess by 'today' I meant 'In about 6 days'.
> >>
> >> Anyhow, I completed the commit of the new directory structure.. I did
> not
> >> delete the OLD directory structure, because they can live side-by-side.
> >> Also, please note that I only created vs2010 solutions and upgraded the
> >> projects to same.
> >>
> >> Please pull down the latest revision and validate these changes. If all
> >> goes
> >> well, I'll delete the old directory structure (everything under the
'C#'
> >> directory).
> >>
> >> Thanks,
> >> Troy
> >>
> >> On Sat, Apr 16, 2011 at 3:42 PM, Troy Howard 
> wrote:
> >>
> >> > Apologize. I got a bit derailed. Will be commiting today.
> >> > On Apr 16, 2011 2:20 PM, "Prescott Nasser" 
> >> wrote:
> >> > >
> >> > >
> >> > > Hey Troy any status update on the new structure? I'm hesistant to
do
> >> > updates since I know you're going to be modifying it all shortly
> >> > >
> >> > > ~P
> >> > >
> >> >
> >>
> >>
> >
>
>



[jira] [Created] (LUCENE-3055) LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers

2011-04-29 Thread Ian Soboroff (JIRA)
LUCENE-2372, LUCENE-2389 made it impossible to subclass core analyzers
--

 Key: LUCENE-3055
 URL: https://issues.apache.org/jira/browse/LUCENE-3055
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 3.1
Reporter: Ian Soboroff


LUCENE-2372 and LUCENE-2389 marked all analyzers as final.  This makes 
ReusableAnalyzerBase useless, and makes it impossible to subclass e.g. 
StandardAnalyzer to make a small modification e.g. to tokenStream().  These 
issues don't indicate a new method of doing this.  The issues don't give a 
reason except for design considerations, which seems a poor reason to make a 
backward-incompatible change

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Code Freeze on realtime_search branch

2011-04-29 Thread Sanne Grinovero
2011/4/29 Michael McCandless :
> Sorry, but, no :)
>
> So feel free to keep working towards removing this limitation!!
>
> This change makes IndexWriter's flush (where it writes the added
> documents in RAM to disk as a new segment) fully concurrent, so that
> while one segment is being flushed (which could take a longish time,
> eg on a slowish IO system), other threads are now free to continue
> indexing (where they were blocked before).  On computers with
> substantial CPU concurrency, and fast "enough" IO systems, this change
> should give a big increase in indexing throughput.
>
> That said, I do think this change is a step towards what you seek
> (allowing multiple IndexWriters, even in separate JVMs maybe on
> separate computers, to write into an index at once).
>
> Mike

thank you for clarifying this; maybe I don't even need to remove the
locking if I can run some of those participant threads in the remote
nodes.
I'll keep you updated, but unfortunately can't start working on it sooner.

Sanne


>
> http://blog.mikemccandless.com
>
> On Fri, Apr 29, 2011 at 2:16 PM, Sanne Grinovero
>  wrote:
>> Hello,
>> this is totally awesome!
>>
>> Does it imply we don't need the IndexWriter lock anymore? And hence
>> that people sharing the Lucene Directory across multiple JVMs can have
>> both write at the same time?
>>
>> I had intentions to *try* removing such limitations this summer, but
>> if this is the case I will spend my time testing this carefully
>> instead, or if some kind of locking is still required I'd appreciate
>> some pointers so that I'll be able to remove them.
>>
>> Regards,
>> Sanne
>>
>> 2011/4/29 Simon Willnauer :
>>> Hey folks,
>>>
>>> LUCENE-3023 aims to land the considerably large
>>> DocumentsWriterPerThread (DWPT) refactoring on trunk.
>>> During the last weeks we have put lots of efforts into cleaning the
>>> code up, fixing javadocs and run test locally
>>> as well as on Jenkins. We reached the point where we are able to
>>> create a final patch for review and land this
>>> exciting refactoring on trunk very soon. I committed the CHANGES.TXT
>>> entry (also appended below) a couple of minutes ago so from now on
>>> we freeze the branch for final review (Robert can you create a new
>>> "final" patch and upload to LUCENE-3023).
>>> Any comments should go to [1] or as a reply to this email. If there is
>>> no blocker coming up we plan to reintegrate the
>>> branch and commit it to trunk early next week. For those who want some
>>> background what DWPT does read: [2]
>>>
>>> Note: this change will not change the index file format so there is no
>>> need to reindex for trunk users. Yet, I will send a heads up next week
>>> with an
>>> overview of that has changed.
>>>
>>> Simon
>>>
>>> [1] https://issues.apache.org/jira/browse/LUCENE-3023
>>> [2] 
>>> http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/
>>>
>>>
>>> * LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
>>>  DocumentsWriterPerThread:
>>>
>>>  - IndexWriter now uses a DocumentsWriter per thread when indexing 
>>> documents.
>>>    Each DocumentsWriterPerThread indexes documents in its own private 
>>> segment,
>>>    and the in memory segments are no longer merged on flush.  Instead, each
>>>    segment is separately flushed to disk and subsequently merged with normal
>>>    segment merging.
>>>
>>>  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
>>>    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
>>>    indexing may continue concurrently with flushing.  The selected
>>>    DWPT flushes all its RAM resident documents do disk.  Note: Segment 
>>> flushes
>>>    don't flush all RAM resident documents but only the documents private to
>>>    the DWPT selected for flushing.
>>>
>>>  - Flushing is now controlled by FlushPolicy that is called for every add,
>>>    update or delete on IndexWriter. By default DWPTs are flushed either on
>>>    maxBufferedDocs per DWPT or the global active used memory. Once the 
>>> active
>>>    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
>>>    flushing and the memory used by this DWPT is substracted from the active
>>>    memory and added to a flushing memory pool, which can lead to temporarily
>>>    higher memory usage due to ongoing indexing.
>>>
>>>  - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can 
>>> address
>>>    up to 2048 MB memory such that the ramBufferSize is now bounded by the 
>>> max
>>>    number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
>>>    IndexWriters net memory consumption can grow far beyond the 2048 MB 
>>> limit if
>>>    the applicatoin can use all available DWPTs. To prevent a DWPT from
>>>    exhausting its address space IndexWriter will forcefully flush a DWPT if 
>>> its
>>>    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be 
>>> controlled
>>>    via IndexWriterConfig and defaults to 1945 MB.

Re: [Lucene.Net] new structure

2011-04-29 Thread Michael Herndon
I'm going to move ahead with this stuff this weekend unless anyone objects.

On Sun, Apr 24, 2011 at 4:42 PM, Michael Herndon <
mhern...@wickedsoftware.net> wrote:

> if you celebrate Easter, Happy Easter, if not, then Happy lucene.net clean
> up day.
>
>
> couple of questions. would it be cool if I can add a .gitignore to the root
> folder?
>
> also would it upset anyone if I add .cmd  & .sh files to the /bin folder
>  and .xml/.build files to the /build folder ?
>
> and sand castle  and shfb to the /lib folder?
>
> - Michael
>
>
> On Sat, Apr 23, 2011 at 7:57 AM, Digy  wrote:
>
>> Everything seems to be OK.
>> +1 for removing old directory structure.
>>
>> Thanks Troy
>>
>> DIGY
>>
>> -Original Message-
>> From: Troy Howard [mailto:thowar...@gmail.com]
>> Sent: Saturday, April 23, 2011 3:07 AM
>> To: lucene-net-...@lucene.apache.org
>> Subject: Re: [Lucene.Net] new structure
>>
>> I guess by 'today' I meant 'In about 6 days'.
>>
>> Anyhow, I completed the commit of the new directory structure.. I did not
>> delete the OLD directory structure, because they can live side-by-side.
>> Also, please note that I only created vs2010 solutions and upgraded the
>> projects to same.
>>
>> Please pull down the latest revision and validate these changes. If all
>> goes
>> well, I'll delete the old directory structure (everything under the 'C#'
>> directory).
>>
>> Thanks,
>> Troy
>>
>> On Sat, Apr 16, 2011 at 3:42 PM, Troy Howard  wrote:
>>
>> > Apologize. I got a bit derailed. Will be commiting today.
>> > On Apr 16, 2011 2:20 PM, "Prescott Nasser" 
>> wrote:
>> > >
>> > >
>> > > Hey Troy any status update on the new structure? I'm hesistant to do
>> > updates since I know you're going to be modifying it all shortly
>> > >
>> > > ~P
>> > >
>> >
>>
>>
>


[jira] [Created] (SOLR-2482) DataImportHandler; reload-config; response in case of failure & further requests

2011-04-29 Thread Stefan Matheis (steffkes) (JIRA)
DataImportHandler; reload-config; response in case of failure & further requests


 Key: SOLR-2482
 URL: https://issues.apache.org/jira/browse/SOLR-2482
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler, web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor
 Attachments: reload-config-error.html

Reloading while the config-file is valid is completely fine, but if the config 
is broken - the Response is plain HTML, containing the full stacktrace (see 
attachment). further requests contain a {{status}} Element with 
??DataImportHandler started. Not Initialized. No commands can be run??, but 
respond with a HTTP-Status 200 OK :/

Would be nice, if:
* the response in case of error could also be xml formatted
* contain the exception message (in my case ??The end-tag for element type 
"entity" must end with a '>' delimiter.??) in a seperate field
* use a better/correct http-status for the latter mentioned requests, i would 
suggest {{503 Service Unavailable}}

So we are able to display to error-message to the user, while the config gets 
broken - and for the further requests we could rely on the http-status and have 
no need to check the content of the xml-response.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2482) DataImportHandler; reload-config; response in case of failure & further requests

2011-04-29 Thread Stefan Matheis (steffkes) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-2482:


Attachment: reload-config-error.html

> DataImportHandler; reload-config; response in case of failure & further 
> requests
> 
>
> Key: SOLR-2482
> URL: https://issues.apache.org/jira/browse/SOLR-2482
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler, web gui
>Reporter: Stefan Matheis (steffkes)
>Priority: Minor
> Attachments: reload-config-error.html
>
>
> Reloading while the config-file is valid is completely fine, but if the 
> config is broken - the Response is plain HTML, containing the full stacktrace 
> (see attachment). further requests contain a {{status}} Element with 
> ??DataImportHandler started. Not Initialized. No commands can be run??, but 
> respond with a HTTP-Status 200 OK :/
> Would be nice, if:
> * the response in case of error could also be xml formatted
> * contain the exception message (in my case ??The end-tag for element type 
> "entity" must end with a '>' delimiter.??) in a seperate field
> * use a better/correct http-status for the latter mentioned requests, i would 
> suggest {{503 Service Unavailable}}
> So we are able to display to error-message to the user, while the config gets 
> broken - and for the further requests we could rely on the http-status and 
> have no need to check the content of the xml-response.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Code Freeze on realtime_search branch

2011-04-29 Thread Michael McCandless
Sorry, but, no :)

So feel free to keep working towards removing this limitation!!

This change makes IndexWriter's flush (where it writes the added
documents in RAM to disk as a new segment) fully concurrent, so that
while one segment is being flushed (which could take a longish time,
eg on a slowish IO system), other threads are now free to continue
indexing (where they were blocked before).  On computers with
substantial CPU concurrency, and fast "enough" IO systems, this change
should give a big increase in indexing throughput.

That said, I do think this change is a step towards what you seek
(allowing multiple IndexWriters, even in separate JVMs maybe on
separate computers, to write into an index at once).

Mike

http://blog.mikemccandless.com

On Fri, Apr 29, 2011 at 2:16 PM, Sanne Grinovero
 wrote:
> Hello,
> this is totally awesome!
>
> Does it imply we don't need the IndexWriter lock anymore? And hence
> that people sharing the Lucene Directory across multiple JVMs can have
> both write at the same time?
>
> I had intentions to *try* removing such limitations this summer, but
> if this is the case I will spend my time testing this carefully
> instead, or if some kind of locking is still required I'd appreciate
> some pointers so that I'll be able to remove them.
>
> Regards,
> Sanne
>
> 2011/4/29 Simon Willnauer :
>> Hey folks,
>>
>> LUCENE-3023 aims to land the considerably large
>> DocumentsWriterPerThread (DWPT) refactoring on trunk.
>> During the last weeks we have put lots of efforts into cleaning the
>> code up, fixing javadocs and run test locally
>> as well as on Jenkins. We reached the point where we are able to
>> create a final patch for review and land this
>> exciting refactoring on trunk very soon. I committed the CHANGES.TXT
>> entry (also appended below) a couple of minutes ago so from now on
>> we freeze the branch for final review (Robert can you create a new
>> "final" patch and upload to LUCENE-3023).
>> Any comments should go to [1] or as a reply to this email. If there is
>> no blocker coming up we plan to reintegrate the
>> branch and commit it to trunk early next week. For those who want some
>> background what DWPT does read: [2]
>>
>> Note: this change will not change the index file format so there is no
>> need to reindex for trunk users. Yet, I will send a heads up next week
>> with an
>> overview of that has changed.
>>
>> Simon
>>
>> [1] https://issues.apache.org/jira/browse/LUCENE-3023
>> [2] 
>> http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/
>>
>>
>> * LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
>>  DocumentsWriterPerThread:
>>
>>  - IndexWriter now uses a DocumentsWriter per thread when indexing documents.
>>    Each DocumentsWriterPerThread indexes documents in its own private 
>> segment,
>>    and the in memory segments are no longer merged on flush.  Instead, each
>>    segment is separately flushed to disk and subsequently merged with normal
>>    segment merging.
>>
>>  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
>>    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
>>    indexing may continue concurrently with flushing.  The selected
>>    DWPT flushes all its RAM resident documents do disk.  Note: Segment 
>> flushes
>>    don't flush all RAM resident documents but only the documents private to
>>    the DWPT selected for flushing.
>>
>>  - Flushing is now controlled by FlushPolicy that is called for every add,
>>    update or delete on IndexWriter. By default DWPTs are flushed either on
>>    maxBufferedDocs per DWPT or the global active used memory. Once the active
>>    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
>>    flushing and the memory used by this DWPT is substracted from the active
>>    memory and added to a flushing memory pool, which can lead to temporarily
>>    higher memory usage due to ongoing indexing.
>>
>>  - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address
>>    up to 2048 MB memory such that the ramBufferSize is now bounded by the max
>>    number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
>>    IndexWriters net memory consumption can grow far beyond the 2048 MB limit 
>> if
>>    the applicatoin can use all available DWPTs. To prevent a DWPT from
>>    exhausting its address space IndexWriter will forcefully flush a DWPT if 
>> its
>>    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be 
>> controlled
>>    via IndexWriterConfig and defaults to 1945 MB.
>>    Since IndexWriter flushes DWPT concurrently not all memory is released
>>    immediately. Applications should still use a ramBufferSize significantly
>>    lower than the JVMs avaliable heap memory since under high load multiple
>>    flushing DWPT can consume substantial transient memory when IO performance
>>    is slow relative to indexing rate.
>>
>>  - IndexWriter#commit now doesn't blo

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027135#comment-13027135
 ] 

Yonik Seeley commented on LUCENE-3023:
--

This looks awesome guys!

I've started some ad-hoc testing via Solr.
A single threaded CSV upload (bulk indexing... no real-time reopens)
looks pretty much the same, and doing 2 CSV uploads at once was
36% faster (a bit apples-to-oranges since the number of resulting
segments was also higher... but even still, looks like a good improvement!)


> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023-svn-diff.patch, 
> LUCENE-3023-ws-changes.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_iw_iwc_jdoc.patch, 
> LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, 
> LUCENE-3023_svndiff.patch, diffMccand.py, diffSources.patch, 
> diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3054) add assert to sorts catch broken comparators in tests

2011-04-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3054:


Attachment: LUCENE-3054.patch

i expanded the patch to all the sorts, just to find all the wierd 
sorting/comparators going on.

it also finds some false positives, ones that are documented as inconsistent 
with equals, ones in tests, etc.

but we can at least look into the ones it finds.

> add assert to sorts catch broken comparators in tests
> -
>
> Key: LUCENE-3054
> URL: https://issues.apache.org/jira/browse/LUCENE-3054
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.1
>Reporter: Robert Muir
> Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these 
> broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3054) add assert to sorts catch broken comparators in tests

2011-04-29 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated LUCENE-3054:
-

Affects Version/s: 3.1

Btw. this is with Lucene 3.1
For full thread: http://search-lucene.com/m/ytANA59Q9G1


> add assert to sorts catch broken comparators in tests
> -
>
> Key: LUCENE-3054
> URL: https://issues.apache.org/jira/browse/LUCENE-3054
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: 3.1
>Reporter: Robert Muir
> Attachments: LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these 
> broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Code Freeze on realtime_search branch

2011-04-29 Thread Sanne Grinovero
Hello,
this is totally awesome!

Does it imply we don't need the IndexWriter lock anymore? And hence
that people sharing the Lucene Directory across multiple JVMs can have
both write at the same time?

I had intentions to *try* removing such limitations this summer, but
if this is the case I will spend my time testing this carefully
instead, or if some kind of locking is still required I'd appreciate
some pointers so that I'll be able to remove them.

Regards,
Sanne

2011/4/29 Simon Willnauer :
> Hey folks,
>
> LUCENE-3023 aims to land the considerably large
> DocumentsWriterPerThread (DWPT) refactoring on trunk.
> During the last weeks we have put lots of efforts into cleaning the
> code up, fixing javadocs and run test locally
> as well as on Jenkins. We reached the point where we are able to
> create a final patch for review and land this
> exciting refactoring on trunk very soon. I committed the CHANGES.TXT
> entry (also appended below) a couple of minutes ago so from now on
> we freeze the branch for final review (Robert can you create a new
> "final" patch and upload to LUCENE-3023).
> Any comments should go to [1] or as a reply to this email. If there is
> no blocker coming up we plan to reintegrate the
> branch and commit it to trunk early next week. For those who want some
> background what DWPT does read: [2]
>
> Note: this change will not change the index file format so there is no
> need to reindex for trunk users. Yet, I will send a heads up next week
> with an
> overview of that has changed.
>
> Simon
>
> [1] https://issues.apache.org/jira/browse/LUCENE-3023
> [2] 
> http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/
>
>
> * LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
>  DocumentsWriterPerThread:
>
>  - IndexWriter now uses a DocumentsWriter per thread when indexing documents.
>    Each DocumentsWriterPerThread indexes documents in its own private segment,
>    and the in memory segments are no longer merged on flush.  Instead, each
>    segment is separately flushed to disk and subsequently merged with normal
>    segment merging.
>
>  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
>    FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
>    indexing may continue concurrently with flushing.  The selected
>    DWPT flushes all its RAM resident documents do disk.  Note: Segment flushes
>    don't flush all RAM resident documents but only the documents private to
>    the DWPT selected for flushing.
>
>  - Flushing is now controlled by FlushPolicy that is called for every add,
>    update or delete on IndexWriter. By default DWPTs are flushed either on
>    maxBufferedDocs per DWPT or the global active used memory. Once the active
>    memory exceeds ramBufferSizeMB only the largest DWPT is selected for
>    flushing and the memory used by this DWPT is substracted from the active
>    memory and added to a flushing memory pool, which can lead to temporarily
>    higher memory usage due to ongoing indexing.
>
>  - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address
>    up to 2048 MB memory such that the ramBufferSize is now bounded by the max
>    number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
>    IndexWriters net memory consumption can grow far beyond the 2048 MB limit 
> if
>    the applicatoin can use all available DWPTs. To prevent a DWPT from
>    exhausting its address space IndexWriter will forcefully flush a DWPT if 
> its
>    hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be 
> controlled
>    via IndexWriterConfig and defaults to 1945 MB.
>    Since IndexWriter flushes DWPT concurrently not all memory is released
>    immediately. Applications should still use a ramBufferSize significantly
>    lower than the JVMs avaliable heap memory since under high load multiple
>    flushing DWPT can consume substantial transient memory when IO performance
>    is slow relative to indexing rate.
>
>  - IndexWriter#commit now doesn't block concurrent indexing while flushing all
>    'currently' RAM resident documents to disk. Yet, flushes that occur while a
>    a full flush is running are queued and will happen after all DWPT involved
>    in the full flush are done flushing. Applications using multiple threads
>    during indexing and trigger a full flush (eg call commmit() or open a new
>    NRT reader) can use significantly more transient memory.
>
>  - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing
>    threads if the number of active + number of flushing DWPT exceed a
>    safety limit. By default this happens if 2 * max number available thread
>    states (DWPTPool) is exceeded. This safety limit prevents applications from
>    exhausting their available memory if flushing can't keep up with
>    concurrently indexing threads.
>
>  - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms
>    l

[jira] [Updated] (LUCENE-3054) add assert to sorts catch broken comparators in tests

2011-04-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3054:


Attachment: LUCENE-3054.patch

really ugly prototype... i expect the generics/sort policeman will want to jump 
in here anyway :)

but it does catch that problem:
{noformat}
[junit] Testsuite: org.apache.lucene.index.TestCodecs
[junit] Testcase: 
testSepPositionAfterMerge(org.apache.lucene.index.TestCodecs):FAILED
[junit] insane comparator for: 
org.apache.lucene.search.PhraseQuery$PostingsAndFreq
{noformat}

> add assert to sorts catch broken comparators in tests
> -
>
> Key: LUCENE-3054
> URL: https://issues.apache.org/jira/browse/LUCENE-3054
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
> Attachments: LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these 
> broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3054) add assert to sorts catch broken comparators in tests

2011-04-29 Thread Robert Muir (JIRA)
add assert to sorts catch broken comparators in tests
-

 Key: LUCENE-3054
 URL: https://issues.apache.org/jira/browse/LUCENE-3054
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Attachments: LUCENE-3054.patch

Looking at Otis's sort problem on the mailing list, he said:
{noformat}
* looked for other places where this call is made - found it in
MultiPhraseQuery$MultiPhraseWeight and changed that call from
ArrayUtil.quickSort to ArrayUtil.mergeSort
* now we no longer see SorterTemplate.quickSort in deep recursion when we do a
thread dump
{noformat}

I thought this was interesting because PostingsAndFreq's comparator
looks like it needs a tiebreaker.

I think in our sorts we should add some asserts to try to catch some of these 
broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3051) don't call SegmentInfo.sizeInBytes for the merging segments

2011-04-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027095#comment-13027095
 ] 

Michael McCandless commented on LUCENE-3051:


Thanks Simon.

bq. mike patch looks good but are we sure we are not accessing the 'live' SI 
somewhere down the path in unsynced context?

Well, we do pass the live info to readPool.get, which could then pass
it to SegmentReader.get, if the reader was not already pooled.  While
in theory other threads could change that info (say, if we are
applying deletes), I believe readerPool prevents that because if dels
are being applied as a merge is kicking off they will share the same
reader, and the 2nd call to get will just return that reader.
Definitely somewhat iffy though...

I'm pretty sure we do not access SI.sizeInBytes elsewhere in IW for
these segments being merged...


> don't call SegmentInfo.sizeInBytes for the merging segments
> ---
>
> Key: LUCENE-3051
> URL: https://issues.apache.org/jira/browse/LUCENE-3051
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3051.patch
>
>
> Selckin has been running Lucene's tests on the RT branch, and hit this:
> {noformat}
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testDeleteAllSlowly(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:535)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 67, Failures: 1, Errors: 0, Time elapsed: 38.357 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testDeleteAllSlowly 
> -Dtests.seed=-4291771462012978364:4550117847390778918
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Lucene Merge Thread #1 ***
> [junit] org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:290)
> [junit]   at 
> org.apache.lucene.store.MockDirectoryWrapper.fileLength(MockDirectoryWrapper.java:549)
> [junit]   at 
> org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:287)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3280)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2956)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
> f6=Pulsing(freqCutoff=15), f7=MockFixedIntBlock(blockSize=1606), 
> f8=SimpleText, f9=MockSep, f1=MockVariableIntBlock(baseBlockSize=99), 
> f0=MockFixedIntBlock(blockSize=1606), f3=Pulsing(freqCutoff=15), f2=MockSep, 
> f5=SimpleText, f4=Standard, f=MockFixedIntBlock(blockSize=1606), c=MockSep, 
> termVector=MockRandom, d9=MockFixedIntBlock(blockSize=1606), 
> d8=Pulsing(freqCutoff=15), d5=SimpleText, d4=Standard, d7=MockRandom, 
> d6=MockVariableIntBlock(baseBlockSize=99), d25=MockRandom, d0=MockRandom, 
> c29=MockFixedIntBlock(blockSize=1606), 
> d24=MockVariableIntBlock(baseBlockSize=99), d1=Standard, c28=Standard, 
> d23=SimpleText, d2=MockFixedIntBlock(blockSize=1606), c27=MockRandom, 
> d22=Standard, d3=MockVariableIntBlock(baseBlockSize=99), 
> d21=Pulsing(freqCutoff=15), d20=MockSep, 
> c22=MockFixedIntBlock(blockSize=1606), c21=Pulsing(freqCutoff=15), 
> c20=MockRandom, d29=MockFixedIntBlock(blockSize=1606), c26=Standard, 
> d28=Pulsing(freqCutoff=15), c25=MockRandom, d27=MockRandom, c24=MockSep, 
> d26=MockVariableIntBlock(baseBlockSize=99), c23=SimpleText, e9=MockRandom, 
> e8=MockSep, e7=SimpleText, e6=

[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2011-04-29 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027080#comment-13027080
 ] 

Earwin Burrfoot commented on LUCENE-3041:
-

I vehemently oppose introducing the "visitor design pattern" (classic 
double-dispatch version) into the Query API. It is a badly broken replacement 
(ie, cannot be easily extended) for multiple dispatch.

Also, from the looks of it (short IRC discussion), user-written visitors and 
rewrite() API have totally different aims.
- rewrite() is very specific (it is a pre-search preparation that produces 
runnable query, eg expands multi-term queries into OR sequences or wrapped 
filters), but should work over any kinds of user-written Queries with possibly 
exotic behaviours (eg, take rewrite from the cache). Consequently, the logic is 
tightly coupled to each Query-impl innards.
- user-written visitors on the other hand, may have a multitude of purporses 
(wildly varying logic for node handling + navigation - eg, some may want to see 
MTQs expanded, and some may not) over relatively fixed number of possible node 
types.

So the best possible solution so far is to keep rewrite() asis - it serves its 
purporse quite well.
And introduce generic reflection-based multiple-dispatch visitor that can walk 
any kind of hierarchies (eg, in my project I rewrite ASTs to ASTs, ASTs to 
Queries, and Queries to bags of Terms) so people can transform their query 
trees.
The current patch contains a derivative of [my original 
version|https://gist.github.com/dfebaf79f5524e6ea8b4]. And here's a 
[test/example|https://gist.github.com/e5eb67d762be0bce8d28]
This visitor keeps all logic on itself and thus cannot replace rewrite().

> Support Query Visting / Walking
> ---
>
> Key: LUCENE-3041
> URL: https://issues.apache.org/jira/browse/LUCENE-3041
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, 
> LUCENE-3041.patch
>
>
> Out of the discussion in LUCENE-2868, it could be useful to add a generic 
> Query Visitor / Walker that could be used for more advanced rewriting, 
> optimizations or anything that requires state to be stored as each Query is 
> visited.
> We could keep the interface very simple:
> {code}
> public interface QueryVisitor {
>   Query visit(Query query);
> }
> {code}
> and then use a reflection based visitor like Earwin suggested, which would 
> allow implementators to provide visit methods for just Querys that they are 
> interested in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3051) don't call SegmentInfo.sizeInBytes for the merging segments

2011-04-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027078#comment-13027078
 ] 

Simon Willnauer commented on LUCENE-3051:
-

mike patch looks good but are we sure we are not accessing the 'live' SI 
somewhere down the path in unsynced context?


> don't call SegmentInfo.sizeInBytes for the merging segments
> ---
>
> Key: LUCENE-3051
> URL: https://issues.apache.org/jira/browse/LUCENE-3051
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3051.patch
>
>
> Selckin has been running Lucene's tests on the RT branch, and hit this:
> {noformat}
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testDeleteAllSlowly(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:535)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 67, Failures: 1, Errors: 0, Time elapsed: 38.357 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testDeleteAllSlowly 
> -Dtests.seed=-4291771462012978364:4550117847390778918
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Lucene Merge Thread #1 ***
> [junit] org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:290)
> [junit]   at 
> org.apache.lucene.store.MockDirectoryWrapper.fileLength(MockDirectoryWrapper.java:549)
> [junit]   at 
> org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:287)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3280)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2956)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
> f6=Pulsing(freqCutoff=15), f7=MockFixedIntBlock(blockSize=1606), 
> f8=SimpleText, f9=MockSep, f1=MockVariableIntBlock(baseBlockSize=99), 
> f0=MockFixedIntBlock(blockSize=1606), f3=Pulsing(freqCutoff=15), f2=MockSep, 
> f5=SimpleText, f4=Standard, f=MockFixedIntBlock(blockSize=1606), c=MockSep, 
> termVector=MockRandom, d9=MockFixedIntBlock(blockSize=1606), 
> d8=Pulsing(freqCutoff=15), d5=SimpleText, d4=Standard, d7=MockRandom, 
> d6=MockVariableIntBlock(baseBlockSize=99), d25=MockRandom, d0=MockRandom, 
> c29=MockFixedIntBlock(blockSize=1606), 
> d24=MockVariableIntBlock(baseBlockSize=99), d1=Standard, c28=Standard, 
> d23=SimpleText, d2=MockFixedIntBlock(blockSize=1606), c27=MockRandom, 
> d22=Standard, d3=MockVariableIntBlock(baseBlockSize=99), 
> d21=Pulsing(freqCutoff=15), d20=MockSep, 
> c22=MockFixedIntBlock(blockSize=1606), c21=Pulsing(freqCutoff=15), 
> c20=MockRandom, d29=MockFixedIntBlock(blockSize=1606), c26=Standard, 
> d28=Pulsing(freqCutoff=15), c25=MockRandom, d27=MockRandom, c24=MockSep, 
> d26=MockVariableIntBlock(baseBlockSize=99), c23=SimpleText, e9=MockRandom, 
> e8=MockSep, e7=SimpleText, e6=MockFixedIntBlock(blockSize=1606), 
> e5=Pulsing(freqCutoff=15), c17=MockFixedIntBlock(blockSize=1606), 
> e3=Standard, d12=MockVariableIntBlock(baseBlockSize=99), 
> c16=Pulsing(freqCutoff=15), e4=SimpleText, 
> d11=MockFixedIntBlock(blockSize=1606), c19=MockSep, e1=MockSep, 
> d14=Pulsing(freqCutoff=15), c18=SimpleText, e2=Pulsing(freqCutoff=15), 
> d13=MockSep, e0=MockVariableIntBlock(baseBlockSize=99), d10=Standard, 
> d19=MockVariableIntBlock(baseBlockSize=99), c11=SimpleText, c10=Standard, 
> d16=Pulsing(freqCutoff=15), c13=MockRandom, 
> c12=MockVariab

[jira] [Updated] (LUCENE-3051) don't call SegmentInfo.sizeInBytes for the merging segments

2011-04-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3051:
---

Attachment: LUCENE-3051.patch

Moves the computation of estimatedMergeBytes into mergeInit (sync'd on IW so 
it's safe to access the SI).

> don't call SegmentInfo.sizeInBytes for the merging segments
> ---
>
> Key: LUCENE-3051
> URL: https://issues.apache.org/jira/browse/LUCENE-3051
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3051.patch
>
>
> Selckin has been running Lucene's tests on the RT branch, and hit this:
> {noformat}
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testDeleteAllSlowly(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:535)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 67, Failures: 1, Errors: 0, Time elapsed: 38.357 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testDeleteAllSlowly 
> -Dtests.seed=-4291771462012978364:4550117847390778918
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Lucene Merge Thread #1 ***
> [junit] org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:290)
> [junit]   at 
> org.apache.lucene.store.MockDirectoryWrapper.fileLength(MockDirectoryWrapper.java:549)
> [junit]   at 
> org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:287)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3280)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2956)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
> f6=Pulsing(freqCutoff=15), f7=MockFixedIntBlock(blockSize=1606), 
> f8=SimpleText, f9=MockSep, f1=MockVariableIntBlock(baseBlockSize=99), 
> f0=MockFixedIntBlock(blockSize=1606), f3=Pulsing(freqCutoff=15), f2=MockSep, 
> f5=SimpleText, f4=Standard, f=MockFixedIntBlock(blockSize=1606), c=MockSep, 
> termVector=MockRandom, d9=MockFixedIntBlock(blockSize=1606), 
> d8=Pulsing(freqCutoff=15), d5=SimpleText, d4=Standard, d7=MockRandom, 
> d6=MockVariableIntBlock(baseBlockSize=99), d25=MockRandom, d0=MockRandom, 
> c29=MockFixedIntBlock(blockSize=1606), 
> d24=MockVariableIntBlock(baseBlockSize=99), d1=Standard, c28=Standard, 
> d23=SimpleText, d2=MockFixedIntBlock(blockSize=1606), c27=MockRandom, 
> d22=Standard, d3=MockVariableIntBlock(baseBlockSize=99), 
> d21=Pulsing(freqCutoff=15), d20=MockSep, 
> c22=MockFixedIntBlock(blockSize=1606), c21=Pulsing(freqCutoff=15), 
> c20=MockRandom, d29=MockFixedIntBlock(blockSize=1606), c26=Standard, 
> d28=Pulsing(freqCutoff=15), c25=MockRandom, d27=MockRandom, c24=MockSep, 
> d26=MockVariableIntBlock(baseBlockSize=99), c23=SimpleText, e9=MockRandom, 
> e8=MockSep, e7=SimpleText, e6=MockFixedIntBlock(blockSize=1606), 
> e5=Pulsing(freqCutoff=15), c17=MockFixedIntBlock(blockSize=1606), 
> e3=Standard, d12=MockVariableIntBlock(baseBlockSize=99), 
> c16=Pulsing(freqCutoff=15), e4=SimpleText, 
> d11=MockFixedIntBlock(blockSize=1606), c19=MockSep, e1=MockSep, 
> d14=Pulsing(freqCutoff=15), c18=SimpleText, e2=Pulsing(freqCutoff=15), 
> d13=MockSep, e0=MockVariableIntBlock(baseBlockSize=99), d10=Standard, 
> d19=MockVariableIntBlock(baseBlockSize=99), c11=SimpleText, c10=Standard, 
> d16=Pulsing(freqCutoff=15), c13=MockRandom, 
> c12=MockVariableIntBlock(baseBlockSize=99),

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027058#comment-13027058
 ] 

Michael Busch commented on LUCENE-3023:
---

Just wanted to say: you guys totally rock! Great teamwork here with all the 
work involved of getting the branch merged back. I'm sorry I couldn't help much 
in the last few weeks.

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023-svn-diff.patch, 
> LUCENE-3023-ws-changes.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_iw_iwc_jdoc.patch, 
> LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, 
> LUCENE-3023_svndiff.patch, diffMccand.py, diffSources.patch, 
> diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking

2011-04-29 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027054#comment-13027054
 ] 

David Smiley commented on LUCENE-3041:
--

Yes! I enthusiastically support introducing the visitor design pattern into the 
Query api.  I've polled the community on this before and got positive responses 
from a few committers but I haven't yet had the time to do anything.  It's 
great to see you've gotten the ball rolling Chris.  

I haven't looked at your patch yet.  Query.rewrite() is definitely a candidate 
for reworking in terms of this new pattern.

> Support Query Visting / Walking
> ---
>
> Key: LUCENE-3041
> URL: https://issues.apache.org/jira/browse/LUCENE-3041
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Chris Male
>Priority: Minor
> Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, 
> LUCENE-3041.patch
>
>
> Out of the discussion in LUCENE-2868, it could be useful to add a generic 
> Query Visitor / Walker that could be used for more advanced rewriting, 
> optimizations or anything that requires state to be stored as each Query is 
> visited.
> We could keep the interface very simple:
> {code}
> public interface QueryVisitor {
>   Query visit(Query query);
> }
> {code}
> and then use a reflection based visitor like Earwin suggested, which would 
> allow implementators to provide visit methods for just Querys that they are 
> interested in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3053) improve test coverage for Multi*

2011-04-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3053:


Attachment: LUCENE-3053.patch

Updated patch, fixes another false fail in xml-query-parser 
(http://www.selckin.be/trunk-3053-p2-0.txt)

> improve test coverage for Multi*
> 
>
> Key: LUCENE-3053
> URL: https://issues.apache.org/jira/browse/LUCENE-3053
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3053.patch, LUCENE-3053.patch, LUCENE-3053.patch
>
>
> It seems like an easy win that when the test calls newSearcher(), 
> it should sometimes wrap the reader with a SlowMultiReaderWrapper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3052) PerFieldCodecWrapper.loadTermsIndex concurrency problem

2011-04-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3052.


Resolution: Fixed

Committed a missing sync'd in the test's codec.

> PerFieldCodecWrapper.loadTermsIndex concurrency problem
> ---
>
> Key: LUCENE-3052
> URL: https://issues.apache.org/jira/browse/LUCENE-3052
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> Selckin's while(1) testing on RT branch hit another error:
> {noformat}
> [junit] Testsuite: org.apache.lucene.TestExternalCodecs
> [junit] Testcase: 
> testPerFieldCodec(org.apache.lucene.TestExternalCodecs):Caused an 
> ERROR
> [junit] (null)
> [junit] java.lang.NullPointerException
> [junit]   at 
> org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.loadTermsIndex(PerFieldCodecWrapper.java:202)
> [junit]   at 
> org.apache.lucene.index.SegmentReader.loadTermsIndex(SegmentReader.java:1005)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:652)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:609)
> [junit]   at 
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:276)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2660)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2651)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:381)
> [junit]   at 
> org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
> [junit]   at 
> org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:541)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.909 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestExternalCodecs 
> -Dtestmethod=testPerFieldCodec 
> -Dtests.seed=-7296204858082494534:5010909751437000758
> [junit] WARNING: test method: 'testPerFieldCodec' left thread running: 
> merge thread: _i(4.0):Cv130 _m(4.0):Cv30 _n(4.0):cv10 into _o
> [junit] RESOURCE LEAK: test method: 'testPerFieldCodec' left 1 thread(s) 
> running
> [junit] NOTE: test params are: codec=PreFlex, locale=zh_TW, 
> timezone=America/Santo_Domingo
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestDemo, TestExternalCodecs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=2,free=104153512,total=125632512
> [junit] -  ---
> [junit] TEST org.apache.lucene.TestExternalCodecs FAILED
> [junit] Exception in thread "Lucene Merge Thread #5" 
> org.apache.lucene.util.ThreadInterruptedException: 
> java.lang.InterruptedException: sleep interrupted
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:505)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.lang.InterruptedException: sleep interrupted
> [junit]   at java.lang.Thread.sleep(Native Method)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:503)
> [junit]   ... 1 more
> {noformat}
> I suspect this is also a trunk issue, but I can't reproduce it yet.
> I think this is happening because the codecs HashMap is changing (via another 
> thread), while .loadTermsIndex is called.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3053) improve test coverage for Multi*

2011-04-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3053:


Attachment: LUCENE-3053.patch

Update patch: fixes false fail in TestMatchAllDocsQuery found by selckin: 
http://www.selckin.be/trunk-3053-0.txt


> improve test coverage for Multi*
> 
>
> Key: LUCENE-3053
> URL: https://issues.apache.org/jira/browse/LUCENE-3053
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3053.patch, LUCENE-3053.patch
>
>
> It seems like an easy win that when the test calls newSearcher(), 
> it should sometimes wrap the reader with a SlowMultiReaderWrapper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3053) improve test coverage for Multi*

2011-04-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026995#comment-13026995
 ] 

Robert Muir commented on LUCENE-3053:
-

Ah please ignore that one: pretty sure this one is LUCENE-3025/LUCENE-2991 all 
over again... it fails on trunk too.

> improve test coverage for Multi*
> 
>
> Key: LUCENE-3053
> URL: https://issues.apache.org/jira/browse/LUCENE-3053
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3053.patch
>
>
> It seems like an easy win that when the test calls newSearcher(), 
> it should sometimes wrap the reader with a SlowMultiReaderWrapper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3053) improve test coverage for Multi*

2011-04-29 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026993#comment-13026993
 ] 

Robert Muir commented on LUCENE-3053:
-

I did hit one fail:

{noformat}
ant test -Dtestcase=TestIndexWriterExceptions 
-Dtestmethod=testExceptionsDuringCommit 
-Dtests.seed=-2996541401386755449:-7422779128529852458
{noformat}

Not sure if its windows-only, and likely unrelated, but for the seed to work 
you probably need to apply this patch...


> improve test coverage for Multi*
> 
>
> Key: LUCENE-3053
> URL: https://issues.apache.org/jira/browse/LUCENE-3053
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3053.patch
>
>
> It seems like an easy win that when the test calls newSearcher(), 
> it should sometimes wrap the reader with a SlowMultiReaderWrapper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3023:
--

Attachment: LUCENE-3023-ws-changes.patch

Here finally all whitespace changes in one patch. They will be committed, but 
are not included in the main patch.

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023-svn-diff.patch, 
> LUCENE-3023-ws-changes.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_iw_iwc_jdoc.patch, 
> LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, 
> LUCENE-3023_svndiff.patch, diffMccand.py, diffSources.patch, 
> diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3053) improve test coverage for Multi*

2011-04-29 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3053:


Attachment: LUCENE-3053.patch

Here's a patch, I think i fixed the various false fails, but it would be good 
to 'beast' the tests a few times to see if there are any left.

Also tried to make TestRegexpRandom2 meaner...

> improve test coverage for Multi*
> 
>
> Key: LUCENE-3053
> URL: https://issues.apache.org/jira/browse/LUCENE-3053
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3053.patch
>
>
> It seems like an easy win that when the test calls newSearcher(), 
> it should sometimes wrap the reader with a SlowMultiReaderWrapper.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3053) improve test coverage for Multi*

2011-04-29 Thread Robert Muir (JIRA)
improve test coverage for Multi*


 Key: LUCENE-3053
 URL: https://issues.apache.org/jira/browse/LUCENE-3053
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0
 Attachments: LUCENE-3053.patch

It seems like an easy win that when the test calls newSearcher(), 
it should sometimes wrap the reader with a SlowMultiReaderWrapper.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3023:
--

Attachment: LUCENE-3023-svn-diff.patch

Here is the final SVN diff. To work around some itches with SVN, the following 
was done:
- reverted everything outside lucene sub folder
- used the previously created manual diff to get a list of all changed files 
(using patchutils command lsdiff)
- used "svn -q status | sed 's/^//' > ../svn-files.txt" to get all 
files affected after merge
- intersect both files (lsdiff and svn status one) to find all files that are 
in reality unchanged, but still affected by SVN (these are all files that were 
added after branching - this is a known limitation of SVN. Files added after 
branching are "replaced" by merge reintegrate, loosing all history). Store 
those files in unchanged.txt
- use the intersected filelist and revert everything: cat ../unchanged.txt | 
xargs svn revert
- finally do a record-only merge again to fix mergeprops reverted by the 
previous revert

My checkout is now ready to commit.

If we have some minor problems with the patch, please wait with fixing after 
commit. If there are serious problems, we can fix them in realtime branch and 
merge manuall (I can do that later).

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023-svn-diff.patch, LUCENE-3023.patch, 
> LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
> LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
> diffSources.patch, diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7550 - Failure

2011-04-29 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7550/

1 tests failed.
REGRESSION:  org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2894)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589)
at java.lang.StringBuffer.append(StringBuffer.java:337)
at 
java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617)
at 
org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93)
at 
org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304)
at 
org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1091)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1023)




Build Log (for compile errors):
[...truncated 9240 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3023:
--

Attachment: LUCENE-3023.patch

I merged the freezed branch again.

Attached is a first patch for reviewing code changes (not SVN diff), created by 
the following command between 2 fresh checkouts, one of them "svn merge 
--reintegrate":

{noformat}
diff -urNb --strip-trailing-cr trunk-lusolr1 trunk-lusolr2 | filterdiff -x 
"*.svn*" --strip 1 --clean > LUCENE-3023.patch
{noformat}

This patch is not intended to be applied, its more to verify the changes 
(therefore all whitespace changes created by merging were excluded). 

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023.patch, LUCENE-3023_CHANGES.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
> LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
> diffSources.patch, diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Highlighting in fields other than content

2011-04-29 Thread Tommaso Teofili
I think your issue may depend on the keyword field having stored="false" or
the field type not defining a Tokenizer.
You may find the following useful:
http://wiki.apache.org/solr/FieldOptionsByUseCase
My 2 cents,
Tommaso

2011/4/29 Pavel Kukačka 

> Hello,
>
>I've got a (probably trivial) issue I can't resolve with Solr 3.1:
> I have a document with common fields (title, keywords, content) and I'm
> trying to use highlighting.
>With the content field there is no problem; works normally. However,
> when I search for a document via its keyword, the document is found, but
> the response doesn't have the highlighted snippet - there is only an
> empty node - like this:
> **
> .
> .
> .
> 
>  
> 
> 
> 
>
> As for the highlighting params, I have set:
>hl=on
>hl.fl=*
>
>
> If I just substitute the searchterm for something from the content, the
> resulting response is fine - like this:
> 
> .
> .
> .
> 
>  
>
>  ustanovení těchto VOP, ZOP, smlouvy či   id="highlighting">družstvem na straně jedné a
> Klienty na straně druhé
> na jiného
>
> .
> .
> .
> 
>
> Does anyone see what I've omitted?
>
> Cheers,
> Pavel
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Highlighting in fields other than content

2011-04-29 Thread Erick Erickson
What is your field definition for "keyword"?
In particular, is it stored?

This page might help.
http://wiki.apache.org/solr/FieldOptionsByUseCase?highlight=%28termvector%29|%28retrieve%29|%28contents%29

Best
Erick

On Fri, Apr 29, 2011 at 8:56 AM, Pavel Kukačka  wrote:
> Hello,
>
>        I've got a (probably trivial) issue I can't resolve with Solr 3.1:
> I have a document with common fields (title, keywords, content) and I'm
> trying to use highlighting.
>        With the content field there is no problem; works normally. However,
> when I search for a document via its keyword, the document is found, but
> the response doesn't have the highlighted snippet - there is only an
> empty node - like this:
> **
> .
> .
> .
> 
>  
> 
> 
> 
>
> As for the highlighting params, I have set:
>        hl=on
>        hl.fl=*
>
>
> If I just substitute the searchterm for something from the content, the
> resulting response is fine - like this:
> 
> .
> .
> .
> 
>  
>    
>  ustanovení těchto VOP, ZOP, smlouvy či   id="highlighting">družstvem na straně jedné a
> Klienty na straně druhé
> na jiného
>    
> .
> .
> .
> 
>
> Does anyone see what I've omitted?
>
> Cheers,
> Pavel
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Highlighting in fields other than content

2011-04-29 Thread Pavel Kukačka
Hello,

I've got a (probably trivial) issue I can't resolve with Solr 3.1: 
I have a document with common fields (title, keywords, content) and I'm
trying to use highlighting.
With the content field there is no problem; works normally. However,
when I search for a document via its keyword, the document is found, but
the response doesn't have the highlighted snippet - there is only an
empty node - like this:
**
.
.
.

  




As for the highlighting params, I have set:
hl=on
hl.fl=*


If I just substitute the searchterm for something from the content, the
resulting response is fine - like this:

.
.
.

  

 ustanovení těchto VOP, ZOP, smlouvy či  družstvem na straně jedné a
Klienty na straně druhé 
na jiného

.
.
.


Does anyone see what I've omitted?

Cheers,
Pavel


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2471) Localparams not working with 2 fq parameters using qt=name

2011-04-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026951#comment-13026951
 ] 

Yonik Seeley commented on SOLR-2471:


bq. Is it possible to have two QT parameters in the same call to Solr?

Nope, see the response from Hoss: "qt selects the request handler used, but 
when local params are parsed the handler is has already been choosen (there is 
only one handler per request)"


> Localparams not working with 2 fq parameters using qt=name
> --
>
> Key: SOLR-2471
> URL: https://issues.apache.org/jira/browse/SOLR-2471
> Project: Solr
>  Issue Type: Bug
>Reporter: Bill Bell
>
> We are having a problem with the following query. If we have two localparams 
> (using fq) and use QT= it does not work.
> This does not find any results:
> http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax
>  qt=namespec v=$qspec}&fq={!type=dismax qt=dismaxname 
> v=$qname}&q=_val_:"{!type=dismax qt=namespec v=$qspec}" _val_:"{!type=dismax 
> qt=dismaxname 
> v=$qname}"&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score
>  desc&rows=1000&start=0
> This works okay. It returns a few results.
> http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax
>  qf=$qqf v=$qspec}&fq={!type=dismax qt=dismaxname 
> v=$qname}&q=_val_:"{!type=dismax qf=$qqf  v=$qspec}" _val_:"{!type=dismax 
> qt=dismaxname v=$qname}" &qqf=specialties_ngram^1.0 
> specialties_search^2.0&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score
>  desc&rows=1000&start=0
> We would like to use a QT for both terms but it seems there is some kind of 
> bug when using two localparams and dismax filters with QT.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3052) PerFieldCodecWrapper.loadTermsIndex concurrency problem

2011-04-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3052:
--

Assignee: Michael McCandless

> PerFieldCodecWrapper.loadTermsIndex concurrency problem
> ---
>
> Key: LUCENE-3052
> URL: https://issues.apache.org/jira/browse/LUCENE-3052
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> Selckin's while(1) testing on RT branch hit another error:
> {noformat}
> [junit] Testsuite: org.apache.lucene.TestExternalCodecs
> [junit] Testcase: 
> testPerFieldCodec(org.apache.lucene.TestExternalCodecs):Caused an 
> ERROR
> [junit] (null)
> [junit] java.lang.NullPointerException
> [junit]   at 
> org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.loadTermsIndex(PerFieldCodecWrapper.java:202)
> [junit]   at 
> org.apache.lucene.index.SegmentReader.loadTermsIndex(SegmentReader.java:1005)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:652)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:609)
> [junit]   at 
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:276)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2660)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2651)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:381)
> [junit]   at 
> org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
> [junit]   at 
> org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:541)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.909 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestExternalCodecs 
> -Dtestmethod=testPerFieldCodec 
> -Dtests.seed=-7296204858082494534:5010909751437000758
> [junit] WARNING: test method: 'testPerFieldCodec' left thread running: 
> merge thread: _i(4.0):Cv130 _m(4.0):Cv30 _n(4.0):cv10 into _o
> [junit] RESOURCE LEAK: test method: 'testPerFieldCodec' left 1 thread(s) 
> running
> [junit] NOTE: test params are: codec=PreFlex, locale=zh_TW, 
> timezone=America/Santo_Domingo
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestDemo, TestExternalCodecs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=2,free=104153512,total=125632512
> [junit] -  ---
> [junit] TEST org.apache.lucene.TestExternalCodecs FAILED
> [junit] Exception in thread "Lucene Merge Thread #5" 
> org.apache.lucene.util.ThreadInterruptedException: 
> java.lang.InterruptedException: sleep interrupted
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:505)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.lang.InterruptedException: sleep interrupted
> [junit]   at java.lang.Thread.sleep(Native Method)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:503)
> [junit]   ... 1 more
> {noformat}
> I suspect this is also a trunk issue, but I can't reproduce it yet.
> I think this is happening because the codecs HashMap is changing (via another 
> thread), while .loadTermsIndex is called.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3052) PerFieldCodecWrapper.loadTermsIndex concurrency problem

2011-04-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026950#comment-13026950
 ] 

Michael McCandless commented on LUCENE-3052:


This repro line seems to work:
{noformat}
ant test-core -Dtestcase=TestExternalCodecs 
-Dtests.seed=-7296204858082494534:5010909751437000758 -Dtests.iter=200 
-Dtests.iter.min=1
{noformat}

> PerFieldCodecWrapper.loadTermsIndex concurrency problem
> ---
>
> Key: LUCENE-3052
> URL: https://issues.apache.org/jira/browse/LUCENE-3052
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> Selckin's while(1) testing on RT branch hit another error:
> {noformat}
> [junit] Testsuite: org.apache.lucene.TestExternalCodecs
> [junit] Testcase: 
> testPerFieldCodec(org.apache.lucene.TestExternalCodecs):Caused an 
> ERROR
> [junit] (null)
> [junit] java.lang.NullPointerException
> [junit]   at 
> org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.loadTermsIndex(PerFieldCodecWrapper.java:202)
> [junit]   at 
> org.apache.lucene.index.SegmentReader.loadTermsIndex(SegmentReader.java:1005)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:652)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:609)
> [junit]   at 
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:276)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2660)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2651)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:381)
> [junit]   at 
> org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
> [junit]   at 
> org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:541)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.909 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestExternalCodecs 
> -Dtestmethod=testPerFieldCodec 
> -Dtests.seed=-7296204858082494534:5010909751437000758
> [junit] WARNING: test method: 'testPerFieldCodec' left thread running: 
> merge thread: _i(4.0):Cv130 _m(4.0):Cv30 _n(4.0):cv10 into _o
> [junit] RESOURCE LEAK: test method: 'testPerFieldCodec' left 1 thread(s) 
> running
> [junit] NOTE: test params are: codec=PreFlex, locale=zh_TW, 
> timezone=America/Santo_Domingo
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestDemo, TestExternalCodecs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=2,free=104153512,total=125632512
> [junit] -  ---
> [junit] TEST org.apache.lucene.TestExternalCodecs FAILED
> [junit] Exception in thread "Lucene Merge Thread #5" 
> org.apache.lucene.util.ThreadInterruptedException: 
> java.lang.InterruptedException: sleep interrupted
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:505)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.lang.InterruptedException: sleep interrupted
> [junit]   at java.lang.Thread.sleep(Native Method)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:503)
> [junit]   ... 1 more
> {noformat}
> I suspect this is also a trunk issue, but I can't reproduce it yet.
> I think this is happening because the codecs HashMap is changing (via another 
> thread), while .loadTermsIndex is called.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3052) PerFieldCodecWrapper.loadTermsIndex concurrency problem

2011-04-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3052:
---

Affects Version/s: 4.0
Fix Version/s: 4.0

> PerFieldCodecWrapper.loadTermsIndex concurrency problem
> ---
>
> Key: LUCENE-3052
> URL: https://issues.apache.org/jira/browse/LUCENE-3052
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
> Fix For: 4.0
>
>
> Selckin's while(1) testing on RT branch hit another error:
> {noformat}
> [junit] Testsuite: org.apache.lucene.TestExternalCodecs
> [junit] Testcase: 
> testPerFieldCodec(org.apache.lucene.TestExternalCodecs):Caused an 
> ERROR
> [junit] (null)
> [junit] java.lang.NullPointerException
> [junit]   at 
> org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.loadTermsIndex(PerFieldCodecWrapper.java:202)
> [junit]   at 
> org.apache.lucene.index.SegmentReader.loadTermsIndex(SegmentReader.java:1005)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:652)
> [junit]   at 
> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:609)
> [junit]   at 
> org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:276)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2660)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2651)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:381)
> [junit]   at 
> org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
> [junit]   at 
> org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:541)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.909 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestExternalCodecs 
> -Dtestmethod=testPerFieldCodec 
> -Dtests.seed=-7296204858082494534:5010909751437000758
> [junit] WARNING: test method: 'testPerFieldCodec' left thread running: 
> merge thread: _i(4.0):Cv130 _m(4.0):Cv30 _n(4.0):cv10 into _o
> [junit] RESOURCE LEAK: test method: 'testPerFieldCodec' left 1 thread(s) 
> running
> [junit] NOTE: test params are: codec=PreFlex, locale=zh_TW, 
> timezone=America/Santo_Domingo
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestDemo, TestExternalCodecs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=2,free=104153512,total=125632512
> [junit] -  ---
> [junit] TEST org.apache.lucene.TestExternalCodecs FAILED
> [junit] Exception in thread "Lucene Merge Thread #5" 
> org.apache.lucene.util.ThreadInterruptedException: 
> java.lang.InterruptedException: sleep interrupted
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:505)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.lang.InterruptedException: sleep interrupted
> [junit]   at java.lang.Thread.sleep(Native Method)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:503)
> [junit]   ... 1 more
> {noformat}
> I suspect this is also a trunk issue, but I can't reproduce it yet.
> I think this is happening because the codecs HashMap is changing (via another 
> thread), while .loadTermsIndex is called.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3052) PerFieldCodecWrapper.loadTermsIndex concurrency problem

2011-04-29 Thread Michael McCandless (JIRA)
PerFieldCodecWrapper.loadTermsIndex concurrency problem
---

 Key: LUCENE-3052
 URL: https://issues.apache.org/jira/browse/LUCENE-3052
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Michael McCandless


Selckin's while(1) testing on RT branch hit another error:
{noformat}
[junit] Testsuite: org.apache.lucene.TestExternalCodecs
[junit] Testcase: testPerFieldCodec(org.apache.lucene.TestExternalCodecs):  
Caused an ERROR
[junit] (null)
[junit] java.lang.NullPointerException
[junit] at 
org.apache.lucene.index.PerFieldCodecWrapper$FieldsReader.loadTermsIndex(PerFieldCodecWrapper.java:202)
[junit] at 
org.apache.lucene.index.SegmentReader.loadTermsIndex(SegmentReader.java:1005)
[junit] at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:652)
[junit] at 
org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:609)
[junit] at 
org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:276)
[junit] at 
org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2660)
[junit] at 
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2651)
[junit] at 
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:381)
[junit] at 
org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
[junit] at 
org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:541)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
[junit] 
[junit] 
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.909 sec
[junit] 
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestExternalCodecs 
-Dtestmethod=testPerFieldCodec 
-Dtests.seed=-7296204858082494534:5010909751437000758
[junit] WARNING: test method: 'testPerFieldCodec' left thread running: 
merge thread: _i(4.0):Cv130 _m(4.0):Cv30 _n(4.0):cv10 into _o
[junit] RESOURCE LEAK: test method: 'testPerFieldCodec' left 1 thread(s) 
running
[junit] NOTE: test params are: codec=PreFlex, locale=zh_TW, 
timezone=America/Santo_Domingo
[junit] NOTE: all tests run in this JVM:
[junit] [TestDemo, TestExternalCodecs]
[junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
(64-bit)/cpus=8,threads=2,free=104153512,total=125632512
[junit] -  ---
[junit] TEST org.apache.lucene.TestExternalCodecs FAILED
[junit] Exception in thread "Lucene Merge Thread #5" 
org.apache.lucene.util.ThreadInterruptedException: 
java.lang.InterruptedException: sleep interrupted
[junit] at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:505)
[junit] at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
[junit] Caused by: java.lang.InterruptedException: sleep interrupted
[junit] at java.lang.Thread.sleep(Native Method)
[junit] at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:503)
[junit] ... 1 more
{noformat}

I suspect this is also a trunk issue, but I can't reproduce it yet.

I think this is happening because the codecs HashMap is changing (via another 
thread), while .loadTermsIndex is called.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: jira issues falling off the radar -- "Next" JIRA version

2011-04-29 Thread Michael McCandless
On Fri, Apr 29, 2011 at 12:12 AM, David Smiley (@MITRE.org)
 wrote:

> (Comments on SOLR-2191 between Mark & I were starting to get off-topic with
> respect to the issue so I am continuing the conversation here)
>
> A lot of JIRA issues seem to fall off the radar, IMO. I'm talking about
> issues that have patches and are basically ready to go.  There are multiple
> ways to address this but at the moment I am going to just bring up one.
> Looking at the versions in JIRA one can assign an issue to
> https://issues.apache.org/jira/browse/SOLR#selectedTab=com.atlassian.jira.plugin.system.project%3Aversions-panel
> I see the version named "Next", with this description: "Placeholder for
> commiters to track issues that are not ready to commit, but seem close
> enough to being ready to warrant focus before the next feature release".
> This version and what it implies is a common pattern in use of JIRA that I
> too use for projects I manage for my employer. It appears that for the 3.1
> release, nobody looked through the issues assigned to "Next", and
> consequently, some issues like SOLR-2191 were forgotten despite being ready
> to go.  Looking through the wiki I see information on how to do a release
> http://wiki.apache.org/solr/HowToRelease and release suggestions but no
> information on what to do in advance of a release.  I also don't see any
> administrative tasks on managing the "Next" version in JIRA.  So I think
> either the "Next" version should be used effectively, or if that isn't going
> to happen then delete this version.

I agree Next is dangerous!

It'd be nice if Jira could auto-magically treat Next as whatever
release really is "next".  EG, say we all agree 3.2 is our next
release, then ideally Jira would treat all Next issues as if they were
marked with 3.2.

But... lacking that, maybe we really shouldn't use Next at all, and
just use 3.2?  Having to step through these issues and move them to
the next release on releasing is also healthy, ie, it's good that we
see/review them, think about why we didn't get it done on the current
release, etc.

> On a related note, I don't know what to make of the "1.5" version, nor what
> to make of issues marked as Closed for "Next".  Some house cleaning is in
> order.

We should clean these up.  Should we just roll them over to 3.2?

Mike

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2104) DIH special command $deleteDocById dosn't skip the document and doesn't increment the deleted statistics

2011-04-29 Thread Juan Pablo Mora (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026939#comment-13026939
 ] 

Juan Pablo Mora commented on SOLR-2104:
---

In Solr 3.1 doesn't update the statistics also. I think is a bug.

> DIH special command $deleteDocById dosn't skip the document and doesn't 
> increment the deleted statistics
> 
>
> Key: SOLR-2104
> URL: https://issues.apache.org/jira/browse/SOLR-2104
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4, 1.4.1
>Reporter: Ephraim Ofir
>Priority: Minor
>
> 1. Not sure it's a bug, but looks like a bug to me - if the query returns any 
> values other than $deleteDocById for the row you want deleted, it deletes the 
> row but also re-adds it with the rest of the data, so in effect the row isn't 
> deleted.  In order to work around this issue, you have to either make sure no 
> data other than $deleteDocById= exists in rows to be deleted or add 
> $skipDoc='true'
> (which I think is a little counter-intuitive, but was the better choice in my 
> case).  My query looks something like:
> SELECT u.id,
>u.name,
>...
>IF(u.delete_flag > 0, u.id, NULL) AS $deleteDocById,
>IF(u.delete_flag > 0, 'true', NULL) AS $skipDoc FROM users_tb u
> 2. $deleteDocById doesn't update the statistics of deleted documents.
> This has 2 downsides, the obvious one is that you don't know if/how many 
> documents were deleted, the not-so-obvious one is that if your import 
> contains only deleted items, it won't be committed automatically by DIH and 
> you'll have to commit it manually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Code Freeze on realtime_search branch

2011-04-29 Thread Simon Willnauer
Hey folks,

LUCENE-3023 aims to land the considerably large
DocumentsWriterPerThread (DWPT) refactoring on trunk.
During the last weeks we have put lots of efforts into cleaning the
code up, fixing javadocs and run test locally
as well as on Jenkins. We reached the point where we are able to
create a final patch for review and land this
exciting refactoring on trunk very soon. I committed the CHANGES.TXT
entry (also appended below) a couple of minutes ago so from now on
we freeze the branch for final review (Robert can you create a new
"final" patch and upload to LUCENE-3023).
Any comments should go to [1] or as a reply to this email. If there is
no blocker coming up we plan to reintegrate the
branch and commit it to trunk early next week. For those who want some
background what DWPT does read: [2]

Note: this change will not change the index file format so there is no
need to reindex for trunk users. Yet, I will send a heads up next week
with an
overview of that has changed.

Simon

[1] https://issues.apache.org/jira/browse/LUCENE-3023
[2] http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/


* LUCENE-2956, LUCENE-2573, LUCENE-2324, LUCENE-2555: Changes from
  DocumentsWriterPerThread:

  - IndexWriter now uses a DocumentsWriter per thread when indexing documents.
Each DocumentsWriterPerThread indexes documents in its own private segment,
and the in memory segments are no longer merged on flush.  Instead, each
segment is separately flushed to disk and subsequently merged with normal
segment merging.

  - DocumentsWriterPerThread (DWPT) is now flushed concurrently based on a
FlushPolicy.  When a DWPT is flushed, a fresh DWPT is swapped in so that
indexing may continue concurrently with flushing.  The selected
DWPT flushes all its RAM resident documents do disk.  Note: Segment flushes
don't flush all RAM resident documents but only the documents private to
the DWPT selected for flushing.

  - Flushing is now controlled by FlushPolicy that is called for every add,
update or delete on IndexWriter. By default DWPTs are flushed either on
maxBufferedDocs per DWPT or the global active used memory. Once the active
memory exceeds ramBufferSizeMB only the largest DWPT is selected for
flushing and the memory used by this DWPT is substracted from the active
memory and added to a flushing memory pool, which can lead to temporarily
higher memory usage due to ongoing indexing.

  - IndexWriter now can utilize ramBufferSize > 2048 MB. Each DWPT can address
up to 2048 MB memory such that the ramBufferSize is now bounded by the max
number of DWPT avaliable in the used DocumentsWriterPerThreadPool.
IndexWriters net memory consumption can grow far beyond the 2048 MB limit if
the applicatoin can use all available DWPTs. To prevent a DWPT from
exhausting its address space IndexWriter will forcefully flush a DWPT if its
hard memory limit is exceeded. The RAMPerThreadHardLimitMB can be controlled
via IndexWriterConfig and defaults to 1945 MB.
Since IndexWriter flushes DWPT concurrently not all memory is released
immediately. Applications should still use a ramBufferSize significantly
lower than the JVMs avaliable heap memory since under high load multiple
flushing DWPT can consume substantial transient memory when IO performance
is slow relative to indexing rate.

  - IndexWriter#commit now doesn't block concurrent indexing while flushing all
'currently' RAM resident documents to disk. Yet, flushes that occur while a
a full flush is running are queued and will happen after all DWPT involved
in the full flush are done flushing. Applications using multiple threads
during indexing and trigger a full flush (eg call commmit() or open a new
NRT reader) can use significantly more transient memory.

  - IndexWriter#addDocument and IndexWriter.updateDocument can block indexing
threads if the number of active + number of flushing DWPT exceed a
safety limit. By default this happens if 2 * max number available thread
states (DWPTPool) is exceeded. This safety limit prevents applications from
exhausting their available memory if flushing can't keep up with
concurrently indexing threads.

  - IndexWriter only applies and flushes deletes if the maxBufferedDelTerms
limit is reached during indexing. No segment flushes will be triggered
due to this setting.

  - IndexWriter#flush(boolean, boolean) doesn't synchronized on IndexWriter
anymore. A dedicated flushLock has been introduced to prevent multiple full-
flushes happening concurrently.

  - DocumentsWriter doesn't write shared doc stores anymore.

  (Mike McCandless, Michael Busch, Simon Willnauer)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3051) don't call SegmentInfo.sizeInBytes for the merging segments

2011-04-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3051:
--

Assignee: Michael McCandless

> don't call SegmentInfo.sizeInBytes for the merging segments
> ---
>
> Key: LUCENE-3051
> URL: https://issues.apache.org/jira/browse/LUCENE-3051
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
>
> Selckin has been running Lucene's tests on the RT branch, and hit this:
> {noformat}
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testDeleteAllSlowly(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:535)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
> [junit] 
> [junit] 
> [junit] Tests run: 67, Failures: 1, Errors: 0, Time elapsed: 38.357 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testDeleteAllSlowly 
> -Dtests.seed=-4291771462012978364:4550117847390778918
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Lucene Merge Thread #1 ***
> [junit] org.apache.lucene.index.MergePolicy$MergeException: 
> java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
> [junit] Caused by: java.io.FileNotFoundException: _4_1.del
> [junit]   at 
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:290)
> [junit]   at 
> org.apache.lucene.store.MockDirectoryWrapper.fileLength(MockDirectoryWrapper.java:549)
> [junit]   at 
> org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:287)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3280)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2956)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
> [junit]   at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
> [junit] NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
> f6=Pulsing(freqCutoff=15), f7=MockFixedIntBlock(blockSize=1606), 
> f8=SimpleText, f9=MockSep, f1=MockVariableIntBlock(baseBlockSize=99), 
> f0=MockFixedIntBlock(blockSize=1606), f3=Pulsing(freqCutoff=15), f2=MockSep, 
> f5=SimpleText, f4=Standard, f=MockFixedIntBlock(blockSize=1606), c=MockSep, 
> termVector=MockRandom, d9=MockFixedIntBlock(blockSize=1606), 
> d8=Pulsing(freqCutoff=15), d5=SimpleText, d4=Standard, d7=MockRandom, 
> d6=MockVariableIntBlock(baseBlockSize=99), d25=MockRandom, d0=MockRandom, 
> c29=MockFixedIntBlock(blockSize=1606), 
> d24=MockVariableIntBlock(baseBlockSize=99), d1=Standard, c28=Standard, 
> d23=SimpleText, d2=MockFixedIntBlock(blockSize=1606), c27=MockRandom, 
> d22=Standard, d3=MockVariableIntBlock(baseBlockSize=99), 
> d21=Pulsing(freqCutoff=15), d20=MockSep, 
> c22=MockFixedIntBlock(blockSize=1606), c21=Pulsing(freqCutoff=15), 
> c20=MockRandom, d29=MockFixedIntBlock(blockSize=1606), c26=Standard, 
> d28=Pulsing(freqCutoff=15), c25=MockRandom, d27=MockRandom, c24=MockSep, 
> d26=MockVariableIntBlock(baseBlockSize=99), c23=SimpleText, e9=MockRandom, 
> e8=MockSep, e7=SimpleText, e6=MockFixedIntBlock(blockSize=1606), 
> e5=Pulsing(freqCutoff=15), c17=MockFixedIntBlock(blockSize=1606), 
> e3=Standard, d12=MockVariableIntBlock(baseBlockSize=99), 
> c16=Pulsing(freqCutoff=15), e4=SimpleText, 
> d11=MockFixedIntBlock(blockSize=1606), c19=MockSep, e1=MockSep, 
> d14=Pulsing(freqCutoff=15), c18=SimpleText, e2=Pulsing(freqCutoff=15), 
> d13=MockSep, e0=MockVariableIntBlock(baseBlockSize=99), d10=Standard, 
> d19=MockVariableIntBlock(baseBlockSize=99), c11=SimpleText, c10=Standard, 
> d16=Pulsing(freqCutoff=15), c13=MockRandom, 
> c12=MockVariableIntBlock(baseBlockSize=99), d15=MockSep, d18=SimpleText, 
> c15=MockFixedIntBlock(blockSize=1606), d17=Standard, 
> c14=Pulsing(freqCutoff=15), b3=MockSep, b2=SimpleText, b5

[jira] [Created] (LUCENE-3051) don't call SegmentInfo.sizeInBytes for the merging segments

2011-04-29 Thread Michael McCandless (JIRA)
don't call SegmentInfo.sizeInBytes for the merging segments
---

 Key: LUCENE-3051
 URL: https://issues.apache.org/jira/browse/LUCENE-3051
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0


Selckin has been running Lucene's tests on the RT branch, and hit this:
{noformat}
[junit] Testsuite: org.apache.lucene.index.TestIndexWriter
[junit] Testcase: 
testDeleteAllSlowly(org.apache.lucene.index.TestIndexWriter): FAILED
[junit] Some threads threw uncaught exceptions!
[junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
exceptions!
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:535)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175)
[junit] 
[junit] 
[junit] Tests run: 67, Failures: 1, Errors: 0, Time elapsed: 38.357 sec
[junit] 
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testDeleteAllSlowly 
-Dtests.seed=-4291771462012978364:4550117847390778918
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: Lucene Merge Thread #1 ***
[junit] org.apache.lucene.index.MergePolicy$MergeException: 
java.io.FileNotFoundException: _4_1.del
[junit] at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507)
[junit] at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:472)
[junit] Caused by: java.io.FileNotFoundException: _4_1.del
[junit] at 
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:290)
[junit] at 
org.apache.lucene.store.MockDirectoryWrapper.fileLength(MockDirectoryWrapper.java:549)
[junit] at 
org.apache.lucene.index.SegmentInfo.sizeInBytes(SegmentInfo.java:287)
[junit] at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3280)
[junit] at 
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2956)
[junit] at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379)
[junit] at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447)
[junit] NOTE: test params are: codec=RandomCodecProvider: {=SimpleText, 
f6=Pulsing(freqCutoff=15), f7=MockFixedIntBlock(blockSize=1606), f8=SimpleText, 
f9=MockSep, f1=MockVariableIntBlock(baseBlockSize=99), 
f0=MockFixedIntBlock(blockSize=1606), f3=Pulsing(freqCutoff=15), f2=MockSep, 
f5=SimpleText, f4=Standard, f=MockFixedIntBlock(blockSize=1606), c=MockSep, 
termVector=MockRandom, d9=MockFixedIntBlock(blockSize=1606), 
d8=Pulsing(freqCutoff=15), d5=SimpleText, d4=Standard, d7=MockRandom, 
d6=MockVariableIntBlock(baseBlockSize=99), d25=MockRandom, d0=MockRandom, 
c29=MockFixedIntBlock(blockSize=1606), 
d24=MockVariableIntBlock(baseBlockSize=99), d1=Standard, c28=Standard, 
d23=SimpleText, d2=MockFixedIntBlock(blockSize=1606), c27=MockRandom, 
d22=Standard, d3=MockVariableIntBlock(baseBlockSize=99), 
d21=Pulsing(freqCutoff=15), d20=MockSep, c22=MockFixedIntBlock(blockSize=1606), 
c21=Pulsing(freqCutoff=15), c20=MockRandom, 
d29=MockFixedIntBlock(blockSize=1606), c26=Standard, 
d28=Pulsing(freqCutoff=15), c25=MockRandom, d27=MockRandom, c24=MockSep, 
d26=MockVariableIntBlock(baseBlockSize=99), c23=SimpleText, e9=MockRandom, 
e8=MockSep, e7=SimpleText, e6=MockFixedIntBlock(blockSize=1606), 
e5=Pulsing(freqCutoff=15), c17=MockFixedIntBlock(blockSize=1606), e3=Standard, 
d12=MockVariableIntBlock(baseBlockSize=99), c16=Pulsing(freqCutoff=15), 
e4=SimpleText, d11=MockFixedIntBlock(blockSize=1606), c19=MockSep, e1=MockSep, 
d14=Pulsing(freqCutoff=15), c18=SimpleText, e2=Pulsing(freqCutoff=15), 
d13=MockSep, e0=MockVariableIntBlock(baseBlockSize=99), d10=Standard, 
d19=MockVariableIntBlock(baseBlockSize=99), c11=SimpleText, c10=Standard, 
d16=Pulsing(freqCutoff=15), c13=MockRandom, 
c12=MockVariableIntBlock(baseBlockSize=99), d15=MockSep, d18=SimpleText, 
c15=MockFixedIntBlock(blockSize=1606), d17=Standard, 
c14=Pulsing(freqCutoff=15), b3=MockSep, b2=SimpleText, b5=Standard, 
b4=MockRandom, b7=MockVariableIntBlock(baseBlockSize=99), 
b6=MockFixedIntBlock(blockSize=1606), d50=MockFixedIntBlock(blockSize=1606), 
b9=Pulsing(freqCutoff=15), b8=MockSep, d43=MockSep, d42=SimpleText, 
d41=MockFixedIntBlock(blockSize=1606), d40=Pulsing(freqCutoff=15), 
d47=MockVariableIntBlock(baseBlockSize=99), 
d46=MockFixedIntBlock(blockSize=1606), 
b0=MockVariableIntBlock(baseBlockSize=99), d45=Standard, b1

[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026932#comment-13026932
 ] 

Simon Willnauer commented on LUCENE-3023:
-

I committed the CHANGES.TXT patch to branch. I think we should freeze the 
branch now so robert can create a last final patch. We should let that patch 
linger around for a while, yet I plan to commit this to trunk on monday. Good 
work everybody! 

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
> LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
> diffSources.patch, diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3023:
---

Attachment: LUCENE-3023_CHANGES.patch

Small edits to Simon's CHANGES entry.

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_CHANGES.patch, 
> LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
> LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
> diffSources.patch, diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3023:
---

Attachment: diffSources.patch

Iteration on diffSources.py -- adds usage line, copyright header.  I think it's 
ready to be committed!

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_iw_iwc_jdoc.patch, 
> LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, 
> LUCENE-3023_svndiff.patch, diffMccand.py, diffSources.patch, 
> diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3023:


Attachment: LUCENE-3023_CHANGES.patch

here is my first cut at CHANGES.TXT for landing on trunk. Review would be much 
appreciated.

> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023_CHANGES.patch, LUCENE-3023_iw_iwc_jdoc.patch, 
> LUCENE-3023_simonw_review.patch, LUCENE-3023_svndiff.patch, 
> LUCENE-3023_svndiff.patch, diffMccand.py, diffSources.patch, 
> realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3023) Land DWPT on trunk

2011-04-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026894#comment-13026894
 ] 

Simon Willnauer commented on LUCENE-3023:
-

bq. I put it under a new 'dev-tools/scripts' dir...
+1
mike can you add a little doc string to the script explaining what it does and 
how to use it? I think we should also have a wiki page that explains how to 
reintegrate a branch just like we have one for merging changes into a branch.


> Land DWPT on trunk
> --
>
> Key: LUCENE-3023
> URL: https://issues.apache.org/jira/browse/LUCENE-3023
> Project: Lucene - Java
>  Issue Type: Task
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
> Attachments: LUCENE-3023.patch, LUCENE-3023.patch, LUCENE-3023.patch, 
> LUCENE-3023_iw_iwc_jdoc.patch, LUCENE-3023_simonw_review.patch, 
> LUCENE-3023_svndiff.patch, LUCENE-3023_svndiff.patch, diffMccand.py, 
> diffSources.patch, realtime-TestAddIndexes-3.txt, 
> realtime-TestAddIndexes-5.txt, 
> realtime-TestIndexWriterExceptions-assert-6.txt, 
> realtime-TestIndexWriterExceptions-npe-1.txt, 
> realtime-TestIndexWriterExceptions-npe-2.txt, 
> realtime-TestIndexWriterExceptions-npe-4.txt, 
> realtime-TestOmitTf-corrupt-0.txt
>
>
> With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so 
> we can proceed landing the DWPT development on trunk soon. I think one of the 
> bigger issues here is to make sure that all JavaDocs for IW etc. are still 
> correct though. I will start going through that first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org