[jira] Resolved: (LUCENE-1554) Problem with IndexWriter.mergeFinish

2009-03-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1554.


   Resolution: Fixed
Fix Version/s: 2.9

I couldn't get the test to fail, but I can see one code path (if mergeInit hits 
an exception) that would trip the assert incorrectly, so I committed that fix.  
Thanks Scott!

> Problem with IndexWriter.mergeFinish
> 
>
> Key: LUCENE-1554
> URL: https://issues.apache.org/jira/browse/LUCENE-1554
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Scott Garland
>Assignee: Michael McCandless
> Fix For: 2.9
>
>
> I'm getting a (very) infrequent assert in IndexWriter.mergeFinish from 
> TestIndexWriter.testAddIndexOnDiskFull. The problem occurs during the 
> rollback when the merge hasn't been registered. I'm not 100% sure this is the 
> correct fix, because it's such an infrequent event. 
> {code:java}
>   final synchronized void mergeFinish(MergePolicy.OneMerge merge) throws 
> IOException {
> 
> // Optimize, addIndexes or finishMerges may be waiting
> // on merges to finish.
> notifyAll();
> if (merge.increfDone)
>   decrefMergeSegments(merge);
> assert merge.registerDone;
> final SegmentInfos sourceSegments = merge.segments;
> final int end = sourceSegments.size();
> for(int i=0;i   mergingSegments.remove(sourceSegments.info(i));
> mergingSegments.remove(merge.info);
> merge.registerDone = false;
>   }
> {code}
> Should  be something like:
> {code:java}
>   final synchronized void mergeFinish(MergePolicy.OneMerge merge) throws 
> IOException {
> 
> // Optimize, addIndexes or finishMerges may be waiting
> // on merges to finish.
> notifyAll();
> if (merge.increfDone)
>   decrefMergeSegments(merge);
> if (merge.registerDone) {
>   final SegmentInfos sourceSegments = merge.segments;
>   final int end = sourceSegments.size();
>   for(int i=0;i mergingSegments.remove(sourceSegments.info(i));
>   mergingSegments.remove(merge.info);
>   merge.registerDone = false;
> }
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679576#action_12679576
 ] 

Michael McCandless commented on LUCENE-1516:


{quote}
TestIndexReader.testDocsOutOfOrderJIRA140 fails because IW.close
isn't called before dir.close. Is this a bug in the unit test?
{quote}
Yes, this looks like a bug in the test.  This also means when we eventually 
commit this, we'll have to first fix that bug on the back-compat tests branch.

{quote}
TestIndexWriterDelete.testOperationsOnDiskFull fails with
MockRAMDirectory.close (still open files) because IW.close isn't
called.
{quote}
I don't understand: that test looks like it does call IW.close for all IW's 
opened?  (It's a little tricky, because modifier.close gets called the 2nd time 
the for(int x = 0...) loop runs).

{quote}
TestIndexWriter.testImmediateDiskFullWithThreads fails because
IW.close fails on the disk full exception. Should IW.closeInternal ->
SegmentReaderPool.close be placed in the finally clause?
{quote}
Does the 2nd call to close (close(false)) also hit an exception?  Perhaps, 
modify the test so that if that 2nd close hits an exception, call abort?

It's good that you're down to mainly the exceptions-based test failures... 
though I think you should focus more on the bigger structural changes to the 
approach (eg switching back to docMap for merging deletes, adding 
SegmentReaderPool.release, which should write changes to the dir & calling it 
from all places that do a .get(), and the other comments above) before trying 
to get all tests to pass.

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [VOTE] Release 2.4.1, take 2

2009-03-06 Thread Grant Ingersoll

+1

On Mar 4, 2009, at 5:27 PM, Michael McCandless wrote:



This is a new vote!

I've re-built the release artifacts (to include LUCENE-1552 fix),
derived from revision 750176 on the 2.4 branch. Here are the changes:

 
http://people.apache.org/~mikemccand/staging-area/lucene2.4.1rc2/changes/Changes.html

Please vote to release these artifacts as 2.4.1:

 http://people.apache.org/~mikemccand/staging-area/lucene2.4.1rc2/dist

Here's my +1.

Mike


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1536) if a filter can support random access API, we should use it

2009-03-06 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679665#action_12679665
 ] 

Paul Elschot commented on LUCENE-1536:
--

For the skipToButNotNext() did you mean sth like LUCENE-1252  ?

> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1536) if a filter can support random access API, we should use it

2009-03-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679690#action_12679690
 ] 

Michael McCandless commented on LUCENE-1536:



Ahh, that's actually something different but also a neat idea to
explore.

I want a way to skipTo(docID) without having it then internally do a
next() if the docID was not accepted.  Basically a random-access
"accepts(int docID)" API, that's called only on increasing docIDs.
Implementing "accepts" for queries is often alot simpler than
implementing next/skipTo.

LUCENE-1252 wants a way to expose access to the two constraints within
a single query separately.  EG a phrase search 1) must have all N
terms, and 2) must have them in the right positions.  But if you could
check only 1), and if it passes next check the filter on the search,
and if it still passes go back and check 2), then that could give
better search performance.

I think there's decent room for improving search performance of
complex queries.


> if a filter can support random access API, we should use it
> ---
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.4
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.9 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
>   * Method high means I use random-access filter API in
> IndexSearcher's main loop.  Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-06 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1516:
-

Attachment: LUCENE-1516.patch

{quote} On an explicit commit(), we should also sweep the pool and
write changes to disk for any SR that has pending changes. {quote}

I created SegmentReaderPool.commitAll which commits changes for all
SRs in the pool in IW.startCommit.

bq. applyDeletes at the top of commitMergedDeletes

Removed

bq. switching back to docMap for merging deletes

Where should I get the numDeletedDocs from? (Used for docUpto +=
docCount - previousReader.numDeletedDocs()) Should the entire
docIdMap be scanned? Is there an expense in cloning the segment
readers besides the extra bitvectors?

* SegmentReaderPool.release is implemented instead of using
reader.decRef. I think you're saying put this patch's decRef logic in
the SRP.release method? 

* Added IW.close to TestIndexReader.testDocsOutOfOrderJIRA140

* TestTransactions may sometimes be failing legitimately during the
prepareCommit, the exception can't be reliably reproduced but perhaps
another test case can be written that does

* TestIndexWriterDelete.testErrorAfterApplyDeletes fails due to
IW.commit not throwing an expected exception 

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679747#action_12679747
 ] 

Michael McCandless commented on LUCENE-1516:


Jason can you resync to trunk?

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679758#action_12679758
 ] 

Michael McCandless commented on LUCENE-1516:



bq. Where should I get the numDeletedDocs from?

SegmentMerger.getDelCounts()

bq. Should the entire docIdMap be scanned?

Yes, when the before/after delCount is different.

bq. Is there an expense in cloning the segment readers besides the extra 
bitvectors?

Copy-on-write of the bitvectors (time and space) is the biggest cost I
think.  Since docMap already has everything we need, I think we should
use it... we could even separate out this change and do it first, if
you want.

{quote}
I think you're saying put this patch's decRef logic in
the SRP.release method?
{quote}

Actually the release method should always commit changes, if there are
any, when an SR is removed SR from the pool.

Then, something higher up (merging, once successful) should clear
changes when it's safe, so that release doesn't save anything.

The "decRef that never commits whenever writer is present" in
DirectoryIndexReader is too low level, I think.


> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-03-06 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1516:
-

Attachment: LUCENE-1516.patch

* The patch is updated to trunk, in most tests "there are still open
files" failures occur now (maybe it's not related to the latest revision)

* mergeMiddle isn't synchronized so if we use a pooled reader
(instead of a frozen clone) couldn't deletes be applied to the SR as
merging is happening? 

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting tokens from search results. Simple concept

2009-03-06 Thread Mike Klaas

On 5-Mar-09, at 2:42 PM, Chris Hostetter wrote:



: What I would LOVE is if I could do it in a standard Lucene search  
like I

: mentioned earlier.
: Hit.doc[0].getHitTokenList() :confused:
: Something like this...

The Query/Scorer APIs don't provide any mechanism for information like
that to be conveyed back up the call chain -- mainly because it's more
heavy weight then most people need.

If you have custom Query/Scorer implementations, you can keep track of
whatever state you want when executing a QUery -- in fact the  
SpanQuery
family of queries do keep track of exactly the type of info you seem  
to
want, and after executing a query, you can ask it for the "Spans" of  
any
matching document -- the down side is the a loss in performance of  
query
execution (because it takes time/memory to keep track of all the  
matches)


Even then, if I'm not mistaken, spans track token _positions_, not  
_offsets_ in the original string.


A reverse text index like lucene is fast precisely because it doesn't  
have to keep track of this information.  I think the best alternative  
might be to use termvectors, which are essentially a cache of the  
analyzed tokens for a document.


-Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [VOTE] Release 2.4.1, take 2

2009-03-06 Thread Michael Busch

+1

-Michael

On 3/4/09 2:27 PM, Michael McCandless wrote:


This is a new vote!

I've re-built the release artifacts (to include LUCENE-1552 fix),
derived from revision 750176 on the 2.4 branch. Here are the changes:

  
http://people.apache.org/~mikemccand/staging-area/lucene2.4.1rc2/changes/Changes.html 



Please vote to release these artifacts as 2.4.1:

  http://people.apache.org/~mikemccand/staging-area/lucene2.4.1rc2/dist

Here's my +1.

Mike


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org