date:20090220


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675523#action_12675523
 ] 

jasonrutherglen edited comment on LUCENE-1516 at 2/20/09 5:42 PM:
---

There's concurrency issues to work out.

- IW.getReader returns a cloned read only reader 
- Removed IW.reopenReader 
- All test methods pass except testAddIndexesAndDoDeletesThreads. 
testAddIndexesAndDoDeletesThreads
currently merges indexes concurrently (and fails). In the future the
method will test merging, deleting, and searching concurrently. 
- Concurrent merges fail when "ant test-core" is run 
- DocumentsWriter.applyDeletes deletes again at the SegmentReader level



  was (Author: jasonrutherglen):
There's concurrency issues to work out.

- IW.getReader returns a cloned read only reader 
- Removed IW.reopenReader 
- All test methods pass except testAddIndexesAndDoDeletesThreads. 
testAddIndexesAndDoDeletesThreads
currently merges indexes concurrently (and fails). In the future the
method will test merging, deleting, and searching concurrently. 
- Concurrent merges fail when "ant test-core" is run 
- DocumentsWriter.applyDeletes deletes again at the SegmentReader level
- What happens when a user executes IR.clone(true) on a read only
reader? 

  
> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1516:
-

Attachment: LUCENE-1516.patch

There's concurrency issues to work out.

- IW.getReader returns a cloned read only reader 
- Removed IW.reopenReader 
- All test methods pass except testAddIndexesAndDoDeletesThreads. 
testAddIndexesAndDoDeletesThreads
currently merges indexes concurrently (and fails). In the future the
method will test merging, deleting, and searching concurrently. 
- Concurrent merges fail when "ant test-core" is run 
- DocumentsWriter.applyDeletes deletes again at the SegmentReader level
- What happens when a user executes IR.clone(true) on a read only
reader? 


> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675504#action_12675504
 ] 

Grant Ingersoll commented on LUCENE-1516:
-

OK, I agree.  Let's just mark it as expert/subject to revision and then we're 
good.

We can revisit IndexAccessor separately.



> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675499#action_12675499
 ] 

Michael McCandless commented on LUCENE-1516:


bq. I guess it depends on the performance hit.

It's challenging to implement truly live updates w/ decent
performance: I think we'd need to build the reader impl that can
search DocumentsWriter buffer.

Whereas the approach (patch) here is actually quite simple (all the
hard work was already done -- IndexReader.reopen,
collection/sorting/filtering by segment, etc.).

bq. In other words, my guess is that over time, as the performance proves out, 
it will be the default choice, not "expert".

I agree: realtime search will likely be a popular feature once we
finish it, release it, it proves stable, performant, etc.  Eventually
(maybe soon) it should be made the default.

I think IndexAccessor makes alot of sense, but it's a big change and
I'd rather not couple it to this issue.  There are many questions to
be hashed out (under a new issue): is it a simple pass-through?  Or
does it manage the lifecycle of the readers for you?  Does it warm new
readers?  Should it emulate "live" update semantics?  Should getReader
get it from the writer if there is one (ie, making realtime search the
"default")?  Etc.


> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675490#action_12675490
 ] 

Michael McCandless commented on LUCENE-1516:


bq. It's hard keeping up with the current proposal in big issues/threads, but I 
don't think anyone is proposing a reader that automatically sees changes... 
i.e. the view of an IndexReader instance will still be fixed.

That's right.  The current proposal is to add one method to IW:

{code}
IndexReader getReader()
{code}

that returns a point-in-time view of the index plus all changes
buffered in IW up until that point.  Then you can reopen that reader
(or call getReader() again, which does the same thing) to quickly get
a new point-in-time reader.

I think the point-in-time semantics is important to keep.

Also, you can't easily emulate point-in-time if we implemented the
"live" approach, but you can easily do vice/versa (assuming we can
keep reopen() time fast enough).

EG the IndexAccessor convenience layer could do automatic reopening so
that when you ask it for the reader it always reopens it; this would
emulate "live updates" and hide the lifecycle management.



> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-1546) Add IndexReader.flush/close(commitUserData)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1546:
--

Assignee: Michael McCandless

> Add IndexReader.flush/close(commitUserData)
> ---
>
> Key: LUCENE-1546
> URL: https://issues.apache.org/jira/browse/LUCENE-1546
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1546.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> IndexWriter offers a commit(String commitUserData) method.
> IndexReader can commit as well using the flush/close methods and so
> needs an analogous method that accepts commitUserData.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-1516) Integrate IndexReader with IndexWriter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1516:
--

Assignee: Michael McCandless

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-20 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675462#action_12675462
 ] 

Yonik Seeley commented on LUCENE-1516:
--

bq. a reader that sees changes as they are made versus having to do the whole 
reopen thing

It's hard keeping up with the current proposal in big issues/threads, but I 
don't think anyone is proposing a reader that automatically sees changes... 
i.e. the view of an IndexReader instance will still be fixed.

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector


[ 
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675438#action_12675438
 ] 

Michael McCandless commented on LUCENE-1483:


Hmm -- we didn't deprecate SortComparator/SortComparatorSource with this, but I 
think we should have?  Does that sound right?  If so I can work up a patch...

> Change IndexSearcher multisegment searches to search each individual segment 
> using a single HitCollector
> 
>
> Key: LUCENE-1483
> URL: https://issues.apache.org/jira/browse/LUCENE-1483
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Mark Miller
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, 
> LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, sortBench.py, 
> sortCollate.py
>
>
> This issue changes how an IndexSearcher searches over multiple segments. The 
> current method of searching multiple segments is to use a MultiSegmentReader 
> and treat all of the segments as one. This causes filters and FieldCaches to 
> be keyed to the MultiReader and makes reopen expensive. If only a few 
> segments change, the FieldCache is still loaded for all of them.
> This patch changes things by searching each individual segment one at a time, 
> but sharing the HitCollector used across each segment. This allows 
> FieldCaches and Filters to be keyed on individual SegmentReaders, making 
> reopen much cheaper. FieldCache loading over multiple segments can be much 
> faster as well - with the old method, all unique terms for every segment is 
> enumerated against each segment - because of the likely logarithmic change in 
> terms per segment, this can be very wasteful. Searching individual segments 
> avoids this cost. The term/document statistics from the multireader are used 
> to score results for each segment.
> When sorting, its more difficult to use a single HitCollector for each sub 
> searcher. Ordinals are not comparable across segments. To account for this, a 
> new field sort enabled HitCollector is introduced that is able to collect and 
> sort across segments (because of its ability to compare ordinals across 
> segments). This TopFieldCollector class will collect the values/ordinals for 
> a given segment, and upon moving to the next segment, translate any 
> ordinals/values so that they can be compared against the values for the new 
> segment. This is done lazily.
> All and all, the switch seems to provide numerous performance benefits, in 
> both sorted and non sorted search. We were seeing a good loss on indices with 
> lots of segments (1000?) and certain queue sizes / queries, but the latest 
> results seem to show thats been mostly taken care of (you shouldnt be using 
> such a large queue on such a segmented index anyway).
> * Introduces
> ** MultiReaderHitCollector - a HitCollector that can collect across multiple 
> IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders.
> ** TopFieldCollector - a HitCollector that can compare values/ordinals across 
> IndexReaders and sort on fields.
> ** FieldValueHitQueue - a Priority queue that is part of the 
> TopFieldCollector implementation.
> ** FieldComparator - a new Comparator class that works across IndexReaders. 
> Part of the TopFieldCollector implementation.
> ** FieldComparatorSource - new class to allow for custom Comparators.
> * Alters
> ** IndexSearcher uses a single HitCollector to collect hits against each 
> individual SegmentReader. All the other changes stem from this ;)
> * Deprecates
> ** TopFieldDocCollector
> ** FieldSortedHitQueue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1546) Add IndexReader.flush/close(commitUserData)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1546:
-

Priority: Trivial  (was: Major)

> Add IndexReader.flush/close(commitUserData)
> ---
>
> Key: LUCENE-1546
> URL: https://issues.apache.org/jira/browse/LUCENE-1546
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1546.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> IndexWriter offers a commit(String commitUserData) method.
> IndexReader can commit as well using the flush/close methods and so
> needs an analogous method that accepts commitUserData.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1546) Add IndexReader.flush/close(commitUserData)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1546:
-

Attachment: LUCENE-1546.patch

All tests pass.

- Added IndexReader.flush(userCommitData).  I'm hesitant about adding
IR.close(userCommitData) as IndexWriter.close doesn't have a similar
method.

> Add IndexReader.flush/close(commitUserData)
> ---
>
> Key: LUCENE-1546
> URL: https://issues.apache.org/jira/browse/LUCENE-1546
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
> Fix For: 2.9
>
> Attachments: LUCENE-1546.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> IndexWriter offers a commit(String commitUserData) method.
> IndexReader can commit as well using the flush/close methods and so
> needs an analogous method that accepts commitUserData.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675414#action_12675414
 ] 

Grant Ingersoll commented on LUCENE-1516:
-

bq. Right, I'm just saying IndexAccessor will have many methods. And then
you're asking every app to make this switch, on upgrade. It's alot of
API swapping/noise vs a single added expert method to IW.

Sure, but that is already the case w/ IW/IR anyway.

I agree about the short term noise, but in the long run it seems cleaner.

bq. But this will be an expert/advanced API, a single added method to IW.
I wouldn't expect users to be confused: on upgrade I think most users
will not even notice its existence!

Hmm, I don't agree, but I guess it depends on the performance hit.  If given a 
choice between the semantics of a reader that sees changes as they are made 
versus having to do the whole reopen thing, I'm betting most users will say 
"duh, I want to see my changes right away" and choose the IR that is synced w/ 
the IW, b/c that is what people think is the logical thing to happen and it is 
how DBs work, which many devs. are used to.  As an app developer, if I don't 
have to think about IR lifecycle management, why would I want to as long as it 
performs?  What this patch is offering, AFAICT, is the removal of IR lifecycle 
managment from the user.

In other words, my guess is that over time, as the performance proves out, it 
will be the default choice, not "expert".  Now, if you're telling me this is 
going to be significantly slower even when updates are rare, then maybe I would 
stick to the current lifecycle, but if there isn't much difference, I'll take 
the one that pushes the lifecycle complexity down into Lucene instead of in my 
app.

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1546) Add IndexReader.flush/close(commitUserData)

Add IndexReader.flush/close(commitUserData)
---

 Key: LUCENE-1546
 URL: https://issues.apache.org/jira/browse/LUCENE-1546
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4
Reporter: Jason Rutherglen
 Fix For: 2.9


IndexWriter offers a commit(String commitUserData) method.
IndexReader can commit as well using the flush/close methods and so
needs an analogous method that accepts commitUserData.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1516) Integrate IndexReader with IndexWriter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1516:
-

Attachment: LUCENE-1516.patch

Ah yes, patch from the old directory that need deleting.  Here's the correct 
one.  Sorry about that.

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675397#action_12675397
 ] 

Michael McCandless commented on LUCENE-1516:


Or, here's an idea: can we do both?  Put IndexAccessor as an optional
"convenience" layer that simplifies the ctors and expert methods of IW
& IR, but leave public direct access to the ctros & expert methods?
This way on upgrade nobody is forced to migrate to an entirely new yet
simply pass-through API?

Or another idea is to decouple these two discussions: go ahead and add
the single expert method to IW, but as a separate discussion/JIRA work
out how we can simplify overall access/management of IR/IW instances?


> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675394#action_12675394
 ] 

Michael McCandless commented on LUCENE-1516:



{quote}
maybe we don't need all those variants? String, File and Directory are all 
easily enough collapsed down to just Directory.
{code}
new IndexWriter(new Directory(indexFile));
{code}
{quote}

(You'd presumably need to close that Directory).  But, yeah, we may be
able to drop some of them, although I do think they are convenient for
new users of Lucene.  And forcing users to switch to a totally new yet
pass through API on ugprade, but not giving them one to one carryover,
is not very nice.

bq. Additionally, there are no more variants than there already are on the IW 
and IR, right?

Right, I'm just saying IndexAccessor will have many methods.  And then
you're asking every app to make this switch, on upgrade.  It's alot of
API swapping/noise vs a single added expert method to IW.

{quote}
As for pass-through or not, I think it would just pass-through, at
least initially, but it certainly leaves open the possibility for
reference counting in the future if someone wants to implement that.
{quote}

If we think it'll be more than just pass through, we should try to
hash out, somewhat, what it will & won't do up front (changing it
later is a big change)?  And we should start from LUCENE-390.

{quote}
As someone who teaches people these APIs on a regular basis, I feel
pretty confident in saying that adding an IR to the IW as a public API
is going to confuse a good chunk of people just as the delete stuff on
the IR currently does now.
{quote}

But this will be an expert/advanced API, a single added method to IW.
I wouldn't expect users to be confused: on upgrade I think most users
will not even notice its existence!

bq. You wouldn't ask FileWriter for a FileReader, would you?

I'm not sure that's the right comparison -- Lucene's IW does far more
than a FileWriter.  And the fact that Lucene allows "point in time"
searching (which is very useful and rather unique) is a very big
difference vs FileReader/Writer.

{quote}
Likewise, isn't it just as logical to ask for an IW from an IR?
{quote}

I don't think so: the functionality is not symmetric, because Lucene
allows only one writer open at a time, but many readers (eg on
different commits).  Since a writer is the one making changes, it
makes sense that you'd ask it, right now, to give you a reader
reflecting all changes up to that point.  And call it again later to
get a reader seeing changes after that, etc.

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

2009-02-20 Thread Ning Li (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675390#action_12675390
 ] 

Ning Li commented on LUCENE-1541:
-

When one precision step is given, it is converted to the representation. Then 
no array creation is necessary. But something like TrieUtils.FieldConfiguration 
would be better. Besides the field name and the precision steps, either it 
should also contain a type (long/int) or a subclass is created for each type. 
It can be used both at indexing time and at query time.

> Trie range - make trie range indexing more flexible
> ---
>
> Key: LUCENE-1541
> URL: https://issues.apache.org/jira/browse/LUCENE-1541
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Ning Li
>Assignee: Uwe Schindler
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1541.patch
>
>
> In the current trie range implementation, a single precision step is 
> specified. With a large precision step (say 8), a value is indexed in fewer 
> terms (8) but the number of terms for a range can be large. With a small 
> precision step (say 2), the number of terms for a range is smaller but a 
> value is indexed in more terms (32).
> We want to add an option that different precision steps can be set for 
> different precisions. An expert can use this option to keep the number of 
> terms for a range small and at the same time index a value in a small number 
> of terms. See the discussion in LUCENE-1470 that results in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1545) Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

2009-02-20 Thread Andreas Hauser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Hauser updated LUCENE-1545:
---

Attachment: AnalyzerTest.java

$ java -Dfile.encoding=UTF-8 -cp lib/lucene-core-2.4-20090219.021329-1.jar:. 
AnalyzerTest
(mo,0,2,type=)
(chte,3,7,type=)
(m,8,9,type=)
(mo,10,12,type=)
(chte,13,17,type=)
$locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE=de_DE.UTF-8
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES=de_DE.UTF-8
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=


> Standard analyzer does not correctly tokenize combining character U+0364 
> COMBINING LATIN SMALL LETTRE E
> ---
>
> Key: LUCENE-1545
> URL: https://issues.apache.org/jira/browse/LUCENE-1545
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.4
> Environment: Linux x86_64, Sun Java 1.6
>Reporter: Andreas Hauser
> Fix For: 2.9
>
> Attachments: AnalyzerTest.java
>
>
> Standard analyzer does not correctly tokenize combining character U+0364 
> COMBINING LATIN SMALL LETTRE E.
> The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining 
> character is lost.
> Expected result is only on token "moͤchte".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1545) Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

2009-02-20 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675381#action_12675381
 ] 

Robert Muir commented on LUCENE-1545:
-

this is an example of why i started messing with LUCENE-1488

> Standard analyzer does not correctly tokenize combining character U+0364 
> COMBINING LATIN SMALL LETTRE E
> ---
>
> Key: LUCENE-1545
> URL: https://issues.apache.org/jira/browse/LUCENE-1545
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 2.4
> Environment: Linux x86_64, Sun Java 1.6
>Reporter: Andreas Hauser
> Fix For: 2.9
>
> Attachments: AnalyzerTest.java
>
>
> Standard analyzer does not correctly tokenize combining character U+0364 
> COMBINING LATIN SMALL LETTRE E.
> The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining 
> character is lost.
> Expected result is only on token "moͤchte".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-1545) Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E

2009-02-20 Thread Andreas Hauser (JIRA)

Standard analyzer does not correctly tokenize combining character U+0364 
COMBINING LATIN SMALL LETTRE E
---

 Key: LUCENE-1545
 URL: https://issues.apache.org/jira/browse/LUCENE-1545
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Affects Versions: 2.4
 Environment: Linux x86_64, Sun Java 1.6
Reporter: Andreas Hauser
 Fix For: 2.9


Standard analyzer does not correctly tokenize combining character U+0364 
COMBINING LATIN SMALL LETTRE E.
The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining 
character is lost.
Expected result is only on token "moͤchte".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: LIA2 on l.a.o/java OK?

2009-02-20 Thread mark harwood


I'm OK with LIA2 on the front page - as Erik suggests it does help lend 
credibility to a project. 
I encounter organisations who are nervous about buying into an open-source 
solution and having books up there on the home page immediately helps establish 
the following:

1) The APIs are stable enough to warrant published documentation
2) There is a reasonable level of adoption in the market
3) There is material available to support new users if they get stuck

>>When new books focused exclusively on Lucene emerge they should get the same 
>>treatment, in my opinion.

So the test-case for this statement would be - what if there was a terrible 
book published? I can't see it happening myself but you have to ask if there is 
some inferred recommendation of quality on any links we provide.






- Original Message 
From: Otis Gospodnetic 
To: java-dev@lucene.apache.org
Sent: Friday, 20 February, 2009 13:50:51
Subject: Re: LIA2 on l.a.o/java OK?


I think Erik put it well.  I'd agree with that thinking even if I were not the 
co-author.  Listing LIA2 on the Wiki is good (and done), but my fear is that 
only some people will find it there.  I think showing LIA2 on the main page and 
making it a bit stickier than a regular news item is a way to help Lucene users 
find it quickly.  When new books focused exclusively on Lucene emerge they 
should get the same treatment, in my opinion.

Otis



- Original Message 
> From: Erik Hatcher 
> To: java-dev@lucene.apache.org
> Sent: Friday, February 20, 2009 9:27:40 PM
> Subject: Re: LIA2 on l.a.o/java OK?
> 
> On Feb 20, 2009, at 6:56 AM, Grant Ingersoll wrote:
> > Isn't that what http://wiki.apache.org/lucene-java/Resources is for?  I 
> > like 
> LIA as much as the next person,  but if we do it for LIA2 then it opens the 
> door 
> for others 
> (http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywords=Lucene&x=0&y=0)
>  
> which will likely clutter the page quite a bit.
> 
> There is precedent.  Other books do make it to Apache sites.  iBatis has the 
> Manning cover in the lower-left sidebar: .  Wicket 
> has three big book covers: .  Struts more subtly: 
> .  ActiveMQ has a news blurb with big book cover: 
> 
> 
> As for other books making it there... that'd be fine by me to have a few book 
> covers shown on the home page.  I imagine we won't hear other authors even 
> asking.
> 
> >  I just don't think we can imply that LIA2 is the "official book on Lucene".
> 
> It's the only book dedicated exclusively to Lucene that I'm aware of, and all 
> of 
> the co-authors are committers/PMC members and active members of the 
> community.  
> It's about as "official" as it gets.
> 
> Books on open source projects lend a great deal of credibility and I've seen 
> first hand that they are used as deciding factors when choosing a technology. 
>  A 
> book means it is mature and has a good following.
> 
> Personal bias noted - I support putting it on the home page, and also news 
> blurbs when there is activity, like when it goes to print and is available in 
> hardcopy.
> 
> Erik
> 
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: LIA2 on l.a.o/java OK?

2009-02-20 Thread Otis Gospodnetic


I think Erik put it well.  I'd agree with that thinking even if I were not the 
co-author.  Listing LIA2 on the Wiki is good (and done), but my fear is that 
only some people will find it there.  I think showing LIA2 on the main page and 
making it a bit stickier than a regular news item is a way to help Lucene users 
find it quickly.  When new books focused exclusively on Lucene emerge they 
should get the same treatment, in my opinion.

Otis



- Original Message 
> From: Erik Hatcher 
> To: java-dev@lucene.apache.org
> Sent: Friday, February 20, 2009 9:27:40 PM
> Subject: Re: LIA2 on l.a.o/java OK?
> 
> On Feb 20, 2009, at 6:56 AM, Grant Ingersoll wrote:
> > Isn't that what http://wiki.apache.org/lucene-java/Resources is for?  I 
> > like 
> LIA as much as the next person,  but if we do it for LIA2 then it opens the 
> door 
> for others 
> (http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywords=Lucene&x=0&y=0)
>  
> which will likely clutter the page quite a bit.
> 
> There is precedent.  Other books do make it to Apache sites.  iBatis has the 
> Manning cover in the lower-left sidebar: .  Wicket 
> has three big book covers: .  Struts more subtly: 
> .  ActiveMQ has a news blurb with big book cover: 
> 
> 
> As for other books making it there... that'd be fine by me to have a few book 
> covers shown on the home page.  I imagine we won't hear other authors even 
> asking.
> 
> >  I just don't think we can imply that LIA2 is the "official book on Lucene".
> 
> It's the only book dedicated exclusively to Lucene that I'm aware of, and all 
> of 
> the co-authors are committers/PMC members and active members of the 
> community.  
> It's about as "official" as it gets.
> 
> Books on open source projects lend a great deal of credibility and I've seen 
> first hand that they are used as deciding factors when choosing a technology. 
>  A 
> book means it is mature and has a good following.
> 
> Personal bias noted - I support putting it on the home page, and also news 
> blurbs when there is activity, like when it goes to print and is available in 
> hardcopy.
> 
> Erik
> 
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

On Feb 20, 2009, at 6:56 AM, Grant Ingersoll wrote:
Isn't that what http://wiki.apache.org/lucene-java/Resources is
for? I like LIA as much as the next person, but if we do it for
LIA2 then it opens the door for others (http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywords=Lucene&x=0&y=0
) which will likely clutter the page quite a bit.

There is precedent. Other books do make it to Apache sites. iBatis
has the Manning cover in the lower-left sidebar: . Wicket has three big book covers: .
Struts more subtly: . ActiveMQ has a news
blurb with big book cover:

As for other books making it there... that'd be fine by me to have a
few book covers shown on the home page. I imagine we won't hear other
authors even asking.

I just don't think we can imply that LIA2 is the "official book on
Lucene".

It's the only book dedicated exclusively to Lucene that I'm aware of,
and all of the co-authors are committers/PMC members and active
members of the community. It's about as "official" as it gets.

Books on open source projects lend a great deal of credibility and
I've seen first hand that they are used as deciding factors when
choosing a technology. A book means it is mature and has a good
following.

Personal bias noted - I support putting it on the home page, and also
news blurbs when there is activity, like when it goes to print and is
available in hardcopy.

Erik

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675354#action_12675354
 ] 

Grant Ingersoll commented on LUCENE-1516:
-

Good points, MIke, but maybe we don't need all those variants?  String, File 
and Directory are all easily enough collapsed down to just Directory.
{code}
new IndexWriter(new Directory(indexFile));
{code}

Additionally, there are no more variants than there already are on the IW and 
IR, right?   

As for pass-through or not, I think it would just pass-through, at least 
initially, but it certainly leaves open the possibility for reference counting 
in the future if someone wants to implement that.

As someone who teaches people these APIs on a regular basis, I feel pretty 
confident in saying that adding an IR to the IW as a public API is going to 
confuse a good chunk of people just as the delete stuff on the IR currently 
does now.  You wouldn't ask FileWriter for a FileReader, would you?  I don't 
see why it would be good to ask a IW for an IR, API-wise (I get why we are 
doing this, it makes sense).

Likewise, isn't it just as logical to ask for an IW from an IR?  If I have an 
IR already and I want it to be aware of the writes I want to do, why wouldn't 
we then add IR.getIW()?  And then we can have total circular dependency. 



> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675349#action_12675349
 ] 

Michael McCandless commented on LUCENE-1516:



bq. I suppose if I know I'm not going to be changing my index, I can still just 
get a read-only IR, right?

Right, I think we still want to allow opening a standalone (uncoupled)
reader.

bq. Then, everyone has a single point of entry for both writers and readers and 
all of this stuff can just be done through package private methods on the IW 
and it allows us to change things if we decide otherwise and it means that the 
IW is not coupled with the IR publicly.

I'm torn... the IndexAccessor would need to expose many variants to
carry over all the options we now have (String or File or Directory,
IndexCommit or not, IndexDeletionPolicy or not, create or not).  It
will end up exposing a number of new methods...  and, would it try to
be "smart" (like IndexModifier, and the LuceneIndexAccessor class in
LUCENE-390), keeping track of references to the readers it's handed
out, etc.?  Or is it simply a pass-through to the underlying
open/ctors we have today?

The alternative (as of right now, unless we are missing something
further with these changes) is adding one method to IndexWriter,
getReader, that returns a readOnly IndexReader, "coupled" to the
writer you got it from in that it's able to search un-committed
changes and if you reopen it, writer will materialize all changes and
make them visible to the reopened reader.

I guess so far I don't really see why this small (one method) API
change merits a switch to a whole new accessor API for creating
readers & writers on an index?  Maybe there is a
straw-that-breaks-the-camel's-back argument that I'm missing...


> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675345#action_12675345
 ] 

Michael McCandless commented on LUCENE-1516:


{quote}
The path forward seems to be exposing a cloned readonly reader
from IW.getReader.
{quote}

+1

{quote}
> can't we move away from allowing any changes via IR? (Ie
> deprecate deleteDocuments/setNorms/etc.)

This would simplify things however as a thought experiment how would
the setNorms work if it were a part of IndexWriter?
{quote}

I think it'd look like this?
{code}
IndexWriter.setNorm(Term term, String field, byte norm)
{code}

Ie the Term IDs the doc(s) you want to set the norm for.

{quote}

> And, clone should not be reopening segments...? 

DirectoryIndexReader.clone(boolean openReadonly) calls
doReopen(SegmentInfos infos, boolean doClone, boolean openReadOnly)
which is an abstract method that in SegmentReader and
MultiSegmentReader reopens the segments? The segment infos for a
ReaderIW is obtained from IW, which is how it knows about the new
segments. Perhaps not desired behavior?
{quote}

OK, I think it does not reopen *existing* segments.  Meaning, if a
segment is in common w/ old and new, it truly clones it (does not
reopen norms nor del).  But if there is a new segment that did not
exist in old, it opens a whole new segment reader?  I'll commit an
assert that this doesn't happen -- if caller passes in "doClone=true"
then caller should not have passed in a segmentInfos with changes?
Else the reader is on thin ice (mismatch what's in RAM vs what
SegmentInfo says).

{quote}
> do we need delete by docID once we have realtime search? I
> think the last compelling reason to keep IR's delete by docID was
> immediacy, but realtime search can give us that, from IW, even when
> deleting by Term or Query? 

Good point! I think we may want to support it but for now it's
shouldn't be necessary. I'm thinking of the case where someone is
using the field cache (or some variant), performs some sort of query
on it and then needs to delete based on doc id. What do they do?
Would we expose a callback mechanism where a deleteFrom(IndexReader
ir) method is exposed and deletes occur at the time of the IW's
choosing?
{quote}

Wouldn't delete-by-Query cover this?  Ie one could always make a
Filter implementing the "look @ field cache, do some logic, provide
docIDs to delete", wrap as Query, then delete-by-Query?

{quote}
> It seems like calling reader.reopen() (on reader obtained
> from writer) should basically do the same thing as calling
> writer.getReader(). Ie they are nearly synonyms? (Except for small
> difference in ref counting - I think writer.getReader() should always
> incRef, but reopen only incRefs if it returns a new reader). 

Perhaps ReaderIW.reopen will call IW.getReader underneath instead of
using IR's usual mechanism.
{quote}

Right, that's what I'm thinking.  Once you've obtained reader coupled
to a writer, you can then simply reopen it whenever you want to see
(materialize) changes done by the writer.

We still need a solution for the "warm the just merged
segment"... else we will not be realtime, especially when big merge
finishes.  It seems like after merge finishes, it should immediately
1) open a SegmentReader on the new segment, 2) invoke the method you
passed in (or you subclassed -- not sure which), 3) carry over deletes
that materialized during the merge, 4) commit the merge (replace old
segments w/ new one).


> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that f

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675343#action_12675343
 ] 

Michael McCandless commented on LUCENE-1516:


Jason, I think you need to "svn up".  Or, tell us which revision you're on and 
we can downgrade to that revision before applying the patch.  (We need "svn 
patch"!).

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: LIA2 on l.a.o/java OK?

2009-02-20 Thread Grant Ingersoll

Isn't that what http://wiki.apache.org/lucene-java/Resources is for?
I like LIA as much as the next person, but if we do it for LIA2 then
it opens the door for others (http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywords=Lucene&x=0&y=0
) which will likely clutter the page quite a bit.

Maybe an alternative is to more prominently link the book section of
the Resources page from the main site? I could see adding a "Books on
Lucene" menu item and maybe even having a separate Wiki page for books
if you want.

I'm also totally for any and all authors submitting "News Item"
patches to the front page, so it would be totally appropriate to link
it there (in fact, that will give you more prominence anyway). I just
don't think we can imply that LIA2 is the "official book on Lucene".

-Grant

On Feb 19, 2009, at 9:46 PM, Otis Gospodnetic wrote:

Hello,

Would it be OK to put the book cover and link to LIA2 some place on
lucene.apache.org/java, say below the navigation elements on the
left side of the page? I/we (authors) think it makes sense to do
this -- the book is for the Lucene community after all, but wanted
to check with the rest of java-dev@ before doing something that
could be seen as self-promotion.

Thanks,
Otis

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter


[ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675338#action_12675338
 ] 

Grant Ingersoll commented on LUCENE-1516:
-

Is there ever a need for the normal IR construction anymore?  Or do we always 
just ask for it from the IW (or wherever we choose to expose this, as I still 
don't think it belongs on the IW API wise, but that isn't a big deal right now) 
every time?  I suppose if I know I'm not going to be changing my index, I can 
still just get a read-only IR, right?

API wise, I think we could do something like (with obvious other variations):
{code}
IndexAccessor{

  IndexWriter getWriter(Directory);

  //returns read-only reader
  IndexReader getReader(Directory);

  //returns the external IR described above
  IndexReader.getReader(IndexWriter);
}
{code}

Then, everyone has a single point of entry for both writers and readers and all 
of this stuff can just be done through package private methods on the IW and it 
allows us to change things if we decide otherwise and it means that the IW is 
not coupled with the IR publicly.

> Integrate IndexReader with IndexWriter 
> ---
>
> Key: LUCENE-1516
> URL: https://issues.apache.org/jira/browse/LUCENE-1516
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1539) Improve Benchmark


[ 
https://issues.apache.org/jira/browse/LUCENE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675336#action_12675336
 ] 

Michael McCandless commented on LUCENE-1539:


bq. In looking over the code, to do the multiple commits using IR we'll need to 
add a IR.flush(String userData) method?

Yes, we should.  Can you open a new issue + patch?

We also have to fix contrib/benchmark to allow specification of a Deletion 
Policy, and then allow openReader task to take a string (userData) to specific 
which commit to open.

But: it'd be best if, within a single alg, we could specify a series of commits 
to open, so that we can iterate over the different commit points.  I don't 
think a param to the task allows this?  (But I'm not sure).  If we made it a 
config option then I believe we could specify a sequence which each round would 
advance through.

> Improve Benchmark
> -
>
> Key: LUCENE-1539
> URL: https://issues.apache.org/jira/browse/LUCENE-1539
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Affects Versions: 2.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1539.patch, sortBench2.py, sortCollate2.py
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Benchmark can be improved by incorporating recent suggestions posted
> on java-dev. M. McCandless' Python scripts that execute multiple
> rounds of tests can either be incorporated into the codebase or
> converted to Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter