[jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*

Michael McCandless (JIRA) Sun, 16 May 2010 11:06:04 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868018#action_12868018
 ]


Michael McCandless commented on LUCENE-2455:
--------------------------------------------

bq. I've looked into implementing registerIndexes, and that's the approach I'd 
like to take:

This looks good.

Though if the src segments share docStores, you can't do a simple
copy (I think you have to fallback to the resolveExternalSegments
approach for such segments).

bq. Does that sound reasonable? Am I missing something?

I think this should work!

If the src segments are an older index rev, I think you are still OK.
They will just remain "old" on copy, and merge will eventually migrate
them forward.

For trunk... you should note in the jdocs that no codec conversion
takes place.  So the CodecProvider used in IW (and later used to read
this index) must know how to provide the codec used by the src
segments.

{quote}
Directory exposes a copyTo(Dir, Collection) which I thought to use. But the 
files are copied to the target Dir w/ their current name - while I need to copy 
them over w/ their new name.
Adding rename to Dir feels wrong and dangerous to me
Adding copyFile(Dir, String old, String new) seems ok
Adding a variant of copyTo which accepts a Collection of the new names - the 
src and new should align. This also seems ok to me.
I'd like to use Directory for the copy, since impls of Dir may do the copy very 
efficiently (i.e. FSDir vs. RAMDir) and I don't want to use IndexInput/Output 
for that.

Do you know of another way I can achieve that? I only want to copy the actual 
segment files, w/o .gen and segments_N, so calling SI.files() seems ok?
{quote}

SI.files() should be fine.

I think falling back to copyFile is best?  Then copyTo could use it.

{quote}
Another question that popped into my head was about consistency of the
incoming Dirs vs. the local one, w.r.t. to CFS files - should I worry
about that? I think not because today one can create an index w/ CFS
and then turn it off and some segments will be compound and others
not?
{quote}
I think that's fine, but we should advertise in the jdocs.


> Some house cleaning in addIndexes*
> ----------------------------------
>
>                 Key: LUCENE-2455
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2455
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Trivial
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2455_3x.patch
>
>
> Today, the use of addIndexes and addIndexesNoOptimize is confusing - 
> especially on when to invoke each. Also, addIndexes calls optimize() in 
> the beginning, but only on the target index. It also includes the 
> following jdoc statement, which from how I understand the code, is 
> wrong: _After this completes, the index is optimized._ -- optimize() is 
> called in the beginning and not in the end. 
> On the other hand, addIndexesNoOptimize does not call optimize(), and 
> relies on the MergeScheduler and MergePolicy to handle the merges. 
> After a short discussion about that on the list (Thanks Mike for the 
> clarifications!) I understand that there are really two core differences 
> between the two: 
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of 
>   calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize() 
>   before on any of the Directories/Indexes if it's a concern. 
> That way, we maintain all the flexibility in the API - 
> addIndexes(IndexReader...) allows for using IR extensions, 
> addIndexes(Directory...) is considered more efficient, by allowing the 
> merges to happen concurrently (depending on MS) and also factors in the 
> MP. So unless you have an IR extension, addDirectories is really the one 
> you should be using. And you have the freedom to call optimize() before 
> each if you care about it, or don't if you don't care. Either way, 
> incurring the cost of optimize() is entirely in the user's hands. 
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler 
> nor MergePolicy, but rather call SegmentMerger directly. This might be 
> another place for improvement. I'll look into it, and if it's not too 
> complicated, I may cover it by this issue as well. If you have any hints 
> that can give me a good head start on that, please don't be shy :). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*

Reply via email to