[ 
https://issues.apache.org/jira/browse/LUCENE-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038136#comment-13038136
 ] 

Michael McCandless commented on LUCENE-3126:
--------------------------------------------


bq. Patch does not handle all files well (few tests fail). Apparently, the .del 
file should not be rolled into the .cfs.

Right, .del files never appear inside a CFS.

bq. SegmentMerger.createCompoundFile does this by default, however it's only 
called from code that ensures no deletions exist. Would have been nice if this 
method documented it .

Please add comments to this!  It's non-obvious ;)

bq. Also, I think *.s<num> should not be rolled into .cfs (those are the 
separate norms files). I don't know how to create such files in the first place 
(thought they're of old format, but 3.1 indexes have them also), and 
TestBackCompat fails.

Right, these too only live outside a CFS.  You create them by opening a 
writable IndexReader (I know: confusing!) and calling setNorm, then closing it. 
 They are not only for old indices... 4.0 creates them too.

bq. Is there a way to identify those files? Is it safe to check if the file 
extension starts w/ IndexFileNames.SEPARATE_NORMS_EXTENSION? Feels hacky to me.

Hackish though it seems (I agree) I think that's the only way?  
SegmentInfo.hasSeparateNorms is equally hacky...

bq. Another thing, I think in order to avoid shared doc stores (and whatever 
other old-format) stuff, since it's only an optimization, that the code should 
copy into CFS only if the segment version is on or after 3.1 (that is 
StringHelper.getVersionComparator().compare(info.getVersion, "3.1") >= 0).

Shared doc stores, yes, but the separate del docs / norms are produced by all 
versions.

More generally: does addIndexes properly refuse to import a too-old index?  We 
should throw IndexFormatTooOldExc in this case?  (And, maybe also 
IndexFormatTooNewExc?).


> IndexWriter.addIndexes can make any incoming segment into CFS if it isn't 
> already
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-3126
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3126
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3126.patch
>
>
> Today, IW.addIndexes(Directory) does not modify the CFS-mode of the incoming 
> segments. However, if IndexWriter's MP wants to create CFS (in general), 
> there's no reason why not turn the incoming non-CFS segments into CFS. We 
> anyway copy them, and if MP is not against CFS, we should create a CFS out of 
> them.
> Will need to use CFW, not sure it's ready for that w/ current API (I'll need 
> to check), but luckily we're allowed to change it (@lucene.internal).
> This should be done, IMO, even if the incoming segment is large (i.e., passes 
> MP.noCFSRatio) b/c like I wrote above, we anyway copy it. However, if you 
> think otherwise, speak up :).
> I'll take a look at this in the next few days.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to