Hi Oliver, To me it looks like you want to do it much too complicated. It also seems that you misunderstood join queries, which seems to be your problem. Comments inside:
> My lucene Index is built and stored in a zip file (uncompressed) which is used > as a read-only Directory. > > 1) At lucene indexing time, is it possible to rewrite the index so that some > fields are only found in some segments Say : > > EnglishWords, EnglishVerbs go to Segment 1 GermanWords, > GermanSentences go to Segment 2 French, frenchWines go to Segment 3 ... You can create the 100% same index structure manually without dealing with Lucene internals. Just index every language into a separate index with a separate IndexWriter. As those segments are read-only, you can call forceMerge(1) after indexing, so those indexes have exactly 1 segment -> every language has one single segment. The only difference is: You would need a separate ZIP file for every language (which is what you probably need, because you want to ship "language packs"). Or you have to rewrite your ZIP-Directory implementation, to work on subdirectories inside the ZIP file. > 2) In what file is the index structure written (number of index, > docValues...) ? And, is it possible, to tamper in some way with this Say, in a > Directory implementation...at start of my application, to tell the lucene > index > to use this segment or not If every language is a separate index, just use "new MultiReader(indexReader1, indexReader2, indexReader3)" to combine them and query the multiReader. This is the identical structure to a single DirectoryReader (which is also handled as a MultiReader internally) and therefore has no speed impact. > If 1, 2 were possible, I think that it would allow me to ship my index > in a modular way in my apps (with language packs) > and do join queries as regular queries, with no speed penalty The "join" keyword seems to be your main misunderstanding. There is no relation between join queries and multiple indexes. In Lucene "join" queries are to join between documents of different type in the same index! Queryng multiple indexes together is not joining, it is simple and very fast (because this is how Lucene was made): Just use the MultiReader approach from above to query all indexes at the same time. As a MultiReader with many 1-segments DirectoryReaders is identical to a large DirectoryReader with n segments, there is no difference at all. This is something different: > 3) At lucene indexing time, is it possible to remap the docId values (I saw > some MergeState.mapDocId method...) Say > 0 -> 4 > 1 -> 3 > 2 -> 1 > 3 -> 0 > 4 -> 2 > > > If 3 is possible, It would allow me to have some sort of > forward/backward compatibilities with my shipped language packs > and also to have fast implementations for some id related methods What do you want to do? Why do you want to do this? (please refer to XY-Problem: <https://people.apache.org/~hossman/#xyproblem>). Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org