Hi Tx,
This is just to close the loop. Thank you very much for your helpful 
suggestions.
This works fine and solves our problem.
Much appreciated.Clive


      From: kiwi clive <kiwi_cl...@yahoo.com.INVALID>
 To: Trejkaz <trej...@trypticon.org>; Lucene Users Mailing List 
<java-user@lucene.apache.org> 
 Sent: Wednesday, February 1, 2017 7:37 PM
 Subject: Re: How do I write in 3.x format to an upgradeded index using Lucene 
4.10
   
Hi Tx
Thank you for the detailed response, that makes a lot of sense. I feel we may 
have to freeze some old analyzer code as we have indexes that were written with 
Lucene 2.3 analyzers and that is no longer supported. I'll need to do some 
experimentation to see how we go. Further reading has shown that StopFilter 
changed behaviour as of Lucene version 2.9.
Keeping old analyzer code forever is not great but as long as we can co-exist 
with newer indexes, we are in good shape as legacy indexes can be reindexed in 
slow time as necessary.
I'll do some digging !
May thanksClive
      From: Trejkaz <trej...@trypticon.org>
 To: Lucene Users Mailing List <java-user@lucene.apache.org>; kiwi clive 
<kiwi_cl...@yahoo.com> 
 Sent: Wednesday, February 1, 2017 2:53 PM
 Subject: Re: How do I write in 3.x format to an upgradeded index using Lucene 
4.10
  
> If we take our old 3.x index and apply IndexUpgrader to it, we end up with a 
> 4.10 index.
> There are several lucene 4.x files created in the index directory and no 
> errors are thrown.
> However, it appears that the index data is still in the 3.x format, namely it 
> remains:
> "thanks", "coming"
> and not:
> "thanks", <pim>, "coming"

Well, this is a different thing really. The index is in the 4.x
format, but the analysis which was performed remains the 3.x analysis,
because nothing was done to change the postings.

So this whole thing is really just a "make sure to use the same
analyser to query which you used to index" problem. So if you indexed
using a Lucene 3 analyser, then you should be using the same v3
analyser when you query against the index in Lucene 4.

So the usual rules apply:
  * Beware of Version.LATEST/LUCENE_CURRENT. Always use the exact
version, and keep using it.
  * If Lucene remove support for some Version you were using, don't
update the Version you're using. Instead, take a copy of the
Tokenizer/TokenFilter you were using from the older version and port
it to work on the new version. Maintain these frozen off analysis
components forever.

But that said, we didn't experience any problems like this from 3 to
4, but rather obscure problems where backwards compatibility was not
maintained in Lucene itself, e.g. places where despite passing in a
Version object, the older behaviour was not maintained. IIRC, the term
length limits being changed was one of these. And in these situations,
for the most part, freezing off a copy of the old behaviour works
fine.

That said, we don't use the "classic" query parser, but rather the
flexible one. And maybe if you're using the classic one, it might have
some misbehaviour around this which we didn't strike by using the
flexible one.

> So we need a way to write documents in 3.x format (no <pim>), to our upgraded 
> indexes,
> new indexes can use native 4.10 format.

It sounds like you just need to use the same analyser you were
previously using, possibly forever...

TX


  

   

Reply via email to