Re: Upgrading Lucene Indices and maintaining same resultset

2015-05-31 Thread Tomoko Uchida
Hi,

 We have a Lucene 3.6-based index set which is quite large and currently
in use. What will be the upgrade path to (a) 4.x or (b) 5.x? With respect
to the data migration, etc. What are the steps and is it technically
possible? I read that 3.x to 5.x is not possible, and throws IndexTooStale
exceptions. Can we do it in two hops, like from 3.x to 4.x and 4.x to 5.x.

I would do this by re-indexing all the data with the Lucene 5.x based
application.
Of course, you will need extra disk and memory space (maybe another
machines), but I think it's more safe and easy than two hops index data
upgrading.

 If I have a set of documents that have already been indexed with Lucene
3.6 and somehow we are able to upgrade to Lucene 4.x (or maybe 5.x), how
can we make sure that we will get the same set of results? I am not sure,
but I will check the analyzers and tokenizers used in the 3.6 versions. If
we could somehow carry over those to 5.x, will we be guaranteed the same
set of results? Or are there other considerations to get the same set of
results?

We cannot guarantee same set of results or rankings for arbitrary queries
when upgrade Lucene version.
Checking all analysis chains is good idea. And I would check top results
for some important queries.

Which means,
- Select important queries as many as possible (most frequently issued by
users, or those giving significant business impact to your service)
- For each query, take diff between 3.6- and 5.x-based applications' top N
results (N would depends on applications or UI)
- Check and make adjustments if there are unignorable differences

Regards,
Tomoko


2015-05-28 14:02 GMT+09:00 Sandeep Khanzode 
sandeep_khanz...@yahoo.com.invalid:

 Hi All,
 We have a Lucene 3.6-based index set which is quite large and currently in
 use. What will be the upgrade path to (a) 4.x or (b) 5.x? With respect to
 the data migration, etc. What are the steps and is it technically possible?
 I read that 3.x to 5.x is not possible, and throws IndexTooStale
 exceptions. Can we do it in two hops, like from 3.x to 4.x and 4.x to 5.x.
 If I have a set of documents that have already been indexed with Lucene
 3.6 and somehow we are able to upgrade to Lucene 4.x (or maybe 5.x), how
 can we make sure that we will get the same set of results? I am not sure,
 but I will check the analyzers and tokenizers used in the 3.6 versions. If
 we could somehow carry over those to 5.x, will we be guaranteed the same
 set of results? Or are there other considerations to get the same set of
 results? - SRK


Upgrading Lucene Indices and maintaining same resultset

2015-05-27 Thread Sandeep Khanzode
Hi All,
We have a Lucene 3.6-based index set which is quite large and currently in use. 
What will be the upgrade path to (a) 4.x or (b) 5.x? With respect to the data 
migration, etc. What are the steps and is it technically possible? I read that 
3.x to 5.x is not possible, and throws IndexTooStale exceptions. Can we do it 
in two hops, like from 3.x to 4.x and 4.x to 5.x.
If I have a set of documents that have already been indexed with Lucene 3.6 and 
somehow we are able to upgrade to Lucene 4.x (or maybe 5.x), how can we make 
sure that we will get the same set of results? I am not sure, but I will check 
the analyzers and tokenizers used in the 3.6 versions. If we could somehow 
carry over those to 5.x, will we be guaranteed the same set of results? Or are 
there other considerations to get the same set of results? - SRK