Hi,

I am currently analyzing, if Zend_Search_Lucene is the right choice for
me to implement the following:

I have about 20.000 data sets with about 50 different fields. 20 of
these fields are used for sorting and selecting the data sets. These
fields are not fixed and can be changed at any time by the admins. But
basically this will probably only happen once or twice a month, but it
will happen. Another problem is that not all of these 50 fields are used
for all data sets. As a matter of fact none of these 20.000 data sets
will ever need all these 50 fields.

Currently, I was thinking of implementing this with a couple of MySQL
tables which hold all these fields. So sorting and selecting shouldn't
be such a problem. But whenever new fields are added then the tables
need to be amended. As well most rows will have more empty columns than
filled columns due to the fact that none of the datasets use all 50 fields.

Now I am thinking about implementing the selecting and sorting with
Zend_Search_Lucene. In my MySQL table I will only have a small number of
columns (primary key, some foreign keys, date columns and a text column
which hold the serialized data of all other fields). This data should be
indexed by Zend_Search_Lucene and the index should be used for searching
and sorting the data.

Has anybody build an index with such an amount of fields (20, not 50
because only 20 are used for searching and sorting), which can change
from time to time and maybe grow up to 40 or even 50 fields by the time?
Did you run into any problems?

My queries could be quite complex with using up to 10 fields combined by
AND to get the results. Does this work without any problems? Or should I
limit the number of fields that can be choosed for a single query?

I am worried a lot about the performance. I understand that
Zend_Search_Lucene has no real support for paginating. So when I only
want to get the first 20 results for a query with 5 fields then the
whole index will be searched and all results will be returned. Will
performance be really an issue?

What do you think in general about my ideas?

Thanks for your comments.

Best regards,

Ralf

Reply via email to