Re: Fw: [fw-general] Zend_Search_Lucene questions

Alexander Veremyev Fri, 22 Dec 2006 09:16:11 -0800

Sebi wrote:

OK Alexander. I understand this. How can I manage this situation?Because I will index all words from text fields (this is the defaultbehavior of the tokenizer, isn't it?). So, there will be words like'and', 'a', 'an', 'than' and many others which will apear in manydocuments. I know that MYSQL fulltext index has a full list with thesecommon words, and they exclude this words from the index.
Tell me how can I select common terms in an efficient way. Where shouldI add this? Is there a class which I can extend?
I wait your answer.


There are two additional analyzer filters (thanks to Lukas!).

StopWords filter and ShortWords filter.

Usage example:
---------------------------
$stopWords = array('a', 'an', 'at', 'the', 'and', 'or', 'is', 'am');

$stopWordsFilter = newZend_Search_Lucene_Analysis_TokenFilter_StopWords($stopWords);

$analyzer = newZend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();

$analyzer->addFilter($stopWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
---------------------------
$stopWordsFilter = new Zend_Search_Lucene_Analysis_TokenFilter_StopWords();
$stopWordsFilter->loadFromFile($my_stopwords_file);

$analyzer = newZend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();

$analyzer->addFilter($stopWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
---------------------------

$shortWordsFilter = newZend_Search_Lucene_Analysis_TokenFilter_ShortWords();

$analyzer = newZend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();

$analyzer->addFilter($shortWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
---------------------------

I've just updated the documentation (Zend_Search. Extensibility.section) and made some small fixes.

Please take SVN version to work with these filters.


With best regards,
   Alexander Veremyev.

Re: Fw: [fw-general] Zend_Search_Lucene questions

Reply via email to