Re: Searching Diacritics

anorman Mon, 27 Aug 2007 09:48:09 -0700

Can I do this at search time rather than index time?  Below is my code that
is handling the searching, where would I utilize such a filter?


Thanks for the help!




package search.lucene.search;
import org.apache.lucene.document.Document;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.ISOLatin1AccentFilter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

import search.lucene.index.IndexManager;

/**
 * This class is used to search the 
 * Lucene index and return search results
 */

public class SearchManager {


private String searchWord;
    
    private IndexManager indexManager;
    
    private Analyzer analyzer;
    
    public SearchManager(String searchWord){
        this.searchWord   = searchWord;
        this.indexManager = new IndexManager();
        this.analyzer = new StandardAnalyzer();
    }
    
    /**
     * do search
     */
    public List search(){
        List searchResult = new ArrayList();
                
        IndexSearcher indexSearcher = null;

        try{
            indexSearcher = new IndexSearcher(indexManager.getIndexDir());
        }catch(IOException ioe){
            ioe.printStackTrace();
        }

        QueryParser queryParser = new QueryParser("content",analyzer);
        Query query = null;
        try {
            query = queryParser.parse(searchWord);
        } catch (ParseException e) {
          e.printStackTrace();
        }
        
        if(null != query && null != indexSearcher){                     
            try {
                Hits hits = indexSearcher.search(query);
                for(int i = 0; i < hits.length(); i ++){
                                        
                                        Document doc = hits.doc(i);
                                System.out.println(doc.get("filename"));
                    
                                        SearchResultBean resultBean = new 
SearchResultBean();

                                       
resultBean.setXMLId(hits.doc(i).get("id"));
                                        
resultBean.setXMLTitle(hits.doc(i).get("title"));
                                        
resultBean.setXMLAuthor(hits.doc(i).get("author"));
                                        
resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
                                        resultBean.setScore(hits.score(i));
                                        
                                        searchResult.add(resultBean);
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return searchResult;   
        
    }






thomas arni-2 wrote:
> 
> You can extend the DefaultAnalyzer.
> The only thing you have to do, is to rewrite the method tokenStream like 
> this:
> 
>   /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
> PROTECTED]
>   StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL 
> PROTECTED] StopFilter}. */
>   public TokenStream tokenStream(String fieldName, Reader reader) {
>     TokenStream result = new StandardTokenizer(reader);
>     result = new StandardFilter(result);
>     result = new LowerCaseFilter(result);
>     result = new StopFilter(result, stopSet);
>     result = new ISOLatin1AccentFilter(result);
>     return result;
>   }
> 
> 
> anorman wrote:
>> This looks like exactly what I want.  Would I implement this along with
>> another analyzer such as the standard or stand alone?  Does anyone have
>> any
>> code examples of implementing such a thing?
>>
>> Thanks,
>> Albert
>>
>>
>>
>>
>> karl wettin-3 wrote:
>>   
>>> 27 aug 2007 kl. 16.03 skrev anorman:
>>>
>>>     
>>>> I have a searchable index of documents which contain french and  
>>>> spanish
>>>> diacritics (è, é, À) etc.  I would like to make the content  
>>>> searchable so
>>>> that when a user searches for a word such as "Amèrique" or "Amerique"
>>>> (without diacritic) then it returns the same results.
>>>>
>>>> Has anyone set up something similar?
>>>>       
>>> ISOLatin1AccentFilter
>>>
>>> -- 
>>> karl
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>
>>>     
>>
>>   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Searching-Diacritics-tf4335454.html#a12353022
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching Diacritics

Reply via email to