27 aug 2007 kl. 20.30 skrev anorman:


I've tried to implement an analyzer with little different then using:

result = new ISOLatin1AccentFilter(result); in the TokenStream method.

Everything appears to work, however my search will not work for any word with diacritics with that change. Without using it will find words such as "cèdulas" but not "cedulas", with the change it will find neither, it just
appears to be stripping it out altogether.

Do you use the same analyzer when searching as when creating the index?

--
karl



Any suggestions?




anorman wrote:

Can I do this at search time rather than index time? Below is my code
that is handling the searching, where would I utilize such a filter?

Thanks for the help!




package search.lucene.search;
import org.apache.lucene.document.Document;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.ISOLatin1AccentFilter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;

import search.lucene.index.IndexManager;

/**
 * This class is used to search the
 * Lucene index and return search results
 */

public class SearchManager {


private String searchWord;

    private IndexManager indexManager;

    private Analyzer analyzer;

    public SearchManager(String searchWord){
        this.searchWord   = searchWord;
        this.indexManager = new IndexManager();
        this.analyzer = new StandardAnalyzer();
    }

    /**
     * do search
     */
    public List search(){
        List searchResult = new ArrayList();
                
        IndexSearcher indexSearcher = null;

        try{
indexSearcher = new IndexSearcher (indexManager.getIndexDir());
        }catch(IOException ioe){
            ioe.printStackTrace();
        }

QueryParser queryParser = new QueryParser ("content",analyzer);
        Query query = null;
        try {
            query = queryParser.parse(searchWord);
        } catch (ParseException e) {
          e.printStackTrace();
        }

        if(null != query && null != indexSearcher){                     
            try {
                Hits hits = indexSearcher.search(query);
                for(int i = 0; i < hits.length(); i ++){
                                        
                                        Document doc = hits.doc(i);
                                System.out.println(doc.get("filename"));

                                        SearchResultBean resultBean = new 
SearchResultBean();


resultBean.setXMLId(hits.doc(i).get("id"));
                                        
resultBean.setXMLTitle(hits.doc(i).get("title"));
                                        
resultBean.setXMLAuthor(hits.doc(i).get("author"));
                                        
resultBean.setXMLAbstract(hits.doc(i).get("abstract"));
                                        resultBean.setScore(hits.score(i));
                                        
                                        searchResult.add(resultBean);
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return searchResult;

    }






thomas arni-2 wrote:

You can extend the DefaultAnalyzer.
The only thing you have to do, is to rewrite the method tokenStream like
this:

  /** Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
PROTECTED]
StandardFilter}, a [EMAIL PROTECTED] LowerCaseFilter} and a [EMAIL PROTECTED] StopFilter}. */
  public TokenStream tokenStream(String fieldName, Reader reader) {
    TokenStream result = new StandardTokenizer(reader);
    result = new StandardFilter(result);
    result = new LowerCaseFilter(result);
    result = new StopFilter(result, stopSet);
    result = new ISOLatin1AccentFilter(result);
    return result;
  }


anorman wrote:
This looks like exactly what I want. Would I implement this along with another analyzer such as the standard or stand alone? Does anyone have
any
code examples of implementing such a thing?

Thanks,
Albert




karl wettin-3 wrote:

27 aug 2007 kl. 16.03 skrev anorman:


I have a searchable index of documents which contain french and
spanish
diacritics (è, é, À) etc.  I would like to make the content
searchable so
that when a user searches for a word such as "Amèrique" or "Amerique"
(without diacritic) then it returns the same results.

Has anyone set up something similar?

ISOLatin1AccentFilter

--
karl
------------------------------------------------------------------ ---
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








-------------------------------------------------------------------- -
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






--
View this message in context: http://www.nabble.com/Searching- Diacritics-tf4335454.html#a12354962
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to