Re: English and French documents together / analysis, indexing, searching

Otis Gospodnetic Sun, 23 Jan 2005 22:08:35 -0800

That would be a partial solution.  Accents will not be a problem any
more, but if you use an Analyzer than stems tokens, they will not rally
be tokenized properly.  Searches will probably work, but if you look at
the index you will see that some terms were not analyzed properly.  But
it may be sufficient for your needs, so try just with accent removal.


Otis


--- "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

> Morus Walter said the following on 1/21/2005 2:14 AM:
> 
> > No. You could do a ( ( french-query ) or ( english-query ) )
> construct 
> > using
> >
> >one query. So query construction would be a bit more complex but
> querying
> >itself wouldn't change.
> >
> >The first thing I'd do in your case would be to look at the
> differences
> >in the output of english and french snowball stemmer.
> >I don't speak any french, but probably you might even use both
> stemmers
> >on all texts.
> >
> >Morus
> >
> 
> I've done some thinking afterwards, and instead of messing with
> complex 
> queries, would it make sense to
> replace all "special" characters such as "é", "è" with "e" during 
> indexing (I suppose write a custom analyzer)
> and then during searching parse the query and replace all occurances
> of 
> special characters (if any) with their
> normal latin equivalents?
> 
> This should produce the required results, no? Since the index would
> not 
> contain any French characters and
> searching for French words would return them since they were indexed
> as 
> normal words.
> 
> -pedja
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: English and French documents together / analysis, indexing, searching

Reply via email to