[ 
https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504152
 ] 

Hoss Man commented on LUCENE-906:
---------------------------------

i don't know much about french, but a few comments...

1) "stopwords" seems like an odd name for what i would think of as a "prefix" 
.. you may want an example in the javadocs to make it clear.

2) are Elison's always lowercase?  I imagine there should be an ignoreCase 
option just like StopFilter has.  (note that toLowerCase() is hardcoded in the 
next() method, but nothing ensures that the stopwords list is lowercased)

3) are there any other characters that can appear between an elision and it's 
root word besides '\'' ? (i'm particularly wondering about other unicode 
characters that look like byte 0x27 but are not actually 0x27)

4) this probably doesn't need to be in it's own contrib.  contrib/analyzers 
should be fine .... if Elison's are specific to french, then 
contrib/analyzers/src/java/org/apache/lucene/analysis/fr/ makes the most sense, 
otherwise it might make sense to add a new subpackage under analysis ... 
"linguistics" perhaps (in contrast to the existing "ngram") ?

> Elision filter for simple french analyzing
> ------------------------------------------
>
>                 Key: LUCENE-906
>                 URL: https://issues.apache.org/jira/browse/LUCENE-906
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Mathieu Lecarme
>         Attachments: elision.patch
>
>
> If you don't wont to use stemming, StandardAnalyzer miss some french 
> strangeness like elision.
> "l'avion" wich means "the plane" must be tokenized as "avion" (plane).
> This filter could be used with other latin language if elision exists.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to