[ https://issues.apache.org/jira/browse/LUCENE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233593#comment-13233593 ]
Jim Regan commented on LUCENE-3883: ----------------------------------- Great :) Regarding the initial 'h', I asked Kevin Scannell (among other feathers in his cap, he created the dictionary used in GaelSpell, and ran an Irish-language search engine), who said: "I looked carefully at how often initial h is a prefix vs not a while ago. I can send you those data - non-prefixes might be more common than you'd think in running text bc of proper names, English mixed in, etc. So upshot is it's a bad idea to strip all initial h's with no hyphen following. As far as h- (with hyphen) goes, it's non-standard but common enough that I'd leave it in the stemmer. Not like there would be false positives in that case if the hyphen is there.' > Analysis for Irish > ------------------ > > Key: LUCENE-3883 > URL: https://issues.apache.org/jira/browse/LUCENE-3883 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis > Reporter: Jim Regan > Assignee: Robert Muir > Priority: Trivial > Labels: analysis, newbie > Attachments: LUCENE-3883.patch, LUCENE-3883.patch, irish.sbl > > > Adds analysis for Irish. > The stemmer is generated from a snowball stemmer. I've sent it to Martin > Porter, who says it will be added during the week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org