[ https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260638#comment-13260638 ]
Robert Muir commented on LUCENE-4019: ------------------------------------- its tough to know for sure. in general a lot of hunspell dictionaries cannot be parsed. There are a ton of these, under many strange licenses and they are very large. A "Test scaffolding" of sorts could probably be done to hunt out problems: * download all dictionaries you can find * for each one, use hunspell command-line tools like munch, unmunch (which applies all the rules), etc to generate some sort of expected output in .txt format. * for each one, do the same using the hunspell parsing here. * compare results: when things differ, try to boil it down to a compact .aff/.dic, with a test case and fix and commit. > Parsing Hunspell affix rules without regexp condition > ----------------------------------------------------- > > Key: LUCENE-4019 > URL: https://issues.apache.org/jira/browse/LUCENE-4019 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/analysis > Affects Versions: 3.6 > Reporter: Luca Cavanna > > We found out that some recent Dutch hunspell dictionaries contain suffix or > prefix rules like the following: > {code} > SFX Na N 1 > SFX Na 0 ste > {code} > The rule on the second line doesn't contain the 5th parameter, which should > be the condition (a regexp usually). You can usually see a '.' as condition, > meaning always (for every character). As explained in LUCENE-3976 the > readAffix method throws error. I wonder if we should treat the missing value > as a kind of default value, like '.'. On the other hand I haven't found any > information about this within the spec. Any thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org