[jira] [Updated] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition
[ https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Cavanna updated LUCENE-4019: - Attachment: LUCENE-4019.patch Hi Chris, thanks for your feedback. Here is a new patch containing a new option in order to enable/disable the affix strict parsing, by default it is enabled. I updated the HunspellStemFilterFactory too in order to expose the new option to Solr. Parsing Hunspell affix rules without regexp condition - Key: LUCENE-4019 URL: https://issues.apache.org/jira/browse/LUCENE-4019 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6 Reporter: Luca Cavanna Assignee: Chris Male Attachments: LUCENE-4019.patch, LUCENE-4019.patch We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules like the following: {code} SFX Na N 1 SFX Na 0 ste {code} The rule on the second line doesn't contain the 5th parameter, which should be the condition (a regexp usually). You can usually see a '.' as condition, meaning always (for every character). As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat the missing value as a kind of default value, like '.'. On the other hand I haven't found any information about this within the spec. Any thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition
[ https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Cavanna updated LUCENE-4019: - Attachment: LUCENE-4019.patch Yeah, sorry for my mistakes, I corrected them. And I added the line number to the ParseException. Let me know if there's something more I can do! Parsing Hunspell affix rules without regexp condition - Key: LUCENE-4019 URL: https://issues.apache.org/jira/browse/LUCENE-4019 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6 Reporter: Luca Cavanna Assignee: Chris Male Attachments: LUCENE-4019.patch, LUCENE-4019.patch, LUCENE-4019.patch We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules like the following: {code} SFX Na N 1 SFX Na 0 ste {code} The rule on the second line doesn't contain the 5th parameter, which should be the condition (a regexp usually). You can usually see a '.' as condition, meaning always (for every character). As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat the missing value as a kind of default value, like '.'. On the other hand I haven't found any information about this within the spec. Any thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4019) Parsing Hunspell affix rules without regexp condition
[ https://issues.apache.org/jira/browse/LUCENE-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Cavanna updated LUCENE-4019: - Attachment: LUCENE-4019.patch Small patch: affix rules with less than 5 elements are now ignored. I added a specific test with a new affix file containing an example of rule shorter than it should be. Let me know if you prefer to add a warning when a rule is skipped. Hunspell does that only with a specific command line option. Parsing Hunspell affix rules without regexp condition - Key: LUCENE-4019 URL: https://issues.apache.org/jira/browse/LUCENE-4019 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6 Reporter: Luca Cavanna Attachments: LUCENE-4019.patch We found out that some recent Dutch hunspell dictionaries contain suffix or prefix rules like the following: {code} SFX Na N 1 SFX Na 0 ste {code} The rule on the second line doesn't contain the 5th parameter, which should be the condition (a regexp usually). You can usually see a '.' as condition, meaning always (for every character). As explained in LUCENE-3976 the readAffix method throws error. I wonder if we should treat the missing value as a kind of default value, like '.'. On the other hand I haven't found any information about this within the spec. Any thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org