krickert commented on PR #1112: URL: https://github.com/apache/opennlp/pull/1112#issuecomment-4802889583
@rzo1 All three addressed (tip `13e46418`). **Turkish.** Added a `tur` profile — it was the one `SnowballStemmer.ALGORITHM` value that names a language and had no profile. Accent fold is null (Turkish diacritics are distinct letters, like the Nordic profiles), with an inline note that the search analyzer's case fold stays locale-generic — so the dotted/dotless-i pair folds by the Unicode default rather than Turkish rules, a deliberate search-recall choice, not Turkish-correct casing. I also softened the class doc: the covered set is every Snowball algorithm that names a language, i.e. all of them except `PORTER`, which is an English-only algorithm variant rather than a distinct language. **Magic `size == 21`.** Replaced with an intent-based assertion that derives the expected set from the enum: the algorithms reachable through the profiles must equal `EnumSet.complementOf(EnumSet.of(PORTER))`. It now fails loudly if a future algorithm is added without a profile (or vice-versa) instead of silently baking in a count. Renamed the test to `testSupportedLanguagesCoverEverySnowballLanguage`. **`MissingResourceException` fallback.** Added `testTwoLetterCodeWithNoIso3FallsBackToRawLookup`: a two-letter code with no ISO 639-3 mapping (e.g. `qq`) makes `getISO3Language()` throw, and the test asserts `forLanguage` catches it and falls through to a raw lookup (empty here) rather than propagating. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
