krickert commented on PR #1112:
URL: https://github.com/apache/opennlp/pull/1112#issuecomment-4802889583

   @rzo1 All three addressed (tip `13e46418`).
   
   **Turkish.** Added a `tur` profile — it was the one 
`SnowballStemmer.ALGORITHM` value that names a language and had no profile. 
Accent fold is null (Turkish diacritics are distinct letters, like the Nordic 
profiles), with an inline note that the search analyzer's case fold stays 
locale-generic — so the dotted/dotless-i pair folds by the Unicode default 
rather than Turkish rules, a deliberate search-recall choice, not 
Turkish-correct casing. I also softened the class doc: the covered set is every 
Snowball algorithm that names a language, i.e. all of them except `PORTER`, 
which is an English-only algorithm variant rather than a distinct language.
   
   **Magic `size == 21`.** Replaced with an intent-based assertion that derives 
the expected set from the enum: the algorithms reachable through the profiles 
must equal `EnumSet.complementOf(EnumSet.of(PORTER))`. It now fails loudly if a 
future algorithm is added without a profile (or vice-versa) instead of silently 
baking in a count. Renamed the test to 
`testSupportedLanguagesCoverEverySnowballLanguage`.
   
   **`MissingResourceException` fallback.** Added 
`testTwoLetterCodeWithNoIso3FallsBackToRawLookup`: a two-letter code with no 
ISO 639-3 mapping (e.g. `qq`) makes `getISO3Language()` throw, and the test 
asserts `forLanguage` catches it and falls through to a raw lookup (empty here) 
rather than propagating.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to