arouel opened a new pull request, #3502: URL: https://github.com/apache/parquet-java/pull/3502
### Rationale for this change `DictionaryValuesWriter.shouldFallBack()` is called by `FallbackValuesWriter.checkFallback()` after every single value write. The current implementation dispatches a virtual call to `getDictionarySize()` on every invocation, even for duplicate values that do not grow the dictionary. Both `dictionaryByteSize` and the dictionary entry count can only increase when a new entry is added (inside the `if (id == -1)` branch), so the size-exceeded condition can only transition from false to true at that exact point. The per-write virtual dispatch is redundant work for the common case. ### What changes are included in this PR? Single file change to DictionaryValuesWriter.java: - Add a private boolean `dictionarySizeExceeded` field, reset in `resetDictionary()`. - Add a protected void `checkDictionarySizeLimit(int newDictionarySize)` method that sets the flag when `dictionaryByteSize > maxDictionaryByteSize || newDictionarySize > MAX_DICTIONARY_ENTRIES`. - Each subclass write method (Binary, FixedLenArray, Long, Double, Integer, Float) calls `checkDictionarySizeLimit(id + 1)` after inserting a new dictionary entry. - `shouldFallBack()` becomes return `dictionarySizeExceeded`, a simple field read with no virtual dispatch. ### Are these changes tested? Yes. All existing tests in `TestDictionary` pass without modification, confirming semantic equivalence. The new test `testCheckDictionarySizeLimitExceedsByEntryCount` directly exercises the previously untested `MAX_DICTIONARY_ENTRIES` boundary in `checkDictionarySizeLimit`, verifying correct trigger and reset behavior. ### Are there any user-facing changes? No. This is an internal optimization with no API, behavior, or configuration changes. Closes #3501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
