arouel opened a new pull request, #3502:
URL: https://github.com/apache/parquet-java/pull/3502

   ### Rationale for this change
   
   `DictionaryValuesWriter.shouldFallBack()` is called by 
`FallbackValuesWriter.checkFallback()` after every single value write. The 
current implementation dispatches a virtual call to `getDictionarySize()` on 
every invocation, even for duplicate values that do not grow the dictionary. 
Both `dictionaryByteSize` and the dictionary entry count can only increase when 
a new entry is added (inside the `if (id == -1)` branch), so the size-exceeded 
condition can only transition from false to true at that exact point. The 
per-write virtual dispatch is redundant work for the common case.
   
   ### What changes are included in this PR?
   
   Single file change to DictionaryValuesWriter.java:
   - Add a private boolean `dictionarySizeExceeded` field, reset in 
`resetDictionary()`.
   - Add a protected void `checkDictionarySizeLimit(int newDictionarySize)` 
method that sets the flag when `dictionaryByteSize > maxDictionaryByteSize || 
newDictionarySize > MAX_DICTIONARY_ENTRIES`.
   - Each subclass write method (Binary, FixedLenArray, Long, Double, Integer, 
Float) calls `checkDictionarySizeLimit(id + 1)` after inserting a new 
dictionary entry.
   - `shouldFallBack()` becomes return `dictionarySizeExceeded`, a simple field 
read with no virtual dispatch.
   
   ### Are these changes tested?
   
   Yes. All existing tests in `TestDictionary` pass without modification, 
confirming semantic equivalence. The new test 
`testCheckDictionarySizeLimitExceedsByEntryCount` directly exercises the 
previously untested `MAX_DICTIONARY_ENTRIES` boundary in 
`checkDictionarySizeLimit`, verifying correct trigger and reset behavior.
   
   ### Are there any user-facing changes?
   
   No. This is an internal optimization with no API, behavior, or configuration 
changes.
   
   Closes #3501
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to