arouel opened a new issue, #3501:
URL: https://github.com/apache/parquet-java/issues/3501

   ### Describe the enhancement requested
   
   `DictionaryValuesWriter.shouldFallBack()` is called by 
`FallbackValuesWriter.checkFallback()` after every single value write. The 
current implementation dispatches a virtual call to `getDictionarySize()` on 
every invocation:
   ```java
   public boolean shouldFallBack() {
     return dictionaryByteSize > maxDictionaryByteSize || getDictionarySize() > 
MAX_DICTIONARY_ENTRIES;
   }
   ```
   `getDictionarySize()` is an abstract method overridden in each typed 
subclass (Binary, Long, Double, Integer, Float) to return the backing map's 
`.size()`. Since `shouldFallBack()` is polled after every write, including 
writes of duplicate values that do not grow the dictionary, the virtual 
dispatch and map-size query are redundant work for the common case where most 
values are already in the dictionary.
   Both `dictionaryByteSize` and the dictionary entry count can only increase 
when a new entry is added (inside the `if (id == -1)` branch of each subclass's 
write method). Therefore the size-exceeded condition can only transition from 
`false` to `true` at that exact point.
   
   ### Proposal
   Replace the per-write check with a cached boolean `dictionarySizeExceeded` 
flag. Introduce a `checkDictionarySizeLimit(int newDictionarySize)` method that 
subclass write methods call only when a new dictionary entry is actually added. 
`shouldFallBack()` then returns the cached flag directly, a simple field read 
with no virtual dispatch.
   
   ### Component(s)
   
   Core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to