arouel opened a new issue, #3501:
URL: https://github.com/apache/parquet-java/issues/3501
### Describe the enhancement requested
`DictionaryValuesWriter.shouldFallBack()` is called by
`FallbackValuesWriter.checkFallback()` after every single value write. The
current implementation dispatches a virtual call to `getDictionarySize()` on
every invocation:
```java
public boolean shouldFallBack() {
return dictionaryByteSize > maxDictionaryByteSize || getDictionarySize() >
MAX_DICTIONARY_ENTRIES;
}
```
`getDictionarySize()` is an abstract method overridden in each typed
subclass (Binary, Long, Double, Integer, Float) to return the backing map's
`.size()`. Since `shouldFallBack()` is polled after every write, including
writes of duplicate values that do not grow the dictionary, the virtual
dispatch and map-size query are redundant work for the common case where most
values are already in the dictionary.
Both `dictionaryByteSize` and the dictionary entry count can only increase
when a new entry is added (inside the `if (id == -1)` branch of each subclass's
write method). Therefore the size-exceeded condition can only transition from
`false` to `true` at that exact point.
### Proposal
Replace the per-write check with a cached boolean `dictionarySizeExceeded`
flag. Introduce a `checkDictionarySizeLimit(int newDictionarySize)` method that
subclass write methods call only when a new dictionary entry is actually added.
`shouldFallBack()` then returns the cached flag directly, a simple field read
with no virtual dispatch.
### Component(s)
Core
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]