Alexander Petrossian (PAF) created ORC-1553:
-----------------------------------------------
Summary: Reading information from Row group, where there are 0
records of SArg column
Key: ORC-1553
URL: https://issues.apache.org/jira/browse/ORC-1553
Project: ORC
Issue Type: Bug
Affects Versions: 1.9.2
Reporter: Alexander Petrossian (PAF)
We have created .orc file using Apache ORC library, I can not provide a
reproducible way to create such a file.
We have statistics for 100% row groups, checked with orc dump.
But I see that when we search by that file we get a very strange behavior:
{code}
TRACE org.apache.orc.impl.RecordReaderImpl: Stats = numberOfValues: 0
stringStatistics {
}
hasNull: false
TRACE org.apache.orc.impl.RecordReaderImpl: Setting (EQUALS value 71231231212)
to YES_NO_NULL
DEBUG org.apache.orc.impl.RecordReaderImpl: Row group 340000 to 349999 is
included.
{code}
If there are 0 values according to existing statistics, so there is obviously
no need to read that row group.
And yet we have YES_NO_NULL decision which forces inclusion of that row group
in subsequent operation, which meaningless and bad for performance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)