Csaba Ringhofer created IMPALA-13949:
----------------------------------------
Summary: Release plain encoded string buffers earlier if all rows
are dropped
Key: IMPALA-13949
URL: https://issues.apache.org/jira/browse/IMPALA-13949
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Csaba Ringhofer
Currently Impala always attaches data page buffers of plain encoded strings to
the next row batch, even if all rows were discarded by the predicates in the
given page. This can lead to surprising memory consumption in very selective
query.
{code}
set md_dop=0; set num_nodes=1; set batch_size=1;
select l_comment from tpch_parquet.lineitem where l_comment like "%nomatch";
{code}
>From the profile:
{code}
- RowBatchBytesEnqueued: 174.75 MB (183239804)
- RowBatchQueuePeakMemoryUsage: 8.06 MB (8453100)
- RowBatchesEnqueued: 29 (29)
- RowsRead: 6.00M (6001215)
- RowsReturned: 0 (0)
{code}
What happens above is that the RowBatch hits AtCapacity() due to hitting the
8MB mem limit for attached buffers and 29 row batches with 0 rows are
returned. This has a performance impact because freeing these buffers happens
on a different thread (in the mt_dop=0 case) than the allocation.
https://github.com/apache/impala/blob/f222574f04fc7b94e1ad514d7d720d50d036a226/be/src/exec/parquet/hdfs-parquet-scanner.cc#L2460
A solution could be to copy the strings if the predicate is very selective.
What complicates this is that this copy is only useful in case of plain
encoding, not dictionary encoding, and theoretically a single scratch batch can
have rows both from dictionary and plain encoded pages. Also, as a page may
fill multiple row batches, it is possible that a selective batch is followed by
a non-selective one where attaching the buffer still makes sense.
Another thing that could affect this is small string optimization - currently
it is not applied in Parquet scanner, while if all rows in a data page are
smallified the original buffer could be still dropped.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)