Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10793


Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for 
common data errors
......................................................................

WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Some data errors (for example out-of-range parquet timestamps)
can dominate logs if a table contains a large number of rows with
invalid data. If an error has its own error code, then these errors
are already aggregated for the user in RuntimeState, but the logs
will contain a new line for every occurrence, which is rarely useful.

There is always a compromise to make between providing as much
information as possible and reducing log size - the one chosen
here is logging the first occurrence of the error when it happens and
the number of similar errors at the end (per query+table+rowgroup+column).

Utility class RuntimeState::LogCollector is added to collect
logs "locally", so not in RuntimeState, as RuntimeState is shared
between different scanner threads/fragments of a query and would
make some kind of locking necessary.

This change should improve the performance of scans hitting many
data errors even if logging is turned off, because the counting
of already occured errors no longer needs locking and the
substitution of paramaters in the error string is also avoided.

TODOs:
- add some kind of testing
- change other data errors (like out of range Kudu timestamp) to
  use a similar logic
- I am not completely satisfied with the interface, so I am open
  to suggestions

Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
4 files changed, 76 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/10793/1
--
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>

Reply via email to