Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10793
Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors ...................................................................... WIP IMPALA-7178: Add the possibility to reduce logging for common data errors Some data errors (for example out-of-range parquet timestamps) can dominate logs if a table contains a large number of rows with invalid data. If an error has its own error code, then these errors are already aggregated for the user in RuntimeState, but the logs will contain a new line for every occurrence, which is rarely useful. There is always a compromise to make between providing as much information as possible and reducing log size - the one chosen here is logging the first occurrence of the error when it happens and the number of similar errors at the end (per query+table+rowgroup+column). Utility class RuntimeState::LogCollector is added to collect logs "locally", so not in RuntimeState, as RuntimeState is shared between different scanner threads/fragments of a query and would make some kind of locking necessary. This change should improve the performance of scans hitting many data errors even if logging is turned off, because the counting of already occured errors no longer needs locking and the substitution of paramaters in the error string is also avoided. TODOs: - add some kind of testing - change other data errors (like out of range Kudu timestamp) to use a similar logic - I am not completely satisfied with the interface, so I am open to suggestions Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h 4 files changed, 76 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/10793/1 -- To view, visit http://gerrit.cloudera.org:8080/10793 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa Gerrit-Change-Number: 10793 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>