Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12845 )
Change subject: Re-land IMPALA-5393. Use THREAD_LOCAL state for regexp ...................................................................... Re-land IMPALA-5393. Use THREAD_LOCAL state for regexp This re-lands commit 6e8c330f40da087ca0d8ba844cd9d97a8e60ff67 which was reverted in d3428a58d8f54d1a64d5aeb1af3f76b7ffcb53d0. The revert was due to an assumption that this commit depended on the new version of re2 (which was correctly reverted due to a toolchain issue). In fact this commit does not depend on any toolchain changes. Original commit message follows -------------------------------- This changes the built-in regexp-related UDFs to use THREAD_LOCAL re2::RE instances instead of FRAGMENT_LOCAL. Although re2::RE is thread-safe, it achieves that thread safety through a certain amount of locking. Using thread-local regexps improves performance substantially. I ran a simple test query: select sum(l_linenumber) from item_20x where length(regexp_extract(l_shipinstruct, '.*', 0)) > 0 on a table with three underlying parquet files (thus getting 3 scanner threads). Prior to this change, the query took ~60 seconds and burned 2m16sec CPU time. With this change, it took ~19sec and 43s CPU time. For a query with more scanner threads, the improvement should be even more dramatic. The only potential downside of this change is slightly increased memory consumption by having one RE instance per thread, but the REs themselves should be small relative to all of the other per-scanner-thread memory. Change-Id: I9ae0703efeb2429813b2a712f1accf1b0a4a409e Reviewed-on: http://gerrit.cloudera.org:8080/12845 Reviewed-by: Lars Volker <l...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/exprs/string-functions-ir.cc 1 file changed, 6 insertions(+), 6 deletions(-) Approvals: Lars Volker: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/12845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I9ae0703efeb2429813b2a712f1accf1b0a4a409e Gerrit-Change-Number: 12845 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>