Hello Dimitris Tsirogiannis, Alex Behm, Dan Hecht, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9697 to look at the new patch set (#14). Change subject: IMPALA-6670: refresh lib-cache entries from plan ...................................................................... IMPALA-6670: refresh lib-cache entries from plan When an impalad is in executor-only mode, it receives no catalog updates. As a result, lib-cache entries are never refreshed. A consequence is that udf queries can return incorrect results or may not run due to resolution issues. Both cases are caused by the executor using a stale copy of the lib file. For incorrect results, an old version of the method may be used. Resolution issues can come up if a method is added to a lib file. The solution in this change is to capture the coordinator's view of the lib file's last modified time when planning. This last modified time is then shipped with the plan to executors. Executors must then use both the lib file path and the last modified time as a key for the lib-cache. If the coordinator's last modified time is more recent than the executor's lib-cache entry, then the entry is refreshed. Brief discussion of alternatives: - lib-cache always checks last modified time + easy/local change to lib-cache - adds an fs lookup always. rejected for this reason - keep the last modified time in the catalog - bound on staleness is too loose. consider the case where fn's f1, f2, f3 are created with last modified times of t1, t2, t3. treat the fn's last modified time as a low-watermark; if the cache entry has a more recent time, use it. Such a scheme would allow the version at t2 to persist. An old fn may keep the state from converging to the latest. This could end up with strange cases where different versions of the lib are used across executors for a single query. In contrast, the change in this path relies on the statestore to push versions forward at all coordinators, so will push all versions at all caches forward as well. Testing: - added an e2e custom cluster test Change-Id: Icf740ea8c6a47e671427d30b4d139cb8507b7ff6 --- M be/src/codegen/codegen-callgraph.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/external-data-source-executor.cc M be/src/exprs/agg-fn.cc M be/src/exprs/hive-udf-call.cc M be/src/exprs/scalar-fn-call.cc M be/src/runtime/lib-cache.cc M be/src/runtime/lib-cache.h M be/src/service/fe-support.cc M common/thrift/Frontend.thrift M common/thrift/Types.thrift M common/thrift/generate_error_codes.py M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M testdata/bin/copy-udfs-udas.sh M tests/custom_cluster/test_coordinators.py M tests/test-hive-udfs/src/main/java/org/apache/impala/TestUpdateUdf.java 22 files changed, 433 insertions(+), 83 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/9697/14 -- To view, visit http://gerrit.cloudera.org:8080/9697 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icf740ea8c6a47e671427d30b4d139cb8507b7ff6 Gerrit-Change-Number: 9697 Gerrit-PatchSet: 14 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>