Vuk Ercegovac created IMPALA-6670: ------------------------------------- Summary: Executor-only impalads do not refresh their lib-cache entries Key: IMPALA-6670 URL: https://issues.apache.org/jira/browse/IMPALA-6670 Project: IMPALA Issue Type: Bug Components: Backend, Frontend Affects Versions: Impala 2.9.0 Reporter: Vuk Ercegovac
When impalads are only executors, there is no way for their lib-cache entries to be refreshed. As far as I can tell, the version of the cached file will remain the same until the impalad is restarted (and a query with a udf/uda that references that file is eval'd on that node). In contrast, impalads that are both executors and coordinators will receive metadata updates which will result in the cache entry being refreshed. Even in this mode, there is room for inconsistency (e.g., update the jar between coordination and evaluation), but all impalads can be made to converge. Basic steps to repro: * Make two jars (I used impala-hive-udfs.jar), one with TestUdf.class and the other with TestUdf.class + ReplaceStringUdf.class * Clear the state drop function scratch.identity(boolean); drop function scratch.replace_string(string); * cp part1.jar to tmp.jar hadoop fs -cp -f /test-warehouse/scratch.db/part1.jar /test-warehouse/scratch.db/tmp.jar * create identity from tmp.jar create function scratch.identity(boolean) returns boolean location '/test-warehouse/scratch.db/tmp.jar' symbol='org.apache.impala.TestUdf'; * Run a query on all nodes select count( *) from functional.alltypes where scratch.identity(bool_col) = bool_col; * cp part2.jar to tmp.jar hadoop fs -cp -f /test-warehouse/scratch.db/part2.jar /test-warehouse/scratch.db/tmp.jar * create replace_string function create function scratch.replace_string(string) returns string location '/test-warehouse/scratch.db/tmp.jar' symbol='org.apache.impala.ReplaceStringUdf'; * run a query select count( *) from functional.alltypes where scratch.replace_string(string_col) = string_col; When all impalads are both executors and coordinators, the second query works. With: ./bin/start-impala-cluster.py --num_coordinators=1 The second query always results in: WARNINGS: ImpalaRuntimeException: Unable to find class. CAUSED BY: ClassNotFoundException: org.apache.impala.ReplaceStringUdf (each backend still has the previous version of tmp.jar) Currently, executors do not need metadata other than what is supplied by coordinators in the plan. Libs are excluded from this scheme; each impalad tries to maintain consistency with the lib files stored in the FS as of the time of function creation (little more complicated ...). One option here is that plans include lib version information so that impalads can know when a refresh is needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)