Vuk Ercegovac created IMPALA-6670:
-------------------------------------

             Summary: Executor-only impalads do not refresh their lib-cache 
entries
                 Key: IMPALA-6670
                 URL: https://issues.apache.org/jira/browse/IMPALA-6670
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Frontend
    Affects Versions: Impala 2.9.0
            Reporter: Vuk Ercegovac


When impalads are only executors, there is no way for their lib-cache entries 
to be refreshed. As far as I can tell, the version of the cached file will 
remain the same until the impalad is restarted (and a query with a udf/uda that 
references that file is eval'd on that node).

In contrast, impalads that are both executors and coordinators will receive 
metadata updates which will result in the cache entry being refreshed. Even in 
this mode, there is room for inconsistency (e.g., update the jar between 
coordination and evaluation), but all impalads can be made to converge.

Basic steps to repro:
 * Make two jars (I used impala-hive-udfs.jar), one with TestUdf.class and the 
other with TestUdf.class + ReplaceStringUdf.class
 * Clear the state

drop function scratch.identity(boolean);
 drop function scratch.replace_string(string);
 * cp part1.jar to tmp.jar

hadoop fs -cp -f /test-warehouse/scratch.db/part1.jar 
/test-warehouse/scratch.db/tmp.jar
 * create identity from tmp.jar

create function scratch.identity(boolean) returns boolean
 location '/test-warehouse/scratch.db/tmp.jar'
 symbol='org.apache.impala.TestUdf';
 * Run a query on all nodes

select count( *) from functional.alltypes where scratch.identity(bool_col) = 
bool_col;
 * cp part2.jar to tmp.jar

hadoop fs -cp -f /test-warehouse/scratch.db/part2.jar 
/test-warehouse/scratch.db/tmp.jar
 * create replace_string function

create function scratch.replace_string(string) returns string
 location '/test-warehouse/scratch.db/tmp.jar'
 symbol='org.apache.impala.ReplaceStringUdf';
 * run a query

select count( *) from functional.alltypes where 
scratch.replace_string(string_col) = string_col;

When all impalads are both executors and coordinators, the second query works.

With:

./bin/start-impala-cluster.py --num_coordinators=1

The second query always results in:

WARNINGS: ImpalaRuntimeException: Unable to find class.
 CAUSED BY: ClassNotFoundException: org.apache.impala.ReplaceStringUdf

(each backend still has the previous version of tmp.jar)

Currently, executors do not need metadata other than what is supplied by 
coordinators in the plan. Libs are excluded from this scheme; each impalad 
tries to maintain consistency with the lib files stored in the FS as of the 
time of function creation (little more complicated ...). 

One option here is that plans include lib version information so that impalads 
can know when a refresh is needed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to