[ 
https://issues.apache.org/jira/browse/IMPALA-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886206#comment-16886206
 ] 

ASF subversion and git services commented on IMPALA-8486:
---------------------------------------------------------

Commit 1cd85d1f8a0d772a4cab263cce4f41728f6ebac7 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1cd85d1 ]

IMPALA-8486: fix stale libCache entries in LocalCatalog mode coordinators

In LocalCatalog mode, after a function is dropped, statestored will
broadcast the update to invalidate the cached CatalogObject in each
coordinator (if they have). However, the current code path does not
trigger libCache to remove the cached JAR/SO file. If we replace the
function file in HDFS with a new one and create the function again
using the same HDFS path, the SELECT statements in other coordinators
won't trigger libCache to refresh the local cached file, so they still
use the old cached file which causes errors.

When a coordinator invalidates its cached CatalogObject of a function,
it should also mark the corresponding libCache entry as "needs refresh".
So the later usage of this function will check the last modified time of
the HDFS file and refresh it in needs. To achieve this, we have to
propagate the HDFS path of the function along with the full function
name in the minimal topic, so libCache can target the cached entry.

Note that this does not prevent the dedicated executors to have stale
libCache entries. It needs some architecture changes. We'll follow it
in IMPALA-8763.

Tests
 - Re-enable test_udf_update_via_drop and test_udf_update_via_create for
LocalCatalog mode.

Change-Id: Ie4812fb8737de3ba6074ffeb9007927bfbbbaf9b
Reviewed-on: http://gerrit.cloudera.org:8080/13849
Reviewed-by: Tim Armstrong <tarmstr...@cloudera.com>
Reviewed-by: Bharath Vissapragada <bhara...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> test_udf_update_via_drop and test_udf_update_via_create fail on local catalog
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-8486
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8486
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.3.0
>            Reporter: Tim Armstrong
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: catalog-v2
>
> {noformat}
>  TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> tests/query_test/test_udfs.py:541: in test_udf_update_via_drop
>     self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"])
> tests/query_test/test_udfs.py:52: in _run_query_all_impalads
>     assert result.data == expected
> E   assert ['Old UDF'] == ['New UDF']
> E     At index 0 diff: 'Old UDF' != 'New UDF'
> E     Full diff:
> E     - ['Old UDF']
> E     + ['New UDF']
> ----------------------------
> {noformat}
> The tests are checking that the local UDF caches on each impalad get 
> invalidated by a drop/create of a function referencing the HDFS file 
> containing the UDF. The test fails because the local catalog, unlike the 
> regular catalog, doesn't invalidate LibCache entries upon receiving a catalog 
> update.
> I looked at this for long enough to realise that the invalidation mechanism 
> is fundamentally broken - it doesn't work with dedicated executors. It also 
> creates a race between the statestore updates and queries referencing the 
> UDFs - if the queries win the race, then they can incorrectly use the old 
> version that should have been invalidated.
> I think this is a potentially problematic issue because old JAR/SO versions 
> could persist in the cache indefinitely if old versions are overwritten in 
> place.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to