[ 
https://issues.apache.org/jira/browse/IMPALA-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-6386.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

commit d9b6fd073055b436c7404d49454dc215b2c7a369
Author: Joe McDonnell <joemcdonn...@cloudera.com>
Date: Thu Jan 11 15:09:52 2018 -0800

IMPALA-6386: Invalidate metadata at table level for dataload
 
 Dataload currently executes bin/load-data.py for TPC-H,
 TPC-DS, and functional-query concurrently. One of the final
 steps for bin/load-data.py is to run a global "invalidate
 metadata". Global "invalidate metadata" commands are known
 to cause problem on concurrent systems. See IMPALA-5087.
 For dataload, if TPC-H executes "invalidate metadata" while
 TPC-DS is still creating tables and adding partitions,
 the TPC-DS executor might erroneously believe that a table
 does not exist.
 
 This changes dataload to invalidate metadata at an
 individual table level rather than globally. This
 prevents the concurrency issue.
 
 This also changes the names of some of the intermediate
 SQL files generated by generate-schema-statements.py
 and consumed by load-data.py to make them less confusing.
 
 Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f
 Reviewed-on: http://gerrit.cloudera.org:8080/9009
 Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
 Tested-by: Impala Public Jenkins

> Dataload can fail due to "invalidate metadata" concurrent with DDLs
> -------------------------------------------------------------------
>
>                 Key: IMPALA-6386
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6386
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 2.11.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>             Fix For: Impala 2.12.0
>
>
> testdata/bin/create-load-data.sh runs bin/load-data.py on TPC-H, TPC-DS, and 
> functional-query in parallel. One of the final steps of bin/load-data.py is 
> to run a universal "invalidate metadata". However, universal "invalidate 
> metadata" is an error-prone operation in a concurrent system. When 
> "invalidate metadata" happens during the DDL statements for another dataset 
> (i.e. TPC-H finishes and runs "invalidate metadata" while TPC-DS is still 
> creating tables and adding partitions), it can lead to errors.
> Thread 1: create external table foo ... ;
> Thread 2: invalidate metadata;
> Thread 1: alter table foo add partition bar; <-- Hits error because it can't 
> find foo
> This is a known issue: IMPALA-5087. This has been seen in my development 
> environment and one automated build, but it is relatively rare.
> Dataload needs to switch to using "invalidate metadata {table_name}" to avoid 
> this issue. This is also a good time to consider using "refresh {table_name}".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to