[ https://issues.apache.org/jira/browse/IMPALA-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell resolved IMPALA-6386. ----------------------------------- Resolution: Fixed Fix Version/s: Impala 2.12.0 commit d9b6fd073055b436c7404d49454dc215b2c7a369 Author: Joe McDonnell <joemcdonn...@cloudera.com> Date: Thu Jan 11 15:09:52 2018 -0800 IMPALA-6386: Invalidate metadata at table level for dataload Dataload currently executes bin/load-data.py for TPC-H, TPC-DS, and functional-query concurrently. One of the final steps for bin/load-data.py is to run a global "invalidate metadata". Global "invalidate metadata" commands are known to cause problem on concurrent systems. See IMPALA-5087. For dataload, if TPC-H executes "invalidate metadata" while TPC-DS is still creating tables and adding partitions, the TPC-DS executor might erroneously believe that a table does not exist. This changes dataload to invalidate metadata at an individual table level rather than globally. This prevents the concurrency issue. This also changes the names of some of the intermediate SQL files generated by generate-schema-statements.py and consumed by load-data.py to make them less confusing. Change-Id: Ibc3a6d8a674a0bf6b02069bfe8a5e12034335b1f Reviewed-on: http://gerrit.cloudera.org:8080/9009 Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> Tested-by: Impala Public Jenkins > Dataload can fail due to "invalidate metadata" concurrent with DDLs > ------------------------------------------------------------------- > > Key: IMPALA-6386 > URL: https://issues.apache.org/jira/browse/IMPALA-6386 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure > Affects Versions: Impala 2.11.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Critical > Fix For: Impala 2.12.0 > > > testdata/bin/create-load-data.sh runs bin/load-data.py on TPC-H, TPC-DS, and > functional-query in parallel. One of the final steps of bin/load-data.py is > to run a universal "invalidate metadata". However, universal "invalidate > metadata" is an error-prone operation in a concurrent system. When > "invalidate metadata" happens during the DDL statements for another dataset > (i.e. TPC-H finishes and runs "invalidate metadata" while TPC-DS is still > creating tables and adding partitions), it can lead to errors. > Thread 1: create external table foo ... ; > Thread 2: invalidate metadata; > Thread 1: alter table foo add partition bar; <-- Hits error because it can't > find foo > This is a known issue: IMPALA-5087. This has been seen in my development > environment and one automated build, but it is relatively rare. > Dataload needs to switch to using "invalidate metadata {table_name}" to avoid > this issue. This is also a good time to consider using "refresh {table_name}". -- This message was sent by Atlassian JIRA (v7.6.3#76005)