[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has uploaded a new patch set (#9). Change subject: IMPALA-1702: Enforce table level consistency accross service .. IMPALA-1702: Enforce table level consistency accross service Problems: 1. CatalogServiceCatalog::reset() resets table ID, so same tables can have different ID and different tables can have same table ID before/after this call. Backend could overwrites each table in an 1:1 table ID to table descriptor map. 2. 1 can happen because frontend gets table directly from catalog cache, which is updated concurrently with query analyzing. so Analyzer::getTable() could return different references for the same table when analyzing the same query. This commit removes table Id from Table object; instead frontend assigns a unique Id to each table just before it is sent to backend. It also implements a referecedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same references for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinTableId.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 25 files changed, 178 insertions(+), 209 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/9 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 154: public TDescriptorTable toThrift() { > Let's add a comment about table id assignment and that this also checks the Done Line 156: // Maps from base table to its table id used in backend. > in the backend Done http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java: Line 42: import com.google.common.base.Preconditions; > remove Done http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: Line 1128:* 1. All tables sent to backend have a unique Id, assigned in > Let's rework/move this validation as follows: Done Line 1134: private void validateTableIds(Analyzer analyzer, TExecRequest result) > validateTableConsistency()? Done http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: Line 93: private final MaptableMap_ = Maps.newHashMap(); > Integer Done -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Alex Behm has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 154: public TDescriptorTable toThrift() { Let's add a comment about table id assignment and that this also checks the single-table consistency guarantee. Line 156: // Maps from base table to its table id used in backend. in the backend http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java: Line 42: import com.google.common.base.Preconditions; remove http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: Line 1128:* 1. All tables sent to backend have a unique Id, assigned in Let's rework/move this validation as follows: * Let's check the table-level consistency guarantee directly in DescriptorTbl.toThrift(). * Let's add validateTableIds() to PlannerTestBase that checks the uniqueness of table ids in the produced TDescriptorTable * Get rid of this current function and remove references to IMPALA-1702. Line 1134: private void validateTableIds(Analyzer analyzer, TExecRequest result) validateTableConsistency()? http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: Line 93: private final MaptableMap_ = Maps.newHashMap(); Integer -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi Xu Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 7: (16 comments) http://gerrit.cloudera.org:8080/#/c/4349/6/common/thrift/Types.thrift File common/thrift/Types.thrift: Line 22: typedef i32 TPlanNodeId > We can also move this typedef into a more appropriate place like Descriptor I think it is ok to push it here to stick with others. just make it an int. http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 70: import org.apache.impala.util.DisjointSet; > will remove Done http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 50: public static final int TABLE_SINK_ID = 0; > I know it doesn't matter but we typically do "public static final ..." Done Line 52: private int nextTableId_ = TABLE_SINK_ID + 1; > nextTableId_ Done Line 54: public TupleDescriptor createTupleDescriptor(String debugName) { > I think we can just remove this and inline at the single caller Done Line 158: if (targetTable_ != null) tableIdMap.put(targetTable_, TABLE_SINK_ID); > Maps from base table to its table id. Done Line 165: Integer tableId = tableIdMap.get(table); > Lots of nesting here, can we invert this condition, i.e. Done Line 166: if (table != null && !(table instanceof View) && tableId == null) { > Remember the tableId here so avoid some hash lookups. Done Line 168: tableIdMap.put(table, tableId); > Use this pattern to avoid multiple lookups: Done http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: Line 198: public TTupleDescriptor toThrift(Integer tableId) { > just tableId Done http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 34: import java.util.concurrent.locks.ReentrantReadWriteLock; > will remove Done http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java File fe/src/main/java/org/apache/impala/catalog/HBaseTable.java: Line 665: public TTableDescriptor toThriftDescriptor(int tableId, Set referencedPartitions) { > tableId Done http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 1567: public TTableDescriptor toThriftDescriptor(int tableId, Set referencedPartitions) { > tableId Done http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java: Line 68: public TTableDescriptor toThriftDescriptor(int tableId, Set referencedPartitions) { > tableId Done http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: Line 38: import org.apache.impala.catalog.KuduTable; > not intentional.. will revert next patch! Done PS6, Line 1105: String db = insertStmt.getTargetTableName().getDb(); : finalizeParams.setTable_db(db == null ? queryCtx.session.database : db); : HdfsTable hdfsTable = (HdfsTable) insertStmt.getTargetTable(); : finalizeParams.setHdfs_base_dir(hdfsTable.getHdfsBaseDir()); : finalizeParams.setStaging_dir( : hdfsTable.getHdfsBaseDir() + "/_impala_insert_staging"); : > will update Done -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Alex Behm has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 6: (12 comments) So much better! http://gerrit.cloudera.org:8080/#/c/4349/6/common/thrift/Types.thrift File common/thrift/Types.thrift: Line 22: typedef i64 TTableId > now i32 is ok I think. We can also move this typedef into a more appropriate place like Descriptors.thrift because nobody else should care about it now. http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 50: static public final long TABLE_SINK_ID = 0; I know it doesn't matter but we typically do "public static final ..." Line 52: private long nextTableId = TABLE_SINK_ID + 1; nextTableId_ change to int? Line 54: private long getNextTableId() { return nextTableId++; } I think we can just remove this and inline at the single caller Line 158: // tableIdMap.get(table) == null iff (table == null || table instanceof View) Maps from base table to its table id. Line 165: if (tupleDesc.isMaterialized()) { Lots of nesting here, can we invert this condition, i.e. if (!tupleDesc.isMaterialized()) continue; Line 166: Table table = tupleDesc.getTable(); Remember the tableId here so avoid some hash lookups. Long tableId = null; if (table != null && !(table instanceof View)) { ... } Line 168: if (!tableIdMap.containsKey(table)) { Use this pattern to avoid multiple lookups: Entry e = map,.get(); if (e == null) { e = nextid; map.put(); } http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: Line 198: public TTupleDescriptor toThrift(Long assignedTableId) { just tableId http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java File fe/src/main/java/org/apache/impala/catalog/HBaseTable.java: Line 665: public TTableDescriptor toThriftDescriptor(long id, Set referencedPartitions) { tableId http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 1567: public TTableDescriptor toThriftDescriptor(long id, Set referencedPartitions) { tableId http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java: Line 68: public TTableDescriptor toThriftDescriptor(long id, Set referencedPartitions) { tableId (and elsewhere) -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 44: // The output table of an insert query. > I still don't understand what this has to do with partition pruning. The pu Done http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 34: import java.util.concurrent.atomic.AtomicLong; remove next patch Line 141: private final TableLoadingMgr tableLoadingMgr_; > Sorry I got confused, you can remove this sentence. I incorrectly thought t done http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 64: > The metastoreAccessLock_ is unused. Please remove. Done http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 79: protected final ArrayList colsByPos_ = Lists.newArrayList(); > As discussed, thinking about this a little more I think the right fix is to kind of done. what do you think -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has uploaded a new patch set (#6). Change subject: IMPALA-1702: Enforce table level consistency accross service .. IMPALA-1702: Enforce table level consistency accross service Problems: 1. CatalogServiceCatalog::reset() resets table ID, so same tables can have different ID and different tables can have same table ID before/after this call. Backend could overwrites each table in an 1:1 table ID to table descriptor map. 2. 1 can happen because frontend gets table directly from catalog cache, which is updated concurrently with query analyzing. so Analyzer::getTable() could return different references for the same table when analyzing the same query. This commit removes table Id, and keeps a referecedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same references for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/CatalogObjects.thrift M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableName.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinTableId.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 28 files changed, 151 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/6 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Alex Behm has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 44: // List of tables to exclude from partition pruning, and thus all partitions are sent to > The only place that this actually becomes useful is for partition pruning? I still don't understand what this has to do with partition pruning. The purpose of this list is correctly described in the original comment. It is to ship the descriptors of these tables to the BE. If we don't do this, then the BE will not know where to write data to in an insert because the base and partition paths are part of the table metadata. The purpose really is to ship the table metadata to the BE and I don't see relevance to partition pruning. Imo, this should be like you had it before. It should be called targetTable_ or insertTable_ or insertTargetTable_ or something like that. http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 141:* TableId does not exist anymore. Sorry I got confused, you can remove this sentence. I incorrectly thought the @link below referenced the TableId class. http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 64: private static final Object metastoreAccessLock_ = new Object(); > remove which? ?removed INVALID_TABLE_ID. not sure if this is what you wante The metastoreAccessLock_ is unused. Please remove. http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 79: protected final long id_; As discussed, thinking about this a little more I think the right fix is to get rid of this id altogether. The table id is only used for mapping tuple descriptors to tables in the BE - a query local mapping. So I think the better fix is to remove the table id here and instead assign ids on-the-fly (or establish the mapping some other way) in DescriptorTbl.toThrift(). -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has uploaded a new patch set (#5). Change subject: IMPALA-1702: Enforce table level consistency accross service .. IMPALA-1702: Enforce table level consistency accross service Problems: 1. CatalogServiceCatalog::reset() resets table ID, so same tables can have different ID and different tables can have same table ID before/after this call. Backend could overwrites each table in an 1:1 table ID to table descriptor map. 2. 1 can happen because frontend gets table directly from catalog cache, which is updated concurrently with query analyzing. so Analyzer::getTable() could return different references for the same table when analyzing the same query. This commit solves different table having same table ID issue by changing table ID counter to an ever increasing 64 bits integer. It solves same table with different table issue ID by keeping a referecedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same references for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinTableId.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 26 files changed, 118 insertions(+), 122 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/5 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 4: (11 comments) http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 294: // Catalog cache has the latest tables from the statestore, in order to use the > The Impalad Catalog has ... Done Line 295: // same table version in a single query, we cache all referenced tables and reuse them > ... query , we cache all referenced tables here. Done Line 2299: if (table == null) { > remove Done http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: Line 187: // SelectStmt (or the BE will be very confused). Assign a unique table ID to it. > The previous longer comment was clearer. How about: Done http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 44: // List of tables to exclude from partition pruning, and thus all partitions are sent to > Afaik, these tables have nothing to do with partition pruning, so the name The only place that this actually becomes useful is for partition pruning? so for "INSERT INTO a PARTITION(b='d') select c from a; if c is in another partition other than partition 'b', then without this partition b's metadata wont be sent. it has no other meaningful usage than this I think. maybe I just change this to a HashSet? Line 181: referencedPartitions = getReferencedPartitions(tbl); > will remove this Done http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 141:* Ever increasing counter for table id {@link Table#id_}. New id is given when > TableId doesn't exist anymore. Done http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 64: private static final Object metastoreAccessLock_ = new Object(); > Please remove while you are here. remove which? ?removed INVALID_TABLE_ID. not sure if this is what you wanted. Line 72: public static final long LOCAL_TABLE_ID = -4; > LOCAL_VIEW_ID is more explicit. Sorry to make you change it again. Done http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/planner/JoinTableId.java File fe/src/main/java/org/apache/impala/planner/JoinTableId.java: Line 25: // Construction only allowed via an IdGenerator. > newline before comment Done http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: Line 1126:* Check that we don't have any duplicate table IDs (see IMPALA-1702). > We should declare IMPALA-1702 as fixed and not mention it here. Instead, we I added something in ImpaladCatalog.java::addTable(). not sure if that is what you suggested.k -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Alex Behm has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 44: // Target table in an insert statement. Is this also the target for UPDATE/DELETE statements? PS3, Line 45: targetTable_ > The reason for this is that previously I tried to add the cache here and wa I'm in favor of this change because "referencedTables" was so generic that it was misleading. We should prefer to be as specific as possible. If we need more abstraction, we can add it later if we need it (but chances are we won't). -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 181: referencedPartitions = getReferencedPartitions(tbl); will remove this -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/4349/3//COMMIT_MSG Commit Message: PS3, Line 12: If backend sees these, : it will overwrites each other in an 1:1 table ID to table : descriptor map. might need a little rephrase. PS3, Line 18: activities operations? PS3, Line 22: ..issue.. (same in next statement too.) http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS3, Line 296: HashMap I think its better to use the parent Map<> class here (that seems to be consistent usage elsewhere in the code). PS3, Line 2282: Returns the Catalog Table object for the given database and table name. :* Adds the table to this analyzer's "missingTbls_" and throws an AnalysisException if :* the table has not yet been loaded in the local catalog cache. :* Throws an AnalysisException if the table or the db does not exist in the Catalog. :* This function does not register authorization requests and does not log access events Please update this to reflect the new logic. http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: PS3, Line 45: targetTable_ Just curious what the rationale behind this clean up. Is it because we use it only for the inserts? I don't know if we intend to use a bunch of referencedTables_ in the future for any other usecase in which case this change should again be undone. Alex, any idea? http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 34: import java.util.concurrent.atomic.AtomicInteger; Unused import now that we switched to Long? -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Alex Behm has posted comments on this change. Change subject: IMPALA-1702: Enforce table level consistency accross service .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 294: // Caches all table used in a single query. Use the same reference instead of getting "Caches all tables referenced in a single query" The comment should explain why we have this local cache. The main motivation is to guarantee single table consistency within a query, i.e., that within one query the same version of a table is used. Note that the catalog cache always contains the latest version of a table. Line 2290: TTableName tblName = new TTableName(dbName, tableName); I'm not sure if this will deal with case-insensitivity properly. Why not use a TableName instead? The equals() and hashCode() definitely do the right thing. Line 2291: Table table = globalState_.referencedTables_.get(tblName); How about this: if (table != null) { // Return query-local version of table. Preconditions.checkState(table.isLoaded()); return table; } http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: Line 141: protected final AtomicLong nextTableId_ = new AtomicLong(0); Add comment that explains the use and behavior of this id, i.e. the id is ever-increasing and only gets reset when the service is re-started. Mention at which points a table gets assigned a new id. http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 65: public static final long INVALID_TABLE_ID = -1; Add comment for these subtly different special ids. Line 67: public static final long ERROR_METADATA_LOADING_ID = -3; I think we should add LOCAL_TABLE_ID and TEST_TABLE_ID as well and describe their uses. http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/common/Id.java File fe/src/main/java/org/apache/impala/common/Id.java: Line 42: return Long.valueOf(id_).hashCode(); revert Line 47: return Long.toString(id_); revert http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: Line 1130 I think we should still maintain something like this to check and document our new guarantees. In addition, it would be good to check our invariants in other places if possible. For example, when we receive an update for a table from the statestore, we should be able to guarantee that the currentTableId <= newTableId iff currentCatalogVersion < catalogVersion. Or something along those lines, I think you get the idea :) Line 1134 There is similar code in a few places in the BE. We should remove that, too. -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service
Huaisi Xu has uploaded a new patch set (#3). Change subject: IMPALA-1702: Enforce table level consistency accross service .. IMPALA-1702: Enforce table level consistency accross service Problems: 1. CatalogServiceCatalog::reset() resets table ID, so same tables can have different ID and different tables can have same table ID before/after this. If backend sees these, it will overwrites each other in an 1:1 table ID to table descriptor map. 2. 1 can happen because frontend gets table directly from catalog cache, which is updated concurrently with other activities, so Analyzer::getTable() can return different references for the same table when analyzing the same query. This commit solves different table having same table ID by changing table ID counter to an ever increasing 64 bits integer. It solves same table with different table ID by keeping a referecedTables_ cache in Analyzer::globalState_ so that calling Analyzer::getTable() on the same table returns the same references for the same query. Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 --- M common/thrift/Types.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java D fe/src/main/java/org/apache/impala/catalog/TableId.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/common/Id.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 24 files changed, 82 insertions(+), 157 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/3 -- To view, visit http://gerrit.cloudera.org:8080/4349 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Huaisi XuGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Huaisi Xu Gerrit-Reviewer: Tim Armstrong