[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-06 Thread Huaisi Xu (Code Review)
Huaisi Xu has uploaded a new patch set (#9).

Change subject: IMPALA-1702: Enforce table level consistency accross service
..

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit removes table Id from Table object; instead
frontend assigns a unique Id to each table just before it
is sent to backend. It also implements a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
25 files changed, 178 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-05 Thread Huaisi Xu (Code Review)
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 154:   public TDescriptorTable toThrift() {
> Let's add a comment about table id assignment and that this also checks the
Done


Line 156: // Maps from base table to its table id used in backend.
> in the backend
Done


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

Line 42: import com.google.common.base.Preconditions;
> remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1128:* 1. All tables sent to backend have a unique Id, assigned in
> Let's rework/move this validation as follows:
Done


Line 1134:   private void validateTableIds(Analyzer analyzer, TExecRequest 
result)
> validateTableConsistency()?
Done


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 93:   private final Map tableMap_ = 
Maps.newHashMap();
> Integer
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 154:   public TDescriptorTable toThrift() {
Let's add a comment about table id assignment and that this also checks the 
single-table consistency guarantee.


Line 156: // Maps from base table to its table id used in backend.
in the backend


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

Line 42: import com.google.common.base.Preconditions;
remove


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1128:* 1. All tables sent to backend have a unique Id, assigned in
Let's rework/move this validation as follows:

* Let's check the table-level consistency guarantee directly in 
DescriptorTbl.toThrift().

* Let's add validateTableIds() to PlannerTestBase that checks the uniqueness of 
table ids in the produced TDescriptorTable

* Get rid of this current function and remove references to IMPALA-1702.


Line 1134:   private void validateTableIds(Analyzer analyzer, TExecRequest 
result)
validateTableConsistency()?


http://gerrit.cloudera.org:8080/#/c/4349/7/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 93:   private final Map tableMap_ = 
Maps.newHashMap();
Integer


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-05 Thread Huaisi Xu (Code Review)
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 7:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/4349/6/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 22: typedef i32 TPlanNodeId
> We can also move this typedef into a more appropriate place like Descriptor
I think it is ok to push it here to stick with others. just make it an int.


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 70: import org.apache.impala.util.DisjointSet;
> will remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 50:   public static final int TABLE_SINK_ID = 0;
> I know it doesn't matter but we typically do "public static final ..."
Done


Line 52:   private int nextTableId_ = TABLE_SINK_ID + 1;
> nextTableId_
Done


Line 54:   public TupleDescriptor createTupleDescriptor(String debugName) {
> I think we can just remove this and inline at the single caller
Done


Line 158: if (targetTable_ != null) tableIdMap.put(targetTable_, 
TABLE_SINK_ID);
> Maps from base table to its table id.
Done


Line 165:   Integer tableId = tableIdMap.get(table);
> Lots of nesting here, can we invert this condition, i.e.
Done


Line 166:   if (table != null && !(table instanceof View) && tableId == 
null) {
> Remember the tableId here so avoid some hash lookups.
Done


Line 168: tableIdMap.put(table, tableId);
> Use this pattern to avoid multiple lookups:
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 198:   public TTupleDescriptor toThrift(Integer tableId) {
> just tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.locks.ReentrantReadWriteLock;
> will remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/HBaseTable.java:

Line 665:   public TTableDescriptor toThriftDescriptor(int tableId, Set 
referencedPartitions) {
> tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1567:   public TTableDescriptor toThriftDescriptor(int tableId, Set 
referencedPartitions) {
> tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java:

Line 68:   public TTableDescriptor toThriftDescriptor(int tableId, Set 
referencedPartitions) {
> tableId
Done


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 38: import org.apache.impala.catalog.KuduTable;
> not intentional.. will revert next patch!
Done


PS6, Line 1105:   String db = insertStmt.getTargetTableName().getDb();
  : finalizeParams.setTable_db(db == null ? 
queryCtx.session.database : db);
  : HdfsTable hdfsTable = (HdfsTable) 
insertStmt.getTargetTable();
  : 
finalizeParams.setHdfs_base_dir(hdfsTable.getHdfsBaseDir());
  : finalizeParams.setStaging_dir(
  : hdfsTable.getHdfsBaseDir() + 
"/_impala_insert_staging");
  :  
> will update
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-05 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 6:

(12 comments)

So much better!

http://gerrit.cloudera.org:8080/#/c/4349/6/common/thrift/Types.thrift
File common/thrift/Types.thrift:

Line 22: typedef i64 TTableId
> now i32 is ok I think.
We can also move this typedef into a more appropriate place like 
Descriptors.thrift because nobody else should care about it now.


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 50:   static public final long TABLE_SINK_ID = 0;
I know it doesn't matter but we typically do "public static final ..."


Line 52:   private long nextTableId = TABLE_SINK_ID + 1;
nextTableId_

change to int?


Line 54:   private long getNextTableId() { return nextTableId++; }
I think we can just remove this and inline at the single caller


Line 158: // tableIdMap.get(table) == null iff (table == null || table 
instanceof View)
Maps from base table to its table id.


Line 165:   if (tupleDesc.isMaterialized()) {
Lots of nesting here, can we invert this condition, i.e.

if (!tupleDesc.isMaterialized()) continue;


Line 166: Table table = tupleDesc.getTable();
Remember the tableId here so avoid some hash lookups.

Long tableId = null;
if (table != null && !(table instanceof View)) {
...

}


Line 168:   if (!tableIdMap.containsKey(table)) {
Use this pattern to avoid multiple lookups:

Entry e = map,.get();
if (e == null) {
  e = nextid;
  map.put();
}


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 198:   public TTupleDescriptor toThrift(Long assignedTableId) {
just tableId


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/HBaseTable.java:

Line 665:   public TTableDescriptor toThriftDescriptor(long id, Set 
referencedPartitions) {
tableId


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1567:   public TTableDescriptor toThriftDescriptor(long id, Set 
referencedPartitions) {
tableId


http://gerrit.cloudera.org:8080/#/c/4349/6/fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java:

Line 68:   public TTableDescriptor toThriftDescriptor(long id, Set 
referencedPartitions) {
tableId

(and elsewhere)


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-05 Thread Huaisi Xu (Code Review)
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // The output table of an insert query.
> I still don't understand what this has to do with partition pruning. The pu
Done


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.atomic.AtomicLong;
remove next patch


Line 141:   private final TableLoadingMgr tableLoadingMgr_;
> Sorry I got confused, you can remove this sentence. I incorrectly thought t
done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 64: 
> The metastoreAccessLock_ is unused. Please remove.
Done


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 79:   protected final ArrayList colsByPos_ = Lists.newArrayList();
> As discussed, thinking about this a little more I think the right fix is to
kind of done. what do you think


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-05 Thread Huaisi Xu (Code Review)
Huaisi Xu has uploaded a new patch set (#6).

Change subject: IMPALA-1702: Enforce table level consistency accross service
..

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit removes table Id, and keeps a referecedTables_
cache in Analyzer::globalState_ so that calling
Analyzer::getTable() on the same table returns the same
references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/CatalogObjects.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableName.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
28 files changed, 151 insertions(+), 201 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-04 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // List of tables to exclude from partition pruning, and thus all 
partitions are sent to
> The only place that this actually becomes useful is for partition pruning? 
I still don't understand what this has to do with partition pruning. The 
purpose of this list is correctly described in the original comment. It is to 
ship the descriptors of these tables to the BE. 

If we don't do this, then the BE will not know where to write data to in an 
insert because the base and partition paths are part of the table metadata. The 
purpose really is to ship the table metadata to the BE and I don't see 
relevance to partition pruning.

Imo, this should be like you had it before. It should be called targetTable_ or 
insertTable_ or insertTargetTable_ or something like that.


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 141:* TableId does not exist anymore.
Sorry I got confused, you can remove this sentence. I incorrectly thought the 
@link below referenced the TableId class.


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 64:   private static final Object metastoreAccessLock_ = new Object();
> remove which? ?removed INVALID_TABLE_ID. not sure if this is what you wante
The metastoreAccessLock_ is unused. Please remove.


http://gerrit.cloudera.org:8080/#/c/4349/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 79:   protected final long id_;
As discussed, thinking about this a little more I think the right fix is to get 
rid of this id altogether.

The table id is only used for mapping tuple descriptors to tables in the BE - a 
query local mapping.

So I think the better fix is to remove the table id here and instead assign ids 
on-the-fly (or establish the mapping some other way) in 
DescriptorTbl.toThrift().


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-04 Thread Huaisi Xu (Code Review)
Huaisi Xu has uploaded a new patch set (#5).

Change subject: IMPALA-1702: Enforce table level consistency accross service
..

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this call. Backend could
overwrites each table in an 1:1 table ID to table descriptor
map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
query analyzing. so Analyzer::getTable() could return
different references for the same table when analyzing
the same query.

This commit solves different table having same table ID
issue by changing table ID counter to an ever increasing
64 bits integer. It solves same table with different table
issue ID by keeping a referecedTables_ cache in
Analyzer::globalState_ so that calling Analyzer::getTable()
on the same table returns the same references for the same
query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinTableId.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
26 files changed, 118 insertions(+), 122 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-04 Thread Huaisi Xu (Code Review)
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294: // Catalog cache has the latest tables from the statestore, in 
order to use the
> The Impalad Catalog has ...
Done


Line 295: // same table version in a single query, we cache all referenced 
tables and reuse them
> ... query , we cache all referenced tables here.
Done


Line 2299: if (table == null) {
> remove
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

Line 187:   // SelectStmt (or the BE will be very confused). Assign a 
unique table ID to it.
> The previous longer comment was clearer. How about:
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // List of tables to exclude from partition pruning, and thus all 
partitions are sent to
> Afaik, these tables have nothing to do with partition pruning, so the name 
The only place that this actually becomes useful is for partition pruning? so 
for "INSERT INTO a PARTITION(b='d') select c from a;
if c is in another partition other than partition 'b', then without this 
partition b's metadata wont be sent. it has no other meaningful usage than this 
I think. maybe I just change this to a HashSet?


Line 181:   referencedPartitions = getReferencedPartitions(tbl);
> will remove this
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 141:* Ever increasing counter for table id {@link Table#id_}. New id 
is given when
> TableId doesn't exist anymore.
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 64:   private static final Object metastoreAccessLock_ = new Object();
> Please remove while you are here.
remove which? ?removed INVALID_TABLE_ID. not sure if this is what you wanted.


Line 72:   public static final long LOCAL_TABLE_ID = -4;
> LOCAL_VIEW_ID is more explicit. Sorry to make you change it again.
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/planner/JoinTableId.java
File fe/src/main/java/org/apache/impala/planner/JoinTableId.java:

Line 25:   // Construction only allowed via an IdGenerator.
> newline before comment
Done


http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1126:* Check that we don't have any duplicate table IDs (see 
IMPALA-1702).
> We should declare IMPALA-1702 as fixed and not mention it here. Instead, we
I added something in ImpaladCatalog.java::addTable(). not sure if that is what 
you suggested.k


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-03 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 44:   // Target table in an insert statement.
Is this also the target for UPDATE/DELETE statements?


PS3, Line 45: targetTable_
> The reason for this is that previously I tried to add the cache here and wa
I'm in favor of this change because "referencedTables" was so generic that it 
was misleading. We should prefer to be as specific as possible. If we need more 
abstraction, we can add it later if we need it (but chances are we won't).


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-03 Thread Huaisi Xu (Code Review)
Huaisi Xu has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4349/4/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 181:   referencedPartitions = getReferencedPartitions(tbl);
will remove this


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-10-03 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4349/3//COMMIT_MSG
Commit Message:

PS3, Line 12: If backend sees these,
: it will overwrites each other in an 1:1 table ID to table
: descriptor map.
might need a little rephrase.


PS3, Line 18: activities
operations?


PS3, Line 22:  
..issue.. (same in next statement too.)


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS3, Line 296: HashMap
I think its better to use the parent Map<> class here (that seems to be 
consistent usage elsewhere in the code).


PS3, Line 2282: Returns the Catalog Table object for the given database and 
table name.
  :* Adds the table to this analyzer's "missingTbls_" and 
throws an AnalysisException if
  :* the table has not yet been loaded in the local catalog 
cache.
  :* Throws an AnalysisException if the table or the db does 
not exist in the Catalog.
  :* This function does not register authorization requests and 
does not log access events
Please update this to reflect the new logic.


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

PS3, Line 45: targetTable_
Just curious what the rationale behind this clean up. Is it because we use it 
only for the inserts? I don't know if we intend to use a bunch of 
referencedTables_ in the future for any other usecase in which case this change 
should again be undone. Alex, any idea?


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 34: import java.util.concurrent.atomic.AtomicInteger;
Unused import now that we switched to Long?


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-09-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1702: Enforce table level consistency accross service
..


Patch Set 3:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 294: // Caches all table used in a single query. Use the same 
reference instead of getting
"Caches all tables referenced in a single query"

The comment should explain why we have this local cache. The main motivation is 
to guarantee single table consistency within a query, i.e., that within one 
query the same version of a table is used.

Note that the catalog cache always contains the latest version of a table.


Line 2290: TTableName tblName = new TTableName(dbName, tableName);
I'm not sure if this will deal with case-insensitivity properly. Why not use a 
TableName instead? The equals() and hashCode() definitely do the right thing.


Line 2291: Table table = globalState_.referencedTables_.get(tblName);
How about this:
if (table != null) {
  // Return query-local version of table.
  Preconditions.checkState(table.isLoaded());
  return table;
}


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

Line 141:   protected final AtomicLong nextTableId_ = new AtomicLong(0);
Add comment that explains the use and behavior of this id, i.e. the id is 
ever-increasing and only gets reset when the service is re-started. Mention at 
which points a table gets assigned a new id.


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 65:   public static final long INVALID_TABLE_ID = -1;
Add comment for these subtly different special ids.


Line 67:   public static final long ERROR_METADATA_LOADING_ID = -3;
I think we should add LOCAL_TABLE_ID and TEST_TABLE_ID as well and describe 
their uses.


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/common/Id.java
File fe/src/main/java/org/apache/impala/common/Id.java:

Line 42: return Long.valueOf(id_).hashCode();
revert


Line 47: return Long.toString(id_);
revert


http://gerrit.cloudera.org:8080/#/c/4349/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

Line 1130
I think we should still maintain something like this to check and document our 
new guarantees.

In addition, it would be good to check our invariants in other places if 
possible. For example, when we receive an update for a table from the 
statestore, we should be able to guarantee that the currentTableId <= 
newTableId iff currentCatalogVersion < catalogVersion. Or something along those 
lines, I think you get the idea :)


Line 1134
There is similar code in a few places in the BE. We should remove that, too.


-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1702: Enforce table level consistency accross service

2016-09-30 Thread Huaisi Xu (Code Review)
Huaisi Xu has uploaded a new patch set (#3).

Change subject: IMPALA-1702: Enforce table level consistency accross service
..

IMPALA-1702: Enforce table level consistency accross service

Problems:
1. CatalogServiceCatalog::reset() resets table ID, so same
tables can have different ID and different tables can have
same table ID before/after this. If backend sees these,
it will overwrites each other in an 1:1 table ID to table
descriptor map.

2. 1 can happen because frontend gets table directly
from catalog cache, which is updated concurrently with
other activities, so Analyzer::getTable() can return
different references for the same table when analyzing
the same query.

This commit solves different table having same table ID by
changing table ID counter to an ever increasing 64 bits
integer. It solves same table with different table ID by
keeping a referecedTables_ cache in Analyzer::globalState_
so that calling Analyzer::getTable() on the same table
returns the same references for the same query.

Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
---
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IncompleteTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
D fe/src/main/java/org/apache/impala/catalog/TableId.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
24 files changed, 82 insertions(+), 157 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/4349/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4349
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifad648b72684ae495ec387590ab1bc58ce5b39e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Huaisi Xu 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Tim Armstrong