[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 20:

There is still one thing that is not clear to me. Why is it allowed to do an 
ALTER TABLE with an empty SORT BY clause but not a CREATE TABLE?

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS18, Line 183: # CTAS with partitions and sort by columns
> I'm not sure I'm following here. sort by() cannot have an empty column list
It's a parser error in create but allowed in alter (see ParserTest.java:2271). 
So, there is some inconsistency here.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6495/17//COMMIT_MSG
Commit Message:

PS17, Line 30: SORT BY columns supersede the sortby() hint, which we will 
remove in a
 : subsequent change (IMPALA-5144).
Mention what happens if both are used (e.g. union the columns or something else)


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS17, Line 152: Returns a list of positions of
  :* the sort by columns within the table's list of columns
This function doesn't return anything.


PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether.
  : Preconditions.checkState(!(table instanceof HBaseTable));
This check doesn't belong here. Maybe move it to the analyze() function?


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

PS17, Line 61: HashMap
nit: Map


PS17, Line 87: a
nit: an


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS17, Line 138:  
Mention what happens if both are used (i.e. sort.by.column and sortby() hint).


http://gerrit.cloudera.org:8080/#/c/6495/17/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS17, Line 1: #
Add one or more tests where the sortby hint is used in conjunction with sort by 
clause, if you don't already have one.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

I think the interaction between the hint and clause confused me at some point. 
I'll take another look at the new patch and see if removing the hint will make 
things simpler.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 15:

Quick question. Is there a reason why we don't remove the sortby hint in this 
patch?

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-25 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 7: Code-Review+2

Rebase and minor test fix. Keep Dan's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-25 Thread Dimitris Tsirogiannis (Code Review)
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6628

to look at the new patch set (#7).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
M www/root.tmpl
17 files changed, 432 insertions(+), 307 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5152: Gather all tables with missing metadata in analysis

2017-04-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5152: Gather all tables with missing metadata in analysis
..


Patch Set 1:

As we discussed offline, see if it makes sense to have a quick pre-analysis 
pass that collects all the base tables that need to load metadata for.

-- 
To view, visit http://gerrit.cloudera.org:8080/6702
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id10a303958557982b26a07c97a9bb30932ab44dd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#6).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
M www/root.tmpl
17 files changed, 433 insertions(+), 307 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6628/5/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS5, Line 279:   /// A map from unique backend ID to the corresponding 
TBackendDescriptor of that backend.
 :   /// Used to track membership updates from the statestore so 
queries can be cancelled
 :   /// when a backend is removed. It's not enough to just cancel 
fragments that are running
 :   /// based on the deletions mentioned in the most recent 
statestore heartbeat; sometimes
 :   /// cancellations are skipped and the statestore, at its 
discretion, may send only
 :   /// a delta of the current membership so we need to compute 
any deletions.
 :   /// TODO: Currently there are multiple locations where cluster 
membership is tracked,
 :   /// here and in the scheduler. This should be consolidated so 
there is a single component
 :   /// (the scheduler?) that tracks this information and calls 
other interested components.
> I think this comment is about the known_backends_ field, no?
Oops, yes.


PS5, Line 290: const BackendDescriptorMap& GetKnownBackends();
> why does this need to be exposed? where is it used?
This is used in ImpalaHttpHandler::BackendsHandler().


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#5).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
M www/root.tmpl
17 files changed, 433 insertions(+), 308 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-19 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 202: // If this is not an executor, don't add it to the new backend 
config.
> this comment just repeats what the code is doing, but doesn't explain why. 
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

PS4, Line 269: backend_config_
> as mentioned earlier, this would be easier to follow if this were renamed t
Done


PS4, Line 283:   /// Map from unique backend id to TBackendDescriptor. Used to 
track the known executors
 :   /// from the statestore. It's important to track both the 
backend ID as well as the
 :   /// TBackendDescriptor so we know what is being removed in a 
given update.
> this comment was very unhelpful at understanding the code because it doesn'
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 388: void ImpalaServer::BackendsUrlCallback(
> I think this should go in impala-http-handler.cc
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.h
File be/src/service/impala-server.h:

PS4, Line 284: GetServerRole
> Why deal with the presentation (i.e. string formatting) here, rather than l
Done


http://gerrit.cloudera.org:8080/#/c/6628/4/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS4, Line 41: only_coordinate
> when i read this name I thought that it would start a cluster of only coord
Done


PS4, Line 42: \
> don't think you need these inside ( )
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-18 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:   current_membership_.insert(make_pair(item.key, be_desc));
> What else would we want to rename backend->executor besides backend_config_
There is a number of inner classes in this file (e.g. AddressableAssignment, 
BackendAssignmentInfo, etc) that use the term backend in fields and functions. 
I believe these should be renamed as well. I just want to make sure we agree on 
the scope of this change before I start modifying this file even more.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover 
partition calls.
..


Patch Set 3: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6651/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS3, Line 838: block
file


PS3, Line 1530: block
file


PS3, Line 1529: code
  :* path
function


PS3, Line 1530: , like
  :* adding a new file to an existing partition
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover 
partition calls.
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6651/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 839: already
remove


PS2, Line 842: loadPartitionBlockMdFromScratc
Refreshing an existing partition calls the refreshFileMetadata() while loading 
a partition from scratch calls loadPartitionBlockMdFromScratch(). Both 
functions load file/block metadata, so maybe we should name the latter 
loadFileMetadataFromScratch()?


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#4).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
14 files changed, 190 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:   current_membership_.insert(make_pair(item.key, be_desc));
> I see, so you're saying that the "impala-membership" topic does contain all
Essentially the scheduler only manages/uses backends that are executors. 
However,  the term 'backend' is embedded in so many places that we may not want 
to do the rename it in this patch. I did the rename you suggested and added a 
brief comment in StatestoreService.thrift. Let me know if you'd like me to do 
the backend->executor rename now.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 15:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS15, Line 168: commans
typo: commas


PS15, Line 169: of
remove


PS15, Line 170: this will throw an AnalysisException
nit: this function


PS15, Line 172: List
You need to comment on the return value.


PS15, Line 174: if 
(!tblProperties.containsKey(AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS))
 {
long line


PS15, Line 178: if (table instanceof KuduTable) {
Whenever I see this, I keep wondering about HBase. Can you add a precondition 
check or a similar check? whatever is applicable.


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

PS15, Line 44: public static final String TBL_PROP_SORT_BY_COLUMNS = 
"sort.by.columns";
Maybe it makes more sense to put this in AlterTableSetTblProperiesStmt. Not a 
strong preference though.


PS15, Line 93: if (table instanceof HdfsTable) {
What else could it be? You can just do Preconditions.checkState(table 
instanceof HdfsTable); and then cast and remove the if statement.


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS15, Line 768: if (table_ instanceof HBaseTable) return;
  : if (table_ instanceof KuduTable) return;
Why not if (!(table_ instanceof HdfsTable)) return; instead?


PS15, Line 843: if (!columns.isEmpty()) {
  : // We need to make a copy to make the sortByColumns_ 
list mutable.
  : // TODO: Remove this when removing the sortby() hint 
(IMPALA-5157).
  : sortByColumns_ = Lists.newArrayList(sortByColumns_);
  :   }
  :   for (int i = 0; i < columns.size(); ++i) {
  : if (columns.get(i).getName().equals(columnName)) {
  :   sortByExprs_.add(resultExprs_.get(i));
  :   sortByColumns_.add(i);
  :   foundColumn = true;
  :   break;
  : }
  :   }
Not sure I follow what is happening here. If you already have sortByColumns_ do 
you expand them based on the hint or something else? Do we have tests for that?


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS15, Line 284: Iterable
I like using interfaces but is this ever called with anything other than a List?


PS15, Line 350: if (options_.sortByCols != null) {
if (options_.sortByCols == null) return;


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

PS15, Line 77:  
nit: extra space


PS15, Line 297: if (sortByColsSql != null) {
  :   sb.append(String.format("SORT BY (\n  %s\n)\n",
  :   Joiner.on(", \n  ").join(sortByColsSql)));
  : }
haha, I can't visualize the output here.


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS15, Line 456: he/she adds
nit: added


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS15, Line 1881: msTbl.getParameters().put(sortByKey, 
MetaStoreUtil.intersectCsvListWithColumNames(
   : msTbl.getParameters().get( sortByKey), columns));
I find it hard to parse this. Can you split it into two statements? Also, extra 
space after 'get('.


PS15, Line 1915: msTbl.getParameters().put(sortByKey, 
MetaStoreUtil.replaceValueInCsvList(
   :   msTbl.getParameters().get( sortByKey), colName, 
newCol.getColumnName()));
same here


PS15, Line 2198: msTbl.getParameters().put(sortByKey, 
MetaStoreUtil.removeValueFromCsvList(
   :   msTbl.getParameters().get( sortByKey), colName));
and here


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

PS15, Line 218: HashSet rightColNames = Sets.newHashSet();
  : for (TColumn c : rightCols) 
rightColNames.add(c.getColumnName().toLowerCase());
  : List outputList = 

[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 207:   current_membership_.insert(make_pair(item.key, be_desc));
> They aren't yet doing it, but the proposal is for them to hit the statestor
Thanks for the clarification. This patch modified the /backends webpage to 
include all the subscribed backends and added the executor/coordinator flags. 
So, Hue could get that webpage out of any backend and retrieve the 
executor/coordinator information. I attach a screenshot so that you'll see what 
I mean (https://cloudera.box.com/s/rvy496vfk812esl3d17677i0m4wzgdmp). The 
current_membership_ doesn't affect that at all. Let me know if that is 
acceptable or if you prefer the statestore webpage to expose that information 
as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
13 files changed, 181 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.

2017-04-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4943: Speed up block md loading for add/recover 
partition calls.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6651/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 919: loadPartitionBlockMdFromScratch
I am not sure this is correct. createAndLoadPartition() is called during 
partition refresh. Why do we want to load metadata from scratch in this case? 
Are you sure we don't regress this case?


-- 
To view, visit http://gerrit.cloudera.org:8080/6651
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/network-util.cc
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
13 files changed, 176 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 61: int Cluster::AddHost(bool has_backend, bool has_datanode, bool 
is_executor) {
> Can you DCHECK(!is_executor || has_backend)?
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

Line 56: /// ranges.
> nit: single line
Done


Line 59:   cluster.AddHost(true, true, false);
> This looks like it can get hard to parse rather quickly - I think even havi
Agree, this isn't pretty. Using a builder may not work well with AddHost(). The 
latter is not merely a factory for hosts, it also populates several Cluster 
fields, like list of backends, etc. I'll leave a TODO for now and give it some 
thought later.


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 32: #include "rapidjson/rapidjson.h"
> still needed?
Done


Line 44: using namespace rapidjson;
> still needed?
Done


Line 204: // If this is not an executor don't add it to the new backend 
config.
> nit: punctuation
You want me to remove the punctuation?


Line 205: if (be_desc.is_executor) {
> Do you need to check __is_set?
Made the fields required, so I guess it's not needed now.


Line 207:   current_membership_.insert(make_pair(item.key, be_desc));
> i thought the plan was to continue to have all nodes of the cluster still b
I am a little confused with the last part of your comment: "That way, e.g. 
clients can still use the membership topic to discover coordinators." I 
couldn't find how that works exactly, can you plz elaborate or give some 
pointers?


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 2000:   vector modes;
> Building this with a stringstream may be easier and would also not need ano
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 272:   rapidjson::Document* document);
> you could forward declare the class and move the include to the .cc file.
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/util/webserver.cc
File be/src/util/webserver.cc:

Line 199:   if (env->impala_server()->IsCoordinator()) 
modes.push_back("Coordinator");
> Same here, maybe use stringstream?
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS1, Line 216:  
> missing word?
Done


Line 254: else:
> elif?
Done


http://gerrit.cloudera.org:8080/#/c/6628/1/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

Line 65:   // True if this is a coordinator node
> What does it mean if these are unset?
I added those as optional because they aren't always set (e.g. during scheduler 
tests). I think it may be better to make them required. What do you think?


http://gerrit.cloudera.org:8080/#/c/6628/1/tests/custom_cluster/test_coordinators.py
File tests/custom_cluster/test_coordinators.py:

Line 94:   operators are executed on 'expected_num_of_executors' 
nodes."""
> nit: indendation
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3040: Fix test caching ddl test

2017-04-13 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3040: Fix test_caching_ddl test
..


Patch Set 2:

(5 comments)

Some comments/questions. No new patch until we answer these first.

http://gerrit.cloudera.org:8080/#/c/6603/2/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

Line 303:   way cache requests are added/dropped/reported (see 
IMPALA-3040), it tries to return
> weird indentation
Done


Line 304:   a stable result by making several attempts to stabilize that 
within a reasonable
> that -> it
Done


Line 311:   max_num_stabilization_attempts = 3
> This is a little strange to me. If the num_requests keep changing, then int
So, you want to exit when no change is detected? I think you'll exit 
prematurely with this code.


Line 311:   max_num_stabilization_attempts = 3
> Actually having a 10s timeout still seems like a good idea
What are the semantics of the 10s timeout? Essentially, do you want an upper 
bound on the number of tries (10/2 = 5 max tries) or something else?


Line 324: num_stabilization_attempts = num_stabilization_attempts + 1
> We hit the max attempts once we detect a change, but then we exit the loop.
I am not sure. Shouldn't we put an upper bound here?


-- 
To view, visit http://gerrit.cloudera.org:8080/6603
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ec4ba5dfae6e90a2bb76e22c93909b05bd78fa4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution

2017-04-13 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6628

Change subject: IMPALA-5147: Add the ability to exclude hosts from query 
execution
..

IMPALA-5147: Add the ability to exclude hosts from query execution

This commit introduces a new startup option, termed 'is_executor',
that determines whether an impalad process can execute query fragments.
The 'is_executor' option determines if a specific host will be included
in the scheduler's backend configuration and hence included in
scheduling decisions.

Testing:
- Added a customer cluster test.
- Added a new scheduler test.

Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M common/thrift/StatestoreService.thrift
M tests/common/custom_cluster_test_suite.py
M tests/custom_cluster/test_coordinators.py
M www/backends.tmpl
12 files changed, 168 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6628
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-3040: Fix test caching ddl test

2017-04-10 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6603

Change subject: IMPALA-3040: Fix test_caching_ddl test
..

IMPALA-3040: Fix test_caching_ddl test

This commmit adds a 30sec timeout on the validation step of
test_caching_ddl test. This test has been flaky and we suspect a race
between the submission of a cache directive removal and the reported
cached directives from the 'hdfs cacheadmin' utility command.

Change-Id: I3ec4ba5dfae6e90a2bb76e22c93909b05bd78fa4
---
M tests/query_test/test_hdfs_caching.py
1 file changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/6603/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6603
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3ec4ba5dfae6e90a2bb76e22c93909b05bd78fa4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables

2017-04-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
..


Patch Set 2:

(6 comments)

Quick pass with some minor comments. I am still trying to wrap my head around 
the concept of KuduPartitionExpr...

http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

PS1, Line 62: int col, PrimitiveType type
Comment on these params.


http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS1, Line 1845: f (fragment.__isset.output_sink && 
fragment.output_sink.__isset.stream_sink
  :   && fragment.output_sink.type == 
TDataSinkType::DATA_STREAM_SINK
  :   && fragment.output_sink.stream_sink.output_partition.type 
== TPartitionType::KUDU)
Add a brief comment.


http://gerrit.cloudera.org:8080/#/c/6559/1/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

PS1, Line 134: 4
2?


PS1, Line 137: 5
3? :)


http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS1, Line 113: // Set in analyze(). Maps exprs in partitionKeyExprs_ to their 
column's position in the
 :   // table, eg. partitionKeyExprs_[i] = 
table_.columns(partitionKeyIdx_[i])
 :   private List partitionKeyIdxs_ = Lists.newArrayList();
Just for my understanding, can you explain why this is needed for this change?


http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

PS1, Line 48: // For kudu partition: expr that calls into the kudu client to 
get the partition number.
:   private Expr partitionExpr_;
: 
:   // Should be called only by the static factory method for Kudu 
partitioned tables.
:   private DataPartition(TPartitionType type, Expr partitionExpr) {
: Preconditions.checkNotNull(partitionExpr);
: Preconditions.checkState(type == TPartitionType.KUDU);
: type_ = type;
: partitionExpr_ = partitionExpr;
: partitionExprs_ = Lists.newArrayList();
:   }
I am not sure I get this. Why not passing a single element list of the case of 
Kudu partitioned?


-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS9, Line 376: if (!isHBaseTable) 
Instead of checking this here, it's a bit cleaner to alway call 
analyzeSortByColumns() and move the check there.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS9, Line 49: Lists.newArrayList()
The caller is not supposed to modify the return values, correct? You may just 
return Collections.emptyList() which is also immutable.


PS9, Line 77: Lists.newArrayList();
Same comment as above.


PS9, Line 89: Lists.newArrayList();
You can still construct and return an ImmutableList. See Guava's 
ImmutableList.builder()


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS9, Line 2374: ParserError("CREATE TABLE Foo LIKE Baz STORED AS TEXTFILE 
LOCATION '/a/b' " +
  : "SORT BY(bar)");
You may want to comment why this results in a parsing error.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 7:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS7, Line 984: Set
There is no KW_SET, right? Update the name?


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS7, Line 73: getTargetTable()
t?


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2484:* Add a warning that will be displayed to the user. Ignores null 
messages.
Comment that no warning can be issued if warnings have been retrieved?


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS7, Line 119: sortByColumns_.
I think this can throw a NPE. Looking at the parser I see that 
CreateTableListStmt can pass null for sortByColumns_ and couldn't find where 
this changes.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS7, Line 296: propertyString
inline


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS7, Line 41: /**
:* Analyzes the 'sort.by.columns' property of 'table'.
:*/
:   public static List analyzeSortByColumns(Table table) 
throws AnalysisException {
: return TablePropertyAnalyzer.analyzeSortByColumns(
: table.getMetaStoreTable().getParameters(), table);
:   }
: 
:   /**
:* Analyzes the 'sort.by.columns' property in 'tblProperties' 
against the columns of
:* 'table'.
:*/
:   public static List analyzeSortByColumns(Map tblProperties,
:   Table table) throws AnalysisException {
: return TablePropertyAnalyzer.analyzeSortByColumns(
: getSortByColumnsFromProperties(tblProperties), table);
:   }
These are used in only one place, right? Maybe inline there?


Line 76: }
Precondition check for HBase?


PS7, Line 127: private static List getSortByColumnsFromProperties(
 :   Map tblProperties) {
You may want to consider if it's better to have a getSortByColumn(Table table) 
function instead of this one. Just a though...


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS7, Line 539: // If the table has a 'sort.by.columns' property and the query 
has a 'noclustered'
 : // hint, we issue a warning during analysis and ignore the 
'noclustered' hint.
That comment doesn't seem relevant here. Remove?


PS7, Line 541: !insertStmt.hasNoClusteredHint()
If insertStmt.hasClusteredHint() is true doesn't that imply that 
hasNoClusteredHint() is false? I thought they are mutually exclusive.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS7, Line 1793: params.sort_by_columns.size() > 0
use isEmpty()


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 526: 
Test for HBase and Kudu?


Line 1054:   }
Same here (HBase + Kudu). You need to put all Kudu tests in a function and 
check for supported.


PS7, Line 1904: most
"must" typo


PS7, Line 1920: // Kudu table tests are in TestCreateManagedKuduTable().
No need for that comment :)


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS7, Line 2347: // Sort by clause
How about some tests with additional table options? Try changing the order in 
which the table options are specified, e.g. location bla sort by ()


http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS7, Line 207: table
select query


http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 5:

(5 comments)

Responses to some comments.

http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns 
he/she adds to the
 :* table.
> We could throw one if the number is wrong, but eventually the caller will h
No that's fine. You already mention that this is used only for tests.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
> Insert hints are not allowed for HBase tables and the analysis will fail. F
If we enforce it at the analysis phase, then adding a Preconditions check here 
is not a bad idea.


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
> No, we cannot change it. I thought about this, but couldn't see why we'd wa
I was thinking more of changing the ASC into DESC but that's fine if this is an 
acceptable limitation.


PS5, Line 39:  DISTRIBUTEDPLAN
: WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, 
PARTITION-KEYS=(year,month)]
: |  partitions=24
: |
: 01:SORT
: |  order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col 
ASC NULLS LAST, bool_col ASC NULLS LAST
: |
: 00:SCAN HDFS [functional.alltypes]
:partitions=24/24 files=24 size=478.45KB
> They make sure that the shuffle/noshuffle hint has been observed, indicated
Good point. I think we can leave them.


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS5, Line 1066:  
> They are part of the output of "describe formatted". Without them the compa
It's probably because of the HMS schema used (maybe they use CHAR(N)). Anyways, 
no need to do something. I just wanted to make sure we don't do anything funky.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 5:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table.
Otherwise? Will the destination table inherit the sort by columns of the source 
table, if any?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS5, Line 64: // Analyze 'sort.by.columns' property.
Comment not applicable. Remove?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS5, Line 127: hasNoClusteredHint_
Similar to the comment on hasShuffleHint, explain the allowed values for 
hasClusteredHint and hasNotClusteredHint.


PS5, Line 763:  
"if any"


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS5, Line 77: return ImmutableList.of();
This is an immutable list whereas the next function returns a mutable one. Make 
it consistent?


PS5, Line 83: If there are any errors, 'error' contains an
:* error string. Otherwise, 'error' is left untouched.
remove


PS5, Line 91: colName
nit: rename to sortByColName? since you also have tableCols it may be good to 
make it explicit which ones you refer to in the for loop.


Line 92: 
nit: remove empty line


PS5, Line 102: and store
 :   // the corresponding result expr in sortByExprs_.
Not sure I follow this part based on the code below.


Line 120: 
nit: remove empty line


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns 
he/she adds to the
 :* table.
Can't we throw an exception instead?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
I assume this works for HBase tables...


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

PS5, Line 521: 'sort.by.columns'='id
How about 'ID' (check for case)?


PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like 
functional.alltypes");
Also, add the negative case where sort by cols don't exist in the source table.


PS5, Line 1913: // Kudu table
  : AnalyzesOk("create table tab (i int, x int primary key) 
partition by hash(x) " +
  : "partitions 8 sort by(i) stored as kudu");
  : // Column in sortby hint must not be a Kudu primary key 
column.
  : AnalysisError("create table tab (i int, x int primary key) 
partition by hash(x) " +
  : "partitions 8 sort by(x) stored as kudu", "SORT BY 
column list must not " +
  : "contain primary key column: 'x'");
  : AnalysisError("create table tab (i int, x int, y int, 
primary key (x)) partition " +
  : "by hash(x) partitions 8 sort by(x) stored as kudu", 
"SORT BY column list must " +
  : "not contain primary key column: 'x'");
  : AnalysisError("create table tab (i int, x int, y int, 
primary key(x, y)) partition " +
  : "by hash(y) partitions 8 sort by (i,x) stored as kudu", 
"SORT BY column list " +
  : "must not contain primary key column: 'x'");
  :   }
For Kudu specific tests they need to be in a function that first calls 
TestUtils.assumeKuduIsSupported() (see L1930). Do you have tests for HBase?


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
Add one with a partitioned table? Also, a CTAS with a more complex select 
(joins, maybe aggs)


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
Out of curiosity? Can we change the default behavior? If no, are we ok with 
this restriction?


PS5, Line 39:  DISTRIBUTEDPLAN
: WRITE TO HDFS 

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
> in general, the commit message shouldn't act as documentation. for things l
That's fine by me. Feel free to simplify the description here and move it to 
the JIRA. I just wanted to understand the implemented semantics instead of 
trying to infer them from the code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

Lars, a few high level questions before I start the next review iteration. 
Thanks

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
Can you expand the comment a little bit on the semantics of this? A few high 
level questions are:
1. What is the connection between this and the sortby() hints? Can I use both? 
What if there is a conflict? Same for clustered hint.
2. Why does sort by enables clustering as well? 
3. Can I specify any table columns (even the partitioning columns) in the SORT 
BY clause or are there some restrictions? 
4. I suppose this works for all table types (including Kudu)?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 2:

(20 comments)

Initial pass, haven't looked at the tests yet.

http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table 
property.
 : HashMap properties = new HashMap();
 : String columnString = Joiner.on(",").join(col_names);
 : 
properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString);
 : RESULT = new AlterTableSetTblProperties(table, null, 
TTablePropertyType.TBL_PROPERTY,
 : properties);
 :   :}
I don't see why you need to treat it as another SetTblProperties statement. I 
understand that eventually the sort by columns will be stored as table 
properties but this adds unnecessary complexity to the parser. I would simply 
create a new AlterTableSetSortByColumns statement.


PS2, Line 1143: opt_sort_by_cols:sort_by_cols
  :   KW_LIKE table_name:other_table
  :   opt_comment_val:comment
  :   file_format_create_table_val:file_format location_val:location
nit: indentation


PS2, Line 1182:   opt_sort_by_cols:sort_by_cols opt_comment_val:comment 
row_format_val:row_format serde_properties:serde_props
nit: long line


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS2, Line 162: /**
 :* Analyzes the 'sort.by.columns' property to make sure the 
columns inside the property
 :* occur in the target table and are neither partitioning nor 
primary key columns.
 :*/
 :   public void analyzeSortByColumns() throws AnalysisException {
 : StringBuilder error = new StringBuilder();
 : TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, 
getTargetTable(), error);
 : if (error.length() > 0) throw new 
AnalysisException(error.toString());
 :   }
Along the lines of my previous comment: a) I would move this completely out of 
this class, b) create a new AlterTableSetSortByColumns and move the analysis 
logic there.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS2, Line 275: Boolean
boolean? why class?


PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved);
This is kind of harsh :). Do we guarantee somewhere that no one will call 
addWarning after warnings have been retrieved?


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS2, Line 119: sortByColumns_.size() > 0
!sortByColumns_.isEmpty()


PS2, Line 179: StringBuilder error = new StringBuilder();
 :   TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, 
srcTable, error);
 :   if (error.length() > 0) throw new 
AnalysisException(error.toString());
Why don't you just throw the AnalysisException from the analyzeSortByColumns()?


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS2, Line 382:  sortByExprs_.size() > 0
!sortByExprs_.isEmpty()


PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx));
Does this work ok if you have column permutation?


PS2, Line 768: StringBuilder error = new StringBuilder();
 : sortByColumns_ = 
TablePropertyAnalyzer.analyzeSortByColumns(table_, error);
 : if (error.length() > 0) throw new 
AnalysisException(error.toString());
Same comment as before. Move the throw clause inside the analyzeSortByColumns() 
function.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS2, Line 33: /**
:  * This class collects various methods to analyze table 
properties.
:  */
: public class TablePropertyAnalyzer {
You're only analyzing the sort by columns, so why this generic class? Can't we 
move these methods to the TableDef class?


PS2, Line 41: Analyzes the 'sort.by.columns' property of 'table'.
What is the return value?


PS2, Line 86: List partitionCols, List primaryKeyCols
You can simply pass one list which includes the list of column names to exclude 
from the sort columns. If you want a specific error message you can 

[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 6: Code-Review+2

Keep Henry's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-28 Thread Dimitris Tsirogiannis (Code Review)
Hello Marcel Kornacker, Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6344

to look at the new patch set (#6).

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..

IMPALA-4041: Limit catalog and admission control updates to coordinators

With this commit we add the ability to limit catalog updates to a
limited set of coordinator nodes. A new startup option, termed
'is_coordinator' is added to indicate if a node is a coordinator.
Coordinators accept connections through HS2 and Beeswax interfaces
and can also participate in query execution. Non-coordinator nodes
do not receive catalog updates from the statestore, do not initialize
a query scheduler and cannot accept Beeswax and HS2 client connections.

Testing:
- Added a custom cluster test that launches a cluster in which the
number of coordinators is less than the cluster size and runs a number
of smoke queries.
- Successfully run exhaustive tests.

Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/service/query-exec-state.cc
M be/src/testutil/in-process-servers.h
M be/src/util/webserver.cc
M bin/start-impala-cluster.py
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_service.py
A tests/custom_cluster/test_coordinators.py
M www/root.tmpl
19 files changed, 435 insertions(+), 235 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 5:

(6 comments)

Thanks Henry!

http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS5, Line 180: node can coordinate queries.
> maybe tighten this up:
Done


http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 949: 
> comment?
Done


http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

PS5, Line 461: if (!FLAGS_disable_admission_control) {
> again, it seems better to check to see if admission_controller() != nullptr
Done


http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/util/webserver.cc
File be/src/util/webserver.cc:

PS5, Line 192: FLAGS_is_coordinator
> it would be neater to do something like:
Done


PS5, Line 192: Worker
> nit: dunno how much we want to bikeshed this, but I prefer "Executor" to "W
Renamed "worker" to "executor". Also, "coordinator" now is "coordinator + 
executor".


http://gerrit.cloudera.org:8080/#/c/6344/5/www/root.tmpl
File www/root.tmpl:

Line 21:   Impala Server Mode
> This template gets used for catalog and statestored as well. You need to ch
Good point, forgot about it. Fixed it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 648 insertions(+), 304 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS2, Line 64: message(${JAVA_FE_ARGS})
> Looks like debug printing. Remove? (below too)
Why?


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

PS2, Line 67: storageIdtoInt
> (Unrelated to your change). Looks obsolete.
Done


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 119:   FlatBufferBuilder fbb = new FlatBufferBuilder(1);
> Preconditions.checkState(isBlockMdSupported(fs))?
Done


PS2, Line 124: } else {
 : locations = fileSystem.getFileBlockLocations(fileStatus, 
0, fileStatus.getLen());
 :   }
> Do we really need this? Per my understanding, we only fetch LocatedFileStat
If fileStatus is not an instance of LocatedFileStatus we don't have the block 
locations. E.g. during refresh.


Line 141: ListMap hostIndex) {
> Preconditions.checkState(!isBlockMdSupported(fs))
We don't have the filesystem here.


PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx);
> - replicaHostIdxs is [ushort] as per fbs definition in which case idx shoul
In Java there is no unsigned short. Hence, ushort is stored as in int in 
flatbuffer java-generated code. I like the two masks as it makes the intention 
more explicit.


Line 463:   public static class FileDescStats {
> For supportability purposes, I'm wondering if we should track the remote_ne
That's a good point. I plan on augmenting this in a separate patch along with 
other useful stats.


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 344: =
> nit: spacing
Done


Line 365:   private static String findCommonPrefix(String a, String b) {
> Use Strings.commonPrefix() from guava?
Good point. Didn't know this existed.


Line 1933:   Collections.sort(orderedFds, new Comparator() {
> Can we just make FileDescriptor implement Comparator and ad
How are we going to call decompressFileName that references a structure (file 
name prefixes) in HdfsTable if we do that?


http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 93:   protected boolean storedInImpaladCatalogCache_ = false;
> What is the motivation to track this? A quick look at the places it is used
To avoid initializing data structures that aren't used in the catalog. Tests 
also need to access this because of the different way we have for loading 
metadata.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-24 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 4:

(10 comments)

I am not so sure about removing the catalog and in-flight queries handles. They 
provide a way of visually validating that worker nodes do not receive catalog 
updates and that they don't schedule queries.

http://gerrit.cloudera.org:8080/#/c/6344/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

Line 259
> This line is effectively removed, right? It was added in: https://gerrit.cl
After inspecting the code and also talking to Lars, I believe this removal is 
safe. This change has the following effect. The scheduler has to wait for the 
statestore update that includes the local backend in order to add it to the 
backend config. Before that, the backend_config could be empty and scheduling 
should fail unless exec_at_coord is true. We have tests for these cases.


http://gerrit.cloudera.org:8080/#/c/6344/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1588: LOG(WARNING) << "Failed to convert hostname " << hostname << " 
to IP address";
> log status?
Done


PS4, Line 1601: address
> nit: descriptor
Done


PS4, Line 1604: }
  :   if (status.ok()) {
> else { 
Done


PS4, Line 1909: NULL
> nullptr
Done


PS4, Line 1931: NULL
> nullptr
Done


http://gerrit.cloudera.org:8080/#/c/6344/4/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 436:   // it to the list of known backends and tell the statestore by 
adding it to
> this function isn't telling the statestore anything, it's only adding to to
Done


PS4, Line 438: In
> nit: should be To, not In
Done


http://gerrit.cloudera.org:8080/#/c/6344/4/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

PS4, Line 40: 3
> is there any way to make this default == the value of '-s'? If I run -s1 ri
Hm, not sure how to do that. I removed the check.


http://gerrit.cloudera.org:8080/#/c/6344/4/tests/custom_cluster/test_coordinators.py
File tests/custom_cluster/test_coordinators.py:

Line 83:   client2.close()
> can you add a check that the worker has not received the catalog metadata? 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#4).

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..

IMPALA-4041: Limit catalog and admission control updates to coordinators

With this commit we add the ability to limit catalog updates to a
limited set of coordinator nodes. A new startup option, termed
'is_coordinator' is added to indicate if a node is a coordinator.
Coordinators accept connections through HS2 and Beeswax interfaces
and can also participate in query execution. Non-coordinator nodes
do not receive catalog updates from the statestore, do not initialize
a query scheduler and cannot accept Beeswax and HS2 client connections.

Testing:
- Added a custom cluster test that launches a cluster in which the
number of coordinators is less than the cluster size and runs a number
of smoke queries.
- Successfully run exhaustive tests.

Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/service/query-exec-state.cc
M be/src/testutil/in-process-servers.h
M bin/start-impala-cluster.py
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_service.py
A tests/custom_cluster/test_coordinators.py
16 files changed, 365 insertions(+), 198 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-22 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

PS3, Line 180: if (FLAGS_disable_admission_control) LOG(INFO) << "Admission 
control is disabled.";
 : if (!FLAGS_disable_admission_control) {
> if (!FLAGS_disable_admission_control) {
Done


http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

Line 200:   /// Registers the request queue topic.
> ... with the statestore.
Done


http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

Line 457:   return 
scheduler_->ComputeScanRangeAssignment(*scheduler_->GetBackendConfig(), 0, 
nullptr,
> long line
Done


http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 336:  std::vector* topic_updates) {
> remove std::
Done


Line 343:   std::vector* topic_updates) {
> remove std::
Done


PS3, Line 1489: Otherwise, this operation
  : // is performed by the Scheduler.
> Can you just ensure that this is always done by this callback, and remove t
Done


Line 1579:   const std::string local_backend_id = exec_env_->subscriber()->id();
> remove std::
Done


Line 1589: LOG(WARNING) << "Failed to convert hostname " << hostname << " 
to IP address";
> but keep going?
Done


Line 1592:   subscriber_topic_updates->push_back(TTopicDelta());
> emplace_back()
Done


Line 1595:   update.topic_entries.push_back(TTopicItem());
> emplace_back()
Done


http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 436:   // backends. If not, add it and tell the statestore. Even though 
the same
> 'and tell the statestore'? doesn't something else do that part?
Done


http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/testutil/in-process-servers.h
File be/src/testutil/in-process-servers.h:

PS3, Line 95: Not owned by this class;
:   /// instead it's owned via shared_ptrs
> Ownership is shared now.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..

IMPALA-4041: Limit catalog and admission control updates to coordinators

With this commit we add the ability to limit catalog updates to a
limited set of coordinator nodes. A new startup option, termed
'is_coordinator' is added to indicate if a node is a coordinator.
Coordinators accept connections through HS2 and Beeswax interfaces
and can also participate in query execution. Non-coordinator nodes
do not receive catalog updates from the statestore, do not initialize
a query scheduler and cannot accept Beeswax and HS2 client connections.

Testing:
- Added a custom cluster test that launches a cluster in which the
number of coordinators is less than the cluster size and runs a number
of smoke queries.
- Successfully run exhaustive tests.

Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/service/query-exec-state.cc
M be/src/testutil/in-process-servers.h
M bin/start-impala-cluster.py
M tests/common/custom_cluster_test_suite.py
M tests/common/impala_service.py
A tests/custom_cluster/test_coordinators.py
16 files changed, 363 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-21 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 2:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 183:   request_pool_service_.get(), metrics_.get(), 
backend_address_));
> no need to pass in backend_address_, you can get that with ExecEnv::GetInst
Hm, I am not sure this is possible. This is the ExecEnv constructor, is it safe 
to call ExecEnv::GetInstance() when AdmissionController and Scheduler are 
initialized here? Maybe I am missing something.


Line 228:   if (FLAGS_use_statestore && statestore_port > 0) {
> let's please not file a jira for something this small, the time involved in
There is already a TODO. Done


Line 349:   if (admission_controller_ != NULL) 
RETURN_IF_ERROR(admission_controller_->Init());
> replace all NULL w/ nullptr in this file (and any other file you touch wher
Done


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

Line 38: DECLARE_bool(is_coordinator);
> unused
Done


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

Line 210:   /// Subscription manager
> owned?
Done


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 341:   bind(mem_fn(::CatalogUpdateCallback), 
this, _1, _2);
> use lambda, unless impractical. you can also inline that in the call.
Done


Line 1484: // Check if the local backend descriptor is in the list of known
> this function is already too long. break this up.
Previously, it was the scheduler's responsibility to update the list of 
backends in the statestore. Given that the scheduler may not start in all the 
nodes, we need this update to happen elsewhere. 

I've extracted this into a separate function which is invoked only for 
non-coordinator nodes.


Line 1903:   // Only for coordinators, initialize the HS2 and Beeswax services.
> rephrase to fix grammar
Done


http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 955: /// Impala server is a coordinator (dictated from the is_coordinator 
flag).
> "indicated by the .. flag"
Done


http://gerrit.cloudera.org:8080/#/c/6344/2/tests/custom_cluster/test_coordinators.py
File tests/custom_cluster/test_coordinators.py:

Line 42:   client = worker.service.create_beeswax_client()
> same for hs2?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Benchmark:
Memory requirement for storing an HDFS table with 250K files is reduced
by 2.5X.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 654 insertions(+), 301 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-20 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..


Patch Set 1:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG
Commit Message:

Line 11: serialization library.
> Any benchmark numbers? Just curious. Also I think it is good to add them he
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt
File common/fbs/CMakeLists.txt:

PS1, Line 24:  
> Trailing white spaces at multiple places in this file.
Done


Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b)
> Can this be moved outside the loop like JAVA_FE_ARGS
Done


http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs
File common/fbs/CatalogObjects.fbs:

PS1, Line 20:  
> extraneous white spaces.
Done


PS1, Line 54: file_name_prefix_idx
> Didn't understand what this prefix means till I read HdfsTable.fileNamePref
Done


Line 70: root_type FbFileDesc;
> What is the significance of this? I'm reading about FlatBuffers for the fir
Yeah, it is mostly used if you read from JSON. Removed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

Line 83: storageIdGenerator.put(host, new Short(diskId));
> Drive-by comment: using 'new' here is pessimising memory consumption. If yo
Good point thanks. Done


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

PS1, Line 318: REPLICA_HOST_IDX_MASK
> Looks like it used for UNMASKing, rename may be?
It's a bit mask, so I'd rather keep the name as such :)


Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF;
> Document these?
Done


PS1, Line 357: getHostIdx
> how about just making hostIdx_ a short rather than typecasting everytime?
Done


PS1, Line 360: REPLICA_HOST_CACHE_MASK
> Shouldn't this be short rather than int? May be it works fine this way, jus
For convenience. Java short is signed.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 315: Preconditions.checkNotNull(fd);
> I don't think this FileDescriptor.create() can ever return null? If yes, wo
Done


Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' 
and returns a
> I like the idea but I'm wondering if this kind of optimization is a little 
That's not entirely true. Multiple filenames from a single write do have a 
common prefix.


Line 420:   prefixCompressFileName(currentFileName, prevFileName);
> Also I think it may not be optimal to construct the prefixes this way by co
In principal you're right, this is not the best way to compress strings. 
However, due to the way these file names are generated (random) this is not 
meant to be a generic file name compressor. Here we take advantage of the 
pattern that file names of a single write exhibit to avoid storing the common 
prefix multiple times.


http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 96:   protected static EnumSet SUPPORTED_TABLE_TYPES = 
EnumSet.of(
> May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by
Not sure how this ended up here. Removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6408/2/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS2, Line 188: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + 
FILE_PATH,
 :INSERTED_VALUE)
You can import:
from tests.util.filesystem_utils import IS_S3 
and then do here if IS_S3: 


-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..

IMPALA-4041: Limit catalog and admission control updates to coordinators

With this commit we add the ability to limit catalog and admission
control updates to a limited set of coordinator nodes. A new startup
option, termed 'is_coordinator' is added to indicate if a node is a
coordinator. Coordinators accept connections through HS2 and Beeswax
interfaces and can also participate in query execution. Non-coordinator
nodes do not receive catalog and admission control updates from the
statestore and cannot access client connections.

Testing:
- Added a custom cluster test that launches a cluster in which the
number of coordinators is less than the cluster size and runs a number
of smoke queries.
- Successfully run exhaustive tests.

Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/service/query-exec-state.cc
M be/src/testutil/in-process-servers.h
M bin/start-impala-cluster.py
M tests/common/custom_cluster_test_suite.py
A tests/custom_cluster/test_coordinators.py
14 files changed, 287 insertions(+), 129 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6344/1/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

Line 229: Status AdmissionController::Init(StatestoreSubscriber* subscriber) {
> Yeah, we need to at least execute the 'AddPoolUpdates' part of the callback
Done. Extracted the AdmissionController out of the Scheduler.


Line 230:   if (!FLAGS_is_coordinator) return Status::OK();
> This shouldn't even get called if is_coordinator is false. I think you shou
Done


http://gerrit.cloudera.org:8080/#/c/6344/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1888:   if (be_port != 0 && be_server != NULL) {
> move this above line 1844, then return if this isn't a coordinator.
Done


http://gerrit.cloudera.org:8080/#/c/6344/1/be/src/service/impala-server.h
File be/src/service/impala-server.h:

Line 948: /// If beeswax_port != 0 (and fe_server != NULL), creates a 
ThriftServer exporting
> update comments with new semantics.
Done


http://gerrit.cloudera.org:8080/#/c/6344/1/bin/start-impala-cluster.py
File bin/start-impala-cluster.py:

Line 246:   args = "-is_coordinator=true %s" % (args)
> leave this out. it's the default, and we should make sure it actually works
Done


http://gerrit.cloudera.org:8080/#/c/6344/1/tests/custom_cluster/test_coordinators.py
File tests/custom_cluster/test_coordinators.py:

Line 76:   self.execute_query_expect_success(client1, "drop database %s 
cascade" % db_name)
> why don't you add running a query against impalads[2] (and failing), and th
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3

2017-03-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5072: Fix test_recover_partitions on S3
..


Patch Set 1: Code-Review+2

In terms of testing, you may run a private jenkins jobs to test your 
uncommitted changes against s3 using 
impala-auxiliary-tests/jenkins/run_impala_tests.py.

-- 
To view, visit http://gerrit.cloudera.org:8080/6408
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata

2017-03-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6406

Change subject: IMPALA-4029: Reduce memory requirements for storing file 
metadata
..

IMPALA-4029: Reduce memory requirements for storing file metadata

This commit improves the memory requirements for storing file and block
metadata in the catalog and the impalad nodes by using the FlatBuffers
serialization library.

Testing:
Passed an exhaustive tests run.

Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
---
M CMakeLists.txt
A common/fbs/CMakeLists.txt
A common/fbs/CatalogObjects.fbs
M common/thrift/CatalogObjects.thrift
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
14 files changed, 653 insertions(+), 298 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6406
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] Fix typo in Flatbuffers cmake module

2017-03-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: Fix typo in Flatbuffers cmake module
..


Patch Set 1: Code-Review+2

Thanks Henry

-- 
To view, visit http://gerrit.cloudera.org:8080/6398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0786344b5485a92c02a246b543b6acda279e199c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4482, IMPALA-4838: RECOVER PARTITIONS with tpcds.store sales

2017-03-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4482, IMPALA-4838: RECOVER PARTITIONS with 
tpcds.store_sales
..


Patch Set 8: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Harrison Sheinblatt 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Michael Brown 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators

2017-03-10 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6344

Change subject: IMPALA-4041: Limit catalog and admission control updates to 
coordinators
..

IMPALA-4041: Limit catalog and admission control updates to coordinators

With this commit we add the ability to limit catalog and admission
control updates to a limited set of coordinator nodes. A new startup
option, termed 'is_coordinator' is added to indicate if a node is a
coordinator. Coordinators accept connections through HS2 and Beeswax
interfaces and can also participate in query execution. Non-coordinator
nodes do not receive catalog and admission control updates from the
statestore and cannot access client connections.

Testing:
- Added a custom cluster test that launches a cluster in which the
number of coordinators is less than the cluster size and runs a number
of smoke queries.
- Successfully run exhaustive tests.

Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
---
M be/src/scheduling/admission-controller.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/impalad-main.cc
M be/src/testutil/in-process-servers.h
M bin/start-impala-cluster.py
M tests/common/custom_cluster_test_suite.py
A tests/custom_cluster/test_coordinators.py
8 files changed, 144 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6344
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-5028: Lock table in /catalog objects endpoint.

2017-03-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5028: Lock table in /catalog_objects endpoint.
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ad4ce286b8cc169f29b8ddfa215f8949b1c11ff
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 4: Code-Review+2

Rebase and keep Henry's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Hello Henry Robinson,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6013

to look at the new patch set (#4).

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..

IMPALA-4885: Expose Jvm thread info in web UI

This commit exposes information about JVM threads to the impalad and
catalogd web UIs. This information includes statistics about the number
of threads running in the JVM as well as per-thread stacktraces, monitors and
synchronizers. Total CPU, user CPU and blocked time is also reported per
thread.

Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
---
M be/src/catalog/catalogd-main.cc
M be/src/service/impalad-main.cc
M be/src/statestore/statestored-main.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/thread.cc
M be/src/util/thread.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/JniUtil.java
A www/jvm-threadz.tmpl
M www/threadz.tmpl
A www/threadz_tabs.tmpl
12 files changed, 402 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6013/3/be/src/util/thread.cc
File be/src/util/thread.cc:

PS3, Line 79: args
> long line
Done


PS3, Line 340: void InitThreading() {
 :   DCHECK(thread_manager.get() == nullptr);
 :   thread_manager.reset(new ThreadMgr());
 : }
 : 
 : Status StartThreadInstrumentation(MetricGroup* metrics, 
Webserver* webserver,
 : bool include_jvm_threads) {
 :   DCHECK(metrics != nullptr);
 :   DCHECK(webserver != nullptr);
 :   RETURN_IF_ERROR(thread_manager->StartInstrumentation(metrics));
 :   RegisterUrlCallbacks(include_jvm_threads, webserver);
 :   return Status::OK();
 : }
> If you move these to the end of the file, does that avoid any of the need t
Yes, we don't need to forward declare RegisterUrlCallbacks. Done


PS3, Line 379: args
> long line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5028: Lock table in /catalog objects endpoint.

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-5028: Lock table in /catalog_objects endpoint.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6296/1/tests/webserver/test_web_pages.py
File tests/webserver/test_web_pages.py:

PS1, Line 141: # IMPALA-5028: Test toThrift() of a partitioned table via the 
WebUI code path.
 : self.get_and_check_status(self.CATALOG_OBJECT_URL +
 :   "?object_type=TABLE_name=functional.alltypes", 
"alltypes",
 :   without_ss=True)
The error was thrown when the table was loaded. So I am wondering if we're 
getting the expected coverage with this tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/6296
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3ad4ce286b8cc169f29b8ddfa215f8949b1c11ff
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS2, Line 2661: updateLastDdlTime(msTbl, msClient);
> If we get some kind of exception during a later round of processing, it is 
Unfortunately, the whole operation is far from being atomic but I see your 
point :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 3:

Any more comments on this? Henry, lmk if you're swamped and I will ask someone 
else to take a look.

-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS2, Line 2661: updateLastDdlTime(msTbl, msClient);
Do we need to call this for every batch? I believe it is sufficient to call it 
once at the end.


http://gerrit.cloudera.org:8080/#/c/6275/2/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS2, Line 190: for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : assert not self.has_value(PART_DIR, result.data)
This check may be an overkill since no partitions were added to this table. Why 
don't you simply verify that show partitions doesn't return any rows?


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Zach Amsden 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates

2017-03-06 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS1, Line 2641: // ifNotExists and needResults are true.
nit: move comment above L2644


PS1, Line 2646: } catch (AlreadyExistsException e) {
  : throw new InternalException(
  : "AlreadyExistsException thrown although 
ifNotExists given", e);
  : }
I don't think we need to handle this case separately. Let the outer catch 
handle it. Besides, we're not doing anything special here.


http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

PS1, Line 171: test_recover_many_partitions
This test may be too expensive. How much time does it take to run? See my 
comments below for ideas on how to trim it a bit.


PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i)
 : for i in xrange(1, 700, 10):
 : self.execute_query_expect_success(self.client,
 : "INSERT INTO TABLE %s PARTITION(s='part%d') 
VALUES(%d)" % (FQ_TBL_NAME, i, i))
I think we can live without this part.


PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + 
FILE_PATH,
 :INSERTED_VALUE)
It should be sufficient to just create the directory, no?


PS1, Line 196: result = self.execute_query_expect_success(self.client,
 : "SHOW PARTITIONS %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : PART_DIR = "part%d\t" % i
 : if (i % 10 == 1):
 : assert self.has_value(PART_DIR, result.data)
 : else:
 : assert not self.has_value(PART_DIR, result.data)
 : 
 : self.execute_query_expect_success(self.client,
 : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME)
 : result = self.execute_query_expect_success(self.client,
 : "select c from %s" % FQ_TBL_NAME)
 : for i in xrange(1, 700):
 : if (i % 10 == 1):
 : assert self.has_value(str(i), result.data)
 : else:
 : assert self.has_value(str(-i), result.data)
I would just do: a) show partitions and verify 0, b) recover partitions, c) 
show partitions and verify 700.


-- 
To view, visit http://gerrit.cloudera.org:8080/6275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4966: Add flatbuffers to build

2017-03-01 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4966: Add flatbuffers to build
..


Patch Set 2: Code-Review+2

Rebase, keep Alex's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4998: Fix missing table lock acquisition.

2017-03-01 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4998: Fix missing table lock acquisition.
..


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6177/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

PS5, Line 934: tbl instanceof IncompleteTable
Not related to this change but I believe we load the metadata twice if we call 
refresh on an incomplete table, once during the getOrLoadTable() call and once 
here. If you agree, let's file a jira about it. Let me know..


-- 
To view, visit http://gerrit.cloudera.org:8080/6177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0e0270daf59fce95f1a1520fc5aaf91d3a7b99fe
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#3).

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..

IMPALA-4885: Expose Jvm thread info in web UI

This commit exposes information about JVM threads to the impalad and
catalogd web UIs. This information includes statistics about the number
of threads running in the JVM as well as per-thread stacktraces, monitors and
synchronizers. Total CPU, user CPU and blocked time is also reported per
thread.

Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
---
M be/src/catalog/catalogd-main.cc
M be/src/service/impalad-main.cc
M be/src/statestore/statestored-main.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/thread.cc
M be/src/util/thread.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/JniUtil.java
A www/jvm-threadz.tmpl
M www/threadz.tmpl
A www/threadz_tabs.tmpl
12 files changed, 404 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6013/2/be/src/util/thread.cc
File be/src/util/thread.cc:

PS2, Line 75: void JvmThreadsUrlCallback(const Webserver::ArgumentMap& args, 
Document* doc);
: 
: void ThreadOverviewUrlCallback(bool track_jvm_threads, const 
Webserver::ArgumentMap& args,
: Document* document);
: 
: void RegisterUrlCallbacks(bool track_jvm_threads, Webserver* 
webserver);
: 
: void GetJvmThreadOverview(Document* document);
> put all these in an anonymous namespace to avoid polluting the global symbo
Done


PS2, Line 216: NULL
> nullptr, here and everywhere
Done


PS2, Line 379: DCHECK(document != NULL);
> why would document be null?
Good point. Removed from here and elsewhere.


PS2, Line 384: GetJvmThreadOverview
> move this into ThreadOverviewUrlCallback
Done


PS2, Line 392: LOG(WARNING) << "Couldn't retrieve information about JVM 
threads: "
 :   << status.GetDetail();
> But - put this in the "error" member of document, and it should show on the
Done


PS2, Line 393: << status.GetDetail();
> formatting
Done


http://gerrit.cloudera.org:8080/#/c/6013/2/be/src/util/thread.h
File be/src/util/thread.h:

PS2, Line 194: track_jvm_threads
> include_jvm_threads ?
Done


http://gerrit.cloudera.org:8080/#/c/6013/2/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

Line 758:   // Information about JVM threads
> comment when this might not be set?
Done


http://gerrit.cloudera.org:8080/#/c/6013/2/fe/src/main/java/org/apache/impala/common/JniUtil.java
File fe/src/main/java/org/apache/impala/common/JniUtil.java:

PS2, Line 206: threadCount
> just write response.setTotal_thread_count(threadBean.getThreadCount()) ?
Done


http://gerrit.cloudera.org:8080/#/c/6013/2/www/threadz.tmpl
File www/threadz.tmpl:

PS2, Line 37: #
> nit: is there only one jvm-threads entry? If so, this should be the ? opera
Hm. Doesn't work. For some reason it doesn't populate it when I replace it with 
the ? operator.


http://gerrit.cloudera.org:8080/#/c/6013/2/www/threadz_tabs.tmpl
File www/threadz_tabs.tmpl:

Line 27:  
> nit: trailing whitespace
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4966: Add flatbuffers to build

2017-02-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6180

Change subject: IMPALA-4966: Add flatbuffers to build
..

IMPALA-4966: Add flatbuffers to build

FlatBuffers version 1.6.0 is already included in the toolchain. This
commit adds it to the build system.

Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
A cmake_modules/FindFlatBuffers.cmake
4 files changed, 57 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/6180/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4902: Copy parameters map in HdfsPartition.toThrift().

2017-02-23 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4902: Copy parameters map in HdfsPartition.toThrift().
..


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic11277ad5512d2431cd3cc791715917c95395ddf
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4902: Copy parameters map in HdfsPartition.toThrift().

2017-02-23 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4902: Copy parameters map in HdfsPartition.toThrift().
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6127/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

PS2, Line 765: One thread
 : // may try to serialize the returned THdfsPartition after 
releasing the table's lock,
 : // and another thread doing DDL may modify the map.
Are we sure this is needed? When do we call HdfsPartition.toThrift() outside of 
HdfsTable.toThrift() which is now protected?


http://gerrit.cloudera.org:8080/#/c/6127/2/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS2, Line 26: import java.util.concurrent.locks.ReentrantReadWriteLock;
I don't think you need that.


-- 
To view, visit http://gerrit.cloudera.org:8080/6127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic11277ad5512d2431cd3cc791715917c95395ddf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 1:

(10 comments)

Thanks for the suggestions Henry. New screenshots are here:
https://drive.google.com/file/d/0B7VW0hRyzVlKTzZPR3VEVlVEYWc/view?usp=sharing
https://drive.google.com/file/d/0B7VW0hRyzVlKOXowTUM2QWVySm8/view?usp=sharing

http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.cc
File be/src/util/thread.cc:

Line 178:   void ThreadOverviewUrlCallback(bool track_jvm_threads,
> Shouldn't this go above ThreadGroupUrlCallback() to match with the sample j
No, this is the output of the overview callback.


PS1, Line 194: [this, track_jvm_threads] (const Webserver::ArgumentMap& args, 
Document* doc) {
 : this->ThreadOverviewUrlCallback(track_jvm_threads, args, doc)
> Make this a method? MakeUrlCallBack or something, used multiple times.
This has changed.


PS1, Line 209: RegisterUrlCallback
> Rather than adding another top-level link, consider making two tabs on the 
Done


PS1, Line 234: bool track_jvm_threads,
> how about just making this a member of ThreadMgr? Actually - I don't think 
Done


Line 260: << status.GetDetail();
> return, then remove else { }
Done


PS1, Line 262: jvmThreadsVal
> C++ naming styles
power of habit.. :) Done


PS1, Line 327: INFO
> warn/error? How about bubbling this error up to the web endpoint by setting
Changed it to WARN. What is the "error" member?


http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.h
File be/src/util/thread.h:

Line 191: /// the "thread-manager."
> Describe track_jvm_threads in the method comment?
Done


http://gerrit.cloudera.org:8080/#/c/6013/1/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

Line 720:   // Summary of a JVM thread. Includes stacktraces, locked monitors
> Do you think its good to include the locked synchronizers or the lock it is
It already does that.


http://gerrit.cloudera.org:8080/#/c/6013/1/www/jvm-threadz.tmpl
File www/jvm-threadz.tmpl:

PS1, Line 35: Is n
> Native
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..

IMPALA-4885: Expose Jvm thread info in web UI

This commit exposes information about JVM threads to the impalad and
catalogd web UIs. This information includes statistics about the number
of threads running in the JVM as well as per-thread stacktraces, monitors and
synchronizers. Total CPU, user CPU and blocked time is also reported per
thread.

Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
---
M be/src/catalog/catalogd-main.cc
M be/src/service/impalad-main.cc
M be/src/statestore/statestored-main.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/thread.cc
M be/src/util/thread.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/JniUtil.java
A www/jvm-threadz.tmpl
M www/threadz.tmpl
A www/threadz_tabs.tmpl
12 files changed, 401 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 


[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.

2017-02-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4840: Fix REFRESH performance regression.
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 1:

A publicly available screenshot: 
https://drive.google.com/file/d/0B7VW0hRyzVlKUGdWakNIdlhtbTQ/view?usp=sharing

-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Henry Robinson 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

2017-02-15 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
..


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5980/2/fe/src/main/java/org/apache/impala/util/DisjointSet.java
File fe/src/main/java/org/apache/impala/util/DisjointSet.java:

PS2, Line 49:  private final IdentityHashMap uniqueSets_ =
:   new IdentityHashMap();
nice idea :)


PS2, Line 111: checkState(removedValue == DUMMY_VALUE);
I think checking for not NULL is also ok.


-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..


Patch Set 1:

A screenshot of the new web page with the JVM thread info is here:
https://cloudera.box.com/s/agu113d1rxzfainy860f6oqlizp4v92v

-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI

2017-02-14 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6013

Change subject: IMPALA-4885: Expose Jvm thread info in web UI
..

IMPALA-4885: Expose Jvm thread info in web UI

This commit exposes information about JVM threads to the impalad and
catalogd web UIs. This information includes statistics about the number
of threads running in the JVM as well as per-thread stacktraces, monitors and
synchronizers. Total CPU, user CPU and blocked time is also reported per
thread.

Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
---
M be/src/catalog/catalogd-main.cc
M be/src/service/impalad-main.cc
M be/src/statestore/statestored-main.cc
M be/src/util/jni-util.cc
M be/src/util/jni-util.h
M be/src/util/thread.cc
M be/src/util/thread.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/common/JniUtil.java
A www/jvm-threadz.tmpl
M www/threadz.tmpl
11 files changed, 312 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4897: AnalysisException: specified cache pool does not exist

2017-02-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4897: AnalysisException: specified cache pool does not 
exist
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5972/1/testdata/workloads/functional-query/queries/QueryTest/alter-table-hdfs-caching.test
File 
testdata/workloads/functional-query/queries/QueryTest/alter-table-hdfs-caching.test:

PS1, Line 1: 
Can you add a comment to describe at a high level what this test file is all 
about?


-- 
To view, visit http://gerrit.cloudera.org:8080/5972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefe61556bc28ae320f3f41fdc930d37b258d970a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4897: AnalysisException: specified cache pool does not exist

2017-02-10 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4897: AnalysisException: specified cache pool does not 
exist
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5972/1/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

PS1, Line 258: if IS_HDFS:
 :   self.run_test_case('QueryTest/alter-table-hdfs-caching', 
vector,
 :   use_db=unique_database, 
multiple_impalad=self._use_multiple_impalad(vector))
I think it may be best to put it in test_hdfs_caching.py? What do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/5972
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iefe61556bc28ae320f3f41fdc930d37b258d970a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Fix merge conflict

2017-02-10 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: Fix merge conflict
..


Patch Set 1: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5967
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia5444d6b44b9aeea18f7861849513a2bde5c881f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

2017-02-09 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
..


Patch Set 2: Code-Review+2

Here is a screenshot from my dev box for the catalog/memz 
https://cloudera.box.com/s/uug5z4zpkfqhhkgxrrejskjvc9l2hh2u 

It's the same for the impalad/memz as well.

-- 
To view, visit http://gerrit.cloudera.org:8080/5909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

2017-02-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
..


Patch Set 2: Code-Review+1

Keep Alex's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/5909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

2017-02-08 Thread Dimitris Tsirogiannis (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5909

to look at the new patch set (#2).

Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
..

IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

This commit adds heap and non-heap memory usage of the embedded
JVM in the memory metrics and exposes these metrics in /memz web page of
the impalad and catalog web UI.

Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
---
M be/src/util/default-path-handlers.cc
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M www/memz.tmpl
3 files changed, 110 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/5909/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

2017-02-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5909/1/be/src/util/default-path-handlers.cc
File be/src/util/default-path-handlers.cc:

Line 163:   document->AddMember("jvm_pool", total, 
document->GetAllocator());
> jvm_total
Done


http://gerrit.cloudera.org:8080/#/c/5909/1/www/memz.tmpl
File www/memz.tmpl:

Line 92: JVM total pool memory usage
> It somewhat confused me, so might be better to omit.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

2017-02-07 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5909/1/www/memz.tmpl
File www/memz.tmpl:

Line 92: JVM total pool memory usage
> What's the significance of the word "pool" here?
Merely to indicate that we aggregate across all memory pools. Let me know if 
you prefer to remove it.


-- 
To view, visit http://gerrit.cloudera.org:8080/5909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

2017-02-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5909

Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
..

IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI

This commit adds heap and non-heap memory usage of the embedded
JVM in the memory metrics and exposes these metrics in /memz web page of
the impalad and catalog web UI.

Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
---
M be/src/util/default-path-handlers.cc
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M www/memz.tmpl
3 files changed, 110 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/5909/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5909
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

2017-02-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
..


Patch Set 24: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 24
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

2017-02-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
..


Patch Set 23: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION

2017-02-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER 
TABLE ADD PARTITION
..


Patch Set 23:

Yes, looking at it now.

-- 
To view, visit http://gerrit.cloudera.org:8080/4144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables

2017-02-02 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2518: DROP DATABASE CASCADE removes cache directives of 
tables
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables

2017-02-02 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2518: DROP DATABASE CASCADE removes cache directives of 
tables
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1279: uncacheTable(removedDb.getTable(tableName));
> Agreed. Might be worth adding a brief comment in one of the existing JIRAs 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] CDH-48291: Fix flaky test TestRequestPoolService

2017-02-02 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has abandoned this change.

Change subject: CDH-48291: Fix flaky test TestRequestPoolService
..


Abandoned

MJ posted a fix for this.

-- 
To view, visit http://gerrit.cloudera.org:8080/5507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I71a036d5b1227508e2175808d7e92837ae247d99
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables

2017-02-02 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has uploaded a new patch set (#2).

Change subject: IMPALA-2518: DROP DATABASE CASCADE removes cache directives of 
tables
..

IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables

This commit fixes an issue where the DROP DATABASE CASCADE statement
will not remove the cache directives of the underlying tables and their
partitions.

Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java
M tests/query_test/test_hdfs_caching.py
3 files changed, 66 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5815/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of tables

2017-02-02 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache 
directives of tables
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5815/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of
> describe the fix not the bug:
Done


http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

Line 1279: uncacheTable(removedDb.getTable(tableName));
> It's a little tricky to figure out, but I think we may have to lock these t
This is quite tricky. Keep in mind that we perform these operations on the 
deleted table objects. So the question is whether we correctly handle dropping 
a table and at the same time altering it. The answer I believe is no. The 
former operation uses the metastoreDdlLock_ while the latter is protected by 
the table lock. In theory, and assuming we had proper hierarchical locking, no 
concurrent access should be allowed on these tables as it would require, in the 
least, a read lock on the db to be dropped. We could acquire the lock on the 
dropped tables but, I'd rather not do it in this patch.


http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java
File fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java:

Line 163:   org.apache.hadoop.hive.metastore.api.Partition part) throws 
ImpalaException {
> tab
Done


http://gerrit.cloudera.org:8080/#/c/5815/1/tests/query_test/test_hdfs_caching.py
File tests/query_test/test_hdfs_caching.py:

Line 212: """The DROP DATABASE CASCADE should properly drop all impacted 
cache directives
> Remove "The"
Done


Line 213:IMPALA-2518"""
> use IMPALA-2518 as a prefix to comment (like we typically do)
Done


Line 215: # Creates `cachedb` database with some cached tables and 
partitions
> Populates the 'cachedb' database...
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-HasComments: Yes


[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

2017-01-31 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
..


Patch Set 13:

John, when do you plan to post a new patch that addresses the last comments? I 
think after that, we should be able to +2 it.

-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page

2017-01-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: [DOCS] Major update to Impala + Kudu page
..


Patch Set 13: Code-Review+1

(5 comments)

I'll let MJ and/or Todd make a final pass.

http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml
File docs/topics/impala_create_table.xml:

PS13, Line 85: [PRIMARY KEY (col_name[, ...]]
This doesn't belong here.


PS13, Line 244: CREATE TABLE [IF NOT EXISTS] 
db_name.]table_name
  :   [COMMENT 'table_comment']
  :   STORED AS KUDU
  :   [TBLPROPERTIES 
('key1'='value1', 
'key2'='value2', ...)]
  : AS
  :   select_statement
This is not correct. You're missing the PRIMARY KEY and PARTITION BY clauses.


PS13, Line 411: impala::username.kudu_t1
nit: I don't think this is a good example. I would be really surprised if 
someone actually created a Kudu table outside of Impala named as 
"impala::username". Pls change the table name.


http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml
File docs/topics/impala_kudu.xml:

PS13, Line 93: kudu.master_addresses
That's the name of the table property. Can you verify that name of the impalad 
configuration param?


PS13, Line 148: There is a many-to-many relationship
  :   between the tablets and tablet servers, managed 
automatically by Kudu.
Isn't more clear to say that each tablet server can store multiple tablets and 
that tablets are replicated across different tablet servers?


-- 
To view, visit http://gerrit.cloudera.org:8080/5649
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: John Russell 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4789: Fix slow metadata loading due to inconsistent paths.

2017-01-27 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent 
paths.
..


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5743
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4789: Fix slow metadata loading due to inconsistent paths.

2017-01-27 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent 
paths.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5743/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

PS4, Line 454: private static boolean nullOrEqual(Object a, Object b) {
 : if (a == null && b == null) return true;
 : if (a != null && b != null) return a.equals(b);
 : return false;
 :   }
Use Guava's Objects.equal()? I think it has the same semantics as this one.


-- 
To view, visit http://gerrit.cloudera.org:8080/5743
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


<    1   2   3   4   5   >