[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: There is still one thing that is not clear to me. Why is it allowed to do an ALTER TABLE with an empty SORT BY clause but not a CREATE TABLE? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS18, Line 183: # CTAS with partitions and sort by columns > I'm not sure I'm following here. sort by() cannot have an empty column list It's a parser error in create but allowed in alter (see ParserTest.java:2271). So, there is some inconsistency here. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: (7 comments) http://gerrit.cloudera.org:8080/#/c/6495/17//COMMIT_MSG Commit Message: PS17, Line 30: SORT BY columns supersede the sortby() hint, which we will remove in a : subsequent change (IMPALA-5144). Mention what happens if both are used (e.g. union the columns or something else) http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS17, Line 152: Returns a list of positions of :* the sort by columns within the table's list of columns This function doesn't return anything. PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether. : Preconditions.checkState(!(table instanceof HBaseTable)); This check doesn't belong here. Maybe move it to the analyze() function? http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: PS17, Line 61: HashMap nit: Map PS17, Line 87: a nit: an http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS17, Line 138: Mention what happens if both are used (i.e. sort.by.column and sortby() hint). http://gerrit.cloudera.org:8080/#/c/6495/17/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS17, Line 1: # Add one or more tests where the sortby hint is used in conjunction with sort by clause, if you don't already have one. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: I think the interaction between the hint and clause confused me at some point. I'll take another look at the new patch and see if removing the hint will make things simpler. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 15: Quick question. Is there a reason why we don't remove the sortby hint in this patch? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 7: Code-Review+2 Rebase and minor test fix. Keep Dan's +2 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6628 to look at the new patch set (#7). Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-http-handler.cc M be/src/service/impala-http-handler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/network-util.cc M be/src/util/webserver.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_coordinators.py M www/backends.tmpl M www/root.tmpl 17 files changed, 432 insertions(+), 307 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/7 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5152: Gather all tables with missing metadata in analysis
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5152: Gather all tables with missing metadata in analysis .. Patch Set 1: As we discussed offline, see if it makes sense to have a quick pre-analysis pass that collects all the base tables that need to load metadata for. -- To view, visit http://gerrit.cloudera.org:8080/6702 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id10a303958557982b26a07c97a9bb30932ab44dd Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has uploaded a new patch set (#6). Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-http-handler.cc M be/src/service/impala-http-handler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/network-util.cc M be/src/util/webserver.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_coordinators.py M www/backends.tmpl M www/root.tmpl 17 files changed, 433 insertions(+), 307 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/6 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/6628/5/be/src/service/impala-server.h File be/src/service/impala-server.h: PS5, Line 279: /// A map from unique backend ID to the corresponding TBackendDescriptor of that backend. : /// Used to track membership updates from the statestore so queries can be cancelled : /// when a backend is removed. It's not enough to just cancel fragments that are running : /// based on the deletions mentioned in the most recent statestore heartbeat; sometimes : /// cancellations are skipped and the statestore, at its discretion, may send only : /// a delta of the current membership so we need to compute any deletions. : /// TODO: Currently there are multiple locations where cluster membership is tracked, : /// here and in the scheduler. This should be consolidated so there is a single component : /// (the scheduler?) that tracks this information and calls other interested components. > I think this comment is about the known_backends_ field, no? Oops, yes. PS5, Line 290: const BackendDescriptorMap& GetKnownBackends(); > why does this need to be exposed? where is it used? This is used in ImpalaHttpHandler::BackendsHandler(). -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has uploaded a new patch set (#5). Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-http-handler.cc M be/src/service/impala-http-handler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/network-util.cc M be/src/util/webserver.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_coordinators.py M www/backends.tmpl M www/root.tmpl 17 files changed, 433 insertions(+), 308 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/5 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 202: // If this is not an executor, don't add it to the new backend config. > this comment just repeats what the code is doing, but doesn't explain why. Done http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: PS4, Line 269: backend_config_ > as mentioned earlier, this would be easier to follow if this were renamed t Done PS4, Line 283: /// Map from unique backend id to TBackendDescriptor. Used to track the known executors : /// from the statestore. It's important to track both the backend ID as well as the : /// TBackendDescriptor so we know what is being removed in a given update. > this comment was very unhelpful at understanding the code because it doesn' Done http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 388: void ImpalaServer::BackendsUrlCallback( > I think this should go in impala-http-handler.cc Done http://gerrit.cloudera.org:8080/#/c/6628/4/be/src/service/impala-server.h File be/src/service/impala-server.h: PS4, Line 284: GetServerRole > Why deal with the presentation (i.e. string formatting) here, rather than l Done http://gerrit.cloudera.org:8080/#/c/6628/4/bin/start-impala-cluster.py File bin/start-impala-cluster.py: PS4, Line 41: only_coordinate > when i read this name I thought that it would start a cluster of only coord Done PS4, Line 42: \ > don't think you need these inside ( ) Done -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 207: current_membership_.insert(make_pair(item.key, be_desc)); > What else would we want to rename backend->executor besides backend_config_ There is a number of inner classes in this file (e.g. AddressableAssignment, BackendAssignmentInfo, etc) that use the term backend in fields and functions. I believe these should be renamed as well. I just want to make sure we agree on the scope of this change before I start modifying this file even more. -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls. .. Patch Set 3: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/6651/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS3, Line 838: block file PS3, Line 1530: block file PS3, Line 1529: code :* path function PS3, Line 1530: , like :* adding a new file to an existing partition remove -- To view, visit http://gerrit.cloudera.org:8080/6651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6651/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS2, Line 839: already remove PS2, Line 842: loadPartitionBlockMdFromScratc Refreshing an existing partition calls the refreshFileMetadata() while loading a partition from scratch calls loadPartitionBlockMdFromScratch(). Both functions load file/block metadata, so maybe we should name the latter loadFileMetadataFromScratch()? -- To view, visit http://gerrit.cloudera.org:8080/6651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has uploaded a new patch set (#4). Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/network-util.cc M be/src/util/webserver.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_coordinators.py M www/backends.tmpl 14 files changed, 190 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/4 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 207: current_membership_.insert(make_pair(item.key, be_desc)); > I see, so you're saying that the "impala-membership" topic does contain all Essentially the scheduler only manages/uses backends that are executors. However, the term 'backend' is embedded in so many places that we may not want to do the rename it in this patch. I did the rename you suggested and added a brief comment in StatestoreService.thrift. Let me know if you'd like me to do the backend->executor rename now. -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 15: (19 comments) http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS15, Line 168: commans typo: commas PS15, Line 169: of remove PS15, Line 170: this will throw an AnalysisException nit: this function PS15, Line 172: List You need to comment on the return value. PS15, Line 174: if (!tblProperties.containsKey(AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS)) { long line PS15, Line 178: if (table instanceof KuduTable) { Whenever I see this, I keep wondering about HBase. Can you add a precondition check or a similar check? whatever is applicable. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: PS15, Line 44: public static final String TBL_PROP_SORT_BY_COLUMNS = "sort.by.columns"; Maybe it makes more sense to put this in AlterTableSetTblProperiesStmt. Not a strong preference though. PS15, Line 93: if (table instanceof HdfsTable) { What else could it be? You can just do Preconditions.checkState(table instanceof HdfsTable); and then cast and remove the if statement. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS15, Line 768: if (table_ instanceof HBaseTable) return; : if (table_ instanceof KuduTable) return; Why not if (!(table_ instanceof HdfsTable)) return; instead? PS15, Line 843: if (!columns.isEmpty()) { : // We need to make a copy to make the sortByColumns_ list mutable. : // TODO: Remove this when removing the sortby() hint (IMPALA-5157). : sortByColumns_ = Lists.newArrayList(sortByColumns_); : } : for (int i = 0; i < columns.size(); ++i) { : if (columns.get(i).getName().equals(columnName)) { : sortByExprs_.add(resultExprs_.get(i)); : sortByColumns_.add(i); : foundColumn = true; : break; : } : } Not sure I follow what is happening here. If you already have sortByColumns_ do you expand them based on the hint or something else? Do we have tests for that? http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS15, Line 284: Iterable I like using interfaces but is this ever called with anything other than a List? PS15, Line 350: if (options_.sortByCols != null) { if (options_.sortByCols == null) return; http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: PS15, Line 77: nit: extra space PS15, Line 297: if (sortByColsSql != null) { : sb.append(String.format("SORT BY (\n %s\n)\n", : Joiner.on(", \n ").join(sortByColsSql))); : } haha, I can't visualize the output here. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS15, Line 456: he/she adds nit: added http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS15, Line 1881: msTbl.getParameters().put(sortByKey, MetaStoreUtil.intersectCsvListWithColumNames( : msTbl.getParameters().get( sortByKey), columns)); I find it hard to parse this. Can you split it into two statements? Also, extra space after 'get('. PS15, Line 1915: msTbl.getParameters().put(sortByKey, MetaStoreUtil.replaceValueInCsvList( : msTbl.getParameters().get( sortByKey), colName, newCol.getColumnName())); same here PS15, Line 2198: msTbl.getParameters().put(sortByKey, MetaStoreUtil.removeValueFromCsvList( : msTbl.getParameters().get( sortByKey), colName)); and here http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: PS15, Line 218: HashSet rightColNames = Sets.newHashSet(); : for (TColumn c : rightCols) rightColNames.add(c.getColumnName().toLowerCase()); : List outputList =
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 207: current_membership_.insert(make_pair(item.key, be_desc)); > They aren't yet doing it, but the proposal is for them to hit the statestor Thanks for the clarification. This patch modified the /backends webpage to include all the subscribed backends and added the executor/coordinator flags. So, Hue could get that webpage out of any backend and retrieve the executor/coordinator information. I attach a screenshot so that you'll see what I mean (https://cloudera.box.com/s/rvy496vfk812esl3d17677i0m4wzgdmp). The current_membership_ doesn't affect that at all. Let me know if that is acceptable or if you prefer the statestore webpage to expose that information as well. -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/network-util.cc M be/src/util/webserver.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_coordinators.py M www/backends.tmpl 13 files changed, 181 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/3 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4943: Speed up block md loading for add/recover partition calls.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4943: Speed up block md loading for add/recover partition calls. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6651/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS1, Line 919: loadPartitionBlockMdFromScratch I am not sure this is correct. createAndLoadPartition() is called during partition refresh. Why do we want to load metadata from scratch in this case? Are you sure we don't regress this case? -- To view, visit http://gerrit.cloudera.org:8080/6651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I331f1f090518f317bcd7df069e480edbd8f039f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/network-util.cc M be/src/util/webserver.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_coordinators.py M www/backends.tmpl 13 files changed, 176 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/2 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test-util.cc File be/src/scheduling/scheduler-test-util.cc: Line 61: int Cluster::AddHost(bool has_backend, bool has_datanode, bool is_executor) { > Can you DCHECK(!is_executor || has_backend)? Done http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler-test.cc File be/src/scheduling/scheduler-test.cc: Line 56: /// ranges. > nit: single line Done Line 59: cluster.AddHost(true, true, false); > This looks like it can get hard to parse rather quickly - I think even havi Agree, this isn't pretty. Using a builder may not work well with AddHost(). The latter is not merely a factory for hosts, it also populates several Cluster fields, like list of backends, etc. I'll leave a TODO for now and give it some thought later. http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 32: #include "rapidjson/rapidjson.h" > still needed? Done Line 44: using namespace rapidjson; > still needed? Done Line 204: // If this is not an executor don't add it to the new backend config. > nit: punctuation You want me to remove the punctuation? Line 205: if (be_desc.is_executor) { > Do you need to check __is_set? Made the fields required, so I guess it's not needed now. Line 207: current_membership_.insert(make_pair(item.key, be_desc)); > i thought the plan was to continue to have all nodes of the cluster still b I am a little confused with the last part of your comment: "That way, e.g. clients can still use the membership topic to discover coordinators." I couldn't find how that works exactly, can you plz elaborate or give some pointers? http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 2000: vector modes; > Building this with a stringstream may be easier and would also not need ano Done http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 272: rapidjson::Document* document); > you could forward declare the class and move the include to the .cc file. Done http://gerrit.cloudera.org:8080/#/c/6628/1/be/src/util/webserver.cc File be/src/util/webserver.cc: Line 199: if (env->impala_server()->IsCoordinator()) modes.push_back("Coordinator"); > Same here, maybe use stringstream? Done http://gerrit.cloudera.org:8080/#/c/6628/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: PS1, Line 216: > missing word? Done Line 254: else: > elif? Done http://gerrit.cloudera.org:8080/#/c/6628/1/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 65: // True if this is a coordinator node > What does it mean if these are unset? I added those as optional because they aren't always set (e.g. during scheduler tests). I think it may be better to make them required. What do you think? http://gerrit.cloudera.org:8080/#/c/6628/1/tests/custom_cluster/test_coordinators.py File tests/custom_cluster/test_coordinators.py: Line 94: operators are executed on 'expected_num_of_executors' nodes.""" > nit: indendation Done -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3040: Fix test caching ddl test
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3040: Fix test_caching_ddl test .. Patch Set 2: (5 comments) Some comments/questions. No new patch until we answer these first. http://gerrit.cloudera.org:8080/#/c/6603/2/tests/query_test/test_hdfs_caching.py File tests/query_test/test_hdfs_caching.py: Line 303: way cache requests are added/dropped/reported (see IMPALA-3040), it tries to return > weird indentation Done Line 304: a stable result by making several attempts to stabilize that within a reasonable > that -> it Done Line 311: max_num_stabilization_attempts = 3 > This is a little strange to me. If the num_requests keep changing, then int So, you want to exit when no change is detected? I think you'll exit prematurely with this code. Line 311: max_num_stabilization_attempts = 3 > Actually having a 10s timeout still seems like a good idea What are the semantics of the 10s timeout? Essentially, do you want an upper bound on the number of tries (10/2 = 5 max tries) or something else? Line 324: num_stabilization_attempts = num_stabilization_attempts + 1 > We hit the max attempts once we detect a change, but then we exit the loop. I am not sure. Shouldn't we put an upper bound here? -- To view, visit http://gerrit.cloudera.org:8080/6603 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ec4ba5dfae6e90a2bb76e22c93909b05bd78fa4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5147: Add the ability to exclude hosts from query execution
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6628 Change subject: IMPALA-5147: Add the ability to exclude hosts from query execution .. IMPALA-5147: Add the ability to exclude hosts from query execution This commit introduces a new startup option, termed 'is_executor', that determines whether an impalad process can execute query fragments. The 'is_executor' option determines if a specific host will be included in the scheduler's backend configuration and hence included in scheduling decisions. Testing: - Added a customer cluster test. - Added a new scheduler test. Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/util/webserver.cc M bin/start-impala-cluster.py M common/thrift/StatestoreService.thrift M tests/common/custom_cluster_test_suite.py M tests/custom_cluster/test_coordinators.py M www/backends.tmpl 12 files changed, 168 insertions(+), 50 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/6628/1 -- To view, visit http://gerrit.cloudera.org:8080/6628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5d2ff7f341c9d2b0649e4d14561077e166ad7c4d Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-3040: Fix test caching ddl test
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6603 Change subject: IMPALA-3040: Fix test_caching_ddl test .. IMPALA-3040: Fix test_caching_ddl test This commmit adds a 30sec timeout on the validation step of test_caching_ddl test. This test has been flaky and we suspect a race between the submission of a cache directive removal and the reported cached directives from the 'hdfs cacheadmin' utility command. Change-Id: I3ec4ba5dfae6e90a2bb76e22c93909b05bd78fa4 --- M tests/query_test/test_hdfs_caching.py 1 file changed, 7 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/6603/1 -- To view, visit http://gerrit.cloudera.org:8080/6603 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3ec4ba5dfae6e90a2bb76e22c93909b05bd78fa4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (6 comments) Quick pass with some minor comments. I am still trying to wrap my head around the concept of KuduPartitionExpr... http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: PS1, Line 62: int col, PrimitiveType type Comment on these params. http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 1845: f (fragment.__isset.output_sink && fragment.output_sink.__isset.stream_sink : && fragment.output_sink.type == TDataSinkType::DATA_STREAM_SINK : && fragment.output_sink.stream_sink.output_partition.type == TPartitionType::KUDU) Add a brief comment. http://gerrit.cloudera.org:8080/#/c/6559/1/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: PS1, Line 134: 4 2? PS1, Line 137: 5 3? :) http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS1, Line 113: // Set in analyze(). Maps exprs in partitionKeyExprs_ to their column's position in the : // table, eg. partitionKeyExprs_[i] = table_.columns(partitionKeyIdx_[i]) : private List partitionKeyIdxs_ = Lists.newArrayList(); Just for my understanding, can you explain why this is needed for this change? http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: PS1, Line 48: // For kudu partition: expr that calls into the kudu client to get the partition number. : private Expr partitionExpr_; : : // Should be called only by the static factory method for Kudu partitioned tables. : private DataPartition(TPartitionType type, Expr partitionExpr) { : Preconditions.checkNotNull(partitionExpr); : Preconditions.checkState(type == TPartitionType.KUDU); : type_ = type; : partitionExpr_ = partitionExpr; : partitionExprs_ = Lists.newArrayList(); : } I am not sure I get this. Why not passing a single element list of the case of Kudu partitioned? -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: Code-Review+1 (5 comments) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS9, Line 376: if (!isHBaseTable) Instead of checking this here, it's a bit cleaner to alway call analyzeSortByColumns() and move the check there. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS9, Line 49: Lists.newArrayList() The caller is not supposed to modify the return values, correct? You may just return Collections.emptyList() which is also immutable. PS9, Line 77: Lists.newArrayList(); Same comment as above. PS9, Line 89: Lists.newArrayList(); You can still construct and return an ImmutableList. See Guava's ImmutableList.builder() http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS9, Line 2374: ParserError("CREATE TABLE Foo LIKE Baz STORED AS TEXTFILE LOCATION '/a/b' " + : "SORT BY(bar)"); You may want to comment why this results in a parsing error. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 7: (18 comments) http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS7, Line 984: Set There is no KW_SET, right? Update the name? http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS7, Line 73: getTargetTable() t? http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 2484:* Add a warning that will be displayed to the user. Ignores null messages. Comment that no warning can be issued if warnings have been retrieved? http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS7, Line 119: sortByColumns_. I think this can throw a NPE. Looking at the parser I see that CreateTableListStmt can pass null for sortByColumns_ and couldn't find where this changes. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS7, Line 296: propertyString inline http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS7, Line 41: /** :* Analyzes the 'sort.by.columns' property of 'table'. :*/ : public static List analyzeSortByColumns(Table table) throws AnalysisException { : return TablePropertyAnalyzer.analyzeSortByColumns( : table.getMetaStoreTable().getParameters(), table); : } : : /** :* Analyzes the 'sort.by.columns' property in 'tblProperties' against the columns of :* 'table'. :*/ : public static List analyzeSortByColumns(MaptblProperties, : Table table) throws AnalysisException { : return TablePropertyAnalyzer.analyzeSortByColumns( : getSortByColumnsFromProperties(tblProperties), table); : } These are used in only one place, right? Maybe inline there? Line 76: } Precondition check for HBase? PS7, Line 127: private static List getSortByColumnsFromProperties( : Map tblProperties) { You may want to consider if it's better to have a getSortByColumn(Table table) function instead of this one. Just a though... http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS7, Line 539: // If the table has a 'sort.by.columns' property and the query has a 'noclustered' : // hint, we issue a warning during analysis and ignore the 'noclustered' hint. That comment doesn't seem relevant here. Remove? PS7, Line 541: !insertStmt.hasNoClusteredHint() If insertStmt.hasClusteredHint() is true doesn't that imply that hasNoClusteredHint() is false? I thought they are mutually exclusive. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS7, Line 1793: params.sort_by_columns.size() > 0 use isEmpty() http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 526: Test for HBase and Kudu? Line 1054: } Same here (HBase + Kudu). You need to put all Kudu tests in a function and check for supported. PS7, Line 1904: most "must" typo PS7, Line 1920: // Kudu table tests are in TestCreateManagedKuduTable(). No need for that comment :) http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS7, Line 2347: // Sort by clause How about some tests with additional table options? Try changing the order in which the table options are specified, e.g. location bla sort by () http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS7, Line 207: table select query http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 5: (5 comments) Responses to some comments. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to the :* table. > We could throw one if the number is wrong, but eventually the caller will h No that's fine. You already mention that this is used only for tests. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs() > Insert hints are not allowed for HBase tables and the analysis will fail. F If we enforce it at the analysis phase, then adding a Preconditions check here is not a bad idea. http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS5, Line 10: ASC NULLS LAST > No, we cannot change it. I thought about this, but couldn't see why we'd wa I was thinking more of changing the ASC into DESC but that's fine if this is an acceptable limitation. PS5, Line 39: DISTRIBUTEDPLAN : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, PARTITION-KEYS=(year,month)] : | partitions=24 : | : 01:SORT : | order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col ASC NULLS LAST, bool_col ASC NULLS LAST : | : 00:SCAN HDFS [functional.alltypes] :partitions=24/24 files=24 size=478.45KB > They make sure that the shuffle/noshuffle hint has been observed, indicated Good point. I think we can leave them. http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: PS5, Line 1066: > They are part of the output of "describe formatted". Without them the compa It's probably because of the HMS schema used (maybe they use CHAR(N)). Anyways, no need to do something. I just wanted to make sure we don't do anything funky. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 5: (20 comments) http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 383: // any such columns of the source table. Otherwise? Will the destination table inherit the sort by columns of the source table, if any? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS5, Line 64: // Analyze 'sort.by.columns' property. Comment not applicable. Remove? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS5, Line 127: hasNoClusteredHint_ Similar to the comment on hasShuffleHint, explain the allowed values for hasClusteredHint and hasNotClusteredHint. PS5, Line 763: "if any" http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS5, Line 77: return ImmutableList.of(); This is an immutable list whereas the next function returns a mutable one. Make it consistent? PS5, Line 83: If there are any errors, 'error' contains an :* error string. Otherwise, 'error' is left untouched. remove PS5, Line 91: colName nit: rename to sortByColName? since you also have tableCols it may be good to make it explicit which ones you refer to in the for loop. Line 92: nit: remove empty line PS5, Line 102: and store : // the corresponding result expr in sortByExprs_. Not sure I follow this part based on the code below. Line 120: nit: remove empty line http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to the :* table. Can't we throw an exception instead? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs() I assume this works for HBase tables... http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: PS5, Line 521: 'sort.by.columns'='id How about 'ID' (check for case)? PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like functional.alltypes"); Also, add the negative case where sort by cols don't exist in the source table. PS5, Line 1913: // Kudu table : AnalyzesOk("create table tab (i int, x int primary key) partition by hash(x) " + : "partitions 8 sort by(i) stored as kudu"); : // Column in sortby hint must not be a Kudu primary key column. : AnalysisError("create table tab (i int, x int primary key) partition by hash(x) " + : "partitions 8 sort by(x) stored as kudu", "SORT BY column list must not " + : "contain primary key column: 'x'"); : AnalysisError("create table tab (i int, x int, y int, primary key (x)) partition " + : "by hash(x) partitions 8 sort by(x) stored as kudu", "SORT BY column list must " + : "not contain primary key column: 'x'"); : AnalysisError("create table tab (i int, x int, y int, primary key(x, y)) partition " + : "by hash(y) partitions 8 sort by (i,x) stored as kudu", "SORT BY column list " + : "must not contain primary key column: 'x'"); : } For Kudu specific tests they need to be in a function that first calls TestUtils.assumeKuduIsSupported() (see L1930). Do you have tests for HBase? http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS5, Line 162: create table t sort by Add one with a partitioned table? Also, a CTAS with a more complex select (joins, maybe aggs) http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS5, Line 10: ASC NULLS LAST Out of curiosity? Can we change the default behavior? If no, are we ok with this restriction? PS5, Line 39: DISTRIBUTEDPLAN : WRITE TO HDFS
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. > in general, the commit message shouldn't act as documentation. for things l That's fine by me. Feel free to simplify the description here and move it to the JIRA. I just wanted to understand the implemented semantics instead of trying to infer them from the code. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) Lars, a few high level questions before I start the next review iteration. Thanks http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. Can you expand the comment a little bit on the semantics of this? A few high level questions are: 1. What is the connection between this and the sortby() hints? Can I use both? What if there is a conflict? Same for clustered hint. 2. Why does sort by enables clustering as well? 3. Can I specify any table columns (even the partitioning columns) in the SORT BY clause or are there some restrictions? 4. I suppose this works for all table types (including Kudu)? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 2: (20 comments) Initial pass, haven't looked at the tests yet. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table property. : HashMapproperties = new HashMap (); : String columnString = Joiner.on(",").join(col_names); : properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString); : RESULT = new AlterTableSetTblProperties(table, null, TTablePropertyType.TBL_PROPERTY, : properties); : :} I don't see why you need to treat it as another SetTblProperties statement. I understand that eventually the sort by columns will be stored as table properties but this adds unnecessary complexity to the parser. I would simply create a new AlterTableSetSortByColumns statement. PS2, Line 1143: opt_sort_by_cols:sort_by_cols : KW_LIKE table_name:other_table : opt_comment_val:comment : file_format_create_table_val:file_format location_val:location nit: indentation PS2, Line 1182: opt_sort_by_cols:sort_by_cols opt_comment_val:comment row_format_val:row_format serde_properties:serde_props nit: long line http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS2, Line 162: /** :* Analyzes the 'sort.by.columns' property to make sure the columns inside the property :* occur in the target table and are neither partitioning nor primary key columns. :*/ : public void analyzeSortByColumns() throws AnalysisException { : StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, getTargetTable(), error); : if (error.length() > 0) throw new AnalysisException(error.toString()); : } Along the lines of my previous comment: a) I would move this completely out of this class, b) create a new AlterTableSetSortByColumns and move the analysis logic there. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS2, Line 275: Boolean boolean? why class? PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved); This is kind of harsh :). Do we guarantee somewhere that no one will call addWarning after warnings have been retrieved? http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS2, Line 119: sortByColumns_.size() > 0 !sortByColumns_.isEmpty() PS2, Line 179: StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, srcTable, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); Why don't you just throw the AnalysisException from the analyzeSortByColumns()? http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS2, Line 382: sortByExprs_.size() > 0 !sortByExprs_.isEmpty() PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx)); Does this work ok if you have column permutation? PS2, Line 768: StringBuilder error = new StringBuilder(); : sortByColumns_ = TablePropertyAnalyzer.analyzeSortByColumns(table_, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); Same comment as before. Move the throw clause inside the analyzeSortByColumns() function. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS2, Line 33: /** : * This class collects various methods to analyze table properties. : */ : public class TablePropertyAnalyzer { You're only analyzing the sort by columns, so why this generic class? Can't we move these methods to the TableDef class? PS2, Line 41: Analyzes the 'sort.by.columns' property of 'table'. What is the return value? PS2, Line 86: List partitionCols, List primaryKeyCols You can simply pass one list which includes the list of column names to exclude from the sort columns. If you want a specific error message you can
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 6: Code-Review+2 Keep Henry's +2 -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Hello Marcel Kornacker, Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6344 to look at the new patch set (#6). Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. IMPALA-4041: Limit catalog and admission control updates to coordinators With this commit we add the ability to limit catalog updates to a limited set of coordinator nodes. A new startup option, termed 'is_coordinator' is added to indicate if a node is a coordinator. Coordinators accept connections through HS2 and Beeswax interfaces and can also participate in query execution. Non-coordinator nodes do not receive catalog updates from the statestore, do not initialize a query scheduler and cannot accept Beeswax and HS2 client connections. Testing: - Added a custom cluster test that launches a cluster in which the number of coordinators is less than the cluster size and runs a number of smoke queries. - Successfully run exhaustive tests. Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/service/query-exec-state.cc M be/src/testutil/in-process-servers.h M be/src/util/webserver.cc M bin/start-impala-cluster.py M tests/common/custom_cluster_test_suite.py M tests/common/impala_service.py A tests/custom_cluster/test_coordinators.py M www/root.tmpl 19 files changed, 435 insertions(+), 235 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/6 -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 5: (6 comments) Thanks Henry! http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/service/impala-server.cc File be/src/service/impala-server.cc: PS5, Line 180: node can coordinate queries. > maybe tighten this up: Done http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 949: > comment? Done http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/service/query-exec-state.cc File be/src/service/query-exec-state.cc: PS5, Line 461: if (!FLAGS_disable_admission_control) { > again, it seems better to check to see if admission_controller() != nullptr Done http://gerrit.cloudera.org:8080/#/c/6344/5/be/src/util/webserver.cc File be/src/util/webserver.cc: PS5, Line 192: FLAGS_is_coordinator > it would be neater to do something like: Done PS5, Line 192: Worker > nit: dunno how much we want to bikeshed this, but I prefer "Executor" to "W Renamed "worker" to "executor". Also, "coordinator" now is "coordinator + executor". http://gerrit.cloudera.org:8080/#/c/6344/5/www/root.tmpl File www/root.tmpl: Line 21: Impala Server Mode > This template gets used for catalog and statestored as well. You need to ch Good point, forgot about it. Fixed it. -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 648 insertions(+), 304 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/3 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/6406/2/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS2, Line 64: message(${JAVA_FE_ARGS}) > Looks like debug printing. Remove? (below too) Why? http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: PS2, Line 67: storageIdtoInt > (Unrelated to your change). Looks obsolete. Done http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 119: FlatBufferBuilder fbb = new FlatBufferBuilder(1); > Preconditions.checkState(isBlockMdSupported(fs))? Done PS2, Line 124: } else { : locations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); : } > Do we really need this? Per my understanding, we only fetch LocatedFileStat If fileStatus is not an instance of LocatedFileStatus we don't have the block locations. E.g. during refresh. Line 141: ListMap hostIndex) { > Preconditions.checkState(!isBlockMdSupported(fs)) We don't have the filesystem here. PS2, Line 425: fbFileBlock_.replicaHostIdxs(replicaIdx); > - replicaHostIdxs is [ushort] as per fbs definition in which case idx shoul In Java there is no unsigned short. Hence, ushort is stored as in int in flatbuffer java-generated code. I like the two masks as it makes the intention more explicit. Line 463: public static class FileDescStats { > For supportability purposes, I'm wondering if we should track the remote_ne That's a good point. I plan on augmenting this in a separate patch along with other useful stats. http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: PS2, Line 344: = > nit: spacing Done Line 365: private static String findCommonPrefix(String a, String b) { > Use Strings.commonPrefix() from guava? Good point. Didn't know this existed. Line 1933: Collections.sort(orderedFds, new Comparator() { > Can we just make FileDescriptor implement Comparator and ad How are we going to call decompressFileName that references a structure (file name prefixes) in HdfsTable if we do that? http://gerrit.cloudera.org:8080/#/c/6406/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 93: protected boolean storedInImpaladCatalogCache_ = false; > What is the motivation to track this? A quick look at the places it is used To avoid initializing data structures that aren't used in the catalog. Tests also need to access this because of the different way we have for loading metadata. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 4: (10 comments) I am not so sure about removing the catalog and in-flight queries handles. They provide a way of visually validating that worker nodes do not receive catalog updates and that they don't schedule queries. http://gerrit.cloudera.org:8080/#/c/6344/4/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 259 > This line is effectively removed, right? It was added in: https://gerrit.cl After inspecting the code and also talking to Lars, I believe this removal is safe. This change has the following effect. The scheduler has to wait for the statestore update that includes the local backend in order to add it to the backend config. Before that, the backend_config could be empty and scheduling should fail unless exec_at_coord is true. We have tests for these cases. http://gerrit.cloudera.org:8080/#/c/6344/4/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1588: LOG(WARNING) << "Failed to convert hostname " << hostname << " to IP address"; > log status? Done PS4, Line 1601: address > nit: descriptor Done PS4, Line 1604: } : if (status.ok()) { > else { Done PS4, Line 1909: NULL > nullptr Done PS4, Line 1931: NULL > nullptr Done http://gerrit.cloudera.org:8080/#/c/6344/4/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 436: // it to the list of known backends and tell the statestore by adding it to > this function isn't telling the statestore anything, it's only adding to to Done PS4, Line 438: In > nit: should be To, not In Done http://gerrit.cloudera.org:8080/#/c/6344/4/bin/start-impala-cluster.py File bin/start-impala-cluster.py: PS4, Line 40: 3 > is there any way to make this default == the value of '-s'? If I run -s1 ri Hm, not sure how to do that. I removed the check. http://gerrit.cloudera.org:8080/#/c/6344/4/tests/custom_cluster/test_coordinators.py File tests/custom_cluster/test_coordinators.py: Line 83: client2.close() > can you add a check that the worker has not received the catalog metadata? Done -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has uploaded a new patch set (#4). Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. IMPALA-4041: Limit catalog and admission control updates to coordinators With this commit we add the ability to limit catalog updates to a limited set of coordinator nodes. A new startup option, termed 'is_coordinator' is added to indicate if a node is a coordinator. Coordinators accept connections through HS2 and Beeswax interfaces and can also participate in query execution. Non-coordinator nodes do not receive catalog updates from the statestore, do not initialize a query scheduler and cannot accept Beeswax and HS2 client connections. Testing: - Added a custom cluster test that launches a cluster in which the number of coordinators is less than the cluster size and runs a number of smoke queries. - Successfully run exhaustive tests. Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/service/query-exec-state.cc M be/src/testutil/in-process-servers.h M bin/start-impala-cluster.py M tests/common/custom_cluster_test_suite.py M tests/common/impala_service.py A tests/custom_cluster/test_coordinators.py 16 files changed, 365 insertions(+), 198 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/4 -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 3: (12 comments) http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: PS3, Line 180: if (FLAGS_disable_admission_control) LOG(INFO) << "Admission control is disabled."; : if (!FLAGS_disable_admission_control) { > if (!FLAGS_disable_admission_control) { Done http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: Line 200: /// Registers the request queue topic. > ... with the statestore. Done http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/scheduling/scheduler-test-util.cc File be/src/scheduling/scheduler-test-util.cc: Line 457: return scheduler_->ComputeScanRangeAssignment(*scheduler_->GetBackendConfig(), 0, nullptr, > long line Done http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 336: std::vector* topic_updates) { > remove std:: Done Line 343: std::vector* topic_updates) { > remove std:: Done PS3, Line 1489: Otherwise, this operation : // is performed by the Scheduler. > Can you just ensure that this is always done by this callback, and remove t Done Line 1579: const std::string local_backend_id = exec_env_->subscriber()->id(); > remove std:: Done Line 1589: LOG(WARNING) << "Failed to convert hostname " << hostname << " to IP address"; > but keep going? Done Line 1592: subscriber_topic_updates->push_back(TTopicDelta()); > emplace_back() Done Line 1595: update.topic_entries.push_back(TTopicItem()); > emplace_back() Done http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 436: // backends. If not, add it and tell the statestore. Even though the same > 'and tell the statestore'? doesn't something else do that part? Done http://gerrit.cloudera.org:8080/#/c/6344/3/be/src/testutil/in-process-servers.h File be/src/testutil/in-process-servers.h: PS3, Line 95: Not owned by this class; : /// instead it's owned via shared_ptrs > Ownership is shared now. Done -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. IMPALA-4041: Limit catalog and admission control updates to coordinators With this commit we add the ability to limit catalog updates to a limited set of coordinator nodes. A new startup option, termed 'is_coordinator' is added to indicate if a node is a coordinator. Coordinators accept connections through HS2 and Beeswax interfaces and can also participate in query execution. Non-coordinator nodes do not receive catalog updates from the statestore, do not initialize a query scheduler and cannot accept Beeswax and HS2 client connections. Testing: - Added a custom cluster test that launches a cluster in which the number of coordinators is less than the cluster size and runs a number of smoke queries. - Successfully run exhaustive tests. Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/service/query-exec-state.cc M be/src/testutil/in-process-servers.h M bin/start-impala-cluster.py M tests/common/custom_cluster_test_suite.py M tests/common/impala_service.py A tests/custom_cluster/test_coordinators.py 16 files changed, 363 insertions(+), 172 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/3 -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 2: (10 comments) http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: Line 183: request_pool_service_.get(), metrics_.get(), backend_address_)); > no need to pass in backend_address_, you can get that with ExecEnv::GetInst Hm, I am not sure this is possible. This is the ExecEnv constructor, is it safe to call ExecEnv::GetInstance() when AdmissionController and Scheduler are initialized here? Maybe I am missing something. Line 228: if (FLAGS_use_statestore && statestore_port > 0) { > let's please not file a jira for something this small, the time involved in There is already a TODO. Done Line 349: if (admission_controller_ != NULL) RETURN_IF_ERROR(admission_controller_->Init()); > replace all NULL w/ nullptr in this file (and any other file you touch wher Done http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: Line 38: DECLARE_bool(is_coordinator); > unused Done http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: Line 210: /// Subscription manager > owned? Done http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 341: bind(mem_fn(::CatalogUpdateCallback), this, _1, _2); > use lambda, unless impractical. you can also inline that in the call. Done Line 1484: // Check if the local backend descriptor is in the list of known > this function is already too long. break this up. Previously, it was the scheduler's responsibility to update the list of backends in the statestore. Given that the scheduler may not start in all the nodes, we need this update to happen elsewhere. I've extracted this into a separate function which is invoked only for non-coordinator nodes. Line 1903: // Only for coordinators, initialize the HS2 and Beeswax services. > rephrase to fix grammar Done http://gerrit.cloudera.org:8080/#/c/6344/2/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 955: /// Impala server is a coordinator (dictated from the is_coordinator flag). > "indicated by the .. flag" Done http://gerrit.cloudera.org:8080/#/c/6344/2/tests/custom_cluster/test_coordinators.py File tests/custom_cluster/test_coordinators.py: Line 42: client = worker.service.create_beeswax_client() > same for hs2? Done -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Benchmark: Memory requirement for storing an HDFS table with 250K files is reduced by 2.5X. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 654 insertions(+), 301 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/2 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG Commit Message: Line 11: serialization library. > Any benchmark numbers? Just curious. Also I think it is good to add them he Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS1, Line 24: > Trailing white spaces at multiple places in this file. Done Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b) > Can this be moved outside the loop like JAVA_FE_ARGS Done http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: PS1, Line 20: > extraneous white spaces. Done PS1, Line 54: file_name_prefix_idx > Didn't understand what this prefix means till I read HdfsTable.fileNamePref Done Line 70: root_type FbFileDesc; > What is the significance of this? I'm reading about FlatBuffers for the fir Yeah, it is mostly used if you read from JSON. Removed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 83: storageIdGenerator.put(host, new Short(diskId)); > Drive-by comment: using 'new' here is pessimising memory consumption. If yo Good point thanks. Done http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: PS1, Line 318: REPLICA_HOST_IDX_MASK > Looks like it used for UNMASKing, rename may be? It's a bit mask, so I'd rather keep the name as such :) Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF; > Document these? Done PS1, Line 357: getHostIdx > how about just making hostIdx_ a short rather than typecasting everytime? Done PS1, Line 360: REPLICA_HOST_CACHE_MASK > Shouldn't this be short rather than int? May be it works fine this way, jus For convenience. Java short is signed. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 315: Preconditions.checkNotNull(fd); > I don't think this FileDescriptor.create() can ever return null? If yes, wo Done Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' and returns a > I like the idea but I'm wondering if this kind of optimization is a little That's not entirely true. Multiple filenames from a single write do have a common prefix. Line 420: prefixCompressFileName(currentFileName, prevFileName); > Also I think it may not be optimal to construct the prefixes this way by co In principal you're right, this is not the best way to compress strings. However, due to the way these file names are generated (random) this is not meant to be a generic file name compressor. Here we take advantage of the pattern that file names of a single write exhibit to avoid storing the common prefix multiple times. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 96: protected static EnumSet SUPPORTED_TABLE_TYPES = EnumSet.of( > May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by Not sure how this ended up here. Removed. -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6408/2/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: PS2, Line 188: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + FILE_PATH, :INSERTED_VALUE) You can import: from tests.util.filesystem_utils import IS_S3 and then do here if IS_S3: -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. IMPALA-4041: Limit catalog and admission control updates to coordinators With this commit we add the ability to limit catalog and admission control updates to a limited set of coordinator nodes. A new startup option, termed 'is_coordinator' is added to indicate if a node is a coordinator. Coordinators accept connections through HS2 and Beeswax interfaces and can also participate in query execution. Non-coordinator nodes do not receive catalog and admission control updates from the statestore and cannot access client connections. Testing: - Added a custom cluster test that launches a cluster in which the number of coordinators is less than the cluster size and runs a number of smoke queries. - Successfully run exhaustive tests. Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/service/query-exec-state.cc M be/src/testutil/in-process-servers.h M bin/start-impala-cluster.py M tests/common/custom_cluster_test_suite.py A tests/custom_cluster/test_coordinators.py 14 files changed, 287 insertions(+), 129 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/2 -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/6344/1/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: Line 229: Status AdmissionController::Init(StatestoreSubscriber* subscriber) { > Yeah, we need to at least execute the 'AddPoolUpdates' part of the callback Done. Extracted the AdmissionController out of the Scheduler. Line 230: if (!FLAGS_is_coordinator) return Status::OK(); > This shouldn't even get called if is_coordinator is false. I think you shou Done http://gerrit.cloudera.org:8080/#/c/6344/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1888: if (be_port != 0 && be_server != NULL) { > move this above line 1844, then return if this isn't a coordinator. Done http://gerrit.cloudera.org:8080/#/c/6344/1/be/src/service/impala-server.h File be/src/service/impala-server.h: Line 948: /// If beeswax_port != 0 (and fe_server != NULL), creates a ThriftServer exporting > update comments with new semantics. Done http://gerrit.cloudera.org:8080/#/c/6344/1/bin/start-impala-cluster.py File bin/start-impala-cluster.py: Line 246: args = "-is_coordinator=true %s" % (args) > leave this out. it's the default, and we should make sure it actually works Done http://gerrit.cloudera.org:8080/#/c/6344/1/tests/custom_cluster/test_coordinators.py File tests/custom_cluster/test_coordinators.py: Line 76: self.execute_query_expect_success(client1, "drop database %s cascade" % db_name) > why don't you add running a query against impalads[2] (and failing), and th Done -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 1: Code-Review+2 In terms of testing, you may run a private jenkins jobs to test your uncommitted changes against s3 using impala-auxiliary-tests/jenkins/run_impala_tests.py. -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6406 Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 653 insertions(+), 298 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/1 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] Fix typo in Flatbuffers cmake module
Dimitris Tsirogiannis has posted comments on this change. Change subject: Fix typo in Flatbuffers cmake module .. Patch Set 1: Code-Review+2 Thanks Henry -- To view, visit http://gerrit.cloudera.org:8080/6398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0786344b5485a92c02a246b543b6acda279e199c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4482, IMPALA-4838: RECOVER PARTITIONS with tpcds.store sales
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4482, IMPALA-4838: RECOVER PARTITIONS with tpcds.store_sales .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaae97d1d44201aeeacacdd39adbae35753512950 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4041: Limit catalog and admission control updates to coordinators
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6344 Change subject: IMPALA-4041: Limit catalog and admission control updates to coordinators .. IMPALA-4041: Limit catalog and admission control updates to coordinators With this commit we add the ability to limit catalog and admission control updates to a limited set of coordinator nodes. A new startup option, termed 'is_coordinator' is added to indicate if a node is a coordinator. Coordinators accept connections through HS2 and Beeswax interfaces and can also participate in query execution. Non-coordinator nodes do not receive catalog and admission control updates from the statestore and cannot access client connections. Testing: - Added a custom cluster test that launches a cluster in which the number of coordinators is less than the cluster size and runs a number of smoke queries. - Successfully run exhaustive tests. Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 --- M be/src/scheduling/admission-controller.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/impalad-main.cc M be/src/testutil/in-process-servers.h M bin/start-impala-cluster.py M tests/common/custom_cluster_test_suite.py A tests/custom_cluster/test_coordinators.py 8 files changed, 144 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/6344/1 -- To view, visit http://gerrit.cloudera.org:8080/6344 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5f2c74abdbcd60ac050efa323616bd41182ceff3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5028: Lock table in /catalog objects endpoint.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5028: Lock table in /catalog_objects endpoint. .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ad4ce286b8cc169f29b8ddfa215f8949b1c11ff Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 4: Code-Review+2 Rebase and keep Henry's +2 -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Hello Henry Robinson, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6013 to look at the new patch set (#4). Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. IMPALA-4885: Expose Jvm thread info in web UI This commit exposes information about JVM threads to the impalad and catalogd web UIs. This information includes statistics about the number of threads running in the JVM as well as per-thread stacktraces, monitors and synchronizers. Total CPU, user CPU and blocked time is also reported per thread. Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f --- M be/src/catalog/catalogd-main.cc M be/src/service/impalad-main.cc M be/src/statestore/statestored-main.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/JniUtil.java A www/jvm-threadz.tmpl M www/threadz.tmpl A www/threadz_tabs.tmpl 12 files changed, 402 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/4 -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/6013/3/be/src/util/thread.cc File be/src/util/thread.cc: PS3, Line 79: args > long line Done PS3, Line 340: void InitThreading() { : DCHECK(thread_manager.get() == nullptr); : thread_manager.reset(new ThreadMgr()); : } : : Status StartThreadInstrumentation(MetricGroup* metrics, Webserver* webserver, : bool include_jvm_threads) { : DCHECK(metrics != nullptr); : DCHECK(webserver != nullptr); : RETURN_IF_ERROR(thread_manager->StartInstrumentation(metrics)); : RegisterUrlCallbacks(include_jvm_threads, webserver); : return Status::OK(); : } > If you move these to the end of the file, does that avoid any of the need t Yes, we don't need to forward declare RegisterUrlCallbacks. Done PS3, Line 379: args > long line Done -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5028: Lock table in /catalog objects endpoint.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5028: Lock table in /catalog_objects endpoint. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6296/1/tests/webserver/test_web_pages.py File tests/webserver/test_web_pages.py: PS1, Line 141: # IMPALA-5028: Test toThrift() of a partitioned table via the WebUI code path. : self.get_and_check_status(self.CATALOG_OBJECT_URL + : "?object_type=TABLE_name=functional.alltypes", "alltypes", : without_ss=True) The error was thrown when the table was loaded. So I am wondering if we're getting the expected coverage with this tests. -- To view, visit http://gerrit.cloudera.org:8080/6296 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3ad4ce286b8cc169f29b8ddfa215f8949b1c11ff Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS2, Line 2661: updateLastDdlTime(msTbl, msClient); > If we get some kind of exception during a later round of processing, it is Unfortunately, the whole operation is far from being atomic but I see your point :) -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 3: Any more comments on this? Henry, lmk if you're swamped and I will ask someone else to take a look. -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6275/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS2, Line 2661: updateLastDdlTime(msTbl, msClient); Do we need to call this for every batch? I believe it is sufficient to call it once at the end. http://gerrit.cloudera.org:8080/#/c/6275/2/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: PS2, Line 190: for i in xrange(1, 700): : PART_DIR = "part%d\t" % i : assert not self.has_value(PART_DIR, result.data) This check may be an overkill since no partitions were added to this table. Why don't you simply verify that show partitions doesn't return any rows? -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4762: RECOVER PARTITIONS should batch partition updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4762: RECOVER PARTITIONS should batch partition updates .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/6275/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS1, Line 2641: // ifNotExists and needResults are true. nit: move comment above L2644 PS1, Line 2646: } catch (AlreadyExistsException e) { : throw new InternalException( : "AlreadyExistsException thrown although ifNotExists given", e); : } I don't think we need to handle this case separately. Let the outer catch handle it. Besides, we're not doing anything special here. http://gerrit.cloudera.org:8080/#/c/6275/1/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: PS1, Line 171: test_recover_many_partitions This test may be too expensive. How much time does it take to run? See my comments below for ideas on how to trim it a bit. PS1, Line 182: # Preload 10% of the partitions with partition(i) = values(i) : for i in xrange(1, 700, 10): : self.execute_query_expect_success(self.client, : "INSERT INTO TABLE %s PARTITION(s='part%d') VALUES(%d)" % (FQ_TBL_NAME, i, i)) I think we can live without this part. PS1, Line 193: self.filesystem_client.create_file(TBL_LOCATION + PART_DIR + FILE_PATH, :INSERTED_VALUE) It should be sufficient to just create the directory, no? PS1, Line 196: result = self.execute_query_expect_success(self.client, : "SHOW PARTITIONS %s" % FQ_TBL_NAME) : for i in xrange(1, 700): : PART_DIR = "part%d\t" % i : if (i % 10 == 1): : assert self.has_value(PART_DIR, result.data) : else: : assert not self.has_value(PART_DIR, result.data) : : self.execute_query_expect_success(self.client, : "ALTER TABLE %s RECOVER PARTITIONS" % FQ_TBL_NAME) : result = self.execute_query_expect_success(self.client, : "select c from %s" % FQ_TBL_NAME) : for i in xrange(1, 700): : if (i % 10 == 1): : assert self.has_value(str(i), result.data) : else: : assert self.has_value(str(-i), result.data) I would just do: a) show partitions and verify 0, b) recover partitions, c) show partitions and verify 700. -- To view, visit http://gerrit.cloudera.org:8080/6275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7f9334051b11ba8fa16159b7ca67ddc7f2392733 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4966: Add flatbuffers to build
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4966: Add flatbuffers to build .. Patch Set 2: Code-Review+2 Rebase, keep Alex's +2 -- To view, visit http://gerrit.cloudera.org:8080/6180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4998: Fix missing table lock acquisition.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4998: Fix missing table lock acquisition. .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/6177/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS5, Line 934: tbl instanceof IncompleteTable Not related to this change but I believe we load the metadata twice if we call refresh on an incomplete table, once during the getOrLoadTable() call and once here. If you agree, let's file a jira about it. Let me know.. -- To view, visit http://gerrit.cloudera.org:8080/6177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0e0270daf59fce95f1a1520fc5aaf91d3a7b99fe Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. IMPALA-4885: Expose Jvm thread info in web UI This commit exposes information about JVM threads to the impalad and catalogd web UIs. This information includes statistics about the number of threads running in the JVM as well as per-thread stacktraces, monitors and synchronizers. Total CPU, user CPU and blocked time is also reported per thread. Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f --- M be/src/catalog/catalogd-main.cc M be/src/service/impalad-main.cc M be/src/statestore/statestored-main.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/JniUtil.java A www/jvm-threadz.tmpl M www/threadz.tmpl A www/threadz_tabs.tmpl 12 files changed, 404 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/3 -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/6013/2/be/src/util/thread.cc File be/src/util/thread.cc: PS2, Line 75: void JvmThreadsUrlCallback(const Webserver::ArgumentMap& args, Document* doc); : : void ThreadOverviewUrlCallback(bool track_jvm_threads, const Webserver::ArgumentMap& args, : Document* document); : : void RegisterUrlCallbacks(bool track_jvm_threads, Webserver* webserver); : : void GetJvmThreadOverview(Document* document); > put all these in an anonymous namespace to avoid polluting the global symbo Done PS2, Line 216: NULL > nullptr, here and everywhere Done PS2, Line 379: DCHECK(document != NULL); > why would document be null? Good point. Removed from here and elsewhere. PS2, Line 384: GetJvmThreadOverview > move this into ThreadOverviewUrlCallback Done PS2, Line 392: LOG(WARNING) << "Couldn't retrieve information about JVM threads: " : << status.GetDetail(); > But - put this in the "error" member of document, and it should show on the Done PS2, Line 393: << status.GetDetail(); > formatting Done http://gerrit.cloudera.org:8080/#/c/6013/2/be/src/util/thread.h File be/src/util/thread.h: PS2, Line 194: track_jvm_threads > include_jvm_threads ? Done http://gerrit.cloudera.org:8080/#/c/6013/2/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: Line 758: // Information about JVM threads > comment when this might not be set? Done http://gerrit.cloudera.org:8080/#/c/6013/2/fe/src/main/java/org/apache/impala/common/JniUtil.java File fe/src/main/java/org/apache/impala/common/JniUtil.java: PS2, Line 206: threadCount > just write response.setTotal_thread_count(threadBean.getThreadCount()) ? Done http://gerrit.cloudera.org:8080/#/c/6013/2/www/threadz.tmpl File www/threadz.tmpl: PS2, Line 37: # > nit: is there only one jvm-threads entry? If so, this should be the ? opera Hm. Doesn't work. For some reason it doesn't populate it when I replace it with the ? operator. http://gerrit.cloudera.org:8080/#/c/6013/2/www/threadz_tabs.tmpl File www/threadz_tabs.tmpl: Line 27: > nit: trailing whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4966: Add flatbuffers to build
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6180 Change subject: IMPALA-4966: Add flatbuffers to build .. IMPALA-4966: Add flatbuffers to build FlatBuffers version 1.6.0 is already included in the toolchain. This commit adds it to the build system. Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0 --- M CMakeLists.txt M bin/bootstrap_toolchain.py M bin/impala-config.sh A cmake_modules/FindFlatBuffers.cmake 4 files changed, 57 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/6180/1 -- To view, visit http://gerrit.cloudera.org:8080/6180 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2ca255ddf08ac846b454bfa1470ed67b1338d2b0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4902: Copy parameters map in HdfsPartition.toThrift().
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4902: Copy parameters map in HdfsPartition.toThrift(). .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic11277ad5512d2431cd3cc791715917c95395ddf Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4902: Copy parameters map in HdfsPartition.toThrift().
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4902: Copy parameters map in HdfsPartition.toThrift(). .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6127/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: PS2, Line 765: One thread : // may try to serialize the returned THdfsPartition after releasing the table's lock, : // and another thread doing DDL may modify the map. Are we sure this is needed? When do we call HdfsPartition.toThrift() outside of HdfsTable.toThrift() which is now protected? http://gerrit.cloudera.org:8080/#/c/6127/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS2, Line 26: import java.util.concurrent.locks.ReentrantReadWriteLock; I don't think you need that. -- To view, visit http://gerrit.cloudera.org:8080/6127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic11277ad5512d2431cd3cc791715917c95395ddf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 1: (10 comments) Thanks for the suggestions Henry. New screenshots are here: https://drive.google.com/file/d/0B7VW0hRyzVlKTzZPR3VEVlVEYWc/view?usp=sharing https://drive.google.com/file/d/0B7VW0hRyzVlKOXowTUM2QWVySm8/view?usp=sharing http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.cc File be/src/util/thread.cc: Line 178: void ThreadOverviewUrlCallback(bool track_jvm_threads, > Shouldn't this go above ThreadGroupUrlCallback() to match with the sample j No, this is the output of the overview callback. PS1, Line 194: [this, track_jvm_threads] (const Webserver::ArgumentMap& args, Document* doc) { : this->ThreadOverviewUrlCallback(track_jvm_threads, args, doc) > Make this a method? MakeUrlCallBack or something, used multiple times. This has changed. PS1, Line 209: RegisterUrlCallback > Rather than adding another top-level link, consider making two tabs on the Done PS1, Line 234: bool track_jvm_threads, > how about just making this a member of ThreadMgr? Actually - I don't think Done Line 260: << status.GetDetail(); > return, then remove else { } Done PS1, Line 262: jvmThreadsVal > C++ naming styles power of habit.. :) Done PS1, Line 327: INFO > warn/error? How about bubbling this error up to the web endpoint by setting Changed it to WARN. What is the "error" member? http://gerrit.cloudera.org:8080/#/c/6013/1/be/src/util/thread.h File be/src/util/thread.h: Line 191: /// the "thread-manager." > Describe track_jvm_threads in the method comment? Done http://gerrit.cloudera.org:8080/#/c/6013/1/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: Line 720: // Summary of a JVM thread. Includes stacktraces, locked monitors > Do you think its good to include the locked synchronizers or the lock it is It already does that. http://gerrit.cloudera.org:8080/#/c/6013/1/www/jvm-threadz.tmpl File www/jvm-threadz.tmpl: PS1, Line 35: Is n > Native Done -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. IMPALA-4885: Expose Jvm thread info in web UI This commit exposes information about JVM threads to the impalad and catalogd web UIs. This information includes statistics about the number of threads running in the JVM as well as per-thread stacktraces, monitors and synchronizers. Total CPU, user CPU and blocked time is also reported per thread. Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f --- M be/src/catalog/catalogd-main.cc M be/src/service/impalad-main.cc M be/src/statestore/statestored-main.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/JniUtil.java A www/jvm-threadz.tmpl M www/threadz.tmpl A www/threadz_tabs.tmpl 12 files changed, 401 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/2 -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson
[Impala-ASF-CR] IMPALA-4840: Fix REFRESH performance regression.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4840: Fix REFRESH performance regression. .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I859b9fe93563ba886d0b5db6db42a14c88caada8 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 1: A publicly available screenshot: https://drive.google.com/file/d/0B7VW0hRyzVlKUGdWakNIdlhtbTQ/view?usp=sharing -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet. .. Patch Set 2: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/5980/2/fe/src/main/java/org/apache/impala/util/DisjointSet.java File fe/src/main/java/org/apache/impala/util/DisjointSet.java: PS2, Line 49: private final IdentityHashMapuniqueSets_ = : new IdentityHashMap (); nice idea :) PS2, Line 111: checkState(removedValue == DUMMY_VALUE); I think checking for not NULL is also ok. -- To view, visit http://gerrit.cloudera.org:8080/5980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. Patch Set 1: A screenshot of the new web page with the JVM thread info is here: https://cloudera.box.com/s/agu113d1rxzfainy860f6oqlizp4v92v -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4885: Expose Jvm thread info in web UI
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6013 Change subject: IMPALA-4885: Expose Jvm thread info in web UI .. IMPALA-4885: Expose Jvm thread info in web UI This commit exposes information about JVM threads to the impalad and catalogd web UIs. This information includes statistics about the number of threads running in the JVM as well as per-thread stacktraces, monitors and synchronizers. Total CPU, user CPU and blocked time is also reported per thread. Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f --- M be/src/catalog/catalogd-main.cc M be/src/service/impalad-main.cc M be/src/statestore/statestored-main.cc M be/src/util/jni-util.cc M be/src/util/jni-util.h M be/src/util/thread.cc M be/src/util/thread.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/common/JniUtil.java A www/jvm-threadz.tmpl M www/threadz.tmpl 11 files changed, 312 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/6013/1 -- To view, visit http://gerrit.cloudera.org:8080/6013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id497043ab33dcf107a562f0b1ccd5c46095d397f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4897: AnalysisException: specified cache pool does not exist
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4897: AnalysisException: specified cache pool does not exist .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/5972/1/testdata/workloads/functional-query/queries/QueryTest/alter-table-hdfs-caching.test File testdata/workloads/functional-query/queries/QueryTest/alter-table-hdfs-caching.test: PS1, Line 1: Can you add a comment to describe at a high level what this test file is all about? -- To view, visit http://gerrit.cloudera.org:8080/5972 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe61556bc28ae320f3f41fdc930d37b258d970a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4897: AnalysisException: specified cache pool does not exist
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4897: AnalysisException: specified cache pool does not exist .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5972/1/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: PS1, Line 258: if IS_HDFS: : self.run_test_case('QueryTest/alter-table-hdfs-caching', vector, : use_db=unique_database, multiple_impalad=self._use_multiple_impalad(vector)) I think it may be best to put it in test_hdfs_caching.py? What do you think? -- To view, visit http://gerrit.cloudera.org:8080/5972 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iefe61556bc28ae320f3f41fdc930d37b258d970a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] Fix merge conflict
Dimitris Tsirogiannis has posted comments on this change. Change subject: Fix merge conflict .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia5444d6b44b9aeea18f7861849513a2bde5c881f Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dan HechtGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI .. Patch Set 2: Code-Review+2 Here is a screenshot from my dev box for the catalog/memz https://cloudera.box.com/s/uug5z4zpkfqhhkgxrrejskjvc9l2hh2u It's the same for the impalad/memz as well. -- To view, visit http://gerrit.cloudera.org:8080/5909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI .. Patch Set 2: Code-Review+1 Keep Alex's +1 -- To view, visit http://gerrit.cloudera.org:8080/5909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5909 to look at the new patch set (#2). Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI .. IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI This commit adds heap and non-heap memory usage of the embedded JVM in the memory metrics and exposes these metrics in /memz web page of the impalad and catalog web UI. Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 --- M be/src/util/default-path-handlers.cc M fe/src/main/java/org/apache/impala/common/JniUtil.java M www/memz.tmpl 3 files changed, 110 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/5909/2 -- To view, visit http://gerrit.cloudera.org:8080/5909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/5909/1/be/src/util/default-path-handlers.cc File be/src/util/default-path-handlers.cc: Line 163: document->AddMember("jvm_pool", total, document->GetAllocator()); > jvm_total Done http://gerrit.cloudera.org:8080/#/c/5909/1/www/memz.tmpl File www/memz.tmpl: Line 92: JVM total pool memory usage > It somewhat confused me, so might be better to omit. Done -- To view, visit http://gerrit.cloudera.org:8080/5909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5909/1/www/memz.tmpl File www/memz.tmpl: Line 92: JVM total pool memory usage > What's the significance of the word "pool" here? Merely to indicate that we aggregate across all memory pools. Let me know if you prefer to remove it. -- To view, visit http://gerrit.cloudera.org:8080/5909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/5909 Change subject: IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI .. IMPALA-4884: Add JVM heap and non-heap usage in metrics and UI This commit adds heap and non-heap memory usage of the embedded JVM in the memory metrics and exposes these metrics in /memz web page of the impalad and catalog web UI. Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 --- M be/src/util/default-path-handlers.cc M fe/src/main/java/org/apache/impala/common/JniUtil.java M www/memz.tmpl 3 files changed, 110 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/5909/1 -- To view, visit http://gerrit.cloudera.org:8080/5909 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I543d4d428d7240e0f710d67973867162f2fcabc8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION .. Patch Set 24: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 24 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION .. Patch Set 23: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-1670,IMPALA-4141: Support multiple partitions in ALTER TABLE ADD PARTITION .. Patch Set 23: Yes, looking at it now. -- To view, visit http://gerrit.cloudera.org:8080/4144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iddbc951f2931f488f7048c9780260f6b49100750 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila JegesGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1279: uncacheTable(removedDb.getTable(tableName)); > Agreed. Might be worth adding a brief comment in one of the existing JIRAs Done -- To view, visit http://gerrit.cloudera.org:8080/5815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] CDH-48291: Fix flaky test TestRequestPoolService
Dimitris Tsirogiannis has abandoned this change. Change subject: CDH-48291: Fix flaky test TestRequestPoolService .. Abandoned MJ posted a fix for this. -- To view, visit http://gerrit.cloudera.org:8080/5507 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I71a036d5b1227508e2175808d7e92837ae247d99 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables
Dimitris Tsirogiannis has uploaded a new patch set (#2). Change subject: IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables .. IMPALA-2518: DROP DATABASE CASCADE removes cache directives of tables This commit fixes an issue where the DROP DATABASE CASCADE statement will not remove the cache directives of the underlying tables and their partitions. Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java M tests/query_test/test_hdfs_caching.py 3 files changed, 66 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/5815/2 -- To view, visit http://gerrit.cloudera.org:8080/5815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of tables .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/5815/1//COMMIT_MSG Commit Message: Line 7: IMPALA-2518: DROP DATABASE CASCADE doesn't remove cache directives of > describe the fix not the bug: Done http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: Line 1279: uncacheTable(removedDb.getTable(tableName)); > It's a little tricky to figure out, but I think we may have to lock these t This is quite tricky. Keep in mind that we perform these operations on the deleted table objects. So the question is whether we correctly handle dropping a table and at the same time altering it. The answer I believe is no. The former operation uses the metastoreDdlLock_ while the latter is protected by the table lock. In theory, and assuming we had proper hierarchical locking, no concurrent access should be allowed on these tables as it would require, in the least, a read lock on the db to be dropped. We could acquire the lock on the dropped tables but, I'd rather not do it in this patch. http://gerrit.cloudera.org:8080/#/c/5815/1/fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java File fe/src/main/java/org/apache/impala/util/HdfsCachingUtil.java: Line 163: org.apache.hadoop.hive.metastore.api.Partition part) throws ImpalaException { > tab Done http://gerrit.cloudera.org:8080/#/c/5815/1/tests/query_test/test_hdfs_caching.py File tests/query_test/test_hdfs_caching.py: Line 212: """The DROP DATABASE CASCADE should properly drop all impacted cache directives > Remove "The" Done Line 213:IMPALA-2518""" > use IMPALA-2518 as a prefix to comment (like we typically do) Done Line 215: # Creates `cachedb` database with some cached tables and partitions > Populates the 'cachedb' database... Done -- To view, visit http://gerrit.cloudera.org:8080/5815 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83ef5a33e06728c2b3f833a0309d9da64dce7b88 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
Dimitris Tsirogiannis has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page .. Patch Set 13: John, when do you plan to post a new patch that addresses the last comments? I think after that, we should be able to +2 it. -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
Dimitris Tsirogiannis has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page .. Patch Set 13: Code-Review+1 (5 comments) I'll let MJ and/or Todd make a final pass. http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml File docs/topics/impala_create_table.xml: PS13, Line 85: [PRIMARY KEY (col_name[, ...]] This doesn't belong here. PS13, Line 244: CREATE TABLE [IF NOT EXISTS] db_name.]table_name : [COMMENT 'table_comment'] : STORED AS KUDU : [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] : AS : select_statement This is not correct. You're missing the PRIMARY KEY and PARTITION BY clauses. PS13, Line 411: impala::username.kudu_t1 nit: I don't think this is a good example. I would be really surprised if someone actually created a Kudu table outside of Impala named as "impala::username". Pls change the table name. http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: PS13, Line 93: kudu.master_addresses That's the name of the table property. Can you verify that name of the impalad configuration param? PS13, Line 148: There is a many-to-many relationship : between the tablets and tablet servers, managed automatically by Kudu. Isn't more clear to say that each tablet server can store multiple tablets and that tablets are replicated across different tablet servers? -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4789: Fix slow metadata loading due to inconsistent paths.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent paths. .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5743 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4789: Fix slow metadata loading due to inconsistent paths.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4789: Fix slow metadata loading due to inconsistent paths. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5743/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: PS4, Line 454: private static boolean nullOrEqual(Object a, Object b) { : if (a == null && b == null) return true; : if (a != null && b != null) return a.equals(b); : return false; : } Use Guava's Objects.equal()? I think it has the same semantics as this one. -- To view, visit http://gerrit.cloudera.org:8080/5743 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8c881b7cb155032b82fba0e29350ca31de388d55 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: Yes