[Impala-ASF-CR] IMPALA-7540. Intern most repetitive strings and network addresses in catalog
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11158 ) Change subject: IMPALA-7540. Intern most repetitive strings and network addresses in catalog .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1902/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26 Gerrit-Change-Number: 11158 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Sat, 26 Jan 2019 06:36:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7540. Intern most repetitive strings and network addresses in catalog
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11158 ) Change subject: IMPALA-7540. Intern most repetitive strings and network addresses in catalog .. Patch Set 3: (12 comments) Bharath, applied your review comments. Please take another look to see if this is ready to go. http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java File fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java: http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@62 PS2, Line 62:*/ > nit: for each method, should we list the set of fields that we selectively Documentation of this kind tends to get out of date quickly. Looking at the code, it seems clear enough what is being interned. Not sure that maintaining a list of fields is worth the effort. http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@88 PS2, Line 88: public static void internFieldsInPlace(TTable table) { > null check for table Done http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@113 PS2, Line 113:*/ > null check for sd. Done http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@131 PS2, Line 131:* Intern low-cardinality fields in the given FieldSchema. > null check Done http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@155 PS2, Line 155: not like 'impala_%' group by PARAM_VALUE order by count(*) desc limit 100; : // : // In a large catalog from a production install, these represented about 68% of the : // entries. : String val = e.getValue(); : if (val.isEmpty() || : "-1".equals(val) || > Can we create a static HashSet out of these (lowercased) and use something Looks like only five of the values would be in the table. The empty string and the prefix string would still be handled specially. Given only five entries, the above, though clunky, is simpler than, and probably performs as well as, a hash table. http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@172 PS2, Line 172: STRING_INTERNER.int > Maybe move everything to guava interner to be consistent? Native interning Done http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@174 PS2, Line 174: > Preconditions.checkState(ret.size() == parameters.size())? Done http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@188 PS2, Line 188: > can we rename it to something like internString() that we can reuse for all Done http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@198 PS2, Line 198: if (value == null) return null; > This looks risky if someone updates the thrift def for TNetworkAddress :-) The risk is only if fields are added. If fields are removed, this code won't compile and will thus call attention to itself. The note says that not all fields are stubbed out. So, yes, will lead to odd bugs if someone alters one of these after interning. But, given the semantics of metadata, we probably should not be altering objects cached in metadata anyway because they are used by multiple planner threads. http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java File fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java: http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java@35 PS2, Line 35: columnQualifier > just curious, any reason why not this? According to this link: https://www.dummies.com/programming/big-data/hadoop/column-qualifiers-in-the-hbase-data-model/ "Unlike column families, column qualifiers can be virtually unlimited in content, length and number. If you omit the column qualifier, the HBase system will assign one for you. Printable characters are not required, so any type and number of bytes can be used to create a column qualifier." So, looks like the qualifier is high cardinality, low reuse. http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@853 PS2, Line 853: _ = id; : accessLevel_ = accessLevel; > Shouldn't we do it after we remove the incremental stats ? (or) blacklist i Done
[Impala-ASF-CR] IMPALA-7540. Intern most repetitive strings and network addresses in catalog
Paul Rogers has uploaded a new patch set (#3) to the change originally created by Todd Lipcon. ( http://gerrit.cloudera.org:8080/11158 ) Change subject: IMPALA-7540. Intern most repetitive strings and network addresses in catalog .. IMPALA-7540. Intern most repetitive strings and network addresses in catalog This adds interning to a bunch of repeated strings in catalog objects, including: - table name - DB name - owner - column names - input/output formats - parameter keys - common parameter values ("true", "false", etc) - HBase column family names Additionally, it interns TNetworkAddresses, so that each datanode host is only stored once rather than having its own copy in each table. I verified this patch using jxray on the development catalogd and impalad. The following lines are removed entirely from the "duplicate strings" report: Overhead # char[]s # objects Value 164K (0.3%) 2,635 2,635 "127.0.0.1" 97K (0.2%) 1,038 1,038 "__HIVE_DEFAULT_PARTITION__" 95K (0.2%) 1,111 1,111 "transient_lastDdlTime" 92K (0.1%) 1,975 1,975 "d" 70K (0.1%) 997 997"EXTERNAL_TABLE" 56K (< 0.1%)1,201 1,201 "todd" 54K (< 0.1%)998 998"EXTERNAL" 46K (< 0.1%)998 998"TRUE" 44K (< 0.1%)567 567"numFilesErasureCoded" 38K (< 0.1%)612 612"totalSize" 30K (< 0.1%)567 567"numFiles" The following are reduced substantially: Before: 72K (0.1%) 1,543 1,543 "1" After: 47K (< 0.1%)1,009 1,009 "1" A few large strings remain in the report that may be worth addressing, depending on whether we think production catalogs exhibit the same repetitions: 1) Avro schemas, eg: 204K (0.3%) 3 3 "{"fields": [{"type": ["boolean", "null"], "name": "bool_col1"}, {"type": ["int", "null"], "name": "tinyint_col1"}, {"type": ...[length 52429]" (in the development catalog there are multiple tables with the same Avro schema) 2) Partition location suffixes, eg: 144K (0.2%) 1,234 1,234 "many_blocks_num_blocks_per_partition_1" 17K (< 0.1%)230 230"year=2009/month=2" 17K (< 0.1%)230 230"year=2009/month=3" 17K (< 0.1%)230 230"year=2009/month=1" (in the development catalog lots of tables have the same partitioning layout) 3) Unsure (jxray isn't reporting the reference chain, but seems likely to be partition values): 49K (< 0.1%)1,058 1,058 "2010" 28K (< 0.1%)612 612"2009" 27K (< 0.1%)585 585"0" 22K (< 0.1%)71 899"" Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26 --- A fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java M fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java 5 files changed, 250 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/11158/3 -- To view, visit http://gerrit.cloudera.org:8080/11158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26 Gerrit-Change-Number: 11158 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. IMPALA-7941: part 2/2: use cgroups memory limit This uses the functionality from part 1 to detect the CGroups memory limit and use it to set a lower process memory limit if needed. min(system memory, cgroups memory limit) is used instead of system memory to determine the memory limit. Behaviour of processes without a memory limit set via CGroups is unchanged. The default behaviour of using 80% of the memory limit detected is still in effect. This seems like an OK default, but may lead to some amount of wasted memory. Modify containers to have a default JVM heap size of 2GB and --mem_limit_includes_jvm, so that the automatically configured memory limit makes more sense. start-impala-cluster.py is modified to exercise all of this. Testing: Tested a containerised cluster manually on my system, which has 32GB of RAM. Here's the breakdown from the memz/ page showing the JVM heap and auto-configured memory limit. Process: Limit=7.31 GB Total=1.94 GB Peak=1.94 GB JVM: max heap size: Total=1.78 GB JVM: non-heap committed: Total=35.56 MB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused Reservation: Total=0 Control Service Queue: Limit=50.00 MB Total=0 Peak=0 Data Stream Service Queue: Limit=374.27 MB Total=0 Peak=0 Data Stream Manager Early RPCs: Total=0 Peak=0 TCMalloc Overhead: Total=12.20 MB Untracked Memory: Total=121.31 MB Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Reviewed-on: http://gerrit.cloudera.org:8080/12262 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M bin/start-impala-cluster.py M docker/coord_exec/Dockerfile M docker/coordinator/Dockerfile M docker/daemon_entrypoint.sh M docker/executor/Dockerfile 7 files changed, 126 insertions(+), 75 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 04:40:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_issuances_ This feels a bit of a misnomer as parquet scanner will actually issue more scan ranges although it does so without calling AddDiskIoRanges(). http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79 PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load(); : if (remaining != 0) { : LOG(INFO) << "XXX Assertion Failed " << remaining << " " << GetStackTrace(); : } DCHECK_EQ(remaining_scan_range_issuances_.Load(), 0) << GetStackTrace(); http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@212 PS5, Line 212: thread_state_.Close(this); I wonder if you can DCHECK remaining_scan_range_issuances_ to 0 after thread_state_.Close() as it seems to wait for all scaner threads to exit. However, I can imagine that we can potentially reach this point due to cancellation or reaching the limit so not all header ranges for the sequence files may have been handled yet ? -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 26 Jan 2019 03:50:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12277 ) Change subject: IMPALA-8062: Call impala-config in single_node_perf_run .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1901/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba Gerrit-Change-Number: 12277 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 03:45:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1900/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 14 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 03:32:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/12277 ) Change subject: IMPALA-8062: Call impala-config in single_node_perf_run .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG@11 PS1, Line 11: variables are set properly. > How did you test? I just ran it on my development machine and checked that it completed as usual. http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py File bin/single_node_perf_run.py: http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py@84 PS1, Line 84: if type(cmd) is list: > Might not be a major issue in practice, but it would be better run to run t Done -- To view, visit http://gerrit.cloudera.org:8080/12277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba Gerrit-Change-Number: 12277 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 03:16:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12277 to look at the new patch set (#2). Change subject: IMPALA-8062: Call impala-config in single_node_perf_run .. IMPALA-8062: Call impala-config in single_node_perf_run This wraps most shell calls in single_node_perf_run.py with a bash shell that first sources impala-config.sh, to make sure environment variables are set properly. Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba --- M bin/single_node_perf_run.py 1 file changed, 20 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/12277/2 -- To view, visit http://gerrit.cloudera.org:8080/12277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba Gerrit-Change-Number: 12277 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12277 ) Change subject: IMPALA-8062: Call impala-config in single_node_perf_run .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1898/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba Gerrit-Change-Number: 12277 Gerrit-PatchSet: 1 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 02:43:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 13: (3 comments) Thanks for the review, please see my inline comments and PS14 http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc@789 PS13, Line 789: // TODO: Move to host profiles? > This seems like the right thing to do. Maybe file a JIRA to track? We could Filed IMPALA-8126 http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc@151 PS13, Line 151: CpuUserPercentage > Should we add a System prefix to these or similar? Otherwise it could be in How about Host* or Node*? With Sys we have something like SysCpuSysPercentage or SystemCpuSysPercentage. I picked one but am happy to change it to something else, too. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419 PS11, Line 419: Samples : /// are not re-sampled into larger intervals, instead owners of this profile can call : /// ClearChunkedTimeSeriesCounters() to reset the sample buffers of all chunked time : /// series counters > I hadn't thought about this, but I think it's worth doing, just so that we Done -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 13 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 02:45:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Hello Michael Ho, Philip Zeyliger, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12069 to look at the new patch set (#14). Change subject: IMPALA-7694: Add host resource usage metrics to profile .. IMPALA-7694: Add host resource usage metrics to profile This change adds a mechanism to collect host resource usage metrics to profiles. Metric collection can be controlled through a new query option 'RESOURCE_TRACE_RATIO'. It specifies the probability with which metrics collection will be enabled. Collection always happens per query for all executors that run one or more fragment instances of the query. This mechanism adds a new time series counter class that collects all measured values and does not re-sample them. It will re-sample values when printing them into a string profile, preserving up to 64 values, but Thrift profiles will contain the full list of values. We add a new section "Per Node Profiles" to the profile to store and show these values: Per Node Profiles: lv-desktop:22000: CpuIoWaitPercentage (500.000ms): 0, 0 CpuSysPercentage (500.000ms): 1, 1 CpuUserPercentage (500.000ms): 4, 0 - ScratchBytesRead: 0 - ScratchBytesWritten: 0 - ScratchFileUsedBytes: 0 - ScratchReads: 0 (0) - ScratchWrites: 0 (0) - TotalEncryptionTime: 0.000ns - TotalReadBlockTime: 0.000ns This change also uses the aforementioned mechanism to collect CPU usage metrics (user, system, and IO wait time). A future change can then add a tool to decode a Thrift profile and plot the contained usage metrics, e.g. using matplotlib (IMPALA-8123). Such a tool is not included in this change because it will require some reworking of the python dependencies. This change also includes a few minor improvements to make the resulting code more readable: - Extend the PeriodicCounterUpdater to call functions to update global metrics before updating the counters. - Expose the scratch profile within the per node resource usage section. - Improve documentation of the profile counter classes. - Remove synchronization from StreamingSampler. - Remove a few pieces of dead code that otherwise would have required updates. - Factor some code for profile decoding into the Impala python library Testing: This change contains a unit test for the system level metrics collection and e2e tests for the profile changes. Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/CMakeLists.txt M be/src/util/periodic-counter-updater.cc M be/src/util/periodic-counter-updater.h M be/src/util/pretty-printer.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/streaming-sampler.h A be/src/util/system-state-info-test.cc A be/src/util/system-state-info.cc A be/src/util/system-state-info.h M bin/parse-thrift-profile.py M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Metrics.thrift M common/thrift/RuntimeProfile.thrift A lib/python/impala_py_lib/profiles.py M tests/beeswax/impala_beeswax.py M tests/common/impala_test_suite.py M tests/query_test/test_observability.py 36 files changed, 1,155 insertions(+), 243 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/12069/14 -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 14 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1896/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Sat, 26 Jan 2019 02:28:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7450. Set thread name during refresh/load operations
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11228 ) Change subject: IMPALA-7450. Set thread name during refresh/load operations .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1899/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/11228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c Gerrit-Change-Number: 11228 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 26 Jan 2019 02:38:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7450. Set thread name during refresh/load operations
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11228 ) Change subject: IMPALA-7450. Set thread name during refresh/load operations .. Patch Set 2: (5 comments) Applied code review comments. Added a test case. Bharath, can you give this a review and see if it is ready to go? http://gerrit.cloudera.org:8080/#/c/11228/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11228/1//COMMIT_MSG@25 PS1, Line 25: by jstacking a catalogd while performing some workload. Also added a > k, I'll add a simple test. I think just doing it single-threaded and assert Created an automated, synchronized test. http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/catalog/TableLoader.java File fe/src/main/java/org/apache/impala/catalog/TableLoader.java: http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@105 PS1, Line 105: sw.elapsedTime(TimeUnit.MILLISECONDS) > Can this be rolled into ThreadNameAnnotator (or InstrumentedScope) and log Could, but requires passing the logger into the annotator so the the log message is attributed to the proper class. Also, would have to pass in the log text. Actually, seems simpler to do it as shown here: the worker thread has much more context than the annotator does. http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java File fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java: http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java@26 PS1, Line 26: * reading a jstack. For example, when making calls to external services which might > It's obvious, but perhaps mention that this isn't thread-safe. Added a bit more explanation. http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java@51 PS1, Line 51: thr_.setName(newName_); > nit: I think this could fit on one line? Done http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java@51 PS1, Line 51: .setName(newName_); > When is this possible? (Thread.currentThread() != thr_) Same question. Since the annotator sits in the executing thread itself (not outside), I can't think of a way that the thread can replace itself. In fact, the only way this could happen is if someone made the mistake of setting the thread name in one thread, and calling close in another. Changed to use Preconditions to check for this error case. -- To view, visit http://gerrit.cloudera.org:8080/11228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c Gerrit-Change-Number: 11228 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 26 Jan 2019 02:10:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12118 ) Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate .. Patch Set 25: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1897/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 25 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 26 Jan 2019 02:17:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7450. Set thread name during refresh/load operations
Paul Rogers has uploaded a new patch set (#2) to the change originally created by Todd Lipcon. ( http://gerrit.cloudera.org:8080/11228 ) Change subject: IMPALA-7450. Set thread name during refresh/load operations .. IMPALA-7450. Set thread name during refresh/load operations This adds a small utility class for annotating the current thread's name during potentially long-running operations such as refresh/load. With this change, jstack output now includes useful thread names like: During startup: "main [invalidating metadata - 128/428 dbs complete]" While loading a fresh table: "pool-4-thread-12 [Loading metadata for: foo_db.foo_table] [Loading metadata for all partition(s) of foo_db.foo_table]" Pool refreshing metadata for a particular path: "pool-23-thread-5 [Refreshing file metadata for path: hdfs://nameservice1/path/to/partdir..." This patch is tricky to automate tests for, but I verified it manually by jstacking a catalogd while performing some workload. Also added a simple unit test to verify the thread renaming behavior. Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/TableLoader.java A fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java A fe/src/test/java/org/apache/impala/util/ThreadNameAnnotatorTest.java 5 files changed, 245 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11228/2 -- To view, visit http://gerrit.cloudera.org:8080/11228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c Gerrit-Change-Number: 11228 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12277 ) Change subject: IMPALA-8062: Call impala-config in single_node_perf_run .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG@11 PS1, Line 11: variables are set properly. How did you test? http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py File bin/single_node_perf_run.py: http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py@84 PS1, Line 84: cmd = " ".join(cmd) Might not be a major issue in practice, but it would be better run to run the tokens through pipes.quote() to make sure that it doesn't get re-tokenized by the shell in the wrong way. -- To view, visit http://gerrit.cloudera.org:8080/12277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba Gerrit-Change-Number: 12277 Gerrit-PatchSet: 1 Gerrit-Owner: Jim Apple Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 01:48:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run
Jim Apple has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12277 Change subject: IMPALA-8062: Call impala-config in single_node_perf_run .. IMPALA-8062: Call impala-config in single_node_perf_run This wraps most shell calls in single_node_perf_run.py with a bash shell that first sources impala-config.sh, to make sure environment variables are set properly. Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba --- M bin/single_node_perf_run.py 1 file changed, 19 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/12277/1 -- To view, visit http://gerrit.cloudera.org:8080/12277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba Gerrit-Change-Number: 12277 Gerrit-PatchSet: 1 Gerrit-Owner: Jim Apple
[Impala-ASF-CR] [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 3: I talked to Pooja about this offline and I feel like, if we can't find a good way to reliably test this automatically, it would be better to merge it and fix a known and potentially severe problem for 3.2 than to hold it back for longer. -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 3 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 01:27:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12257 ) Change subject: IMPALA-8097: mt_dop for all queries via hidden flag .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1895/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Gerrit-Change-Number: 12257 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 01:42:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12118 ) Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate .. Patch Set 25: (4 comments) http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java: http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@426 PS25, Line 426: + "invalidated since it does not exist anymore", getFullyQualifiedTblName())); line too long (94 > 90) http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java: http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@481 PS25, Line 481: if (eventsProcessor_.getStatus() != EventProcessorStatus.ACTIVE) eventsProcessor_.start(); line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@725 PS25, Line 725: private void createDatabaseFromImpala(String dbName, String desc) throws ImpalaException { line too long (92 > 90) http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@751 PS25, Line 751: private void createTableFromImpala(String tblName, boolean isPartitioned) throws ImpalaException { line too long (100 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 25 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Sat, 26 Jan 2019 01:41:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate
Vihang Karajgaonkar has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/12118 ) Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate .. IMPALA-7970 : Add support for metastore event based automatic invalidate This change adds support to CatalogD to poll metastore events to issue invalidate on tables automatically. It adds basic infrastructure to poll HMS notifications events at a configurable frequency using a backend config called hms_event_polling_interval_s flag. Currently, it issues invalidate at tables when it received alter events on table and partitions. It also adds tables/databases and removes tables from catalogD when it receives create_table/create_database and drop_table/drop_database events. The default value of hms_event_polling_interval_s is 0 which disables the feature. A non-zero value in seconds of this configuration can be used to enable the feature and set the polling frequency. In order to process each event atomically, this feature relies on version lock in CatalogServiceCatalog. It adds new methods in CatalogServiceCatalog which takes a write lock on version so that readers are blocked until the catalog state is updated based on the events. In case of processing events, the metastore operation is already completed and only catalog state needs to be updated. Hence we do not need to make new metastore calls while processing the events and only version lock is sufficient to serialize updates to the catalog objects based on events. This locking protocol is similar to what is done in case of DDL processing in CatalogOpExecutor except it does not need to take metastoreDdlLock since no metastore operations are needed during event processing. The change also adds a new test class to test the basic functionality for each of the event type which is supported. Note that this feature is still a work in progress and additional improvements will be done in subsequent patches. By default the feature is turned off. Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 --- M be/src/common/global-flags.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java A fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java A fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java A fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java A fe/src/main/java/org/apache/impala/catalog/events/MetastoreNotificationException.java A fe/src/main/java/org/apache/impala/catalog/events/MetastoreNotificationFetchException.java A fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java A fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java M fe/src/test/resources/postgresql-hive-site.xml.template 14 files changed, 2,458 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/12118/25 -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 25 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 ) Change subject: IMPALA-6503: Support reading complex types from ORC format files .. Patch Set 9: (10 comments) I looked through the headers and test files initially. I think this makes sense at a high level. I haven't looked at the .cc files in parallel - I think Zoltan will probably get to that before me. http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.h@175 PS9, Line 175: std::list selected_type_ids_; Mention that RowReaderOptions.includeTypes() expects a list (otherwise it's confusing why this is not a vector). Actually it's still confusing why it's not a vector but at least we can blame the ORC library for that decision :) http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@301 PS9, Line 301: VLOG_QUERY << "Add ORC column " << node->getMaximumColumnId() << " for empty tuple " This logging will be too verbose for most purposes - I'd suggest making it VLOG(3) or removing it. http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@363 PS9, Line 363: for (uint64_t id : selected_type_ids) nit: braces for multi-line if http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@395 PS9, Line 395: selected_type_ids_.size() It doesn't really matter here, but with gcc4.9.2 list::size() is actually O(n). We've been burned by that before. http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@680 PS9, Line 680: for (int i = 0; i < total_tuples; ++i) Use braces for multi-line if http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.h@19 PS9, Line 19: #ifndef IMPALA_EXEC_ORC_COLUMN_READERS_H We started using "#pragma once" instead of the traditional include guards in many places - it would be nice to switch. http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.h@61 PS9, Line 61: virtual void UpdateInputBatch(orc::ColumnVectorBatch* orc_batch) = 0; Maybe mention the relationship with ReadValue() - do you need to call it before calling ReadValue()? Maybe it would help to document how callers should use this in a class comment for OrcColumnReader? http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.cc@330 PS9, Line 330: for (OrcColumnReader* child : children_) Need braces around multi-line for loop - here and elsewhere http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.cc@331 PS9, Line 331: RETURN_IF_ERROR(child->ReadValue(row_idx_, tuple, pool)); This probably performs similarly to the previous code with the switch, but ultimately to make this performance we'd want to do batched calls that read multiple values in one try. We changed the Parquet reader to do things that way a while back and it makes it faster automatically and unlocks more optimisation opportunities. Nothing to fix here, just sharing http://gerrit.cloudera.org:8080/#/c/12168/9/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: http://gerrit.cloudera.org:8080/#/c/12168/9/tests/query_test/test_nested_types.py@128 PS9, Line 128: # TODO(IMPALA-6505): support predicates push down on ORC stripe statistics Maybe move this to a string argument to skip() -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 9 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sat, 26 Jan 2019 01:19:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1894/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 01:13:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Andrew Sherman has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. IMPALA-7985: Port RemoteShutdown() to KRPC. The :shutdown command is used to shutdown a remote server. The common case is that a user specifies the impalad to shutdown by specifying a host e.g. :shutdown('host100'). If a user has more than one impalad on a remote host then the form :shutdown(':') can be used to specify the port by which the imapald can be contacted. Prior to IMPALA-7985 this port was the backend port, e.g. :shutdown('host100:22000'). With IMPALA-7985 the port to use is the KRPC port, e.g. :shutdown('host100:27000'). Shutdown is implemented by making an rpc call to the target impalad. This changes the implementation of this call to use KRPC. To aid the user in finding the KRPC port, the KRPC address is added to the /backends section of the debug web page. We attempt to detect the case where :shutdown is pointed at a thrift port (like the backend port) and print an informative message. Documentation of this change will be done in IMPALA-8098. For discussion of why it was chosen to implement this change in an incompatible way, see comments in https://issues.apache.org/jira/browse/IMPALA-7985. TESTING Ran all end-to-end tests. Add a test for /backends to test_web_pages.py. In test_restart_services.py add a call to the old backend port to the test. Some expected error messages were changed in line with what KRPC returns. Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e --- M be/src/runtime/backend-client.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/service/client-request-state.cc M be/src/service/control-service.cc M be/src/service/control-service.h M be/src/service/impala-http-handler.cc M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M be/src/service/impala-server.cc M be/src/service/impala-server.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift M tests/custom_cluster/test_restart_services.py M tests/webserver/test_web_pages.py M www/backends.tmpl 16 files changed, 220 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12260/2 -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 1: (2 comments) Thanks for comments http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@70 PS1, Line 70: virtual void RemoteShutdown(const class RemoteShutdownParamsPB* req, > Brief comment Done http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@81 PS1, Line 81: Rrpc > typo? I had to stare to see this, thanks -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Sat, 26 Jan 2019 00:54:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 13: (5 comments) My main outstanding concerns are the ones that Michael raised. Another pass revealed a few more questions. http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc@789 PS13, Line 789: // TODO: Move to host profiles? This seems like the right thing to do. Maybe file a JIRA to track? We could also simplify BackendState::ComputeResourceUtilization() to just use the per-backend counters instead of iterating over fragments. I think there may be some compatibility concerns about removing these - existence of the counters isn't contractual but we don't want to break useful tools if avoidable. For example, I confirmed that Cloudera Manager actually does parse the existing strings (which is a little sad, but understandable given the lack of other counters). http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc@151 PS13, Line 151: CpuUserPercentage Should we add a System prefix to these or similar? Otherwise it could be interpreted as the percentage of cpu used by this query. http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc@230 PS13, Line 230: host_profile_, query_id(), query_options().scratch_limit)); I'm actually super-happy that we get these counters now. http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419 PS11, Line 419: Samples : /// are not re-sampled into larger intervals, instead owners of this profile can call : /// ClearChunkedTimeSeriesCounters() to reset the sample buffers of all chunked time : /// series counters > Yes. If you think that's a problem, we can cap the number of samples at som I hadn't thought about this, but I think it's worth doing, just so that we can reason about the behaviour better. http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py File tests/observability/test_plot_profile_resource_usage.py: http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@35 PS11, Line 35: > I agree that this should be done through Impyla, but it doesn't expose the I don't want to hold back good work, I can live with this I think. -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 13 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 00:48:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1893/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 00:44:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12262 to look at the new patch set (#5). Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. IMPALA-7941: part 2/2: use cgroups memory limit This uses the functionality from part 1 to detect the CGroups memory limit and use it to set a lower process memory limit if needed. min(system memory, cgroups memory limit) is used instead of system memory to determine the memory limit. Behaviour of processes without a memory limit set via CGroups is unchanged. The default behaviour of using 80% of the memory limit detected is still in effect. This seems like an OK default, but may lead to some amount of wasted memory. Modify containers to have a default JVM heap size of 2GB and --mem_limit_includes_jvm, so that the automatically configured memory limit makes more sense. start-impala-cluster.py is modified to exercise all of this. Testing: Tested a containerised cluster manually on my system, which has 32GB of RAM. Here's the breakdown from the memz/ page showing the JVM heap and auto-configured memory limit. Process: Limit=7.31 GB Total=1.94 GB Peak=1.94 GB JVM: max heap size: Total=1.78 GB JVM: non-heap committed: Total=35.56 MB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused Reservation: Total=0 Control Service Queue: Limit=50.00 MB Total=0 Peak=0 Data Stream Service Queue: Limit=374.27 MB Total=0 Peak=0 Data Stream Manager Early RPCs: Total=0 Peak=0 TCMalloc Overhead: Total=12.20 MB Untracked Memory: Total=121.31 MB Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M bin/start-impala-cluster.py M docker/coord_exec/Dockerfile M docker/coordinator/Dockerfile M docker/daemon_entrypoint.sh M docker/executor/Dockerfile 7 files changed, 126 insertions(+), 75 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/12262/5 -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3677/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 00:03:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 00:03:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12257 ) Change subject: IMPALA-8097: mt_dop for all queries via hidden flag .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG@13 PS3, Line 13: because these are not executable. > Would you mind elaborating for latter references ? Done http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc@206 PS3, Line 206: && state->query_options().mt_dop == 0 > Seems better to do this check before trying to acquire the thread token. Ot Done http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc@a108 PS3, Line 108: > Do you recall what the limitation was ? This check got added in a large patch with a bunch of other stuff: http://gerrit.cloudera.org:8080/4853 I think it was partially because it wasn't tested and nobody had checked that the logic generalised correctly to multiple finstances per backend. Maybe there was also a desire to do optimisations like local aggregation of filters before turning it on. It's a little weird though because we didn't allow joins for mt plans *anyway*, so there can't possibly be an mt plan with runtime filters. Also it's not ideal to toggle the query option as this point rather than during plan generation. http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@161 PS3, Line 161:* - MT_DOP > 0 is not supported for plans with base table joins or table sinks. :* Throws a NotImplementedException if plan validation fails. > Comments need updating Done -- To view, visit http://gerrit.cloudera.org:8080/12257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Gerrit-Change-Number: 12257 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 26 Jan 2019 00:01:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag
Hello Michael Ho, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12257 to look at the new patch set (#4). Change subject: IMPALA-8097: mt_dop for all queries via hidden flag .. IMPALA-8097: mt_dop for all queries via hidden flag --unlock_mt_dop=true unlocks mt_dop for all queries including joins and inserts. This disables the parallel plans with separate join builds when running standalone, because these are not executable until IMPALA-4224 is implemented. Inserts work without modification - they were disabled because of lack of testing and the possibility for generating many small files with unpartitioned inserts - see IMPALA-8125. Testing: Add custom cluster test that exercise joins, runtime filters and inserts as a sanity check for the flag. Ran exhaustive build. Manually ran TPC-H and TPC-DS tests against a minicluster with mt_dop = 4. Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 --- M be/src/common/global-flags.cc M be/src/exec/blocking-join-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/joins_mt_dop.test A tests/custom_cluster/test_mt_dop.py 12 files changed, 121 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/12257/4 -- To view, visit http://gerrit.cloudera.org:8080/12257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Gerrit-Change-Number: 12257 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 23:52:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@357 PS3, Line 357: ot > nit: to Done http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@405 PS3, Line 405: --mem_limit=" : << PrettyPrinter::PrintBytes(*bytes_limit); > this log line can be confusing to mean that the --mem_limit was set to byte I changed it to be a bit more verbose - print the bytes limit and then the literal --mem_limit string in parentheses to explain where it came from. I0125 15:48:19.728356 31957 exec-env.cc:405] Using process memory limit: 7.31 GB (--mem_limit=7848997273) -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 23:49:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 5: PS4 accidentally included a change I didn't mean to include. -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 23:50:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12244 ) Change subject: IMPALA-8092: Add an admission controller debug page .. Patch Set 3: (4 comments) The code is looking good to me. Main request is testing (which I missed first time around). Does Pooja intend to review this too? http://gerrit.cloudera.org:8080/#/c/12244/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12244/3//COMMIT_MSG@15 PS3, Line 15: - Histogram of the distribution of peak memory used by queries admitted Another thought: an important bit of the AC state is the memory available , admitted and reserved on each host - can we expose this in this patch on in a follow-on patch? http://gerrit.cloudera.org:8080/#/c/12244/3//COMMIT_MSG@21 PS3, Line 21: Missed this first time around, but can we add a test to test_web_pages that loads the /admission webpage (maybe also with a request_pool parameter and also invokes the refresh). It would be good to have just as a sanity check. It also wouldn't be a bad idea to do a manual stress test just to try and flush out any races or crashes - e.g. run a workload against a minicluster and have a script that loads the page in a loop. I did manually look through the code for anything suspicious and it seems fine, but best to be sure. http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/scheduling/admission-controller.cc@1095 PS3, Line 1095: using namespace rapidjson; Why not include the namespace for the whole function? http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/service/impala-http-handler.cc@879 PS3, Line 879: form from -- To view, visit http://gerrit.cloudera.org:8080/12244 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766 Gerrit-Change-Number: 12244 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 23:41:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12262 to look at the new patch set (#4). Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. IMPALA-7941: part 2/2: use cgroups memory limit This uses the functionality from part 1 to detect the CGroups memory limit and use it to set a lower process memory limit if needed. min(system memory, cgroups memory limit) is used instead of system memory to determine the memory limit. Behaviour of processes without a memory limit set via CGroups is unchanged. The default behaviour of using 80% of the memory limit detected is still in effect. This seems like an OK default, but may lead to some amount of wasted memory. Modify containers to have a default JVM heap size of 2GB and --mem_limit_includes_jvm, so that the automatically configured memory limit makes more sense. start-impala-cluster.py is modified to exercise all of this. Testing: Tested a containerised cluster manually on my system, which has 32GB of RAM. Here's the breakdown from the memz/ page showing the JVM heap and auto-configured memory limit. Process: Limit=7.31 GB Total=1.94 GB Peak=1.94 GB JVM: max heap size: Total=1.78 GB JVM: non-heap committed: Total=35.56 MB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused Reservation: Total=0 Control Service Queue: Limit=50.00 MB Total=0 Peak=0 Data Stream Service Queue: Limit=374.27 MB Total=0 Peak=0 Data Stream Manager Early RPCs: Total=0 Peak=0 TCMalloc Overhead: Total=12.20 MB Untracked Memory: Total=121.31 MB Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 --- M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M bin/start-impala-cluster.py M docker/coord_exec/Dockerfile M docker/coordinator/Dockerfile M docker/daemon_entrypoint.sh M docker/executor/Dockerfile M www/bootstrap/css/bootstrap.css M www/bootstrap/css/bootstrap.min.css 9 files changed, 133 insertions(+), 77 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/12262/4 -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12273 ) Change subject: IMPALA-8114: Deflake test_breakpad.py .. IMPALA-8114: Deflake test_breakpad.py A test failed recently in a private build and it looked like the loop in wait_for_num_processes had terminated to early. To make sure that the forked of processes that write the minidumps have actually started, we now sleep for 1 second before entering the wait loop. Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b Reviewed-on: http://gerrit.cloudera.org:8080/12273 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M tests/custom_cluster/test_breakpad.py 1 file changed, 4 insertions(+), 1 deletion(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/12273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b Gerrit-Change-Number: 12273 Gerrit-PatchSet: 3 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 23:41:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12273 ) Change subject: IMPALA-8114: Deflake test_breakpad.py .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b Gerrit-Change-Number: 12273 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 23:41:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 3: Code-Review+1 (2 comments) Looks good to me. I talked to Bikram, and he's going to finish the review. http://gerrit.cloudera.org:8080/#/c/12262/2/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/12262/2/be/src/runtime/exec-env.cc@364 PS2, Line 364: if (use_commit_limit) { : avail_mem = MemInfo::commit_limit(); > I pulled out this logic into a function since it was getting unwieldy and a Makes sense, that is much cleaner. http://gerrit.cloudera.org:8080/#/c/12262/2/docker/daemon_entrypoint.sh File docker/daemon_entrypoint.sh: http://gerrit.cloudera.org:8080/#/c/12262/2/docker/daemon_entrypoint.sh@45 PS2, Line 45: export JAVA_TOOL_OPTIONS="-Xmx2g $JAVA_TOOL_OPTIONS" > I did see that, but I didn't want to embed test-specific logic into the con Yeah, I figured that this was already on our TODO list somewhere. -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 23:08:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1892/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 13 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 22:48:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7565: Set TAcceptQueueServer connection setup pool to be multi-threaded by default
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12249 ) Change subject: IMPALA-7565: Set TAcceptQueueServer connection_setup_pool to be multi-threaded by default .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12249 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I053120d4c3153ddbe5261acd28388be6cd191908 Gerrit-Change-Number: 12249 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 23:00:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py File tests/observability/test_plot_profile_resource_usage.py: http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@35 PS11, Line 35: > I kind-of think we should just fix it. I agree that this should be done through Impyla, but it doesn't expose the profile format yet. This test is gone in the latest PS, since it seemed easier to add the plotting tool in a subsequent change. The remaining tests still call get_thrift_profile but that code has been there before. Would you prefer to hold this change until we updated Impyla, switch to using ImpalaHiveServer2Service directly, or move forward and update impyla before adding the plotting tool back? -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 13 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 21:57:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12257 ) Change subject: IMPALA-8097: mt_dop for all queries via hidden flag .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG@13 PS3, Line 13: because these are not executable. Would you mind elaborating for latter references ? http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc File be/src/exec/blocking-join-node.cc: http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc@206 PS3, Line 206: && state->query_options().mt_dop == 0 Seems better to do this check before trying to acquire the thread token. Otherwise, we would leak the token. http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc@a108 PS3, Line 108: Do you recall what the limitation was ? http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@161 PS3, Line 161:* - MT_DOP > 0 is not supported for plans with base table joins or table sinks. :* Throws a NotImplementedException if plan validation fails. Comments need updating -- To view, visit http://gerrit.cloudera.org:8080/12257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15 Gerrit-Change-Number: 12257 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Fri, 25 Jan 2019 21:55:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Hello Michael Ho, Philip Zeyliger, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12069 to look at the new patch set (#13). Change subject: IMPALA-7694: Add host resource usage metrics to profile .. IMPALA-7694: Add host resource usage metrics to profile This change adds a mechanism to collect host resource usage metrics to profiles. Metric collection can be controlled through a new query option 'RESOURCE_TRACE_RATIO'. It specifies the probability with which metrics collection will be enabled. Collection always happens per query for all executors that run one or more fragment instances of the query. This mechanism adds a new time series counter class that collects all measured values and does not re-sample them. It will re-sample values when printing them into a string profile, preserving up to 64 values, but Thrift profiles will contain the full list of values. We add a new section "Per Node Profiles" to the profile to store and show these values: Per Node Profiles: lv-desktop:22000: CpuIoWaitPercentage (500.000ms): 0, 0 CpuSysPercentage (500.000ms): 1, 1 CpuUserPercentage (500.000ms): 4, 0 - ScratchBytesRead: 0 - ScratchBytesWritten: 0 - ScratchFileUsedBytes: 0 - ScratchReads: 0 (0) - ScratchWrites: 0 (0) - TotalEncryptionTime: 0.000ns - TotalReadBlockTime: 0.000ns This change also uses the aforementioned mechanism to collect CPU usage metrics (user, system, and IO wait time). A future change can then add a tool to decode a Thrift profile and plot the contained usage metrics, e.g. using matplotlib (IMPALA-8123). Such a tool is not included in this change because it will require some reworking of the python dependencies. This change also includes a few minor improvements to make the resulting code more readable: - Extend the PeriodicCounterUpdater to call functions to update global metrics before updating the counters. - Expose the scratch profile within the per node resource usage section. - Improve documentation of the profile counter classes. - Remove synchronization from StreamingSampler. - Remove a few pieces of dead code that otherwise would have required updates. - Factor some code for profile decoding into the Impala python library Testing: This change contains a unit test for the system level metrics collection and e2e tests for the profile changes. Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.cc M be/src/service/impala-server.cc M be/src/service/impala-server.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/CMakeLists.txt M be/src/util/periodic-counter-updater.cc M be/src/util/periodic-counter-updater.h M be/src/util/pretty-printer.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/streaming-sampler.h A be/src/util/system-state-info-test.cc A be/src/util/system-state-info.cc A be/src/util/system-state-info.h M bin/parse-thrift-profile.py M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/Metrics.thrift M common/thrift/RuntimeProfile.thrift A lib/python/impala_py_lib/profiles.py M tests/beeswax/impala_beeswax.py M tests/common/impala_service.py M tests/common/impala_test_suite.py M tests/query_test/test_observability.py 37 files changed, 1,117 insertions(+), 260 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/12069/13 -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 13 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 3: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@357 PS3, Line 357: ot nit: to http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@405 PS3, Line 405: --mem_limit=" : << PrettyPrinter::PrintBytes(*bytes_limit); this log line can be confusing to mean that the --mem_limit was set to bytes_limit. eg. --mem_limit=10GB, avail_mem = 7GB, so bytes_limit will be 7GB but the log line will say: "Using process memory limit from --mem_limit=7GB" -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 21:09:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 20:45:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. IMPALA-8058: Fallback for HBase key scan range estimation Impala supports "pushing" of HBase key range predicates to HBase so that Impala reads only rows within the target key range. The planner estimates the cardinality of such scans by sampling the rows within the range. However, we have seen cases where sampling returns rows for unknown reasons. The planner then ends up without a good cardinality estimate. (Specifically, the code does a division by zero and produces a huge estimate. See the ticket for details.) Impala appears to use the sampling strategy to compute cardinality because HBase uses generally do not gather table stats. The resulting estimates are often off by 2x or more. This is a problem in tests as it causes cardinality numbers to vary greatly from the expected values. Fortunately, tests do gather HMS stats. There may be cases where users do as well. This fix exploits that fact. This fix: * Creates a fall-back strategy that uses table cardinality from HMS and the selectivity of the key predicates to estimate cardinality when the sampling approach fails. * The fall-back strategy requires tracking the predicates used for HBase keys so that their selectivity can be applied during fall-back calculations. * Moved HBase key calculation out of the SingleNodePlanner into the HBase scan node as suggested by a "TO DO" in the code. Doing so simplified the new code. * In the spirit of IMPALA-7919, adds the key predicates to the HBase scan node in the EXPLAIN output. Testing: * Adds a query context option to disable the normal key sampling to force the use of the fall-back. Used for testing. * Adds a new set of HBase test cases that use the new feature to check plans with the fall-back approach. * Reran all existing tests. * Compared cardinality numbers for the two modes: sampling and HMS using the cardinality features of IMPALA-8021. The two approaches provide different results, but this is mostly due to the missing selectivity estimates for inequality operators. (That's a fix for another time.) Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Reviewed-on: http://gerrit.cloudera.org:8080/12192 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test 10 files changed, 511 insertions(+), 116 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/1891/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 19:56:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12273 ) Change subject: IMPALA-8114: Deflake test_breakpad.py .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b Gerrit-Change-Number: 12273 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 19:54:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12244 ) Change subject: IMPALA-8092: Add an admission controller debug page .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1890/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12244 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766 Gerrit-Change-Number: 12244 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 19:48:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12273 ) Change subject: IMPALA-8114: Deflake test_breakpad.py .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3676/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b Gerrit-Change-Number: 12273 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 19:42:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#5). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_issuances_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change adn 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M
[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12248 ) Change subject: IMPALA-8095: Detailed expression cardinality tests .. Patch Set 5: (5 comments) Just some clarifying questions. Overall makes sense to me. Since this is test-only code, we can merge this and keep improving it as and when we find new bugs or need more fixture specific improvements for writing new tests. http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java: http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@41 PS5, Line 41: See : * also {@link ExprNdvTest}, {@link CardinalityTest}. There seems to be overlap in all of these. Should we consolidate them? Others might find it confusing while figuring out where to add new tests. http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@110 PS5, Line 110: Bug Does it make sense to tag the jiras for all the bugs here? I know you raised a bunch of them, probably we should maintain an epic (cardinality mis-estimates or something) and link them there. Many of them make good fe ramp-up tasks :-) ** I know that is a lot of work, don't want that to block this patch, just some thought. http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@111 PS5, Line 111: // 2 in the shell with SHOW COLUMN STATS alltypes : //verifyTableCol(allTypes, "year", 2, 0); : // Bug: Unit test in Eclipse see the above, unit tests run from the : // command line see the below. Disabling to avoid a flaky test, : // here and below Not super clear what is happening here, clarify may be? http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java: http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java@164 PS5, Line 164: After IMPALA-7659, Impala computes a null count, :* when gathering stats, but the NDV does not include nulls (except for Boolean :* columns) if stats are computed by IMpala, but does include nulls if stats are :* computed by Hive. Haha, this is indeed bizarre. http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java File fe/src/test/java/org/apache/impala/common/FrontendFixture.java: http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java@164 PS5, Line 164: protected void clearTestDbs() { : for (Db testDb: testDbs_) { : catalog_.removeDb(testDb.getName()); : } : } I did something similar in the testcase patch, except that this is backed by a temporary HMS created from scratch and we totally discard the HMS instance after test. That way we don't need to fake all the HMS table structures like in the methods below. -- To view, visit http://gerrit.cloudera.org:8080/12248 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401 Gerrit-Change-Number: 12248 Gerrit-PatchSet: 5 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 19:25:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 4: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/1889/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 19:21:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@78 PS5, Line 78: if (!(ranges_issued_barrier_.pending() != 0) && initial_ranges_issued_ && progress_.done()) { line too long (95 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 19:16:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/12244 ) Change subject: IMPALA-8092: Add an admission controller debug page .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl File www/admission_controller.tmpl: http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl@283 PS1, Line 283: > This came up because I started running a stress test and was looking at the Done. added the total admitted/rejected/timed out metrics -- To view, visit http://gerrit.cloudera.org:8080/12244 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766 Gerrit-Change-Number: 12244 Gerrit-PatchSet: 2 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 19:09:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page
Hello Pooja Nilangekar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12244 to look at the new patch set (#3). Change subject: IMPALA-8092: Add an admission controller debug page .. IMPALA-8092: Add an admission controller debug page This patch adds a new debug page "admission" that provides the following details about resource pools: - Pool configuration - Relevant pool stats - Queued Queries in order of being queued (local to the coordinator) - Running Queries (local to this coordinator) - Histogram of the distribution of peak memory used by queries admitted to the pool The aforementioned details can also be viewed for a single resource pool using a search string and are also available as a JSON object from the same http endpoint. Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766 --- M be/src/catalog/catalog-server.cc M be/src/rpc/rpc-trace.cc M be/src/runtime/coordinator.cc M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/service/impala-http-handler.cc M be/src/service/impala-http-handler.h M be/src/statestore/statestore.cc M be/src/util/default-path-handlers.cc M be/src/util/logging-support.cc M be/src/util/metrics.cc M be/src/util/thread.cc M be/src/util/webserver-test.cc M be/src/util/webserver.h A www/admission_controller.tmpl 15 files changed, 734 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/12244/3 -- To view, visit http://gerrit.cloudera.org:8080/12244 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766 Gerrit-Change-Number: 12244 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc@452 PS3, Line 452: SCOPED_CLEANUP({ UpdateRemainingScanRangeIssuances(-1); }); > We have MakeScopeExitTrigger in Impala, but it seems like we could just swi Seems better to be consistent, so I switched to MakeScopeExitTrigger. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 19:05:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/12097/4/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/4/be/src/exec/hdfs-scan-node.cc@78 PS4, Line 78: if (!(ranges_issued_barrier_.pending() != 0) && initial_ranges_issued_ && progress_.done()) { line too long (95 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 4 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 18:51:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#4). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_issuances_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change adn 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M
[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12260 ) Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@70 PS1, Line 70: virtual void RemoteShutdown(const class RemoteShutdownParamsPB* req, Brief comment http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@81 PS1, Line 81: Rrpc typo? I guess it was already this way, I just missed it in the last review. -- To view, visit http://gerrit.cloudera.org:8080/12260 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e Gerrit-Change-Number: 12260 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Fri, 25 Jan 2019 18:23:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12179 ) Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu table .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1888/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081 Gerrit-Change-Number: 12179 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 25 Jan 2019 18:21:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12244 ) Change subject: IMPALA-8092: Add an admission controller debug page .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl File www/admission_controller.tmpl: http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl@283 PS1, Line 283: Queries Currently Running > That might be difficult to get right with the current metrics since we dont This came up because I started running a stress test and was looking at the page to see if queries were actually getting submitted to the pool - and it was hard to tell whether the queries had actually run without switching back over to my stress test terminal. Maybe we could expose the admitted/rejected/timed out metrics? Since in aggregate that would show the number of queries that had "exited" the admission control. -- To view, visit http://gerrit.cloudera.org:8080/12244 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766 Gerrit-Change-Number: 12244 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 17:55:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12179 ) Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu table .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/12179/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/12179/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2419 PS2, Line 2419: if (!oldTbl.getKuduTableName().equals(newKuduTableName)) { > nit: we often structure these as "if (condition) return;" to keep the inden Done http://gerrit.cloudera.org:8080/#/c/12179/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2423 PS2, Line 2423: oldMsTbl.getParameters().put(KuduTable.KEY_TABLE_NAME, newKuduTableName); > I noticed that the order changed but couldn't figure out why. Is there a pa You want to update the msTbl with the value of the new Kudu table, before passing the msTbl to validateKuduTblExists. validateKuduTblExists will take the name of the Kudu table from the msTbl using the key KuduTable.KEY_TABLE_NAME and use it to check if that Kudu table exists. However, I decided to just remove the call to validateKuduTblExists, I don't think its adding any value here because kuduClient.renameTable should guarantee the new table exists. So doing another check just adds an unnecessary RPC call to Kudu. The original run of the tests passed without this change because of IMPALA-8117. The fix for IMPALA-8117 requires some invasive changes to add proper unit testing, so I'm planning on doing it in a separate patch. -- To view, visit http://gerrit.cloudera.org:8080/12179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081 Gerrit-Change-Number: 12179 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 25 Jan 2019 17:44:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 ) Change subject: IMPALA-6503: Support reading complex types from ORC format files .. Patch Set 8: (12 comments) Started to look at it. Currently mostly have comments about style. I plan to look at it deeper. http://gerrit.cloudera.org:8080/#/c/12168/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12168/8//COMMIT_MSG@25 PS8, Line 25: Don’t like nit: 'They are not like', or 'They differ from' http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.h@160 PS8, Line 160: orc::RowReaderOptions row_reader_options; nit: put '_' at the end of the variable name. http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@181 PS8, Line 181: base nit: based http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@329 PS8, Line 329: *template_tuple = : Tuple::Create(tuple_desc.byte_size(), template_tuple_pool_.get()); nit: put curly braces around it http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@364 PS8, Line 364: if (id >= node.getColumnId() && id <= node.getMaximumColumnId()) return true; nit: put curly braces around it http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@681 PS8, Line 681: RETURN_IF_ERROR(AssembleCollection(*child_reader, child_batch_offset + i, : coll_value_builder)); nit: add curly braces around it http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.h File be/src/exec/orc-column-readers.h: http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.h@101 PS8, Line 101: batch_ = dynamic_cast(orc_batch); We avoid doing dynamic casts in release mode. You can do a dynamic_cast inside a DCHECK, then use static_cast here. http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.h@126 PS8, Line 126: int64_t val = batch_->data.data()[row_idx]; Maybe you could add a DCHECK that batch_ is not null. http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@190 PS8, Line 190: *slot = TimestampValue::FromUnixTimeNanos(secs, nanos, Do we know that orc timestamps are timezone-aware or not? http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@326 PS8, Line 326: children_[c]->UpdateInputBatch(batch_->fields[children_fields_[c]]); nit: add curly brackets around it since it is a multi-line for stmt. http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@331 PS8, Line 331: RETURN_IF_ERROR(child->ReadValue(row_idx_, tuple, pool)); nit: add curly brackets around it http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@395 PS8, Line 395: } else CreateChildForSlot(node, slot_desc); nit: it's against the Google style guide to have braces on only one branch of an if-stmt: https://google.github.io/styleguide/cppguide.html#Conditionals -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 25 Jan 2019 17:25:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table
Hello Lars Volker, Mike Percy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12179 to look at the new patch set (#4). Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu table .. IMPALA-7640: RENAME on managed Kudu table should rename Kudu table `ALTER TABLE managed_kudu_tbl RENAME TO new_managed_kudu_tbl` will now rename the underlying Kudu table from `managed_kudu_tbl` to `new_managed_kudu_tbl`. This is only triggered for managed tables, for external tables renames do not modify the underlying Kudu table. Testing: * Ran core tests * Updated kudu_alter.test Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 2 files changed, 52 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12179/4 -- To view, visit http://gerrit.cloudera.org:8080/12179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081 Gerrit-Change-Number: 12179 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3675/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12192 ) Change subject: IMPALA-8058: Fallback for HBase key scan range estimation .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce Gerrit-Change-Number: 12192 Gerrit-PatchSet: 3 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 25 Jan 2019 16:39:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12179 ) Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu table .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1887/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081 Gerrit-Change-Number: 12179 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Fri, 25 Jan 2019 15:48:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table
Hello Lars Volker, Mike Percy, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12179 to look at the new patch set (#3). Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu table .. IMPALA-7640: RENAME on managed Kudu table should rename Kudu table `ALTER TABLE managed_kudu_tbl RENAME TO new_managed_kudu_tbl` will now rename the underlying Kudu table from `managed_kudu_tbl` to `new_managed_kudu_tbl`. This is only triggered for managed tables, for external tables renames do not modify the underlying Kudu table. Testing: * Ran core tests * Updated kudu_alter.test Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081 --- M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test 2 files changed, 55 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12179/3 -- To view, visit http://gerrit.cloudera.org:8080/12179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081 Gerrit-Change-Number: 12179 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12262 ) Change subject: IMPALA-7941: part 2/2: use cgroups memory limit .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1886/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12262 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21 Gerrit-Change-Number: 12262 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 Jan 2019 08:33:38 + Gerrit-HasComments: No