[Impala-ASF-CR] IMPALA-7540. Intern most repetitive strings and network addresses in catalog

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11158 )

Change subject: IMPALA-7540. Intern most repetitive strings and network 
addresses in catalog
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1902/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26
Gerrit-Change-Number: 11158
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Sat, 26 Jan 2019 06:36:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7540. Intern most repetitive strings and network addresses in catalog

2019-01-25 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11158 )

Change subject: IMPALA-7540. Intern most repetitive strings and network 
addresses in catalog
..


Patch Set 3:

(12 comments)

Bharath, applied your review comments. Please take another look to see if this 
is ready to go.

http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java
File fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java:

http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@62
PS2, Line 62:*/
> nit: for each method, should we list the set of fields that we selectively
Documentation of this kind tends to get out of date quickly. Looking at the 
code, it seems clear enough what is being interned. Not sure that maintaining a 
list of fields is worth the effort.


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@88
PS2, Line 88:   public static void internFieldsInPlace(TTable table) {
> null check for table
Done


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@113
PS2, Line 113:*/
> null check for sd.
Done


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@131
PS2, Line 131:* Intern low-cardinality fields in the given FieldSchema.
> null check
Done


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@155
PS2, Line 155:  not like 'impala_%' group by PARAM_VALUE order by count(*) desc 
limit 100;
 :   //
 :   // In a large catalog from a production install, these 
represented about 68% of the
 :   // entries.
 :   String val = e.getValue();
 :   if (val.isEmpty() ||
 :   "-1".equals(val) ||
> Can we create a static HashSet out of these (lowercased) and use something
Looks like only five of the values would be in the table. The empty string and 
the prefix string would still be handled specially. Given only five entries, 
the above, though clunky, is simpler than, and probably performs as well as, a 
hash table.


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@172
PS2, Line 172: STRING_INTERNER.int
> Maybe move everything to guava interner to be consistent? Native interning
Done


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@174
PS2, Line 174:
> Preconditions.checkState(ret.size() == parameters.size())?
Done


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@188
PS2, Line 188:
> can we rename it to something like internString() that we can reuse for all
Done


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java@198
PS2, Line 198: if (value == null) return null;
> This looks risky if someone updates the thrift def for TNetworkAddress :-)
The risk is only if fields are added. If fields are removed, this code won't 
compile and will thus call attention to itself.
The note says that not all fields are stubbed out. So, yes, will lead to odd 
bugs if someone alters one of these after interning.

But, given the semantics of metadata, we probably should not be altering 
objects cached in metadata anyway because they are used by multiple planner 
threads.


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java
File fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java:

http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java@35
PS2, Line 35: columnQualifier
> just curious, any reason why not this?
According to this link: 
https://www.dummies.com/programming/big-data/hadoop/column-qualifiers-in-the-hbase-data-model/

"Unlike column families, column qualifiers can be virtually unlimited in 
content, length and number. If you omit the column qualifier, the HBase system 
will assign one for you. Printable characters are not required, so any type and 
number of bytes can be used to create a column qualifier."

So, looks like the qualifier is high cardinality, low reuse.


http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11158/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@853
PS2, Line 853: _ = id;
 : accessLevel_ = accessLevel;
> Shouldn't we do it after we remove the incremental stats ? (or) blacklist i
Done



[Impala-ASF-CR] IMPALA-7540. Intern most repetitive strings and network addresses in catalog

2019-01-25 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded a new patch set (#3) to the change originally created 
by Todd Lipcon. ( http://gerrit.cloudera.org:8080/11158 )

Change subject: IMPALA-7540. Intern most repetitive strings and network 
addresses in catalog
..

IMPALA-7540. Intern most repetitive strings and network addresses in catalog

This adds interning to a bunch of repeated strings in catalog objects,
including:
- table name
- DB name
- owner
- column names
- input/output formats
- parameter keys
- common parameter values ("true", "false", etc)
- HBase column family names

Additionally, it interns TNetworkAddresses, so that each datanode host
is only stored once rather than having its own copy in each table.

I verified this patch using jxray on the development catalogd and
impalad. The following lines are removed entirely from the "duplicate
strings" report:

 Overhead   # char[]s # objects  Value
 164K (0.3%) 2,635   2,635  "127.0.0.1"
 97K (0.2%)  1,038   1,038  "__HIVE_DEFAULT_PARTITION__"
 95K (0.2%)  1,111   1,111  "transient_lastDdlTime"
 92K (0.1%)  1,975   1,975  "d"
 70K (0.1%)  997 997"EXTERNAL_TABLE"
 56K (< 0.1%)1,201   1,201  "todd"
 54K (< 0.1%)998 998"EXTERNAL"
 46K (< 0.1%)998 998"TRUE"
 44K (< 0.1%)567 567"numFilesErasureCoded"
 38K (< 0.1%)612 612"totalSize"
 30K (< 0.1%)567 567"numFiles"

The following are reduced substantially:

Before: 72K (0.1%)  1,543   1,543  "1"
After:  47K (< 0.1%)1,009   1,009  "1"

A few large strings remain in the report that may be worth addressing, depending
on whether we think production catalogs exhibit the same repetitions:

1) Avro schemas, eg:
 204K (0.3%) 3   3  "{"fields": [{"type": ["boolean", "null"], 
"name": "bool_col1"}, {"type": ["int", "null"], "name": "tinyint_col1"}, 
{"type": ...[length 52429]"

(in the development catalog there are multiple tables with the same Avro
schema)

2) Partition location suffixes, eg:
 144K (0.2%) 1,234   1,234  "many_blocks_num_blocks_per_partition_1"
 17K (< 0.1%)230 230"year=2009/month=2"
 17K (< 0.1%)230 230"year=2009/month=3"
 17K (< 0.1%)230 230"year=2009/month=1"

(in the development catalog lots of tables have the same partitioning
layout)

3) Unsure (jxray isn't reporting the reference chain, but seems likely
   to be partition values):
 49K (< 0.1%)1,058   1,058  "2010"
 28K (< 0.1%)612 612"2009"
 27K (< 0.1%)585 585"0"
 22K (< 0.1%)71  899""

Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26
---
A fe/src/main/java/org/apache/impala/catalog/CatalogInterners.java
M fe/src/main/java/org/apache/impala/catalog/HBaseColumn.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
5 files changed, 250 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/11158/3
--
To view, visit http://gerrit.cloudera.org:8080/11158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3121aefa4391bcb1477d9dba0a49440d7000d26
Gerrit-Change-Number: 11158
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..

IMPALA-7941: part 2/2: use cgroups memory limit

This uses the functionality from part 1 to detect the CGroups memory
limit and use it to set a lower process memory limit if needed.
min(system memory, cgroups memory limit) is used instead of
system memory to determine the memory limit. Behaviour of
processes without a memory limit set via CGroups is unchanged.

The default behaviour of using 80% of the memory limit detected
is still in effect. This seems like an OK default, but may
lead to some amount of wasted memory.

Modify containers to have a default JVM heap size of 2GB and
--mem_limit_includes_jvm, so that the automatically configured
memory limit makes more sense.

start-impala-cluster.py is modified to exercise all of this.

Testing:
Tested a containerised cluster manually on my system, which has
32GB of RAM. Here's the breakdown from the memz/ page showing
the JVM heap and auto-configured memory limit.

  Process: Limit=7.31 GB Total=1.94 GB Peak=1.94 GB
JVM: max heap size: Total=1.78 GB
JVM: non-heap committed: Total=35.56 MB
Buffer Pool: Free Buffers: Total=0
Buffer Pool: Clean Pages: Total=0
Buffer Pool: Unused Reservation: Total=0
Control Service Queue: Limit=50.00 MB Total=0 Peak=0
Data Stream Service Queue: Limit=374.27 MB Total=0 Peak=0
Data Stream Manager Early RPCs: Total=0 Peak=0
TCMalloc Overhead: Total=12.20 MB
Untracked Memory: Total=121.31 MB

Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Reviewed-on: http://gerrit.cloudera.org:8080/12262
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M bin/start-impala-cluster.py
M docker/coord_exec/Dockerfile
M docker/coordinator/Dockerfile
M docker/daemon_entrypoint.sh
M docker/executor/Dockerfile
7 files changed, 126 insertions(+), 75 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 04:40:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465
PS5, Line 465: remaining_scan_range_issuances_
This feels a bit of a misnomer as parquet scanner will actually issue more scan 
ranges although it does so without calling AddDiskIoRanges().


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79
PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load();
: if (remaining != 0) {
:   LOG(INFO) << "XXX Assertion Failed " << remaining << " " << 
GetStackTrace();
: }
DCHECK_EQ(remaining_scan_range_issuances_.Load(), 0) << GetStackTrace();


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@212
PS5, Line 212: thread_state_.Close(this);
I wonder if you can DCHECK remaining_scan_range_issuances_ to 0 after 
thread_state_.Close() as it seems to wait for all scaner threads to exit. 
However, I can imagine that we can potentially reach this point due to 
cancellation or reaching the limit so not all header ranges for the sequence 
files may have been handled yet ?



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 26 Jan 2019 03:50:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12277 )

Change subject: IMPALA-8062: Call impala-config in single_node_perf_run
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1901/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
Gerrit-Change-Number: 12277
Gerrit-PatchSet: 2
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 03:45:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12069 )

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1900/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 14
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 03:32:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run

2019-01-25 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12277 )

Change subject: IMPALA-8062: Call impala-config in single_node_perf_run
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG@11
PS1, Line 11: variables are set properly.
> How did you test?
I just ran it on my development machine and checked that it completed as usual.


http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py
File bin/single_node_perf_run.py:

http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py@84
PS1, Line 84:   if type(cmd) is list:
> Might not be a major issue in practice, but it would be better run to run t
Done



--
To view, visit http://gerrit.cloudera.org:8080/12277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
Gerrit-Change-Number: 12277
Gerrit-PatchSet: 2
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 03:16:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run

2019-01-25 Thread Jim Apple (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12277

to look at the new patch set (#2).

Change subject: IMPALA-8062: Call impala-config in single_node_perf_run
..

IMPALA-8062: Call impala-config in single_node_perf_run

This wraps most shell calls in single_node_perf_run.py with a bash
shell that first sources impala-config.sh, to make sure environment
variables are set properly.

Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
---
M bin/single_node_perf_run.py
1 file changed, 20 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/12277/2
--
To view, visit http://gerrit.cloudera.org:8080/12277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
Gerrit-Change-Number: 12277
Gerrit-PatchSet: 2
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12277 )

Change subject: IMPALA-8062: Call impala-config in single_node_perf_run
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1898/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
Gerrit-Change-Number: 12277
Gerrit-PatchSet: 1
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 02:43:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-25 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12069 )

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..


Patch Set 13:

(3 comments)

Thanks for the review, please see my inline comments and PS14

http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc@789
PS13, Line 789:   // TODO: Move to host profiles?
> This seems like the right thing to do. Maybe file a JIRA to track? We could
Filed IMPALA-8126


http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc@151
PS13, Line 151: CpuUserPercentage
> Should we add a System prefix to these or similar? Otherwise it could be in
How about Host* or Node*? With Sys we have something like SysCpuSysPercentage 
or SystemCpuSysPercentage. I picked one but am happy to change it to something 
else, too.


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419
PS11, Line 419:  Samples
  :   /// are not re-sampled into larger intervals, instead owners 
of this profile can call
  :   /// ClearChunkedTimeSeriesCounters() to reset the sample 
buffers of all chunked time
  :   /// series counters
> I hadn't thought about this, but I think it's worth doing, just so that we
Done



--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 13
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 02:45:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-25 Thread Lars Volker (Code Review)
Hello Michael Ho, Philip Zeyliger, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12069

to look at the new patch set (#14).

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..

IMPALA-7694: Add host resource usage metrics to profile

This change adds a mechanism to collect host resource usage metrics to
profiles. Metric collection can be controlled through a new query option
'RESOURCE_TRACE_RATIO'. It specifies the probability with which metrics
collection will be enabled. Collection always happens per query for all
executors that run one or more fragment instances of the query.

This mechanism adds a new time series counter class that collects all
measured values and does not re-sample them. It will re-sample values
when printing them into a string profile, preserving up to 64 values,
but Thrift profiles will contain the full list of values.

We add a new section "Per Node Profiles" to the profile to store and
show these values:

Per Node Profiles:
  lv-desktop:22000:
CpuIoWaitPercentage (500.000ms): 0, 0
CpuSysPercentage (500.000ms): 1, 1
CpuUserPercentage (500.000ms): 4, 0
  - ScratchBytesRead: 0
  - ScratchBytesWritten: 0
  - ScratchFileUsedBytes: 0
  - ScratchReads: 0 (0)
  - ScratchWrites: 0 (0)
  - TotalEncryptionTime: 0.000ns
  - TotalReadBlockTime: 0.000ns

This change also uses the aforementioned mechanism to collect CPU usage
metrics (user, system, and IO wait time).

A future change can then add a tool to decode a Thrift profile and plot
the contained usage metrics, e.g. using matplotlib (IMPALA-8123). Such a
tool is not included in this change because it will require some
reworking of the python dependencies.

This change also includes a few minor improvements to make the resulting
code more readable:
- Extend the PeriodicCounterUpdater to call functions to update global
  metrics before updating the counters.
- Expose the scratch profile within the per node resource usage section.
- Improve documentation of the profile counter classes.
- Remove synchronization from StreamingSampler.
- Remove a few pieces of dead code that otherwise would have required
  updates.
- Factor some code for profile decoding into the Impala python library

Testing: This change contains a unit test for the system level metrics
collection and e2e tests for the profile changes.

Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/CMakeLists.txt
M be/src/util/periodic-counter-updater.cc
M be/src/util/periodic-counter-updater.h
M be/src/util/pretty-printer.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/streaming-sampler.h
A be/src/util/system-state-info-test.cc
A be/src/util/system-state-info.cc
A be/src/util/system-state-info.h
M bin/parse-thrift-profile.py
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Metrics.thrift
M common/thrift/RuntimeProfile.thrift
A lib/python/impala_py_lib/profiles.py
M tests/beeswax/impala_beeswax.py
M tests/common/impala_test_suite.py
M tests/query_test/test_observability.py
36 files changed, 1,155 insertions(+), 243 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/12069/14
-- 
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 14
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1896/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Sat, 26 Jan 2019 02:28:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7450. Set thread name during refresh/load operations

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11228 )

Change subject: IMPALA-7450. Set thread name during refresh/load operations
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1899/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/11228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c
Gerrit-Change-Number: 11228
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 26 Jan 2019 02:38:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7450. Set thread name during refresh/load operations

2019-01-25 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11228 )

Change subject: IMPALA-7450. Set thread name during refresh/load operations
..


Patch Set 2:

(5 comments)

Applied code review comments. Added a test case. Bharath, can you give this a 
review and see if it is ready to go?

http://gerrit.cloudera.org:8080/#/c/11228/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11228/1//COMMIT_MSG@25
PS1, Line 25: by jstacking a catalogd while performing some workload. Also 
added a
> k, I'll add a simple test. I think just doing it single-threaded and assert
Created an automated, synchronized test.


http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@105
PS1, Line 105: sw.elapsedTime(TimeUnit.MILLISECONDS)
> Can this be rolled into ThreadNameAnnotator (or InstrumentedScope) and log
Could, but requires passing the logger into the annotator so the the log 
message is attributed to the proper class. Also, would have to pass in the log 
text.

Actually, seems simpler to do it as shown here: the worker thread has much more 
context than the annotator does.


http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java
File fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java:

http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java@26
PS1, Line 26:  * reading a jstack. For example, when making calls to external 
services which might
> It's obvious, but perhaps mention that this isn't thread-safe.
Added a bit more explanation.


http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java@51
PS1, Line 51: thr_.setName(newName_);
> nit: I think this could fit on one line?
Done


http://gerrit.cloudera.org:8080/#/c/11228/1/fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java@51
PS1, Line 51: .setName(newName_);
> When is this possible? (Thread.currentThread() != thr_)
Same question. Since the annotator sits in the executing thread itself (not 
outside), I can't think of a way that the thread can replace itself. In fact, 
the only way this could happen is if someone made the mistake of setting the 
thread name in one thread, and calling close in another. Changed to use 
Preconditions to check for this error case.



--
To view, visit http://gerrit.cloudera.org:8080/11228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c
Gerrit-Change-Number: 11228
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sat, 26 Jan 2019 02:10:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
..


Patch Set 25:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1897/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 25
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 26 Jan 2019 02:17:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7450. Set thread name during refresh/load operations

2019-01-25 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded a new patch set (#2) to the change originally created 
by Todd Lipcon. ( http://gerrit.cloudera.org:8080/11228 )

Change subject: IMPALA-7450. Set thread name during refresh/load operations
..

IMPALA-7450. Set thread name during refresh/load operations

This adds a small utility class for annotating the current thread's name
during potentially long-running operations such as refresh/load. With
this change, jstack output now includes useful thread names like:

During startup:
  "main [invalidating metadata - 128/428 dbs complete]"

While loading a fresh table:
  "pool-4-thread-12 [Loading metadata for: foo_db.foo_table] [Loading
   metadata for all partition(s) of foo_db.foo_table]"

Pool refreshing metadata for a particular path:
  "pool-23-thread-5 [Refreshing file metadata for path:
   hdfs://nameservice1/path/to/partdir..."

This patch is tricky to automate tests for, but I verified it manually
by jstacking a catalogd while performing some workload. Also added a
simple unit test to verify the thread renaming behavior.

Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
A fe/src/main/java/org/apache/impala/util/ThreadNameAnnotator.java
A fe/src/test/java/org/apache/impala/util/ThreadNameAnnotatorTest.java
5 files changed, 245 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11228/2
--
To view, visit http://gerrit.cloudera.org:8080/11228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic7c850d6bb2eedc375ee567c19eb17add335f60c
Gerrit-Change-Number: 11228
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12277 )

Change subject: IMPALA-8062: Call impala-config in single_node_perf_run
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12277/1//COMMIT_MSG@11
PS1, Line 11: variables are set properly.
How did you test?


http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py
File bin/single_node_perf_run.py:

http://gerrit.cloudera.org:8080/#/c/12277/1/bin/single_node_perf_run.py@84
PS1, Line 84: cmd = " ".join(cmd)
Might not be a major issue in practice, but it would be better run to run the 
tokens through pipes.quote() to make sure that it doesn't get re-tokenized by 
the shell in the wrong way.



--
To view, visit http://gerrit.cloudera.org:8080/12277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
Gerrit-Change-Number: 12277
Gerrit-PatchSet: 1
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 01:48:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8062: Call impala-config in single node perf run

2019-01-25 Thread Jim Apple (Code Review)
Jim Apple has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12277


Change subject: IMPALA-8062: Call impala-config in single_node_perf_run
..

IMPALA-8062: Call impala-config in single_node_perf_run

This wraps most shell calls in single_node_perf_run.py with a bash
shell that first sources impala-config.sh, to make sure environment
variables are set properly.

Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
---
M bin/single_node_perf_run.py
1 file changed, 19 insertions(+), 11 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/12277/1
--
To view, visit http://gerrit.cloudera.org:8080/12277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic7c1b77906a975c37f3b51a0f900ed3536b398ba
Gerrit-Change-Number: 12277
Gerrit-PatchSet: 1
Gerrit-Owner: Jim Apple 


[Impala-ASF-CR] [WIP] IMPALA-6932: Speed up scans for sequence datasets with many files

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11517 )

Change subject: [WIP] IMPALA-6932: Speed up scans for sequence datasets with 
many files
..


Patch Set 3:

I talked to Pooja about this offline and I feel like, if we can't find a good 
way to reliably test this automatically, it would be better to merge it and fix 
a known and potentially severe problem for 3.2 than to hold it back for longer.


--
To view, visit http://gerrit.cloudera.org:8080/11517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Gerrit-Change-Number: 11517
Gerrit-PatchSet: 3
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 01:27:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12257 )

Change subject: IMPALA-8097: mt_dop for all queries via hidden flag
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1895/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Gerrit-Change-Number: 12257
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 01:42:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
..


Patch Set 25:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java:

http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java@426
PS25, Line 426: + "invalidated since it does not exist 
anymore", getFullyQualifiedTblName()));
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@481
PS25, Line 481:   if (eventsProcessor_.getStatus() != 
EventProcessorStatus.ACTIVE) eventsProcessor_.start();
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@725
PS25, Line 725:   private void createDatabaseFromImpala(String dbName, String 
desc) throws ImpalaException {
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/12118/25/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@751
PS25, Line 751:   private void createTableFromImpala(String tblName, boolean 
isPartitioned) throws ImpalaException {
line too long (100 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 25
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 26 Jan 2019 01:41:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate

2019-01-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#25). ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
..

IMPALA-7970 : Add support for metastore event based automatic invalidate

This change adds support to CatalogD to poll metastore events to issue
invalidate on tables automatically. It adds basic infrastructure to poll
HMS notifications events at a configurable frequency using a backend
config called hms_event_polling_interval_s flag. Currently, it issues
invalidate at tables when it received alter events on table and
partitions. It also adds tables/databases and removes tables from
catalogD when it receives create_table/create_database and
drop_table/drop_database events. The default value of
hms_event_polling_interval_s is 0 which disables the feature. A
non-zero value in seconds of this configuration can be used to enable
the feature and set the polling frequency.

In order to process each event atomically, this feature relies on
version lock in CatalogServiceCatalog. It adds new methods in
CatalogServiceCatalog which takes a write lock on version so that
readers are blocked until the catalog state is updated based on the
events. In case of processing events, the metastore operation is already
completed and only catalog state needs to be updated. Hence we do not
need to make new metastore calls while processing the events and only
version lock is sufficient to serialize updates to the catalog objects
based on events. This locking protocol is similar to what is done in
case of DDL processing in CatalogOpExecutor except it does not need to
take metastoreDdlLock since no metastore operations are needed during
event processing.

The change also adds a new test class to test the basic functionality
for each of the event type which is supported.

Note that this feature is still a work in progress and additional
improvements will be done in subsequent patches. By default the feature
is turned off.

Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/events/ExternalEventsProcessor.java
A fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventUtils.java
A 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreNotificationException.java
A 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreNotificationFetchException.java
A fe/src/main/java/org/apache/impala/catalog/events/NoOpEventProcessor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
A 
fe/src/test/java/org/apache/impala/catalog/events/SynchronousHMSEventProcessorForTests.java
M fe/src/test/resources/postgresql-hive-site.xml.template
14 files changed, 2,458 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/12118/25
--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 25
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..


Patch Set 9:

(10 comments)

I looked through the headers and test files initially. I think this makes sense 
at a high level. I haven't looked at the .cc files in parallel - I think Zoltan 
will probably get to that before me.

http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.h@175
PS9, Line 175:   std::list selected_type_ids_;
Mention that RowReaderOptions.includeTypes() expects a list (otherwise it's 
confusing why this is not a vector). Actually it's still confusing why it's not 
a vector but at least we can blame the ORC library for that decision :)


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@301
PS9, Line 301: VLOG_QUERY << "Add ORC column " << 
node->getMaximumColumnId() << " for empty tuple "
This logging will be too verbose for most purposes - I'd suggest making it 
VLOG(3) or removing it.


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@363
PS9, Line 363:   for (uint64_t id : selected_type_ids)
nit: braces for multi-line if


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@395
PS9, Line 395: selected_type_ids_.size()
It doesn't really matter here, but with gcc4.9.2 list::size() is actually O(n). 
We've been burned by that before.


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/hdfs-orc-scanner.cc@680
PS9, Line 680: for (int i = 0; i < total_tuples; ++i)
Use braces for multi-line if


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.h@19
PS9, Line 19: #ifndef IMPALA_EXEC_ORC_COLUMN_READERS_H
We started using "#pragma once" instead of the traditional include guards in 
many places - it would be nice to switch.


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.h@61
PS9, Line 61:   virtual void UpdateInputBatch(orc::ColumnVectorBatch* 
orc_batch) = 0;
Maybe mention the relationship with ReadValue() - do you need to call it before 
calling ReadValue()? Maybe it would help to document how callers should use 
this in a class comment for OrcColumnReader?


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.cc@330
PS9, Line 330:   for (OrcColumnReader* child : children_)
Need braces around multi-line for loop - here and elsewhere


http://gerrit.cloudera.org:8080/#/c/12168/9/be/src/exec/orc-column-readers.cc@331
PS9, Line 331: RETURN_IF_ERROR(child->ReadValue(row_idx_, tuple, pool));
This probably performs similarly to the previous code with the switch, but 
ultimately to make this performance we'd want to do batched calls that read 
multiple values in one try.

We changed the Parquet reader to do things that way a while back and it makes 
it faster automatically and unlocks more optimisation opportunities.

Nothing to fix here, just sharing


http://gerrit.cloudera.org:8080/#/c/12168/9/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

http://gerrit.cloudera.org:8080/#/c/12168/9/tests/query_test/test_nested_types.py@128
PS9, Line 128: # TODO(IMPALA-6505): support predicates push down on ORC 
stripe statistics
Maybe move this to a string argument to skip()



--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 26 Jan 2019 01:19:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1894/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 01:13:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.

2019-01-25 Thread Andrew Sherman (Code Review)
Andrew Sherman has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
..

IMPALA-7985: Port RemoteShutdown() to KRPC.

The :shutdown command is used to shutdown a remote server. The common
case is that a user specifies the impalad to shutdown by specifying a
host e.g. :shutdown('host100'). If a user has more than one impalad on a
remote host then the form :shutdown(':') can be used to
specify the port by which the imapald can be contacted. Prior to
IMPALA-7985 this port was the backend port, e.g.
:shutdown('host100:22000'). With IMPALA-7985 the port to use is the KRPC
port, e.g. :shutdown('host100:27000').

Shutdown is implemented by making an rpc call to the target impalad.
This changes the implementation of this call to use KRPC.

To aid the user in finding the KRPC port, the KRPC address is added to
the /backends section of the debug web page.

We attempt to detect the case where :shutdown is pointed at a thrift
port (like the backend port) and print an informative message.

Documentation of this change will be done in IMPALA-8098.

For discussion of why it was chosen to implement this change in an
incompatible way, see comments in
https://issues.apache.org/jira/browse/IMPALA-7985.

TESTING

Ran all end-to-end tests.
Add a test for /backends to test_web_pages.py.
In test_restart_services.py add a call to the old backend port to the
test. Some expected error messages were changed in line with what KRPC
returns.

Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
---
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/service/client-request-state.cc
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M tests/custom_cluster/test_restart_services.py
M tests/webserver/test_web_pages.py
M www/backends.tmpl
16 files changed, 220 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12260/2
--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.

2019-01-25 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
..


Patch Set 1:

(2 comments)

Thanks for comments

http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@70
PS1, Line 70:   virtual void RemoteShutdown(const class RemoteShutdownParamsPB* 
req,
> Brief comment
Done


http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@81
PS1, Line 81: Rrpc
> typo?
I had to stare to see this, thanks



--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Sat, 26 Jan 2019 00:54:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12069 )

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..


Patch Set 13:

(5 comments)

My main outstanding concerns are the ones that Michael raised. Another pass 
revealed a few more questions.

http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/coordinator.cc@789
PS13, Line 789:   // TODO: Move to host profiles?
This seems like the right thing to do. Maybe file a JIRA to track? We could 
also simplify BackendState::ComputeResourceUtilization() to just use the 
per-backend counters instead of iterating over fragments.

I think there may be some compatibility concerns about removing these - 
existence of the counters isn't contractual but we don't want to break useful 
tools if avoidable.

For example, I confirmed that Cloudera Manager actually does parse the existing 
strings (which is a little sad, but understandable given the lack of other 
counters).


http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc@151
PS13, Line 151: CpuUserPercentage
Should we add a System prefix to these or similar? Otherwise it could be 
interpreted as the percentage of cpu used by this query.


http://gerrit.cloudera.org:8080/#/c/12069/13/be/src/runtime/query-state.cc@230
PS13, Line 230: host_profile_, query_id(), 
query_options().scratch_limit));
I'm actually super-happy that we get these counters now.


http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/12069/11/be/src/util/runtime-profile.h@419
PS11, Line 419:  Samples
  :   /// are not re-sampled into larger intervals, instead owners 
of this profile can call
  :   /// ClearChunkedTimeSeriesCounters() to reset the sample 
buffers of all chunked time
  :   /// series counters
> Yes. If you think that's a problem, we can cap the number of samples at som
I hadn't thought about this, but I think it's worth doing, just so that we can 
reason about the behaviour better.


http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py
File tests/observability/test_plot_profile_resource_usage.py:

http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@35
PS11, Line 35:
> I agree that this should be done through Impyla, but it doesn't expose the
I don't want to hold back good work, I can live with this I think.



--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 13
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 00:48:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1893/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 00:44:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Tim Armstrong (Code Review)
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12262

to look at the new patch set (#5).

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..

IMPALA-7941: part 2/2: use cgroups memory limit

This uses the functionality from part 1 to detect the CGroups memory
limit and use it to set a lower process memory limit if needed.
min(system memory, cgroups memory limit) is used instead of
system memory to determine the memory limit. Behaviour of
processes without a memory limit set via CGroups is unchanged.

The default behaviour of using 80% of the memory limit detected
is still in effect. This seems like an OK default, but may
lead to some amount of wasted memory.

Modify containers to have a default JVM heap size of 2GB and
--mem_limit_includes_jvm, so that the automatically configured
memory limit makes more sense.

start-impala-cluster.py is modified to exercise all of this.

Testing:
Tested a containerised cluster manually on my system, which has
32GB of RAM. Here's the breakdown from the memz/ page showing
the JVM heap and auto-configured memory limit.

  Process: Limit=7.31 GB Total=1.94 GB Peak=1.94 GB
JVM: max heap size: Total=1.78 GB
JVM: non-heap committed: Total=35.56 MB
Buffer Pool: Free Buffers: Total=0
Buffer Pool: Clean Pages: Total=0
Buffer Pool: Unused Reservation: Total=0
Control Service Queue: Limit=50.00 MB Total=0 Peak=0
Data Stream Service Queue: Limit=374.27 MB Total=0 Peak=0
Data Stream Manager Early RPCs: Total=0 Peak=0
TCMalloc Overhead: Total=12.20 MB
Untracked Memory: Total=121.31 MB

Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M bin/start-impala-cluster.py
M docker/coord_exec/Dockerfile
M docker/coordinator/Dockerfile
M docker/daemon_entrypoint.sh
M docker/executor/Dockerfile
7 files changed, 126 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/12262/5
--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3677/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 00:03:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 00:03:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12257 )

Change subject: IMPALA-8097: mt_dop for all queries via hidden flag
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG@13
PS3, Line 13: because these are not executable.
> Would you mind elaborating for latter references ?
Done


http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc@206
PS3, Line 206: && state->query_options().mt_dop == 0
> Seems better to do this check before trying to acquire the thread token. Ot
Done


http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc@a108
PS3, Line 108:
> Do you recall what the limitation was ?
This check got added in a large patch with a bunch of other stuff: 
http://gerrit.cloudera.org:8080/4853

I think it was partially because it wasn't tested and nobody had checked that 
the logic generalised correctly to multiple finstances per backend. Maybe there 
was also a desire to do optimisations like local aggregation of filters before 
turning it on.

It's a little weird though because we didn't allow joins for mt plans *anyway*, 
so there can't possibly be an mt plan with  runtime filters. Also it's not 
ideal to toggle the query option as this point rather than during plan 
generation.


http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@161
PS3, Line 161:* - MT_DOP > 0 is not supported for plans with base table 
joins or table sinks.
 :* Throws a NotImplementedException if plan validation fails.
> Comments need updating
Done



--
To view, visit http://gerrit.cloudera.org:8080/12257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Gerrit-Change-Number: 12257
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 26 Jan 2019 00:01:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag

2019-01-25 Thread Tim Armstrong (Code Review)
Hello Michael Ho, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12257

to look at the new patch set (#4).

Change subject: IMPALA-8097: mt_dop for all queries via hidden flag
..

IMPALA-8097: mt_dop for all queries via hidden flag

--unlock_mt_dop=true unlocks mt_dop for all queries
including joins and inserts.

This disables the parallel plans with separate join builds
when running standalone, because these are not executable
until IMPALA-4224 is implemented. Inserts work without
modification - they were disabled because of lack of
testing and the possibility for generating many small
files with unpartitioned inserts - see IMPALA-8125.

Testing:
Add custom cluster test that exercise joins, runtime filters
and inserts as a sanity check for the flag.

Ran exhaustive build.

Manually ran TPC-H and TPC-DS tests against a minicluster
with mt_dop = 4.

Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
---
M be/src/common/global-flags.cc
M be/src/exec/blocking-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A testdata/workloads/functional-query/queries/QueryTest/joins_mt_dop.test
A tests/custom_cluster/test_mt_dop.py
12 files changed, 121 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/12257/4
--
To view, visit http://gerrit.cloudera.org:8080/12257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Gerrit-Change-Number: 12257
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:52:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@357
PS3, Line 357: ot
> nit: to
Done


http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@405
PS3, Line 405: --mem_limit="
 :   << PrettyPrinter::PrintBytes(*bytes_limit);
> this log line can be confusing to mean that the --mem_limit was set to byte
I changed it to be a bit more verbose - print the bytes limit and then the 
literal --mem_limit string in parentheses to explain where it came from.

  I0125 15:48:19.728356 31957 exec-env.cc:405] Using process memory limit: 7.31 
GB (--mem_limit=7848997273)



--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:49:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 5:

PS4 accidentally included a change I didn't mean to include.


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:50:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12244 )

Change subject: IMPALA-8092: Add an admission controller debug page
..


Patch Set 3:

(4 comments)

The code is looking good to me. Main request is testing (which I missed first 
time around).

Does Pooja intend to review this too?

http://gerrit.cloudera.org:8080/#/c/12244/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12244/3//COMMIT_MSG@15
PS3, Line 15: - Histogram of the distribution of peak memory used by queries 
admitted
Another thought: an important bit of the AC state is the memory available , 
admitted and reserved on each host - can we expose this in this patch on in a 
follow-on patch?


http://gerrit.cloudera.org:8080/#/c/12244/3//COMMIT_MSG@21
PS3, Line 21:
Missed this first time around, but can we add a test to test_web_pages that 
loads the /admission webpage (maybe also with a request_pool parameter and also 
invokes the refresh). It would be good to have just as a sanity check.

It also wouldn't be a bad idea to do a manual stress test just to try and flush 
out any races or crashes - e.g. run a workload against a minicluster and have a 
script that loads the page in a loop. I did manually look through the code for 
anything suspicious and it seems fine, but best to be sure.


http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/scheduling/admission-controller.cc@1095
PS3, Line 1095: using namespace rapidjson;
Why not include the namespace for the whole function?


http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/12244/3/be/src/service/impala-http-handler.cc@879
PS3, Line 879: form
from



--
To view, visit http://gerrit.cloudera.org:8080/12244
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766
Gerrit-Change-Number: 12244
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:41:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Tim Armstrong (Code Review)
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12262

to look at the new patch set (#4).

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..

IMPALA-7941: part 2/2: use cgroups memory limit

This uses the functionality from part 1 to detect the CGroups memory
limit and use it to set a lower process memory limit if needed.
min(system memory, cgroups memory limit) is used instead of
system memory to determine the memory limit. Behaviour of
processes without a memory limit set via CGroups is unchanged.

The default behaviour of using 80% of the memory limit detected
is still in effect. This seems like an OK default, but may
lead to some amount of wasted memory.

Modify containers to have a default JVM heap size of 2GB and
--mem_limit_includes_jvm, so that the automatically configured
memory limit makes more sense.

start-impala-cluster.py is modified to exercise all of this.

Testing:
Tested a containerised cluster manually on my system, which has
32GB of RAM. Here's the breakdown from the memz/ page showing
the JVM heap and auto-configured memory limit.

  Process: Limit=7.31 GB Total=1.94 GB Peak=1.94 GB
JVM: max heap size: Total=1.78 GB
JVM: non-heap committed: Total=35.56 MB
Buffer Pool: Free Buffers: Total=0
Buffer Pool: Clean Pages: Total=0
Buffer Pool: Unused Reservation: Total=0
Control Service Queue: Limit=50.00 MB Total=0 Peak=0
Data Stream Service Queue: Limit=374.27 MB Total=0 Peak=0
Data Stream Manager Early RPCs: Total=0 Peak=0
TCMalloc Overhead: Total=12.20 MB
Untracked Memory: Total=121.31 MB

Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M bin/start-impala-cluster.py
M docker/coord_exec/Dockerfile
M docker/coordinator/Dockerfile
M docker/daemon_entrypoint.sh
M docker/executor/Dockerfile
M www/bootstrap/css/bootstrap.css
M www/bootstrap/css/bootstrap.min.css
9 files changed, 133 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/12262/4
--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12273 )

Change subject: IMPALA-8114: Deflake test_breakpad.py
..

IMPALA-8114: Deflake test_breakpad.py

A test failed recently in a private build and it looked like the loop in
wait_for_num_processes had terminated to early. To make sure that the
forked of processes that write the minidumps have actually started, we
now sleep for 1 second before entering the wait loop.

Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b
Reviewed-on: http://gerrit.cloudera.org:8080/12273
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M tests/custom_cluster/test_breakpad.py
1 file changed, 4 insertions(+), 1 deletion(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/12273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b
Gerrit-Change-Number: 12273
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:41:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12273 )

Change subject: IMPALA-8114: Deflake test_breakpad.py
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b
Gerrit-Change-Number: 12273
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:41:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 3: Code-Review+1

(2 comments)

Looks good to me. I talked to Bikram, and he's going to finish the review.

http://gerrit.cloudera.org:8080/#/c/12262/2/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/12262/2/be/src/runtime/exec-env.cc@364
PS2, Line 364:   if (use_commit_limit) {
 : avail_mem = MemInfo::commit_limit();
> I pulled out this logic into a function since it was getting unwieldy and a
Makes sense, that is much cleaner.


http://gerrit.cloudera.org:8080/#/c/12262/2/docker/daemon_entrypoint.sh
File docker/daemon_entrypoint.sh:

http://gerrit.cloudera.org:8080/#/c/12262/2/docker/daemon_entrypoint.sh@45
PS2, Line 45: export JAVA_TOOL_OPTIONS="-Xmx2g $JAVA_TOOL_OPTIONS"
> I did see that, but I didn't want to embed test-specific logic into the con
Yeah, I figured that this was already on our TODO list somewhere.



--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:08:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12069 )

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1892/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 13
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 22:48:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7565: Set TAcceptQueueServer connection setup pool to be multi-threaded by default

2019-01-25 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12249 )

Change subject: IMPALA-7565: Set TAcceptQueueServer connection_setup_pool to be 
multi-threaded by default
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12249
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I053120d4c3153ddbe5261acd28388be6cd191908
Gerrit-Change-Number: 12249
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 23:00:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-25 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12069 )

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py
File tests/observability/test_plot_profile_resource_usage.py:

http://gerrit.cloudera.org:8080/#/c/12069/11/tests/observability/test_plot_profile_resource_usage.py@35
PS11, Line 35:
> I kind-of think we should just fix it.
I agree that this should be done through Impyla, but it doesn't expose the 
profile format yet. This test is gone in the latest PS, since it seemed easier 
to add the plotting tool in a subsequent change. The remaining tests still call 
get_thrift_profile but that code has been there before. Would you prefer to 
hold this change until we updated Impyla, switch to using 
ImpalaHiveServer2Service directly, or move forward and update impyla before 
adding the plotting tool back?



--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 13
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 21:57:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8097: mt dop for all queries via hidden flag

2019-01-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12257 )

Change subject: IMPALA-8097: mt_dop for all queries via hidden flag
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12257/3//COMMIT_MSG@13
PS3, Line 13: because these are not executable.
Would you mind elaborating for latter references ?


http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc
File be/src/exec/blocking-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/exec/blocking-join-node.cc@206
PS3, Line 206: && state->query_options().mt_dop == 0
Seems better to do this check before trying to acquire the thread token. 
Otherwise, we would leak the token.


http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12257/3/be/src/runtime/coordinator.cc@a108
PS3, Line 108:
Do you recall what the limitation was ?


http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12257/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@161
PS3, Line 161:* - MT_DOP > 0 is not supported for plans with base table 
joins or table sinks.
 :* Throws a NotImplementedException if plan validation fails.
Comments need updating



--
To view, visit http://gerrit.cloudera.org:8080/12257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72f0b02a005e8bf22fd17b8fb5aabf8c0d9b6b15
Gerrit-Change-Number: 12257
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Fri, 25 Jan 2019 21:55:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-25 Thread Lars Volker (Code Review)
Hello Michael Ho, Philip Zeyliger, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12069

to look at the new patch set (#13).

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..

IMPALA-7694: Add host resource usage metrics to profile

This change adds a mechanism to collect host resource usage metrics to
profiles. Metric collection can be controlled through a new query option
'RESOURCE_TRACE_RATIO'. It specifies the probability with which metrics
collection will be enabled. Collection always happens per query for all
executors that run one or more fragment instances of the query.

This mechanism adds a new time series counter class that collects all
measured values and does not re-sample them. It will re-sample values
when printing them into a string profile, preserving up to 64 values,
but Thrift profiles will contain the full list of values.

We add a new section "Per Node Profiles" to the profile to store and
show these values:

Per Node Profiles:
  lv-desktop:22000:
CpuIoWaitPercentage (500.000ms): 0, 0
CpuSysPercentage (500.000ms): 1, 1
CpuUserPercentage (500.000ms): 4, 0
  - ScratchBytesRead: 0
  - ScratchBytesWritten: 0
  - ScratchFileUsedBytes: 0
  - ScratchReads: 0 (0)
  - ScratchWrites: 0 (0)
  - TotalEncryptionTime: 0.000ns
  - TotalReadBlockTime: 0.000ns

This change also uses the aforementioned mechanism to collect CPU usage
metrics (user, system, and IO wait time).

A future change can then add a tool to decode a Thrift profile and plot
the contained usage metrics, e.g. using matplotlib (IMPALA-8123). Such a
tool is not included in this change because it will require some
reworking of the python dependencies.

This change also includes a few minor improvements to make the resulting
code more readable:
- Extend the PeriodicCounterUpdater to call functions to update global
  metrics before updating the counters.
- Expose the scratch profile within the per node resource usage section.
- Improve documentation of the profile counter classes.
- Remove synchronization from StreamingSampler.
- Remove a few pieces of dead code that otherwise would have required
  updates.
- Factor some code for profile decoding into the Impala python library

Testing: This change contains a unit test for the system level metrics
collection and e2e tests for the profile changes.

Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/CMakeLists.txt
M be/src/util/periodic-counter-updater.cc
M be/src/util/periodic-counter-updater.h
M be/src/util/pretty-printer.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/streaming-sampler.h
A be/src/util/system-state-info-test.cc
A be/src/util/system-state-info.cc
A be/src/util/system-state-info.h
M bin/parse-thrift-profile.py
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Metrics.thrift
M common/thrift/RuntimeProfile.thrift
A lib/python/impala_py_lib/profiles.py
M tests/beeswax/impala_beeswax.py
M tests/common/impala_service.py
M tests/common/impala_test_suite.py
M tests/query_test/test_observability.py
37 files changed, 1,117 insertions(+), 260 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/12069/13
--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 13
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 3: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@357
PS3, Line 357: ot
nit: to


http://gerrit.cloudera.org:8080/#/c/12262/3/be/src/runtime/exec-env.cc@405
PS3, Line 405: --mem_limit="
 :   << PrettyPrinter::PrintBytes(*bytes_limit);
this log line can be confusing to mean that the --mem_limit was set to 
bytes_limit.
eg. --mem_limit=10GB, avail_mem = 7GB, so bytes_limit will be 7GB but the log 
line will say:
"Using process memory limit from --mem_limit=7GB"



--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 21:09:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 20:45:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..

IMPALA-8058: Fallback for HBase key scan range estimation

Impala supports "pushing" of HBase key range predicates to HBase so that
Impala reads only rows within the target key range. The planner
estimates the cardinality of such scans by sampling the rows within the
range. However, we have seen cases where sampling returns rows for
unknown reasons. The planner then ends up without a good cardinality
estimate.  (Specifically, the code does a division by zero and produces
a huge estimate.  See the ticket for details.)

Impala appears to use the sampling strategy to compute cardinality
because HBase uses generally do not gather table stats. The resulting
estimates are often off by 2x or more. This is a problem in tests as it
causes cardinality numbers to vary greatly from the expected values.
Fortunately, tests do gather HMS stats. There may be cases where users
do as well. This fix exploits that fact.

This fix:

* Creates a fall-back strategy that uses table cardinality from HMS and
  the selectivity of the key predicates to estimate cardinality when the
  sampling approach fails.
* The fall-back strategy requires tracking the predicates used for HBase
  keys so that their selectivity can be applied during fall-back
  calculations.
* Moved HBase key calculation out of the SingleNodePlanner into the
  HBase scan node as suggested by a "TO DO" in the code. Doing so
  simplified the new code.
* In the spirit of IMPALA-7919, adds the key predicates to the HBase
  scan node in the EXPLAIN output.

Testing:

* Adds a query context option to disable the normal key sampling to
  force the use of the fall-back. Used for testing.
* Adds a new set of HBase test cases that use the new feature to check
  plans with the fall-back approach.
* Reran all existing tests.
* Compared cardinality numbers for the two modes: sampling and HMS using
  the cardinality features of IMPALA-8021. The two approaches provide
  different results, but this is mostly due to the missing selectivity
  estimates for inequality operators. (That's a fix for another time.)

Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Reviewed-on: http://gerrit.cloudera.org:8080/12192
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
10 files changed, 511 insertions(+), 116 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/1891/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:56:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12273 )

Change subject: IMPALA-8114: Deflake test_breakpad.py
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b
Gerrit-Change-Number: 12273
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:54:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12244 )

Change subject: IMPALA-8092: Add an admission controller debug page
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1890/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12244
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766
Gerrit-Change-Number: 12244
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:48:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8114: Deflake test breakpad.py

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12273 )

Change subject: IMPALA-8114: Deflake test_breakpad.py
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3676/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcd1fbb498c475a1f186f490abaf90b47ecba05b
Gerrit-Change-Number: 12273
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:42:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Philip Zeyliger (Code Review)
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#5).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_issuances_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change adn 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M 

[Impala-ASF-CR] IMPALA-8095: Detailed expression cardinality tests

2019-01-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12248 )

Change subject: IMPALA-8095: Detailed expression cardinality tests
..


Patch Set 5:

(5 comments)

Just some clarifying questions. Overall makes sense to me. Since this is 
test-only code, we can merge this and keep improving it as and when we find new 
bugs or need more fixture specific improvements for writing new tests.

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@41
PS5, Line 41: See
:  * also {@link ExprNdvTest}, {@link CardinalityTest}.
There seems to be overlap in all of these. Should we consolidate them? Others 
might find it confusing while figuring out where to add new tests.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@110
PS5, Line 110: Bug
Does it make sense to tag the jiras for all the bugs here? I know you raised a 
bunch of them, probably we should maintain  an epic (cardinality mis-estimates 
or something) and link them there.

Many of them make good fe ramp-up tasks :-)

** I know that is a lot of work, don't want that to block this patch, just some 
thought.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprCardinalityTest.java@111
PS5, Line 111: // 2 in the shell with SHOW COLUMN STATS alltypes
 : //verifyTableCol(allTypes, "year", 2, 0);
 : // Bug: Unit test in Eclipse see the above, unit tests run 
from the
 : // command line see the below. Disabling to avoid a flaky 
test,
 : // here and below
Not super clear what is happening here, clarify may be?


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java@164
PS5, Line 164: After IMPALA-7659, Impala computes a null count,
 :* when gathering stats, but the NDV does not include nulls 
(except for Boolean
 :* columns) if stats are computed by IMpala, but does include 
nulls if stats are
 :* computed by Hive.
Haha, this is indeed bizarre.


http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
File fe/src/test/java/org/apache/impala/common/FrontendFixture.java:

http://gerrit.cloudera.org:8080/#/c/12248/5/fe/src/test/java/org/apache/impala/common/FrontendFixture.java@164
PS5, Line 164:  protected void clearTestDbs() {
 : for (Db testDb: testDbs_) {
 :   catalog_.removeDb(testDb.getName());
 : }
 :   }
I did something similar in the testcase patch, except that this is backed by a 
temporary HMS created from scratch and we totally discard the HMS instance 
after test. That way we don't need to fake all the HMS table structures like in 
the methods below.



--
To view, visit http://gerrit.cloudera.org:8080/12248
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3da58ee9b0beebeffb170b9430bd36d20dcd2401
Gerrit-Change-Number: 12248
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:25:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/1889/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:21:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@78
PS5, Line 78:   if (!(ranges_issued_barrier_.pending() != 0) && 
initial_ranges_issued_ && progress_.done()) {
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:16:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page

2019-01-25 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12244 )

Change subject: IMPALA-8092: Add an admission controller debug page
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl
File www/admission_controller.tmpl:

http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl@283
PS1, Line 283:   
> This came up because I started running a stress test and was looking at the
Done. added the total admitted/rejected/timed out metrics



--
To view, visit http://gerrit.cloudera.org:8080/12244
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766
Gerrit-Change-Number: 12244
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:09:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page

2019-01-25 Thread Bikramjeet Vig (Code Review)
Hello Pooja Nilangekar, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12244

to look at the new patch set (#3).

Change subject: IMPALA-8092: Add an admission controller debug page
..

IMPALA-8092: Add an admission controller debug page

This patch adds a new debug page "admission" that provides the
following details about resource pools:
- Pool configuration
- Relevant pool stats
- Queued Queries in order of being queued (local to the coordinator)
- Running Queries (local to this coordinator)
- Histogram of the distribution of peak memory used by queries admitted
  to the pool

The aforementioned details can also be viewed for a single resource
pool using a search string and are also available as a JSON object
from the same http endpoint.

Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766
---
M be/src/catalog/catalog-server.cc
M be/src/rpc/rpc-trace.cc
M be/src/runtime/coordinator.cc
M be/src/scheduling/admission-controller.cc
M be/src/scheduling/admission-controller.h
M be/src/service/impala-http-handler.cc
M be/src/service/impala-http-handler.h
M be/src/statestore/statestore.cc
M be/src/util/default-path-handlers.cc
M be/src/util/logging-support.cc
M be/src/util/metrics.cc
M be/src/util/thread.cc
M be/src/util/webserver-test.cc
M be/src/util/webserver.h
A www/admission_controller.tmpl
15 files changed, 734 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/12244/3
--
To view, visit http://gerrit.cloudera.org:8080/12244
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766
Gerrit-Change-Number: 12244
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc@452
PS3, Line 452:   SCOPED_CLEANUP({ UpdateRemainingScanRangeIssuances(-1); });
> We have MakeScopeExitTrigger in Impala, but it seems like we could just swi
Seems better to be consistent, so I switched to MakeScopeExitTrigger.



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:05:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12097/4/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12097/4/be/src/exec/hdfs-scan-node.cc@78
PS4, Line 78:   if (!(ranges_issued_barrier_.pending() != 0) && 
initial_ranges_issued_ && progress_.done()) {
line too long (95 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 18:51:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Philip Zeyliger (Code Review)
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#4).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_issuances_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change adn 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M 

[Impala-ASF-CR] IMPALA-7985: Port RemoteShutdown() to KRPC.

2019-01-25 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@70
PS1, Line 70:   virtual void RemoteShutdown(const class RemoteShutdownParamsPB* 
req,
Brief comment


http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@81
PS1, Line 81: Rrpc
typo?

I guess it was already this way, I just missed it in the last review.



--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Fri, 25 Jan 2019 18:23:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12179 )

Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu 
table
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1888/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081
Gerrit-Change-Number: 12179
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 25 Jan 2019 18:21:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8092: Add an admission controller debug page

2019-01-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12244 )

Change subject: IMPALA-8092: Add an admission controller debug page
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl
File www/admission_controller.tmpl:

http://gerrit.cloudera.org:8080/#/c/12244/1/www/admission_controller.tmpl@283
PS1, Line 283:   Queries Currently Running
> That might be difficult to get right with the current metrics since we dont
This came up because I started running a stress test and was looking at the 
page to see if queries were actually getting submitted to the pool - and it was 
hard to tell whether the queries had actually run without switching back over 
to my stress test terminal.

Maybe we could expose the admitted/rejected/timed out metrics? Since in 
aggregate that would show the number of queries that had "exited" the admission 
control.



--
To view, visit http://gerrit.cloudera.org:8080/12244
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff055d9709ea1bcc2f492adcde92241b6149f766
Gerrit-Change-Number: 12244
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 17:55:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table

2019-01-25 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12179 )

Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu 
table
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12179/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/12179/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2419
PS2, Line 2419: if (!oldTbl.getKuduTableName().equals(newKuduTableName)) {
> nit: we often structure these as "if (condition) return;" to keep the inden
Done


http://gerrit.cloudera.org:8080/#/c/12179/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2423
PS2, Line 2423:   oldMsTbl.getParameters().put(KuduTable.KEY_TABLE_NAME, 
newKuduTableName);
> I noticed that the order changed but couldn't figure out why. Is there a pa
You want to update the msTbl with the value of the new Kudu table, before 
passing the msTbl to validateKuduTblExists.

validateKuduTblExists will take the name of the Kudu table from the msTbl using 
the key KuduTable.KEY_TABLE_NAME and use it to check if that Kudu table exists.

However, I decided to just remove the call to validateKuduTblExists, I don't 
think its adding any value here because kuduClient.renameTable should guarantee 
the new table exists. So doing another check just adds an unnecessary RPC call 
to Kudu.

The original run of the tests passed without this change because of 
IMPALA-8117. The fix for IMPALA-8117 requires some invasive changes to add 
proper unit testing, so I'm planning on doing it in a separate patch.



--
To view, visit http://gerrit.cloudera.org:8080/12179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081
Gerrit-Change-Number: 12179
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 25 Jan 2019 17:44:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-01-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..


Patch Set 8:

(12 comments)

Started to look at it. Currently mostly have comments about style.
I plan to look at it deeper.

http://gerrit.cloudera.org:8080/#/c/12168/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12168/8//COMMIT_MSG@25
PS8, Line 25: Don’t like
nit: 'They are not like', or 'They differ from'


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.h
File be/src/exec/hdfs-orc-scanner.h:

http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.h@160
PS8, Line 160:   orc::RowReaderOptions row_reader_options;
nit: put '_' at the end of the variable name.


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@181
PS8, Line 181: base
nit: based


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@329
PS8, Line 329: *template_tuple =
 : Tuple::Create(tuple_desc.byte_size(), 
template_tuple_pool_.get());
nit: put curly braces around it


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@364
PS8, Line 364: if (id >= node.getColumnId() && id <= 
node.getMaximumColumnId()) return true;
nit: put curly braces around it


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/hdfs-orc-scanner.cc@681
PS8, Line 681:   RETURN_IF_ERROR(AssembleCollection(*child_reader, 
child_batch_offset + i,
 :   coll_value_builder));
nit: add curly braces around it


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.h
File be/src/exec/orc-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.h@101
PS8, Line 101: batch_ = dynamic_cast(orc_batch);
We avoid doing dynamic casts in release mode.
You can do a dynamic_cast inside a DCHECK, then use static_cast here.


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.h@126
PS8, Line 126: int64_t val = batch_->data.data()[row_idx];
Maybe you could add a DCHECK that batch_ is not null.


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc
File be/src/exec/orc-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@190
PS8, Line 190:   *slot = TimestampValue::FromUnixTimeNanos(secs, nanos,
Do we know that orc timestamps are timezone-aware or not?


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@326
PS8, Line 326: 
children_[c]->UpdateInputBatch(batch_->fields[children_fields_[c]]);
nit: add curly brackets around it since it is a multi-line for stmt.


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@331
PS8, Line 331: RETURN_IF_ERROR(child->ReadValue(row_idx_, tuple, pool));
nit: add curly brackets around it


http://gerrit.cloudera.org:8080/#/c/12168/8/be/src/exec/orc-column-readers.cc@395
PS8, Line 395:   } else CreateChildForSlot(node, slot_desc);
nit: it's against the Google style guide to have braces on only one branch of 
an if-stmt:
https://google.github.io/styleguide/cppguide.html#Conditionals



--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 8
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 25 Jan 2019 17:25:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table

2019-01-25 Thread Sahil Takiar (Code Review)
Hello Lars Volker, Mike Percy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12179

to look at the new patch set (#4).

Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu 
table
..

IMPALA-7640: RENAME on managed Kudu table should rename Kudu table

`ALTER TABLE managed_kudu_tbl RENAME TO new_managed_kudu_tbl` will now
rename the underlying Kudu table from `managed_kudu_tbl` to
`new_managed_kudu_tbl`. This is only triggered for managed tables, for
external tables renames do not modify the underlying Kudu table.

Testing:
* Ran core tests
* Updated kudu_alter.test

Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
2 files changed, 52 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12179/4
--
To view, visit http://gerrit.cloudera.org:8080/12179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081
Gerrit-Change-Number: 12179
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3675/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 16:40:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-25 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 25 Jan 2019 16:39:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12179 )

Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu 
table
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1887/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081
Gerrit-Change-Number: 12179
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Fri, 25 Jan 2019 15:48:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7640: RENAME on managed Kudu table should rename Kudu table

2019-01-25 Thread Sahil Takiar (Code Review)
Hello Lars Volker, Mike Percy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12179

to look at the new patch set (#3).

Change subject: IMPALA-7640: RENAME on managed Kudu table should rename Kudu 
table
..

IMPALA-7640: RENAME on managed Kudu table should rename Kudu table

`ALTER TABLE managed_kudu_tbl RENAME TO new_managed_kudu_tbl` will now
rename the underlying Kudu table from `managed_kudu_tbl` to
`new_managed_kudu_tbl`. This is only triggered for managed tables, for
external tables renames do not modify the underlying Kudu table.

Testing:
* Ran core tests
* Updated kudu_alter.test

Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081
---
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
2 files changed, 55 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12179/3
--
To view, visit http://gerrit.cloudera.org:8080/12179
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77e7583ce93cba8f6e743c4bedd9900ae1fae081
Gerrit-Change-Number: 12179
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-7941: part 2/2: use cgroups memory limit

2019-01-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12262 )

Change subject: IMPALA-7941: part 2/2: use cgroups memory limit
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1886/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9fb4fb936a46fc194a204391d03c07c8c7fba21
Gerrit-Change-Number: 12262
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 Jan 2019 08:33:38 +
Gerrit-HasComments: No