[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has uploaded a new patch set (#2). Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called MinMaxAvgValueCounter which keeps a track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. This is also displayed in the RuntimeProfile. The RuntimeProfile has also been updated to keep a track of, display and move this new MinMaxAvgValueCounter between nodes through Thrift. A test has been added to test that this counter works fine when there are multiple blocks to scan per node. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 7 files changed, 216 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/2 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3823: Add timer to measure Parquet footer reads
Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/4371 Change subject: IMPALA-3823: Add timer to measure Parquet footer reads .. IMPALA-3823: Add timer to measure Parquet footer reads It's been observed that Parquet footer reads perform poorly especially when reading from S3. This patch adds a timer "FooterProcessingTimer" which keeps a track of the average time each split of each scan node spends in reading and processing the parquet footer. Added a new utility counter called MinMaxAvgValueCounter which keeps a track of the min, max and average values seen so far from a set of values. This counter is used to calculate the min, max and average time taken to scan and process Parquet footers per query per node. This is also displayed in the RuntimeProfile. The RuntimeProfile has also been updated to keep a track of, display and move this new MinMaxAvgValueCounter between nodes through Thrift. A test has been added to test that this counter works fine when there are multiple blocks to scan per node. Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/RuntimeProfile.thrift M tests/query_test/test_scanners.py 7 files changed, 216 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4371/1 -- To view, visit http://gerrit.cloudera.org:8080/4371 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icf87bad90037dd0cea63b10c537382ec0f980cbf Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh Mukil
[Impala-ASF-CR] IMPALA-3201: reservation implementation for new buffer pool
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3201: reservation implementation for new buffer pool .. Patch Set 18: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3201: reservation implementation for new buffer pool
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3201: reservation implementation for new buffer pool .. IMPALA-3201: reservation implementation for new buffer pool This patch implements the reservation bookkeeping logic for the new buffer pool. The reservations are always guaranteed and any buffer/page allocations require a reservation. Reservations are tracked via a hierarchy of ReservationTrackers. Eventually ReservationTrackers should become the main mechanism for memory accounting, once all runtime memory is handled by the buffer pool. In the meantime, query/process limits are enforced and memory is reported through the preexisting MemTracker hierarchy. Reservations are accounted as consumption against a MemTracker because the memory is committed and cannot be used for other purposes. The MemTracker then uses the counters from the ReservationTracker to log information about buffer-pool memory down to the operator level. Testing: Includes basic tests for buffer pool client registration and various reservation operations. Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6 Reviewed-on: http://gerrit.cloudera.org:8080/3993 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins --- M be/CMakeLists.txt A be/src/bufferpool/CMakeLists.txt A be/src/bufferpool/buffer-pool-test.cc A be/src/bufferpool/buffer-pool.cc M be/src/bufferpool/buffer-pool.h A be/src/bufferpool/reservation-tracker-counters.h A be/src/bufferpool/reservation-tracker-test.cc A be/src/bufferpool/reservation-tracker.cc A be/src/bufferpool/reservation-tracker.h M be/src/runtime/mem-tracker.cc M be/src/runtime/mem-tracker.h A be/src/util/dummy-runtime-profile.h M be/src/util/runtime-profile-counters.h 13 files changed, 1,439 insertions(+), 26 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/3993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh
Michael Brown has posted comments on this change. Change subject: IMPALA-2013: Reintroduce steps for checking HBase health in run-hbase.sh .. Patch Set 5: (9 comments) Under what circumstances is more than one zookeeper needed? http://gerrit.cloudera.org:8080/#/c/4348/5//COMMIT_MSG Commit Message: PS5, Line 11: The original script was problematic for a couple of reasons (most : notably because it queried the wrong Zookeeper node) so we stopped : using it during our mini-cluster HBase start up procedure. Note quite right. I didn't use it because it wasn't clear it was needed anymore, the so-called "HBase race" had been fixed. I didn't notice the master vs. rs discrepancy at that time. http://gerrit.cloudera.org:8080/#/c/4348/5/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: PS5, Line 50: parser.add_argument('--timeout', '-t', action='store' I find action='store' to be redundant since that's the default. PS5, Line 52: 'Default is 30 seconds.')) Don't hardcode 30. Use a format string and read from TIMEOUT_SECONDS. PS5, Line 56: 'e.g, 0.0.0.0:5070. Default is localhost:5070.')) Use a format string and read from HDFS_HOST. PS5, Line 60: 'e.g, 0.0.0.0:2181. Default is localhost:2181.')) Use a format string and read from ZK_HOSTS. PS5, Line 66: '/hbase/master and /hbase/rs.') Use a format string. Maybe use '-n '.join(HBASE_NODES) as part of it? PS5, Line 127: hdfs_client.list('/') What does this do? PS5, Line 125: try: : hdfs_client = InsecureClient('http://' + args.hdfs_host) : hdfs_client.list('/') : except requests.exceptions.ConnectionError as e: : msg = 'Could not connect to HDFS web host http://{0} - {1}'.format(args.hdfs_host, e) : LOGGER.error(msg) : sys.exit(1) : : zk_client = connect_to_zookeeper(args.zookeeper_hosts, args.timeout) : errors = sum([check_hbase_node(zk_client, node, args.timeout) for node in args.nodes]) I suggest you make this a method. The __main__ can get the arguments, the errors, and print the error and exit accordingly. It would be better to have a separate method to make a connection and do the query work. PS5, Line 135: if errors: It would be better to have the explicit numerical > 0 comparison. -- To view, visit http://gerrit.cloudera.org:8080/4348 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b81f3cfb6ea0ba7b18ce5fcd5d268f515c8b0c3 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Ishaan Joshi Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3201: reservation implementation for new buffer pool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3201: reservation implementation for new buffer pool .. Patch Set 18: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/3993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I35cc89e863efb4cc506657bfdaaaf633a10bbab6 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter
Michael Brown has posted comments on this change. Change subject: IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter .. Patch Set 2: Note to committer: there's no GVO path for this, so I suggest you submit along with +2. -- To view, visit http://gerrit.cloudera.org:8080/4357 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: stak...@cloudera.com Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: stak...@cloudera.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter
Michael Brown has posted comments on this change. Change subject: IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter .. Patch Set 2: Code-Review+1 This needs a committer's +2. -- To view, visit http://gerrit.cloudera.org:8080/4357 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: stak...@cloudera.com Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: stak...@cloudera.com Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4028: Trim sentry config file path spaces while impala start.
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4309 to look at the new patch set (#3). Change subject: IMPALA-4028: Trim sentry config file path spaces while impala start. .. IMPALA-4028: Trim sentry config file path spaces while impala start. When sentry config file path is wrong input which end contains spaces, impala start up failed. Trimming sentry config file path spaces and checking file again while the original file has been checked failed will solve this problem. Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634 --- M fe/src/main/java/com/cloudera/impala/authorization/SentryConfig.java 1 file changed, 7 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/4309/3 -- To view, visit http://gerrit.cloudera.org:8080/4309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: davy...@163.com Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: davy...@163.com
[Impala-ASF-CR] IMPALA-4111: backend death tests should not produce minidumps
Lars Volker has posted comments on this change. Change subject: IMPALA-4111: backend death tests should not produce minidumps .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/4353/3/be/src/util/minidump.h File be/src/util/minidump.h: PS3, Line 29: for death test nit: maybe remove or reword like "for example during death test" or "during tests" -- To view, visit http://gerrit.cloudera.org:8080/4353 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I33037f4f77c8188fc2ec46a77083a0a64f6ea404 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes