[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 8: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/383/ -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 8: Code-Review+2 Fix a linker issue - order of boost libraries was wrong -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Hello Impala Public Jenkins, Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6402 to look at the new patch set (#8). Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. IMPALA-5077: add NUMA and current cpu to CpuInfo NUMA info is found using the /sys filesystem. The current CPU can be found using sched_getcpu(), which is supported on all recent Linux kernels (unfortunately CentOS 5 shipped with an older kernel, so we need a fallback). Testing: Confirmed that this built on a range of different Linux distros, including CentOS 5, which is missing support for features like sched_getcpu(). Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 --- M CMakeLists.txt M be/CMakeLists.txt A be/src/common/.gitignore M be/src/common/CMakeLists.txt A be/src/common/config.h.in M be/src/util/cpu-info.cc M be/src/util/cpu-info.h 7 files changed, 168 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/8 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 14: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/382/ -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 7: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/380/ -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 14: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/382/ -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 14: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4996: Single-threaded KuduScanNode .. Patch Set 6: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/377/ -- To view, visit http://gerrit.cloudera.org:8080/6312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 17: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/381/ -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4831: enforce BufferPool reservation invariants .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4831: enforce BufferPool reservation invariants .. IMPALA-4831: enforce BufferPool reservation invariants Before this patch ill-behaved code outside BufferPool could violate BufferPool invariants by calling methods on ReservationTracker() such as DecreaseReservation() or ReleaseFrom() or by hooking up Clients and ReservationTrackers in the wrong way (e.g. sharing a ReservationTracker between two Clients). Now each client creates and owns its ReservationTracker and restricts which methods can be called from outside BufferPool. This also reduces the amount of boilerplate code required to set up and tear down a client. Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573 Reviewed-on: http://gerrit.cloudera.org:8080/6313 Reviewed-by: Dan HechtTested-by: Impala Public Jenkins --- M be/src/runtime/bufferpool/buffer-pool-internal.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h M be/src/runtime/bufferpool/reservation-tracker.h M be/src/runtime/bufferpool/suballocator-test.cc M be/src/runtime/bufferpool/suballocator.cc 7 files changed, 163 insertions(+), 150 deletions(-) Approvals: Impala Public Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 17: Code-Review+2 Forgot to update the alloc test. Made the tiny change to the test. Forwarding the +2. -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has uploaded a new patch set (#17). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Added some EE APPX_MEDIAN() tests on larger datasets that exercise the resize code path. Perf Benchrmark (about 35,000 elements per bucket): SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t BEFORE: 11s067ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 124.726us 124.726us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 29.544us 29.544us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 86.406us 120.372us 3 1 44.00 KB 10.00 MB 04:AGGREGATE31s840ms2s824ms 2.00K -1 1.02 GB 128.00 MB FINALIZE 03:EXCHANGE 31s163ms1s989ms 6.00K -1 0 0 HASH(c1) 01:AGGREGATE33s356ms3s416ms 6.00K -1 1.95 GB 128.00 MB STREAMING 00:SCAN HDFS3 64.962ms 65.490ms 65.54M -1 25.97 MB 64.00 MB tpcds_10_parquet.benchmark AFTER: 9s465ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail 06:AGGREGATE1 73.961us 73.961us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 18.101us 18.101us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 75.795us 83.969us 3 1 44.00 KB 10.00 MB 04:AGGREGATE31s608ms 2s683ms 2.00K -1 1.02 GB 128.00 MB FINALIZE 03:EXCHANGE 3 826.683ms 1s322ms 6.00K -1 0 0 HASH(c1) 01:AGGREGATE32s457ms 2s672ms 6.00K -1 3.14 GB 128.00 MB STREAMING 00:SCAN HDFS3 81.514ms 89.056ms 65.54M -1 25.94 MB 64.00 MB tpcds_10_parquet.benchmark Memory Benchmark (about 12 elements per bucket): SELECT MAX(a) FROM ( SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a FROM tpcds_parquet.store_sales GROUP BY ss_customer_sk) t BEFORE: 7s477ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 114.686us 114.686us1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 18.214us 18.214us3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 147.055us 165.464us3 1 28.00 KB 10.00 MB 04:AGGREGATE32s043ms2s147ms 14.82K -1 4.94 GB 128.00 MB FINALIZE 03:EXCHANGE 3 840.528ms 943.254ms 15.61K -1 0 0 HASH(ss_customer_sk) 01:AGGREGATE31s769ms1s869ms 15.61K -1 5.32 GB 128.00 MB STREAMING 00:SCAN HDFS3 17.941ms 37.109ms 183.59K -1 1.94 MB 16.00 MB tpcds_parquet.store_sales AFTER: 434ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 125.915us 125.915us1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 72.179us 72.179us3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 79.054us 83.385us3 1 28.00 KB 10.00 MB 04:AGGREGATE36.559ms7.669ms 14.82K -1 17.32 MB 128.00 MB FINALIZE 03:EXCHANGE 3 67.370us 85.068us 15.60K -1 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 19.245ms 24.472ms 15.60K -1 9.48 MB 128.00 MB STREAMING 00:SCAN HDFS3 53.173ms 55.844ms 183.59K -1 1.18 MB 16.00 MB tpcds_parquet.store_sales
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Hello Marcel Kornacker, Impala Public Jenkins, Matthew Jacobs, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6025 to look at the new patch set (#17). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 16 elements. Once the buffer becomes full, we reallocate a new buffer with double capacity and copy the original buffer into the new one. We continue doubling the buffer size until the buffer has room for 20,000 elements as before. Testing: Added some EE APPX_MEDIAN() tests on larger datasets that exercise the resize code path. Perf Benchrmark (about 35,000 elements per bucket): SELECT MAX(a) from ( SELECT c1, appx_median(c2) as a FROM benchmark GROUP BY c1) t BEFORE: 11s067ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 124.726us 124.726us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 29.544us 29.544us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 86.406us 120.372us 3 1 44.00 KB 10.00 MB 04:AGGREGATE31s840ms2s824ms 2.00K -1 1.02 GB 128.00 MB FINALIZE 03:EXCHANGE 31s163ms1s989ms 6.00K -1 0 0 HASH(c1) 01:AGGREGATE33s356ms3s416ms 6.00K -1 1.95 GB 128.00 MB STREAMING 00:SCAN HDFS3 64.962ms 65.490ms 65.54M -1 25.97 MB 64.00 MB tpcds_10_parquet.benchmark AFTER: 9s465ms Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail 06:AGGREGATE1 73.961us 73.961us 1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 18.101us 18.101us 3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 75.795us 83.969us 3 1 44.00 KB 10.00 MB 04:AGGREGATE31s608ms 2s683ms 2.00K -1 1.02 GB 128.00 MB FINALIZE 03:EXCHANGE 3 826.683ms 1s322ms 6.00K -1 0 0 HASH(c1) 01:AGGREGATE32s457ms 2s672ms 6.00K -1 3.14 GB 128.00 MB STREAMING 00:SCAN HDFS3 81.514ms 89.056ms 65.54M -1 25.94 MB 64.00 MB tpcds_10_parquet.benchmark Memory Benchmark (about 12 elements per bucket): SELECT MAX(a) FROM ( SELECT ss_customer_sk, APPX_MEDIAN(ss_sold_date_sk) as a FROM tpcds_parquet.store_sales GROUP BY ss_customer_sk) t BEFORE: 7s477ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 114.686us 114.686us1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 18.214us 18.214us3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 147.055us 165.464us3 1 28.00 KB 10.00 MB 04:AGGREGATE32s043ms2s147ms 14.82K -1 4.94 GB 128.00 MB FINALIZE 03:EXCHANGE 3 840.528ms 943.254ms 15.61K -1 0 0 HASH(ss_customer_sk) 01:AGGREGATE31s769ms1s869ms 15.61K -1 5.32 GB 128.00 MB STREAMING 00:SCAN HDFS3 17.941ms 37.109ms 183.59K -1 1.94 MB 16.00 MB tpcds_parquet.store_sales AFTER: 434ms Operator #Hosts Avg Time Max Time#Rows Est. #Rows Peak Mem Est. Peak Mem Detail - 06:AGGREGATE1 125.915us 125.915us1 1 28.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 72.179us 72.179us3 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE3 79.054us 83.385us3 1 28.00 KB 10.00 MB 04:AGGREGATE36.559ms7.669ms 14.82K -1 17.32 MB 128.00 MB FINALIZE 03:EXCHANGE 3 67.370us 85.068us 15.60K -1 0 0 HASH(ss_customer_sk) 01:AGGREGATE3 19.245ms 24.472ms 15.60K -1
[native-toolchain-CR] IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)
Henry Robinson has posted comments on this change. Change subject: IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2) .. Patch Set 2: Lars took a look at the glog breakpad patch, and Matt took a quick look at the glog message listener patch. -- To view, visit http://gerrit.cloudera.org:8080/6372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5 Gerrit-PatchSet: 2 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)
Henry Robinson has posted comments on this change. Change subject: IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2) .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6372/2/source/gflags/build.sh File source/gflags/build.sh: PS2, Line 33: if [ -e CMakeLists.txt ]; then : wrap cmake -DBUILD_STATIC_LIBS=ON -DCMAKE_INSTALL_PREFIX=$LOCAL_INSTALL -DCMAKE_BUILD_TYPE=RELEASE : else : wrap ./configure --with-pic --prefix=$LOCAL_INSTALL > I assume the CMakeLists.txt path is for the newer version? I think it'd be Would prefer not to use PACKAGE_VERSION, as I find version checking in bash to be pretty error-prone (at least with my skill level in bash). I added a comment to clarify. -- To view, visit http://gerrit.cloudera.org:8080/6372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5 Gerrit-PatchSet: 2 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2)
Hello Matthew Jacobs, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6372 to look at the new patch set (#3). Change subject: IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2) .. IMPALA-5057: Upgrade gflags (2.2.0) and glog (0.3.4-p2) Upgrade gflags and glog together, glog is dependent on gflags. Fix patch 0001 to install a message listener that no longer worked in glog 0.3.4. Patched branch available for inspection at: https://github.com/henryr/glog/commits/0.3.4-p2 Tested message listener reimplementation with Impala. glog unit tests also pass. Impala breakpad DCHECK test, which exercises the patched path, passes. Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5 --- M buildall.sh M source/gflags/build.sh M source/glog/build.sh A source/glog/glog-0.3.4-patches/0001-Allow-glog-to-accept-a-message-listener.patch A source/glog/glog-0.3.4-patches/0002-Preserve-custom-signal-handler-for-dcheck.patch 5 files changed, 284 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/native-toolchain refs/changes/72/6372/3 -- To view, visit http://gerrit.cloudera.org:8080/6372 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I43bfe51a7b85f55f75e0bfe9f4eda1eea7856ff5 Gerrit-PatchSet: 3 Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3203: Part 1: Free list implementation
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6410 Change subject: IMPALA-3203: Part 1: Free list implementation .. IMPALA-3203: Part 1: Free list implementation We will have a single free list per size class, protected by a SpinLock. Free buffers are stored in a heap, ordered by the memory address of the buffer, to help reduce address space fragmentation. A follow-up patch will use the free lists in BufferPool. Testing: Added unit tests for sanity checks and verification of behaviour that is trickier to check in integration or system tests. The cost will be exercised more thoroughly via BufferPool in Part 2. Performance: Includes a benchmark that demonstrates the scalability of the free lists under concurrency. When measuring pure throughput of free list operations, having a free list per core is significantly faster than a shared free list, or allocating directly from TCMalloc. On 8 cores, if the memory allocated is actually touched then for 64KB+ buffers, memory access is the bottleneck rather than lock contention. This suggests that having a free list per core is more than sufficient (however, we will need to validate this with system concurrency benchmarks once we switch to using this during query execution). Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/free-lists-benchmark.cc M be/src/runtime/bufferpool/CMakeLists.txt M be/src/runtime/bufferpool/buffer-allocator.cc M be/src/runtime/bufferpool/buffer-allocator.h M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/bufferpool/buffer-pool.h A be/src/runtime/bufferpool/free-list-test.cc A be/src/runtime/bufferpool/free-list.h 9 files changed, 860 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6410/2 -- To view, visit http://gerrit.cloudera.org:8080/6410 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia89acfa4efdecb96d3678443b4748932b4133b9b Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 16: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/378/ -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 13: Code-Review+2 Rebased onto https://gerrit.cloudera.org/#/c/6313/ -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5811 to look at the new patch set (#13). Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool Add a copy of BufferedTupleStream that allocates memory from BufferPool. This will replace the original implementation when IMPALA-3200 is completed. The major changes are: * Terminology is updated from "blocks" to "pages" * No small buffer support is needed (hooray!). * BufferedTupleStream needs to do its own tracking of # of rows per page, etc instead of relying on BufferedBlockMgr to do it. A wrapper around PageHandle is used. * Profile counters (unpin, pin, get new block time) are removed - similar counters in the BufferPool client are more useful. * Changed the tuple null indicators so that they are allocated before each tuple, rather than in a block at the start of the page. This slightly reduces the memory density, but greatly simplifies much logic. In particular, it avoids problems with RowIdx and larger pages with offsets that don't fit in 16 bits. * Buffer management of read/write streams uses the new pin-counting support to separate pinning of the read and write pages. This means that the reservation usage of an unpinned read/write stream does not fluctuate and the read/write iterators can always advance without requiring additional reservation. Testing this required some further changes. TestEnv was refactored so it can set up either BufferPool or BufferedBlockMgr. Some BufferPool-related state is added to ExecEnv, QueryState and RuntimeState, but is only created for backend tests that explicitly create a BufferPool. The following is left for future work: * IMPALA-3808 (large row support) is not added. I've added TODOs to the code to places that will require changes. * IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since it requires global changes to operators that accumulate memory. Testing: All of the BufferedTupleStream unit tests are ported to the new implementation, except for ones specifically testing the small buffer functionality. Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 --- M be/src/exec/hash-table-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-test.cc A be/src/runtime/buffered-tuple-stream-v2-test.cc A be/src/runtime/buffered-tuple-stream-v2.cc A be/src/runtime/buffered-tuple-stream-v2.h A be/src/runtime/buffered-tuple-stream-v2.inline.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc 20 files changed, 3,002 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/13 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 12: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/380/ -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: Line 176: CHECK_CONSISTENCY(); > maybe move these CHECK_CONSISTECY() to the end of the functions so they val We don't check at the exit points of all functions so I wanted to at least check at the entry to some of these functions before they start changing state. Line 270: RETURN_IF_ERROR(PinPageIfNeeded(write_page_, pinned_)); > wouldn't it be better to move this to AdvanceWritePage()? The other caller Done Line 299: // May need to pin the new page for both reading and writing. > Add: "See ExpectedPinCount()." Done Line 374: pinned_ || buffer_pool_client_->reservation()->IncreaseReservationToFit(page_len_); > let's add a comment, something like: Done Line 443: MemTracker* tracker, scoped_ptr* batch, bool* got_rows) { > you don't have to do it now, but I kinda think this function should be in P Yeah, I think we should really remove it: IMPALA-2758 http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS11, Line 498: and to pin it for reading if the stream's current : /// state requires it. > see comment in cc file on this function. if you take that suggestion, then Done PS11, Line 518: write iterator is > iterators are Done -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Hello Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5811 to look at the new patch set (#12). Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool Add a copy of BufferedTupleStream that allocates memory from BufferPool. This will replace the original implementation when IMPALA-3200 is completed. The major changes are: * Terminology is updated from "blocks" to "pages" * No small buffer support is needed (hooray!). * BufferedTupleStream needs to do its own tracking of # of rows per page, etc instead of relying on BufferedBlockMgr to do it. A wrapper around PageHandle is used. * Profile counters (unpin, pin, get new block time) are removed - similar counters in the BufferPool client are more useful. * Changed the tuple null indicators so that they are allocated before each tuple, rather than in a block at the start of the page. This slightly reduces the memory density, but greatly simplifies much logic. In particular, it avoids problems with RowIdx and larger pages with offsets that don't fit in 16 bits. * Buffer management of read/write streams uses the new pin-counting support to separate pinning of the read and write pages. This means that the reservation usage of an unpinned read/write stream does not fluctuate and the read/write iterators can always advance without requiring additional reservation. Testing this required some further changes. TestEnv was refactored so it can set up either BufferPool or BufferedBlockMgr. Some BufferPool-related state is added to ExecEnv, QueryState and RuntimeState, but is only created for backend tests that explicitly create a BufferPool. The following is left for future work: * IMPALA-3808 (large row support) is not added. I've added TODOs to the code to places that will require changes. * IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since it requires global changes to operators that accumulate memory. Testing: All of the BufferedTupleStream unit tests are ported to the new implementation, except for ones specifically testing the small buffer functionality. Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 --- M be/src/exec/hash-table-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-test.cc A be/src/runtime/buffered-tuple-stream-v2-test.cc A be/src/runtime/buffered-tuple-stream-v2.cc A be/src/runtime/buffered-tuple-stream-v2.h A be/src/runtime/buffered-tuple-stream-v2.inline.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc 20 files changed, 3,012 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/12 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Dan Hecht has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Jim Apple has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Hello Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6402 to look at the new patch set (#7). Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. IMPALA-5077: add NUMA and current cpu to CpuInfo NUMA info is found using the /sys filesystem. The current CPU can be found using sched_getcpu(), which is supported on all recent Linux kernels (unfortunately CentOS 5 shipped with an older kernel, so we need a fallback). Testing: Confirmed that this built on a range of different Linux distros, including CentOS 5, which is missing support for features like sched_getcpu(). Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 --- M be/CMakeLists.txt A be/src/common/.gitignore M be/src/common/CMakeLists.txt A be/src/common/config.h.in M be/src/util/cpu-info.cc M be/src/util/cpu-info.h 6 files changed, 167 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/7 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6402/6/be/src/util/cpu-info.cc File be/src/util/cpu-info.cc: Line 183: for (int core = 0; core < max_num_cores_; ++core) core_to_numa_node_[core] = 0; > std::fill Done -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 1: (14 comments) http://gerrit.cloudera.org:8080/#/c/6406/1//COMMIT_MSG Commit Message: Line 11: serialization library. Any benchmark numbers? Just curious. Also I think it is good to add them here. http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CMakeLists.txt File common/fbs/CMakeLists.txt: PS1, Line 24: Trailing white spaces at multiple places in this file. Line 45: set(CPP_ARGS --cpp -o ${BE_OUTPUT_DIR} -b) Can this be moved outside the loop like JAVA_FE_ARGS http://gerrit.cloudera.org:8080/#/c/6406/1/common/fbs/CatalogObjects.fbs File common/fbs/CatalogObjects.fbs: PS1, Line 20: extraneous white spaces. PS1, Line 54: file_name_prefix_idx Didn't understand what this prefix means till I read HdfsTable.fileNamePrefixes_. I think it should be pointed out here. Line 70: root_type FbFileDesc; What is the significance of this? I'm reading about FlatBuffers for the first time, so pardon my ignorance. From what I understand, this isn't required in strongly typed systems. Please correct me if I'm wrong. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: Line 318: private static int REPLICA_HOST_IDX_MASK = 0x7FFF; Document these? PS1, Line 318: REPLICA_HOST_IDX_MASK Looks like it used for UNMASKing, rename may be? PS1, Line 357: getHostIdx how about just making hostIdx_ a short rather than typecasting everytime? PS1, Line 360: REPLICA_HOST_CACHE_MASK Shouldn't this be short rather than int? May be it works fine this way, just want to be sure. http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 315: Preconditions.checkNotNull(fd); I don't think this FileDescriptor.create() can ever return null? If yes, worth removing Preconditions check here and elsewhere. Line 338:* Computes the common prefix between 'fileName' and 'prevFileName' and returns a I like the idea but I'm wondering if this kind of optimization is a little premature. Especially given the filenames generated by Impala writes are typically random UUID style strings and would likely not qualify this optimization. Thoughts? Please correct me if I got this wrong. Line 420: prefixCompressFileName(currentFileName, prevFileName); Also I think it may not be optimal to construct the prefixes this way by comparing the previous name with the current name. May be we should huffman style encoding that minimizes the no. of bits across the entire set of file names? http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 96: protected static EnumSet SUPPORTED_TABLE_TYPES = EnumSet.of( May be remove TableLoader.SUPPORTED_TABLE_TYPES and make it use this one by making it public? -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 11: Code-Review+2 (7 comments) http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: Line 176: CHECK_CONSISTENCY(); maybe move these CHECK_CONSISTECY() to the end of the functions so they validate the resulting state, but feel free to ignore. Line 270: RETURN_IF_ERROR(PinPageIfNeeded(write_page_, pinned_)); wouldn't it be better to move this to AdvanceWritePage()? The other callers handle it for their specific case (i.e. by calling PrepareForReadInternal(), or not needing it), so this is really just for when we're adding a new write page to an existing stream, no? Line 299: // May need to pin the new page for both reading and writing. Add: "See ExpectedPinCount()." since that's where we really document how we manage reservations/pins. Line 374: pinned_ || buffer_pool_client_->reservation()->IncreaseReservationToFit(page_len_); let's add a comment, something like: // If already pinned, no additional pin is needed (see ExpectedPinCount()). Line 443: MemTracker* tracker, scoped_ptr* batch, bool* got_rows) { you don't have to do it now, but I kinda think this function should be in PHJ. It's really just a composition of buffered-tuple-stream operations, and it's not something that we should be doing generally. http://gerrit.cloudera.org:8080/#/c/5811/11/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS11, Line 498: and to pin it for reading if the stream's current : /// state requires it. see comment in cc file on this function. if you take that suggestion, then update this. PS11, Line 518: write iterator is iterators are -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Jim Apple has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/6402/6/be/src/util/cpu-info.cc File be/src/util/cpu-info.cc: Line 183: for (int core = 0; core < max_num_cores_; ++core) core_to_numa_node_[core] = 0; std::fill -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Hello Jim Apple, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6402 to look at the new patch set (#6). Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. IMPALA-5077: add NUMA and current cpu to CpuInfo NUMA info is found using the /sys filesystem. The current CPU can be found using sched_getcpu(), which is supported on all recent Linux kernels (unfortunately CentOS 5 shipped with an older kernel, so we need a fallback). Testing: Confirmed that this built on a range of different Linux distros, including CentOS 5, which is missing support for features like sched_getcpu(). Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 --- M be/CMakeLists.txt A be/src/common/.gitignore M be/src/common/CMakeLists.txt A be/src/common/config.h.in M be/src/util/cpu-info.cc M be/src/util/cpu-info.h 6 files changed, 166 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/6 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/6402/5/be/src/util/cpu-info.cc File be/src/util/cpu-info.cc: PS5, Line 161: getcpu > should that say sched_getcpu()? Yes - sched_getcpu() is a glibc wrapper around the getcpu() syscall. Line 186: LOG(FATAL) << "Could not find nodes in /sys/devices/system/node"; > are we confident that all distros impala runs on will have this? Oops, I intended to check for the existence of the directory (it's not present if the kernel has NUMA support disabled). Now all errors should be non-fatal and just fall back to assuming one node. -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Pare down ODBC info
Jim Apple has posted comments on this change. Change subject: [DOCS] Pare down ODBC info .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6409/1/docs/topics/impala_odbc.xml File docs/topics/impala_odbc.xml: PS1, Line 39: can be integrate "can be integrated" PS1, Line 42: all been approved by whom? -- To view, visit http://gerrit.cloudera.org:8080/6409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 10: (19 comments) http://gerrit.cloudera.org:8080/#/c/5811/9/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: PS9, Line 322: > !has_read_iterator() ? Done http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: Line 101: for (const Page& page : pages_) { > will this be too expensive, even in debug builds (and we should make sure t We were already doing a pass for the list in one of the Calc* methods, so it's not likely to be significantly worse. I removed the consistency check from GetNextInternal(), which is likely to be the most commonly-called. I added a CHECK_CONSISTENCY macro to guarantee the calls get removed in release builds. PS10, Line 125: read_page_ == pages_.end() > has_read_iterator() and switch clauses? Done Line 129: } > does it make sense to print the write page as well? It's printed on line 123 PS10, Line 149: // This must be the first call to PrepareForWrite(). > Maybe say, this must be the first iterator? i.e. you can't call this functi Done PS10, Line 158: // This must be the first call to PrepareForWrite(). > same Done PS10, Line 205: if > when Done PS10, Line 211: If > When Done PS10, Line 258: get_extra_reservation > it's unfortunate we need this. i wonder if moving the reservation logic (an Done PS10, Line 272: bool pin_for_read = has_read_iterator() && pinned_; : get_extra_reservation |= pin_for_read; > this is still pretty confusing. if you don't move the reservation logic com I restructured this code and move the reservation logic into the callers. It adds a bit more boilerplate but seems less magical. Line 283: if (pin_for_read) RETURN_IF_ERROR(PinPage(_page)); > see comment above. Done Line 298: int64_t row_size, bool get_extra_reservation, bool* got_page) noexcept { > this wrapper now seems unnecessary and not helpful for readability because Done PS10, Line 339: read_page_ == pages_.end() > !has_read_iterator()? Done Line 379: ResetReadPage(); > this seems to make more sense to do in PrepareForRead() since that's the on Done http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS10, Line 217: got_buffer > maybe this should be 'got_reservation' now, and explained in terms of reser Works for me - done. PS10, Line 228: got_buffer > same Done PS10, Line 496: get_extra_reservation > this needs explanation, but first see comments in the cc file about it. Reworked this. Line 503: bool* got_page) noexcept WARN_UNUSED_RESULT; > seems like we can put this code in NewWritePage() now. This ended up being restructured a bit - some of the code is in a new function AdvanceWritePage(). PS10, Line 509: got_buffer > got_page? Done -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Tim Armstrong has uploaded a new patch set (#11). Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool Add a copy of BufferedTupleStream that allocates memory from BufferPool. This will replace the original implementation when IMPALA-3200 is completed. The major changes are: * Terminology is updated from "blocks" to "pages" * No small buffer support is needed (hooray!). * BufferedTupleStream needs to do its own tracking of # of rows per page, etc instead of relying on BufferedBlockMgr to do it. A wrapper around PageHandle is used. * Profile counters (unpin, pin, get new block time) are removed - similar counters in the BufferPool client are more useful. * Changed the tuple null indicators so that they are allocated before each tuple, rather than in a block at the start of the page. This slightly reduces the memory density, but greatly simplifies much logic. In particular, it avoids problems with RowIdx and larger pages with offsets that don't fit in 16 bits. * Buffer management of read/write streams uses the new pin-counting support to separate pinning of the read and write pages. This means that the reservation usage of an unpinned read/write stream does not fluctuate and the read/write iterators can always advance without requiring additional reservation. Testing this required some further changes. TestEnv was refactored so it can set up either BufferPool or BufferedBlockMgr. Some BufferPool-related state is added to ExecEnv, QueryState and RuntimeState, but is only created for backend tests that explicitly create a BufferPool. The following is left for future work: * IMPALA-3808 (large row support) is not added. I've added TODOs to the code to places that will require changes. * IMPALA-4179 (remove MarkNeedsDeepCopy()) is not fixed, since it requires global changes to operators that accumulate memory. Testing: All of the BufferedTupleStream unit tests are ported to the new implementation, except for ones specifically testing the small buffer functionality. Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 --- M be/src/exec/hash-table-test.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-tuple-stream-test.cc A be/src/runtime/buffered-tuple-stream-v2-test.cc A be/src/runtime/buffered-tuple-stream-v2.cc A be/src/runtime/buffered-tuple-stream-v2.h A be/src/runtime/buffered-tuple-stream-v2.inline.h M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.cc M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc 20 files changed, 3,010 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5811/11 -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] [DOCS] Pare down ODBC info
Greg Rahn has posted comments on this change. Change subject: [DOCS] Pare down ODBC info .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Pare down ODBC info
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6409 Change subject: [DOCS] Pare down ODBC info .. [DOCS] Pare down ODBC info Remove the Cloudera-specific portion of the Impala + ODBC page, leaving just the background info about why to use ODBC. Defer to third parties who supply ODBC drivers. Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8 --- M docs/topics/impala_odbc.xml 1 file changed, 5 insertions(+), 174 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/6409/1 -- To view, visit http://gerrit.cloudera.org:8080/6409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I302d23432d348aa0c9c2837c0f42bc4f2844f9b8 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4831: enforce BufferPool reservation invariants .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/379/ -- To view, visit http://gerrit.cloudera.org:8080/6313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT
Zach Amsden has posted comments on this change. Change subject: IMPALA-4318: Kudu support for CREATE EXTERNAL TABLE AS SELECT .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/6261/6/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: PS6, Line 201: Ingested > how about analyzeExistingKuduTableParams ? Done http://gerrit.cloudera.org:8080/#/c/6261/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: Line 2793: > nit extra line Done http://gerrit.cloudera.org:8080/#/c/6261/6/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: PS6, Line 360: kudu_tbl_name = KuduTestSuite.to_kudu_table_name(unique_database, "foo2") : assert kudu_client.table_exists(kudu_tbl_name) : cursor.execute("SELECT * FROM %s.foo2" % (unique_database)) : assert len(cursor.fetchall()) == 1 : cursor.execute("DROP TABLE %s.foo2" % (unique_database)) : assert kudu_client.table_exists(kudu_tbl_name) > I think you wanna wrap this in a try/finally and make sure to drop the kudu Unless I'm mistaken, the entire unique_database gets dropped at the end of the test. The only reason I'm dropping the tables at all here is to test the external vs. managed table behavior. See for example test_kudu_col_removed, test_kudu_rename_table which don't bother to clean up. -- To view, visit http://gerrit.cloudera.org:8080/6261 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9aa82809a6c0c5e6386827314b7e5b520c1a6633 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. Patch Set 1: Code-Review+2 In terms of testing, you may run a private jenkins jobs to test your uncommitted changes against s3 using impala-auxiliary-tests/jenkins/run_impala_tests.py. -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5072: Fix test recover partitions on S3
Zach Amsden has uploaded a new change for review. http://gerrit.cloudera.org:8080/6408 Change subject: IMPALA-5072: Fix test_recover_partitions on S3 .. IMPALA-5072: Fix test_recover_partitions on S3 On S3, directory creation is a nop, so we have to actually create values to materialize the partitions. Testing: I'm still trying to figure out how to run the tests against S3, but reasonably confident this change will actually fix the problem. Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c --- M tests/metadata/test_recover_partitions.py 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/6408/1 -- To view, visit http://gerrit.cloudera.org:8080/6408 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I30d86e49b000db925a778ff53479b61a075dc88c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach Amsden
[Impala-ASF-CR] IMPALA-4831: enforce BufferPool reservation invariants
Dan Hecht has posted comments on this change. Change subject: IMPALA-4831: enforce BufferPool reservation invariants .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6313 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic5b0c335d6e73250f7e5a3b9ce2f999c5119c573 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Dan Hecht has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 5: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6402/5/be/src/util/cpu-info.cc File be/src/util/cpu-info.cc: PS5, Line 161: getcpu should that say sched_getcpu()? Line 186: LOG(FATAL) << "Could not find nodes in /sys/devices/system/node"; are we confident that all distros impala runs on will have this? why is this fatal, but !found_numa_node is not? why not be consistent? -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 16: Thanks for all the reviews, Alex, Jim, MJ, Mostafa, Marcel! -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 16: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/378/ -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Alex Behm has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG Commit Message: Line 15: as a precaution and for testing purposes. > if we're worried about bugs, we should add more tests. i'm not in favor of I don't think it's unusual to have "feature flags" for disabling invasive changes quickly to unblock users if they do hit a problem - these flags exist because it's infeasible to test all possible scenarios. Let's do more testing and remove the flag. Taras, let's talk about what additional testing we should do. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3748: minimum buffer requirements in planner
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3748: minimum buffer requirements in planner .. Patch Set 13: Updated the explain plan output to use more concise names for estimates and reservations. Removed the doc changes from this patch - will update them in a follow-on. -- To view, visit http://gerrit.cloudera.org:8080/5847 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3748: minimum buffer requirements in planner
Tim Armstrong has uploaded a new patch set (#13). Change subject: IMPALA-3748: minimum buffer requirements in planner .. IMPALA-3748: minimum buffer requirements in planner Compute the minimum buffer requirement for spilling nodes and per-host estimates for the entire plan tree. This builds on top of the existing resource estimation code, which computes the sets of plan nodes that can execute concurrently. This is cleaned up so that the process of producing resource requirements is clearer. It also removes the unused VCore estimates. Fixes various bugs and other issues: * computeCosts() was not called for unpartitioned fragments, so the per-operator memory estimate was not visible. * Nested loop join was not treated as a blocking join. * The TODO comment about union was misleading * Fix the computation for mt_dop > 1 by distinguishing per-instance and per-host estimates. * Always generate an estimate instead of unpredictably returning -1/"unavailable" in many circumstances - there was little rhyme or reason to when this happened. * Remove the special "trivial plan" estimates. With the rest of the cleanup we generate estimates <= 10MB for those trivial plans through the normal code path. I left one bug (IMPALA-4862) unfixed because it is subtle, will affect estimates for many plans and will be easier to review once we have the test infra in place. Testing: Added basic planner tests for resource requirements in both the MT and non-MT cases. Re-enabled the explain_level tests, which appears to be the only coverage for many of these estimates. Removed the complex and brittle test cases and replaced with a couple of much simpler end-to-end tests. Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37 --- M be/src/exec/analytic-eval-node.cc M be/src/exec/partitioned-aggregation-node.h M be/src/exec/partitioned-hash-join-builder.h M be/src/runtime/sorter.cc M be/src/scheduling/query-schedule.cc M be/src/service/query-exec-state.cc M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/common/PrintUtils.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/DataSink.java M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java M fe/src/main/java/org/apache/impala/planner/JoinNode.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java A fe/src/main/java/org/apache/impala/planner/ResourceProfile.java M fe/src/main/java/org/apache/impala/planner/SelectNode.java M fe/src/main/java/org/apache/impala/planner/SingularRowSrcNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/SubplanNode.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/planner/UnnestNode.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test A testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test M
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG Commit Message: Line 15: as a precaution and for testing purposes. > I agree that "blindly" adding query options is bad, but there are so many t if we're worried about bugs, we should add more tests. i'm not in favor of piling on query options as a work-around for missing test coverage (and i don't think we're talking about "testing to death"). query options make the product harder to use. and we do not guarantee that for every query we're able to run at the moment that the runtime behavior (including memory consumption) will never change in the future. that would be unreasonable. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Dan Hecht has posted comments on this change. Change subject: IMPALA-3586: Implement union passthrough .. Patch Set 16: (1 comment) http://gerrit.cloudera.org:8080/#/c/5816/16/be/src/exec/union-node.h File be/src/exec/union-node.h: Line 91: bool const_todo_; > Minor naming thing, I don't feel too strongly but 'todo' seems unusual. Alt Rather than having these variables, how about just saving the index of the first materialize child. Then, these can be simple functions, e.g. HasMorePassthrough(), HasMoreMaterialized(), that is derived from existing state. that way, there's less dynamic state that needs to be updated and reasoned about. -- To view, visit http://gerrit.cloudera.org:8080/5816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
Dan Hecht has posted comments on this change. Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool .. Patch Set 10: (19 comments) http://gerrit.cloudera.org:8080/#/c/5811/9/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: PS9, Line 322: !has_read_iterator() ? http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.cc File be/src/runtime/buffered-tuple-stream-v2.cc: Line 101: for (const Page& page : pages_) { will this be too expensive, even in debug builds (and we should make sure this loop is compiled out of release builds)? PS10, Line 125: read_page_ == pages_.end() has_read_iterator() and switch clauses? Line 129: } does it make sense to print the write page as well? PS10, Line 149: // This must be the first call to PrepareForWrite(). Maybe say, this must be the first iterator? i.e. you can't call this function after starting a read iteration or read/write iteration, either, right? PS10, Line 158: // This must be the first call to PrepareForWrite(). same PS10, Line 205: if when (sorry, i wrote the comment but 'if' sounds confusing since it's not currently pinned) PS10, Line 211: If When PS10, Line 258: get_extra_reservation it's unfortunate we need this. i wonder if moving the reservation logic (and the ResetWritePage() call) to the callers would be simpler? In any case, if we leave it here, how about calling this parameter 'get_read_reservation'? also see next comment. PS10, Line 272: bool pin_for_read = has_read_iterator() && pinned_; : get_extra_reservation |= pin_for_read; this is still pretty confusing. if you don't move the reservation logic completely into the callers, it may still make sense to move the calculation 'pinned_ && has_read_iterator()' into AddRowSlow(). The other callsites are known to be either static true or false. THen you would also move the PinPage() call into the caller (using PinPageIfNeeded()), which seems to make sense since in the case of PrepareForReadWrite(), you really want PrepareForReadInternal() to do that anyway, no? Or at least you could move this after the write iterator is set up and still use PinPageIfNeeded(). Line 283: if (pin_for_read) RETURN_IF_ERROR(PinPage(_page)); see comment above. Line 298: int64_t row_size, bool get_extra_reservation, bool* got_page) noexcept { this wrapper now seems unnecessary and not helpful for readability because NewWritePage() has no other callers. PS10, Line 339: read_page_ == pages_.end() !has_read_iterator()? Line 379: ResetReadPage(); this seems to make more sense to do in PrepareForRead() since that's the only case you can possible have a read iterator, right? and that assumption is built into the reservation management code. http://gerrit.cloudera.org:8080/#/c/5811/10/be/src/runtime/buffered-tuple-stream-v2.h File be/src/runtime/buffered-tuple-stream-v2.h: PS10, Line 217: got_buffer maybe this should be 'got_reservation' now, and explained in terms of reservation. or at least got_page? PS10, Line 228: got_buffer same PS10, Line 496: get_extra_reservation this needs explanation, but first see comments in the cc file about it. Line 503: bool* got_page) noexcept WARN_UNUSED_RESULT; seems like we can put this code in NewWritePage() now. PS10, Line 509: got_buffer got_page? -- To view, visit http://gerrit.cloudera.org:8080/5811 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: partitions DMLs for Kudu tables .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/6037/7/be/src/runtime/data-stream-partitioner.cc File be/src/runtime/data-stream-partitioner.cc: PS7, Line 51: KuduTableDescriptor* table_desc_ = static_cast(table_desc); : sp::shared_ptr client_; : RETURN_IF_ERROR(CreateKuduClient(table_desc_->kudu > probably a codegen opportunity here for later, can you leave a TODO? Done http://gerrit.cloudera.org:8080/#/c/6037/7/be/src/runtime/data-stream-partitioner.h File be/src/runtime/data-stream-partitioner.h: PS7, Line 45: // partition number as parameters. : // TODO: make this cleaner without sacrifi > I'm sorry if I wasn't clear when we spoke last, but I think it is still bet Done PS7, Line 63: > typo Done PS7, Line 87: ++next_unknow > rename to indicate used for R.R. only, e.g. next_unknown_partition_ or some Done -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#8). Change subject: IMPALA-3742: partitions DMLs for Kudu tables .. IMPALA-3742: partitions DMLs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions the rows according to Kudu's partitioning scheme. A followup patch will deal with sorting. It accomplishes this by inserting an exchange node into the plan before the DML operation. The DataStreamSender then uses a new abstraction, DataStreamPartitioner, that calls into the Kudu client to determine the partition for each row. Testing: - Updated planner tests. - Manually verified the partitioning works as expected. Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/runtime/CMakeLists.txt M be/src/runtime/coordinator.cc A be/src/runtime/data-stream-partitioner.cc A be/src/runtime/data-stream-partitioner.h M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-delete.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 23 files changed, 586 insertions(+), 104 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/6037/8 -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Hello Jim Apple, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6402 to look at the new patch set (#5). Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. IMPALA-5077: add NUMA and current cpu to CpuInfo NUMA info is found using the /sys filesystem. The current CPU can be found using sched_getcpu(), which is supported on all recent Linux kernels (unfortunately CentOS 5 shipped with an older kernel, so we need a fallback). Testing: Confirmed that this built on a range of different Linux distros, including CentOS 5, which is missing support for features like sched_getcpu(). Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 --- M be/CMakeLists.txt A be/src/common/.gitignore M be/src/common/CMakeLists.txt A be/src/common/config.h.in M be/src/util/cpu-info.cc M be/src/util/cpu-info.h 6 files changed, 155 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/5 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6406/1/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 83: storageIdGenerator.put(host, new Short(diskId)); Drive-by comment: using 'new' here is pessimising memory consumption. If you use Short.valueof() or just let Java automatically box it, it will use a built-in cache for values -128 to 127, e.g. see http://docs.oracle.com/javase/7/docs/api/java/lang/Short.html#valueOf(short) -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 3: Code-Review+1 Carry Jim's +1 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/6402/3/be/CMakeLists.txt File be/CMakeLists.txt: PS3, Line 282: > nit: remove space? Done http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/common/CMakeLists.txt File be/src/common/CMakeLists.txt: PS3, Line 53: CONFIGURE_FILE > Does cmake automatically understand that compilation depends on this step? The config file is generated during configuration (i.e. the "cmake ." step), so it will always be generated before compilation. I think the only catch is that if you modify config.h.in without touching a CMakeLists.txt file (very unlikely) then it doesn't trigger regeneration of the file. http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.cc File be/src/util/cpu-info.cc: PS3, Line 181: for (; dir_it != fs::directory_iterator(); ++dir_it) { > directory_iterator doesn't work with ranged for syntax? Doesn't look like it: https://github.com/Beman/filesystem-proposal/issues/1 http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.h File be/src/util/cpu-info.h: PS3, Line 87: hould : /// be used by Impala. > what does 'should be used' mean? Available for use? I added a bit more explanation - it's usually just the number of online cores but there's a --num_cores command line options. -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4029: Reduce memory requirements for storing file metadata
Dimitris Tsirogiannis has uploaded a new change for review. http://gerrit.cloudera.org:8080/6406 Change subject: IMPALA-4029: Reduce memory requirements for storing file metadata .. IMPALA-4029: Reduce memory requirements for storing file metadata This commit improves the memory requirements for storing file and block metadata in the catalog and the impalad nodes by using the FlatBuffers serialization library. Testing: Passed an exhaustive tests run. Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 --- M CMakeLists.txt A common/fbs/CMakeLists.txt A common/fbs/CatalogObjects.fbs M common/thrift/CatalogObjects.thrift M fe/CMakeLists.txt M fe/pom.xml M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java 14 files changed, 653 insertions(+), 298 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/6406/1 -- To view, visit http://gerrit.cloudera.org:8080/6406 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I483d3cadc9d459f71a310c35a130d073597b0983 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Henry Robinson has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/6402/3/be/CMakeLists.txt File be/CMakeLists.txt: PS3, Line 282: nit: remove space? http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/common/CMakeLists.txt File be/src/common/CMakeLists.txt: PS3, Line 53: CONFIGURE_FILE Does cmake automatically understand that compilation depends on this step? http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.cc File be/src/util/cpu-info.cc: PS3, Line 181: for (; dir_it != fs::directory_iterator(); ++dir_it) { directory_iterator doesn't work with ranged for syntax? http://gerrit.cloudera.org:8080/#/c/6402/3/be/src/util/cpu-info.h File be/src/util/cpu-info.h: PS3, Line 87: hould : /// be used by Impala. what does 'should be used' mean? Available for use? -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4996: Single-threaded KuduScanNode .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4996: Single-threaded KuduScanNode .. Patch Set 6: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/377/ -- To view, visit http://gerrit.cloudera.org:8080/6312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6312 to look at the new patch set (#6). Change subject: IMPALA-4996: Single-threaded KuduScanNode .. IMPALA-4996: Single-threaded KuduScanNode This introduces KuduScanNodeMt, the single-threaded version of KuduScanNode that materializes the tuples in GetNext(). KuduScanNodeMt is enabled by the same condition as HdfsScanNodeMt: mt_dop is greater than or equal to 1. To share code between the two implementations, KuduScanNode and KuduScanNodeMt are now subclasses of KuduScanNodeBase, which implements the shared code. The KuduScanner is minimally impacted, as it already had the required GetNext interface. Since the KuduClient is a heavy-weight object, it is now shared at the QueryState level. We try to share the KuduClient as much as possible, but there are times when the KuduClient cannot be shared. Each Kudu table has master addresses stored in the Hive Metastore. We only share KuduClients for tables that have an identical value for the master addresses. In the ideal case, every Kudu table will have the same value, but there is no explicit guarantee of this. The testing for this is a modified version of kudu-scan-node.test run with various mt_dop values. Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a --- M be/src/exec/CMakeLists.txt M be/src/exec/exec-node.cc M be/src/exec/hdfs-scan-node-base.h A be/src/exec/kudu-scan-node-base.cc A be/src/exec/kudu-scan-node-base.h A be/src/exec/kudu-scan-node-mt.cc A be/src/exec/kudu-scan-node-mt.h M be/src/exec/kudu-scan-node.cc M be/src/exec/kudu-scan-node.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java A testdata/workloads/functional-query/queries/QueryTest/mt-dop-kudu.test M tests/query_test/test_mt_dop.py 17 files changed, 558 insertions(+), 143 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/6312/6 -- To view, visit http://gerrit.cloudera.org:8080/6312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4787: Optimize APPX MEDIAN() memory usage
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage .. Patch Set 16: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Jim Apple has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4996: Single-threaded KuduScanNode
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4996: Single-threaded KuduScanNode .. Patch Set 5: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/376/ -- To view, visit http://gerrit.cloudera.org:8080/6312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e4593300e376bc508b78acaea64ffdd2c73a67a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs Previously, exprs used in sorts were evaluated lazily. This can potentially be bad for performance if the exprs are expensive to evaluate, and it can lead to crashes if the exprs are non-deterministic, as this violates assumptions of our sorting algorithm. This patch addresses these issues by materializing ordering exprs. It does so when the expr is non-deterministic (including when it contains a UDF, which we cannot currently know if they are non-deterministic), or when its cost exceeds a threshold (or the cost is unknown). Testing: - Added e2e tests in test_sort.py and test_queries.py. - Updated planner tests. Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e --- M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/analysis/FunctionName.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test A testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M tests/query_test/test_sort.py M tests/query_test/test_udfs.py 12 files changed, 326 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/6322/2 -- To view, visit http://gerrit.cloudera.org:8080/6322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs .. Patch Set 2: (24 comments) http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: Line 212: // True if this expr always returns the same value given the same input, false if it > True if this exprs always returns the same value given the same input. Fals Done Line 213: // does not, or if the behavior is unknown (e.g. UDFs). Set at the end of analyze() and > e.g. Done http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 292: // We currently don't have a way to indicate if a UDF is deterministic, so just always > Ideally we should not conflate the UDF and deterministic concepts. For exam Not sure what you mean by 'two separate members". Maybe Expr.containsNonDeterministicBuiltin and Expr.containsUDF? Also, it doesn't seem like FunctionCallExpr.isNondeterministicBuiltinFn() and Expr.isDeterministic() could ever be guaranteed to return opposite values, since a deterministic function that has non-deterministic parameters would constitute a non-deterministic expr. http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: Line 253: if (!(smap.getLhs().get(i) instanceof SlotRef) > use {} for multi-line if Done http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java File fe/src/main/java/org/apache/impala/analysis/SortInfo.java: Line 39: // All Exprs with cost greater than this will be materialized. Since we don't currently > Remove TODO, instead comment where this value came from, maybe with some ex Done Line 46: > needs a brief comment, in particular why we need to store them Done Line 172:*/ > We should collect the SlotRefs only for non-materialized ordering exprs. Done Line 180: // substOrderBy is a mapping from slot refs in the sort node's input and ordering > update comment to reflect the new contents of the substOrderBy smap Done Line 196: SlotDescriptor origSlotDesc = origSlotRef.getDesc(); > update comment, we are not only substituting slot refs Done Line 208: substituteOrderingExprs(substOrderBy, analyzer); > Say something about the args of this function. Also mention side-effects (p Done Line 210: // Update the tuple descriptor used to materialize the input of the sort. > make this the first sentence in this comment Done Line 211: setMaterializedTupleInfo(sortTupleDesc, sortTupleExprs); > no more hint, right? Done Line 214: } > from the materialized sort exprs to the new SlotRefs Done Line 216: /** > Instead of returning a new smap and combining with the other one, why not p I updated it to take the smap as a parameter, but the problem with returning the materialized exprs to avoid side effects is that we also need to update isAsc and nullsFirst. Line 217:* Materialize ordering exprs that are non-deterministic, are more expensive than a cost > analyzer needed? For 'addSlotDescriptor' Line 219:* 'sortTupleDesc' as their parent. > remove Done Line 228:* exprs that get materialized. > Why TODO? What happens when there are no sort exprs at all left? There's already a lot in this review, and I wasn't sure how easy this would be. I'm happy to take a look if you want. If there are no sort exprs, we still go through the whole sort but every row is considered equal (by TupleRowComparator::Compare"). This is the same behavior as would currently happen, except slightly faster as the literal expressions don't get repeatedly evaluated and compared. I added a test to at least show it works as expected. http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 374: public void testSortExprMaterialization() { > testSortExprMaterialization Done http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test File testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test: Line 19 > I think these tests need more examples cheap and expensive exprs to make su I'm not sure how to make that work with PlannerTest, but I added a test to query_tests/test_udfs that works. For UDAs, assuming that you mean something like: select bool_col, sum(min(int_col)) over(order by uda(int_col)) from functional.alltypes group by 1; I believe that will already be materialized, in a SlotRef created in
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. IMPALA-5077: add NUMA and current cpu to CpuInfo NUMA information is found using the /sys filesystem. The current CPU can be found using sched_getcpu(), which is supported on all recent Linux kernels (unfortunately CentOS 5 shipped with an older kernel, so we need a fallback). Testing: Confirmed that this built on a range of different Linux distros, including CentOS 5, which is missing support for features like sched_getcpu(). Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 --- M be/CMakeLists.txt A be/src/common/.gitignore M be/src/common/CMakeLists.txt A be/src/common/config.h.in M be/src/util/cpu-info.cc M be/src/util/cpu-info.h 6 files changed, 154 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/6402/3 -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple
[Impala-ASF-CR] IMPALA-5077: add NUMA and current cpu to CpuInfo
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5077: add NUMA and current cpu to CpuInfo .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/6402/2/be/src/util/cpu-info.cc File be/src/util/cpu-info.cc: PS2, Line 161: result > Just inline sched_getcpu() if you aren't going to use result. Done PS2, Line 165: Running > "Built"? Good point - done. Line 183: string filename = dir_it->path().filename().string(); > const Done Line 192: core_to_numa_node_ = new int[max_num_cores_]; > Will ASAN complain about this memory leak? LeakSanitiser would report it as "reachable" but it's easy enough to switch to unique_ptr. -- To view, visit http://gerrit.cloudera.org:8080/6402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0525228a56bcf20c45f78ee1ba1d300c74cf4d05 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes