[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
Alex Behm has posted comments on this change. Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. Patch Set 1: Oh sorry, just saw it was -1 -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
Alex Behm has posted comments on this change. Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. Patch Set 1: -Code-Review David, what made you +1? You seemed concerned. -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
David Knupp has posted comments on this change. Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. Patch Set 1: Code-Review-1 -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Michael Brown Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
David Knupp has posted comments on this change. Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. Patch Set 1: s/for that file/for that module/ -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
David Knupp has posted comments on this change. Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. Patch Set 1: So, I'm a little concerned about this. The datetime being installed here isn't the stdlib datetime. It's a thirdparty module from Zope. When you install it, you also get other dependencies... Installing collected packages: zope.interface, pytz, datetime Successfully installed datetime-4.2 pytz-2017.2 zope.interface-4.4.1 And if you look at the comments in the setup.py for that file, this is included: description="""\ This package provides a DateTime data type, as known from Zope 2. Unless you need to communicate with Zope 2 APIs, you're probably better off using Python's built-in datetime module.""".replace('\n', ' ') https://github.com/zopefoundation/DateTime/blob/master/setup.py -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 5: Code-Review+2 Looks like the GVO hit IMPALA-5358, which is non-deterministic. The fix for it is in GVO. While it is in progress, I'll give it another shot. -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 5: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/640/ -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
Alex Behm has posted comments on this change. Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. Patch Set 1: Code-Review+2 Thanks for being brave enough to switch to 16.04 and fix issues with it! -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: Alex Behm Gerrit-Reviewer: David Knupp Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 9: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/639/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.
Alex Behm has posted comments on this change. Change subject: IMPALA-5358: Fix repeatable table sample. .. Patch Set 1: Correct. -- To view, visit http://gerrit.cloudera.org:8080/6985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.
Henry Robinson has posted comments on this change. Change subject: IMPALA-5358: Fix repeatable table sample. .. Patch Set 1: Ok, thanks - I think you mean that inputParts was arbitrarily ordered, so the bug was non-deterministic (but most of the time, gave the correct results in this case). That sounds fine, agree that we would expect current tests to find this. -- To view, visit http://gerrit.cloudera.org:8080/6985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.
Alex Behm has posted comments on this change. Change subject: IMPALA-5358: Fix repeatable table sample. .. Patch Set 1: Thanks, Henry. Can you clarify what you mean by REPEAT() clause? Did you mean REPEATABLE? That clause is mandatory for most tests and the PlannerTest that failed did have a test. The underlying issue is that we store the partitions as a Mapso iterating over the map can give us the partitions in an arbitrary order. I'll keep thinking whether we can have a more direct test, but luckiy our tests did ultimately catch this issue :) -- To view, visit http://gerrit.cloudera.org:8080/6985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5358: Fix repeatable table sample. .. Patch Set 1: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/638/ -- To view, visit http://gerrit.cloudera.org:8080/6985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Alex Behm has posted comments on this change. Change subject: IMPALA-5030: Adds support for NVL2() function .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent TranGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 17: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 17: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/636/ -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Hello Marcel Kornacker, Impala Public Jenkins, Jim Apple, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6950 to look at the new patch set (#17). Change subject: IMPALA-5347: Parquet scanner microoptimizations .. IMPALA-5347: Parquet scanner microoptimizations A mix of microoptimizations that profiling the parquet scanner revealed. All lead to some measurable improvement and added up to significant speedups for various scans. * Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining in all cases. * Apply __restrict__ in a few places so the compiler knows that it is safe to cache values accessed via those pointers * memset() the whole batch instead of the null indicators is cases where it is almost certainly cheaper. * Avoid updating two correlated loop variables in MaterializeValueBatch(). * Avoid unnecessary initialization of often-unused 'val' in ReadSlot(). * Shave a few instructions off the (still very expensive) bit unpacking and dict decoding logic. Performance: Some local TPC-H and targeted-perf benchmarks showed average speedups of ~5%. I did some benchmarks targeted at measuring column materialisation performance using a version of lineitem with duplicated data to make it bigger. These queries all got significantly faster. Dict-encoded DECIMAL: 2.23 -> 1.23s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 Plain-encoded BIGINT: 2.33s -> 1.62s SELECT count(*) FROM biglineitem WHERE l_orderkey != 10 Dict-encoded STRING: 2.73s -> 1.72s SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A' Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy) SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37 Multiple columns: 5.15s -> 3.74s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100 Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 --- M be/src/common/compiler-util.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/tuple.h M be/src/util/bit-stream-utils.inline.h M be/src/util/bit-util.h M be/src/util/dict-encoding.h M be/src/util/rle-encoding.h 9 files changed, 117 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/17 -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 9: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/635/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04
Jim Apple has uploaded a new change for review. http://gerrit.cloudera.org:8080/6989 Change subject: IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 .. IMPALA-5366: datetime missing when upgrading to Ubuntu 16.04 Depending explicitly on datetime in requirements.txt fixes the issue, even though datetime is nominally included in the standard library. Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 --- M infra/python/deps/requirements.txt 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6989/1 -- To view, visit http://gerrit.cloudera.org:8080/6989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I41501d273967dc1d57d366a145766a67b2dbbaa0 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Testing: - Ran a private build which passed. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Reviewed-on: http://gerrit.cloudera.org:8080/6878 Reviewed-by: Taras BobrovytskyTested-by: Impala Public Jenkins --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M tests/custom_cluster/test_permanent_udfs.py 12 files changed, 323 insertions(+), 63 deletions(-) Approvals: Impala Public Jenkins: Verified Taras Bobrovytsky: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. IMPALA-4923: reduce memory transfer for selective scans Most of the code changes are to restructure things so that the scratch batch's tuple buffer is stored in a separate MemPool from auxiliary memory such as decompression buffers. This part of the change does not change the behaviour of the scanner in itself, but allows us to recycle the tuple buffer without holding onto unused auxiliary memory. The optimisation is implemented in TryCompact(): if enough rows were filtered out during the copy from the scratch batch to the output batch, the fixed-length portions of the surviving rows (if any) are copied to a new, smaller, buffer, and the original, larger, buffer is reused for the next scratch batch. Previously the large buffer was always attached to the output batch, so a large buffer was transferred between threads for every scratch batch processed. In combination with the decompression buffer change in IMPALA-5304, this means that in many cases selective scans don't produce nearly as many empty or near-empty batches and do not attach nearly as much memory to each batch. Performance: Even on an 8 core machine I see some speedup on selective scans. Profiling with "perf top" also showed that time in TCMalloc was reduced - it went from several % of CPU time to a minimal amount. Running TPC-H on the same machine showed a ~5% overall improvement and no regressions. E.g. Q6 got 20-25% faster. I hope to do some additional cluster benchmarking on systems with more cores to verify that the severe performance problems there are fixed, but in the meantime it seems like we have enough evidence that it will at least improve things. Testing: Add a couple of selective scans that exercise the new code paths. Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Reviewed-on: http://gerrit.cloudera.org:8080/6949 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-scratch-tuple-batch.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M testdata/workloads/functional-query/queries/QueryTest/parquet.test 6 files changed, 212 insertions(+), 60 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 16: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/632/ -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.
Henry Robinson has posted comments on this change. Change subject: IMPALA-5358: Fix repeatable table sample. .. Patch Set 1: Code-Review+2 Was there a REPEAT(..) clause that could have picked this up that we could put in a test? Feel free to commit w/o to unblock the tests. -- To view, visit http://gerrit.cloudera.org:8080/6985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Henry Robinson Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Vincent Tran has uploaded a new patch set (#4). Change subject: IMPALA-5030: Adds support for NVL2() function .. IMPALA-5030: Adds support for NVL2() function This change adds support to the function NVL2(expr1, expr2, expr3). This function returns expr2 if expr1 is not null, else it returns expr3. NVL2() is converted to IF() prior to analysis. Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 2 files changed, 29 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/6962/4 -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent TranGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Vincent Tran has posted comments on this change. Change subject: IMPALA-5030: Adds support for NVL2() function .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/6962/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: PS3, Line 100: params.exprs().get(0) > Will this work if the user simply calls nvl2() ? Should this be part of the Done Line 100: plist.add(new IsNullPredicate(params.exprs().get(0), true)); > What if NVL2() is called without params? How about doing: Done PS3, Line 106:} > nit: Wrong indentation. Done -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent TranGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 8: Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/634/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters .. IMPALA-5324: Fix version check in EvalDictionaryFilters Due to a bootstrapping issue with the dictionary filtering code change, the parquet version check used in EvalDictionaryFilters was checking for < 2.10. However, the impala 2.9 parquet contains the appropriate encoding, so this changes the version check to be < 2.9. Change-Id: Icc216332171038f74ff1d2ce3066da8167095361 Reviewed-on: http://gerrit.cloudera.org:8080/6969 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-scanner.cc 1 file changed, 3 insertions(+), 4 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Alex Behm has posted comments on this change. Change subject: IMPALA-5030: Adds support for NVL2() function .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6962/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 100: plist.add(new IsNullPredicate(params.exprs().get(0), true)); What if NVL2() is called without params? How about doing: List plist = Lists.newArrayList(params.exprs()); if (!plist.isEmpty()) { plist.set(0, new IsNullPredicate(plist.get(0), true)); } return new FunctionCallExpr("if", plist); -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent TranGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Michael Ho has posted comments on this change. Change subject: IMPALA-5030: Adds support for NVL2() function .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/6962/3/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: PS3, Line 100: params.exprs().get(0) Will this work if the user simply calls nvl2() ? Should this be part of the analysis ? PS3, Line 106:} nit: Wrong indentation. -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent TranGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Vincent Tran has uploaded a new patch set (#3). Change subject: IMPALA-5030: Adds support for NVL2() function .. IMPALA-5030: Adds support for NVL2() function This change adds support to the function NVL2(expr1, expr2, expr3). This function returns expr2 if expr1 is not null, else it returns expr3. NVL2() is converted to IF() prior to analysis. Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 --- M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 2 files changed, 30 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/6962/3 -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent TranGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Vincent Tran
[Impala-ASF-CR] IMPALA-5358: Fix repeatable table sample.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/6985 Change subject: IMPALA-5358: Fix repeatable table sample. .. IMPALA-5358: Fix repeatable table sample. The bug was a simple oversight where we iterated over an unordered list of partitions. Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/6985/1 -- To view, visit http://gerrit.cloudera.org:8080/6985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9110751b075430b068b52d7441e5845f86d1b6af Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-5030: Adds support for NVL2() function
Vincent Tran has posted comments on this change. Change subject: IMPALA-5030: Adds support for NVL2() function .. Patch Set 2: (16 comments) http://gerrit.cloudera.org:8080/#/c/6962/1//COMMIT_MSG Commit Message: Line 17 > Clean up commit msg Done http://gerrit.cloudera.org:8080/#/c/6962/2//COMMIT_MSG Commit Message: Line 11: expr3. > Briefly mention that NVL2() is converted to IF before analysis. Done http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 329: // Binary predicates must be rewritten to a canonical form for both Kudu predicate > Please look the places that reference BetweenToCompoundRule.INSTANCE. There Abandoned this approach in favor of IF FunctionCallExpr http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 85: public static Expr createExpr(FunctionName fnName, FunctionParams params) { > The current approach looks too complicated. I think we should directly tran Done http://gerrit.cloudera.org:8080/#/c/6962/2/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 94: if (fnName.getFnNamePath().get(0).equalsIgnoreCase("nvl2")) { > Please follow the same checks as above for "decode". You could have: Done Line 95: Preconditions.checkArgument(params.size()==3, "NVL2() " > Preconditions checks are like asserts. They are used to enforce code invari Done Line 97: Preconditions.checkArgument( > Same here, inappropriate use of Preconditions check. In addition, the match Done http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/main/java/org/apache/impala/rewrite/Nvl2ToCaseRule.java File fe/src/main/java/org/apache/impala/rewrite/Nvl2ToCaseRule.java: Line 33 > Provide class comment with examples like in the other ExprRewriteRules Done Line 39 > negate this condition and inline the Nvl2RewriteFunction for brevity Done Line 53 > Remove tabs. Done Line 55 > Why not IF? That seems shorter and more appropriate. Take a look at TupleIs Done http://gerrit.cloudera.org:8080/#/c/6962/2/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: Line 1605: > remove whitespace Done Line 1606: // NVL2() > mention that NVL2() is converted to IF before analysis Done Line 1607: AnalyzesOk("select nvl2(1, 'not null', 'null')"); > Add negative cases to show what error message will be shown, for example: Done http://gerrit.cloudera.org:8080/#/c/6962/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 81: > Remove tabs Done Line 85: RewritesOk("int_col not between float_col and double_col", rule, > Add additional tests: Done -- To view, visit http://gerrit.cloudera.org:8080/6962 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I32b03e9864f46c9c5e482280d1aa676ff7f02644 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vincent TranGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Vincent Tran Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 4: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/626/ -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-5338: Fix Kudu timestamp column default values .. IMPALA-5338: Fix Kudu timestamp column default values While support for TIMESTAMP columns in Kudu tables has been committed (IMPALA-5137), it does not support TIMESTAMP column default values. This supports CREATE TABLE syntax to specify the default values, but more importantly this fixes the loading of Kudu tables that may have had default values set on UNIXTIME_MICROS columns, e.g. if the table was created via the python client. This involves fixing KuduColumn to hide the LiteralExpr representing the default value because it will be a BIGINT if the column type is TIMESTAMP. It is only needed to call toSql() and toStringValue(), so helper functions are added to KuduColumn to encapsulate special logic for TIMESTAMP. TODO: Add support and tests for ALTER setting the default value (when IMPALA-4622 is committed). Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 --- M be/src/exec/kudu-scanner.cc M be/src/exprs/timestamp-functions-ir.cc M be/src/exprs/timestamp-functions.h M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java M fe/src/main/java/org/apache/impala/util/KuduUtil.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test M tests/query_test/test_kudu.py 14 files changed, 245 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/6936/2 -- To view, visit http://gerrit.cloudera.org:8080/6936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5338: Fix Kudu timestamp column default values .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 87: // post-analysis. For TIMESTAMPs those are Kudu UNIXTIME_MICROS, i.e. int64s. > i don't like that this changes its meaning as a result of analysis, because Done http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/util/KuduUtil.java File fe/src/main/java/org/apache/impala/util/KuduUtil.java: Line 167: public static Object getKuduDefaultValue(TExpr defaultValue, > supply function comment Done http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 2403: "ts timestamp not null default cast('2009-01-01 00:00:00' as timestamp)) " + > should we also allow a string constant here (that we then try to convert to Sure I can do that -- To view, visit http://gerrit.cloudera.org:8080/6936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. Patch Set 3: Code-Review+2 Great job on getting your first patch accepted! This looks good to me and I will rebase and merge it shortly. -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Pooja Nilangekar has posted comments on this change. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. Patch Set 3: > > Patch Set 3: Commit message was updated. -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Pooja Nilangekar has posted comments on this change. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/6982/1//COMMIT_MSG Commit Message: Line 9: Changed the argument of the message to print the size instead of > We usually try to keep the commit message line lengths to 70 characters per Done Line 10: the memory location. > Can you mention how you tested it over here? Done http://gerrit.cloudera.org:8080/#/c/6982/2//COMMIT_MSG Commit Message: PS2, Line 8: > Nit: The title can exceed 70 chars. We want the title to be on one line so Done Line 16: value of num_bytes and performing a select on a table in parquet format. > It'll also be good to mention why you're not adding automated tests. Done -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Pooja Nilangekar has posted comments on this change. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. Patch Set 3: > Patch Set 3: Commit message was updated. Done. -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/6982/2//COMMIT_MSG Commit Message: PS2, Line 8: instead of value Nit: The title can exceed 70 chars. We want the title to be on one line so we can see it in the single line commit list on github. I should have mentioned that earlier. Line 16:- num_bytes = *data_size + 1 It'll also be good to mention why you're not adding automated tests. -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4864 Speed up single slot predicates with dictionaries
Zach Amsden has uploaded a new patch set (#9). Change subject: IMPALA-4864 Speed up single slot predicates with dictionaries .. IMPALA-4864 Speed up single slot predicates with dictionaries When dictionaries are present we can pre-evaluate conjuncts against the dictionary values and simply look up the result. Status: ready for review. Seems to be passing all tests, runs with ASAN, and gives expected results. Need to come up with some specific test cases that exercise this functionality and measure performance. Basic idea: since we codegen so early, before we know enough details about the columns to know if they are dict filterable, if we do have dictionary filtering predicates, we codegen a guard around each dictionary filterable predicate evaluation. This guard skips evaluation of the predicate if it has already been evaluated by the dictionary. In this way, we can skip evaluation dynamically for each row group as we learn which columns are dictionary filterable, and then push predicate evaluation into the column reader. Since the branches will remain 100% predictable over the row group, this should give us the fastest way to skip over predicate evaluation without compromising the general case where we may be unable to evaluate against the dictionary. We can even do this with codegen turned off, as a side effect of the way we generate the codegen'd function when dictionary evaluation is enabled. If dictionaries aren't available for some predicates, we automatically fall back to evaluating those predicates in the original order, preserving selectivity. The overhead in this case is a perfectly predictable extra conditional per dictionary predicate. Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hash-join-node.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-parquet-scanner-ir.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/parquet-scratch-tuple-batch.h M be/src/exec/partitioned-hash-join-node.cc M be/src/runtime/query-state.h M be/src/runtime/row-batch.h M be/src/util/bitmap-test.cc M be/src/util/bitmap.h M be/src/util/dict-encoding.h M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test 20 files changed, 564 insertions(+), 180 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/6726/9 -- To view, visit http://gerrit.cloudera.org:8080/6726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I65981c89e5292086809ec1268f5a273f4c1fe054 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Zach AmsdenGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 8: > Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/630/ This patch depends on a recent commit to the impala-setup repo: https://github.com/awleblang/impala-setup/commit/5d1a054ac0307953cbdf495fe929ee1b84bc5335 The jenkins workers seem to run it once and not run it again for the lifetime of that worker. So the dependency will only get updated once my GVO gets fresh executors. So, I will try GVO-ing this multiple times in hopes that I will get fresh executors with the new dependency installed. -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Alex Behm has posted comments on this change. Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. .. Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/6840/6//COMMIT_MSG Commit Message: PS6, Line 30: > run Done http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS6, Line 304: else { : tableStats_ = new TTableStats(-1); : tableStats_.setTotal_file_bytes(-1); : } > Do you need this? Removed. http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/View.java File fe/src/main/java/org/apache/impala/catalog/View.java: PS6, Line 118: tableStats_ = new TTableStats(-1); : tableStats_.setTotal_file_bytes(-1); > Didn't you remove this? Maybe I am confused. My bad. really removed now. http://gerrit.cloudera.org:8080/#/c/6840/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS8, Line 724: post-sampling bytes > What is the "post-sampling bytes"? Rephrased. -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Pooja Nilangekar has uploaded a new patch set (#2). Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. IMPALA-5232: Parquet reader error message prints memory address instead of value Changed the argument of the message to print the size instead of the memory location. Testing: Manually tested the error message by modifying the value of num_bytes and performing a select on a table in parquet format. Tested both condition with the following values: - num_bytes = -1 - num_bytes = *data_size + 1 Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/6982/2 -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 8: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/630/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. .. Patch Set 8: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/6840/6//COMMIT_MSG Commit Message: PS6, Line 30: run http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS6, Line 304: else { : tableStats_ = new TTableStats(-1); : tableStats_.setTotal_file_bytes(-1); : } Do you need this? http://gerrit.cloudera.org:8080/#/c/6840/6/fe/src/main/java/org/apache/impala/catalog/View.java File fe/src/main/java/org/apache/impala/catalog/View.java: PS6, Line 118: tableStats_ = new TTableStats(-1); : tableStats_.setTotal_file_bytes(-1); Didn't you remove this? Maybe I am confused. http://gerrit.cloudera.org:8080/#/c/6840/8/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: PS8, Line 724: post-sampling bytes What is the "post-sampling bytes"? -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. Patch Set 11: Code-Review+2 Rebased and fixed the conflicts. Forwarding the +2. -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. Patch Set 11: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/633/ -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Taras Bobrovytsky has uploaded a new patch set (#11). Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Testing: - Ran a private build which passed. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M tests/custom_cluster/test_permanent_udfs.py 12 files changed, 323 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/11 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6878 to look at the new patch set (#11). Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Testing: - Ran a private build which passed. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M tests/custom_cluster/test_permanent_udfs.py 12 files changed, 323 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/11 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Hello Impala Public Jenkins, Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6878 to look at the new patch set (#11). Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. IMPALA-5259: Add REFRESH FUNCTIONS statement Before this patch, Impala relied on INVALIDATE METADATA to load externally added UDFs from HMS. The problem with this approach is that INVALIDATE METADATA affects all databases and tables in the entire cluster. In this patch, we add a REFRESH FUNCTIONS statement that reloads the functions of a database from HMS. We return a list of updated and removed db functions to the issuing Impalad in order to update its local catalog cache. Testing: - Ran a private build which passed. Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b --- M common/thrift/CatalogService.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/catalog/Function.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M tests/custom_cluster/test_permanent_udfs.py 12 files changed, 322 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/6878/11 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6982/1//COMMIT_MSG Commit Message: Line 9: Changed the argument of the message to print the size instead of the memory location. We usually try to keep the commit message line lengths to 70 characters per line. This is so that the message fits into the gerrit Web UI's commit message box. (It doesn't matter if only the title goes over 70 characters though) Line 10: Can you mention how you tested it over here? You can say "Testing: Tested manually by doing blah blah" and also give some reasoning as to why you cannot add an automated test for this. You can look at one of my previous commit messages to get a decent idea: https://gerrit.cloudera.org/#/c/6886/ -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 16: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/632/ -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 8: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/631/ -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 8: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6949/7/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: PS7, Line 40: // Number of tuples transferred to output batches (i.e. not filtered by predicates). > I know "output batches" already implies potentially more than one row batch Done -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Hello Michael Ho, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6949 to look at the new patch set (#8). Change subject: IMPALA-4923: reduce memory transfer for selective scans .. IMPALA-4923: reduce memory transfer for selective scans Most of the code changes are to restructure things so that the scratch batch's tuple buffer is stored in a separate MemPool from auxiliary memory such as decompression buffers. This part of the change does not change the behaviour of the scanner in itself, but allows us to recycle the tuple buffer without holding onto unused auxiliary memory. The optimisation is implemented in TryCompact(): if enough rows were filtered out during the copy from the scratch batch to the output batch, the fixed-length portions of the surviving rows (if any) are copied to a new, smaller, buffer, and the original, larger, buffer is reused for the next scratch batch. Previously the large buffer was always attached to the output batch, so a large buffer was transferred between threads for every scratch batch processed. In combination with the decompression buffer change in IMPALA-5304, this means that in many cases selective scans don't produce nearly as many empty or near-empty batches and do not attach nearly as much memory to each batch. Performance: Even on an 8 core machine I see some speedup on selective scans. Profiling with "perf top" also showed that time in TCMalloc was reduced - it went from several % of CPU time to a minimal amount. Running TPC-H on the same machine showed a ~5% overall improvement and no regressions. E.g. Q6 got 20-25% faster. I hope to do some additional cluster benchmarking on systems with more cores to verify that the severe performance problems there are fixed, but in the meantime it seems like we have enough evidence that it will at least improve things. Testing: Add a couple of selective scans that exercise the new code paths. Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-scratch-tuple-batch.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M testdata/workloads/functional-query/queries/QueryTest/parquet.test 6 files changed, 212 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6949/8 -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5232: Parquet reader error message prints memory address instead of value
Pooja Nilangekar has uploaded a new change for review. http://gerrit.cloudera.org:8080/6982 Change subject: IMPALA-5232: Parquet reader error message prints memory address instead of value .. IMPALA-5232: Parquet reader error message prints memory address instead of value Changed the argument of the message to print the size instead of the memory location. Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/6982/1 -- To view, visit http://gerrit.cloudera.org:8080/6982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6a8f424807877c38d6f113db924cec8c68828c70 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja Nilangekar
[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5338: Fix Kudu timestamp column default values .. Patch Set 1: (1 comment) Responding to Marcel's higher level comment first, I'll address the code comments assuming we don't change approaches. http://gerrit.cloudera.org:8080/#/c/6936/1//COMMIT_MSG Commit Message: Line 19: will be a BIGINT if the column type is TIMESTAMP. It is only > what if you stored the default as a timestamp literal instead (and then con The problem is that then the catalog would need to do the conversion from TimestampValue to unixtime micros. That would get hairy because the conversion function exists as a UDF, and it'd be really weird for the catalog to start calling UDFs. It could be exposed explicitly over JNI but that's also not ideal. This approach does involve some special casing, but I think at least we can be clear about when the conversion happens and what is stored on the KuduColumn class. -- To view, visit http://gerrit.cloudera.org:8080/6936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Alex Behm has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 7: Code-Review+2 Happy with the change after Michael's final comment is addressed -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 8: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/630/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 8: Code-Review+2 Updated bootstrap_build.sh to include the libffi-dev dependency after the discussion on the dev list. Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Hello Impala Public Jenkins, Matthew Jacobs, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6910 to look at the new patch set (#8). Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. IMPALA-5333: Add support for Impala to work with ADLS This patch leverages the AdlFileSystem in Hadoop to allow Impala to talk to the Azure Data Lake Store. This patch has functional changes as well as adds test infrastructure for testing Impala over ADLS. We do not support ACLs on ADLS since the Hadoop ADLS connector does not integrate ADLS ACLs with Hadoop users/groups. For testing, we use the azure-data-lake-store-python client from Microsoft. This client seems to have some consistency issues. For example, a drop table through Impala will delete the files in ADLS, however, listing that directory through the python client immediately after the drop, will still show the files. This behavior is unexpected since ADLS claims to be strongly consistent. Some tests have been skipped due to this limitation with the tag SkipIfADLS.slow_client. Tracked by IMPALA-5335. The azure-data-lake-store-python client also only works on CentOS 6.6 and over, so the python dependencies for Azure will not be downloaded when the TARGET_FILESYSTEM is not "adls". While running ADLS tests, the expectation will be that it runs on a machine that is at least running CentOS 6.6. Note: This is only a test limitation, not a functional one. Clusters with older OSes like CentOS 6.4 will still work with ADLS. Added another dependency to bootstrap_build.sh for the ADLS Python client. Testing: Ran core tests with and without TARGET_FILESYSTEM as 'adls' to make sure that all tests pass and that nothing breaks. Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 --- M bin/bootstrap_build.sh M bin/impala-config.sh M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M infra/python/bootstrap_virtualenv.py A infra/python/deps/adls-requirements.txt M infra/python/deps/compiled-requirements.txt M infra/python/deps/pip_download.py M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl M tests/common/impala_test_suite.py M tests/common/skip.py M tests/custom_cluster/test_hdfs_fd_caching.py M tests/custom_cluster/test_insert_behaviour.py M tests/custom_cluster/test_parquet_max_page_header.py M tests/custom_cluster/test_permanent_udfs.py M tests/data_errors/test_data_errors.py M tests/failure/test_failpoints.py M tests/metadata/test_compute_stats.py M tests/metadata/test_ddl.py M tests/metadata/test_hdfs_encryption.py M tests/metadata/test_hdfs_permissions.py M tests/metadata/test_hms_integration.py M tests/metadata/test_metadata_query_statements.py M tests/metadata/test_partition_metadata.py M tests/metadata/test_refresh_partition.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_compressed_formats.py M tests/query_test/test_hdfs_caching.py M tests/query_test/test_hdfs_fd_caching.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py M tests/query_test/test_join_queries.py M tests/query_test/test_nested_types.py M tests/query_test/test_observability.py M tests/query_test/test_partitioning.py M tests/query_test/test_scanners.py M tests/stress/test_ddl_stress.py A tests/util/adls_util.py M tests/util/filesystem_utils.py 42 files changed, 357 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6910/8 -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Michael Ho has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 7: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/6949/7/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: PS7, Line 40: // Number of tuples transferred to output batches (i.e. not filtered by predicates). I know "output batches" already implies potentially more than one row batch but it doesn't hurt to point that out explicitly here or below so readers can keep in mind that num_tuples_transferred > num_rows_to_commit implies transfer of some tuples to previous row batches. -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/6980/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS1, Line 502: !insertStmt.hasNoClusteredHint() : && !ctx_.isSingleNodeExec()) 1 line PS1, Line 504: Expr kuduPartitionExpr = new KuduPartitionExpr(DescriptorTable.TABLE_SINK_ID, : (KuduTable) insertStmt.getTargetTable(), : Lists.newArrayList(insertStmt.getPartitionKeyExprs()), : insertStmt.getPartitionColPos()); how about refactoring this to KuduUtil to avoid duplicate code w/ DistributedPlan which could get out of sync later? -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5354: INSERT hints for Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-5354: INSERT hints for Kudu tables .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6980/1/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test: Line 213: add test with both hints -- To view, visit http://gerrit.cloudera.org:8080/6980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbd1ef977446ffee157ce3ce0b476e1f08a75d05 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Jim Apple has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 15: Code-Review+2 THank you! -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/6950/14/be/src/common/compiler-util.h File be/src/common/compiler-util.h: Line 46: /// Clang is incredibly pedantic about __restrict__ and doesn't offer an option to be > Can you be more specific? Done -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Hello Marcel Kornacker, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6950 to look at the new patch set (#15). Change subject: IMPALA-5347: Parquet scanner microoptimizations .. IMPALA-5347: Parquet scanner microoptimizations A mix of microoptimizations that profiling the parquet scanner revealed. All lead to some measurable improvement and added up to significant speedups for various scans. * Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining in all cases. * Apply __restrict__ in a few places so the compiler knows that it is safe to cache values accessed via those pointers * memset() the whole batch instead of the null indicators is cases where it is almost certainly cheaper. * Avoid updating two correlated loop variables in MaterializeValueBatch(). * Avoid unnecessary initialization of often-unused 'val' in ReadSlot(). * Shave a few instructions off the (still very expensive) bit unpacking and dict decoding logic. Performance: Some local TPC-H and targeted-perf benchmarks showed average speedups of ~5%. I did some benchmarks targeted at measuring column materialisation performance using a version of lineitem with duplicated data to make it bigger. These queries all got significantly faster. Dict-encoded DECIMAL: 2.23 -> 1.23s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 Plain-encoded BIGINT: 2.33s -> 1.62s SELECT count(*) FROM biglineitem WHERE l_orderkey != 10 Dict-encoded STRING: 2.73s -> 1.72s SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A' Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy) SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37 Multiple columns: 5.15s -> 3.74s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100 Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 --- M be/src/common/compiler-util.h M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/tuple.h M be/src/util/bit-stream-utils.inline.h M be/src/util/bit-util.h M be/src/util/dict-encoding.h M be/src/util/rle-encoding.h 9 files changed, 117 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/15 -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5259: Add REFRESH FUNCTIONS statement
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5259: Add REFRESH FUNCTIONS statement .. Patch Set 10: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6878 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3625c88bb51cca833f3293c224d3f0feb00e6e0b Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Taras Bobrovytsky Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5338: Fix Kudu timestamp column default values
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-5338: Fix Kudu timestamp column default values .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/6936/1//COMMIT_MSG Commit Message: Line 19: will be a BIGINT if the column type is TIMESTAMP. It is only what if you stored the default as a timestamp literal instead (and then convert it to what kudu needs when that's required)? it feels like the current approach requires a bunch of special-casing. http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java: Line 87: // post-analysis. For TIMESTAMPs those are Kudu UNIXTIME_MICROS, i.e. int64s. i don't like that this changes its meaning as a result of analysis, because we're planning on making analysis idempotent (at some point in the future). let's make it a separate variable, and then you can get rid of isanalyzed_ http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/main/java/org/apache/impala/util/KuduUtil.java File fe/src/main/java/org/apache/impala/util/KuduUtil.java: Line 167: public static Object getKuduDefaultValue(TExpr defaultValue, supply function comment http://gerrit.cloudera.org:8080/#/c/6936/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 2403: "ts timestamp not null default cast('2009-01-01 00:00:00' as timestamp)) " + should we also allow a string constant here (that we then try to convert to a timestamp, and if that fails, we complain)? -- To view, visit http://gerrit.cloudera.org:8080/6936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I655910fb4805bb204a999627fa9f68e43ea8aaf2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 7: Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/628/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 14: I did a dry run and it turns out clang was unhappy with a couple of the qualifiers. I think clang's implementation of restrict is somewhat broken so as a workaround I made a macro to remove that qualifier for clang. -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE) .. IMPALA-5167: Reduce the number of Kudu clients created (BE) Creating Kudu clients is very expensive as each will fetch metadata from the Kudu master, so we should minimize the number of Kudu clients that get created. This patch stores a map from Kudu master addressed to Kudu clients in the ExecEnv to be used across the BE for all queries. Another patch will address the FE. This relies on a change on the Kudu side that clears non-covered range entries from the client's cache on table open (fdc022fe6231af20e307012d98c35b16cbfa7b33) Testing: - Ran a stress test on a 10 node cluster: scan of a small Kudu table, 1000 concurrent queries, load on the Kudu master was reduced signficantly, from ~50% cpu to ~5%. (with the FE changes included) - Ran the Kudu e2e tests. - Manually ran a test with concurrent INSERTs and 'ALTER TABLE ADD PARTITION' (which is affected by the Kudu side change mentiond above) and verified correctness. Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Reviewed-on: http://gerrit.cloudera.org:8080/6792 Reviewed-by: Matthew JacobsTested-by: Impala Public Jenkins --- M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-table-sink.h M be/src/exprs/kudu-partition-expr.cc M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h 8 files changed, 73 insertions(+), 69 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5167: Reduce the number of Kudu clients created (BE)
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5167: Reduce the number of Kudu clients created (BE) .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6b0c12a256c33e8ef32315b3736cae2dea2ae705 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: Line 102: ++num_output_batches; > I think the idea was right, but the code wasn't quite implementing the idea I looked again and realised we don't even need to track num_output_batches. num_tuples_transferred > num_to_commit implies that tuples were transferred to a previous batch. It's enough of a simplification it seems worth doing. http://gerrit.cloudera.org:8080/#/c/6949/4/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: PS4, Line 142: Tuple* uncompacted_tuple = row->GetTuple(0); > FOREACH_ROW() ? I looked earlier - the rows aren't committed to the batch yet so FOREACH_ROW doesn't work for this case. PS4, Line 155: } > not needed ? Yep, and also the ones below. -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6949 to look at the new patch set (#7). Change subject: IMPALA-4923: reduce memory transfer for selective scans .. IMPALA-4923: reduce memory transfer for selective scans Most of the code changes are to restructure things so that the scratch batch's tuple buffer is stored in a separate MemPool from auxiliary memory such as decompression buffers. This part of the change does not change the behaviour of the scanner in itself, but allows us to recycle the tuple buffer without holding onto unused auxiliary memory. The optimisation is implemented in TryCompact(): if enough rows were filtered out during the copy from the scratch batch to the output batch, the fixed-length portions of the surviving rows (if any) are copied to a new, smaller, buffer, and the original, larger, buffer is reused for the next scratch batch. Previously the large buffer was always attached to the output batch, so a large buffer was transferred between threads for every scratch batch processed. In combination with the decompression buffer change in IMPALA-5304, this means that in many cases selective scans don't produce nearly as many empty or near-empty batches and do not attach nearly as much memory to each batch. Performance: Even on an 8 core machine I see some speedup on selective scans. Profiling with "perf top" also showed that time in TCMalloc was reduced - it went from several % of CPU time to a minimal amount. Running TPC-H on the same machine showed a ~5% overall improvement and no regressions. E.g. Q6 got 20-25% faster. I hope to do some additional cluster benchmarking on systems with more cores to verify that the severe performance problems there are fixed, but in the meantime it seems like we have enough evidence that it will at least improve things. Testing: Add a couple of selective scans that exercise the new code paths. Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-scratch-tuple-batch.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M testdata/workloads/functional-query/queries/QueryTest/parquet.test 6 files changed, 210 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6949/7 -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function
Greg Rahn has posted comments on this change. Change subject: IMPALA-5326: [DOCS] Document REPLACE() function .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml File docs/topics/impala_string_functions.xml: PS1, Line 890: o > How about replace('hello world', '0', 'z') and just illustrate the original The lowercase folding is just something for the column heading/alias. -- To view, visit http://gerrit.cloudera.org:8080/6979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Alex Behm has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function
Zach Amsden has posted comments on this change. Change subject: IMPALA-5326: [DOCS] Document REPLACE() function .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml File docs/topics/impala_string_functions.xml: PS1, Line 890: o > Maybe best to alias the column/expression or use a different example. This How about replace('hello world', '0', 'z') and just illustrate the original string is returned if nothing matches. Show case sensitivity is not very useful, I think it is sufficient just to mention it. One other notable behavior - patterns which match their replacement do not consider the replacement for further expansion. So replace('foo', 'foo', 'foofoo') results in 'foofoo', not a heap / stack overflow. -- To view, visit http://gerrit.cloudera.org:8080/6979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters
Alex Behm has posted comments on this change. Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Alex Behm has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: Line 102: ++num_output_batches; > Turns out it's the incorrect thing to do (found a bug in testing by test_me I think the idea was right, but the code wasn't quite implementing the idea (sorry for the distraction). If we increment the non-empty output batches at the end and modify the check to non_empty_output_batches > 0, then I think it can work. However, even that is suboptimal because what we really want to count are the non-compacted output batches because those are they only ones can still reference the tuple mem. Anyway, seems easy to forget a case and break this code, so let's worry about that potential improvement later. -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function
Greg Rahn has posted comments on this change. Change subject: IMPALA-5326: [DOCS] Document REPLACE() function .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml File docs/topics/impala_string_functions.xml: PS1, Line 890: o > This looks like a bug. It might be worth calling out the oddness here that Maybe best to alias the column/expression or use a different example. This happens because impala lowercases all column aliases. -- To view, visit http://gerrit.cloudera.org:8080/6979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Michael Ho has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/6949/4/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: PS4, Line 142: for (int i = dst_batch->num_rows(); i < end_row; ++i) { FOREACH_ROW() ? PS4, Line 155: inline not needed ? -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5326: [DOCS] Document REPLACE() function
Jim Apple has posted comments on this change. Change subject: IMPALA-5326: [DOCS] Document REPLACE() function .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6979/1/docs/topics/impala_string_functions.xml File docs/topics/impala_string_functions.xml: PS1, Line 890: o This looks like a bug. It might be worth calling out the oddness here that 'o' is caps in the command but lowercase in the column name. -- To view, visit http://gerrit.cloudera.org:8080/6979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib576fba03673bd6708a46f45c8388477b25e34da Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Zach Amsden Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5324: Fix version check in EvalDictionaryFilters
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5324: Fix version check in EvalDictionaryFilters .. Patch Set 2: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/627/ -- To view, visit http://gerrit.cloudera.org:8080/6969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icc216332171038f74ff1d2ce3066da8167095361 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6950 to look at the new patch set (#13). Change subject: IMPALA-5347: Parquet scanner microoptimizations .. IMPALA-5347: Parquet scanner microoptimizations A mix of microoptimizations that profiling the parquet scanner revealed. All lead to some measurable improvement and added up to significant speedups for various scans. * Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining in all cases. * Apply __restrict__ in a few places so the compiler knows that it is safe to cache values accessed via those pointers * memset() the whole batch instead of the null indicators is cases where it is almost certainly cheaper. * Avoid updating two correlated loop variables in MaterializeValueBatch(). * Avoid unnecessary initialization of often-unused 'val' in ReadSlot(). * Shave a few instructions off the (still very expensive) bit unpacking and dict decoding logic. Performance: Some local TPC-H and targeted-perf benchmarks showed average speedups of ~5%. I did some benchmarks targeted at measuring column materialisation performance using a version of lineitem with duplicated data to make it bigger. These queries all got significantly faster. Dict-encoded DECIMAL: 2.23 -> 1.23s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 Plain-encoded BIGINT: 2.33s -> 1.62s SELECT count(*) FROM biglineitem WHERE l_orderkey != 10 Dict-encoded STRING: 2.73s -> 1.72s SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A' Plain-encoded STRING: 7.07s -> 6.08s (most time spent in Snappy) SELECT count(*) FROM biglineitem WHERE length(l_comment) > 37 Multiple columns: 5.15s -> 3.74s SELECT count(*) FROM biglineitem WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100 Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/runtime/tuple.h M be/src/util/bit-stream-utils.inline.h M be/src/util/bit-util.h M be/src/util/dict-encoding.h M be/src/util/rle-encoding.h 8 files changed, 107 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/13 -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 979: Status HdfsParquetScanner::ResetScratchBatch() { > Looks like this is only called in a single place, so might as well move the Done http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: Line 492: for (int i = 0; i < num_tuples; ++i) { > int64_t i since num_tuples is an int64_t Done -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5347: Parquet scanner microoptimizations
Alex Behm has posted comments on this change. Change subject: IMPALA-5347: Parquet scanner microoptimizations .. Patch Set 12: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 979: Status HdfsParquetScanner::ResetScratchBatch() { Looks like this is only called in a single place, so might as well move the two lines there and remove this function. http://gerrit.cloudera.org:8080/#/c/6950/11/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: Line 492: for (int i = 0; i < num_tuples; ++i) { int64_t i since num_tuples is an int64_t -- To view, visit http://gerrit.cloudera.org:8080/6950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 7: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/624/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE
John Russell has uploaded a new change for review. http://gerrit.cloudera.org:8080/6981 Change subject: IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE .. IMPALA-5359: [DOCS] Document SORT BY syntax for CREATE TABLE and ALTER TABLE Review part 1: Just tackling the CREATE TABLE syntax and examples for now. Will have separate gerrits for ALTER TABLE and more far-flung places to mention this feature (New Features, Performance Considerations, Parquet File Format). Change-Id: Icd571cd8840368edb327d16d27192458838ef234 --- M docs/topics/impala_create_table.xml 1 file changed, 74 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/6981/1 -- To view, visit http://gerrit.cloudera.org:8080/6981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Icd571cd8840368edb327d16d27192458838ef234 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 4: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/626/ -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5325: Do not update totalHdfsBytes /numHdfsFiles on Catalogd
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5325: Do not update totalHdfsBytes_/numHdfsFiles_ on Catalogd .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I03cc6f9e9b2c03cafb87029ea0802dfdb2745be1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2373: Extrapolate row counts for HDFS tables.
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6840 to look at the new patch set (#6). Change subject: IMPALA-2373: Extrapolate row counts for HDFS tables. .. IMPALA-2373: Extrapolate row counts for HDFS tables. The main idea of this patch is to use table stats to extrapolate the row counts for new/modified partitions. Existing behavior: - Partitions that lack the row count stat are ignored when estimating the cardinality of HDFS scans. Such partitions effectively have an estimated row count of zero. - We always use the row count stats for partitions that have one. The row count may be innaccurate if data in such partitions has changed significantly. Summary of changes: - Enhance COMPUTE STATS to also store the total number of file bytes in the table. - Use the table-level row count and file bytes stats to estimate the number of rows in a scan. - A new impalad startup flag is added to enable/disable the extrapolation behavior. The feature is disabled by default. Note that even with the feature disabled, COMPUTE STATS stores the file bytes so you can enable the feature without having to COMPUTE STATS again. Testing: - Added new FE unit test - Added new EE test Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 --- M be/src/common/global-flags.cc M be/src/exec/catalog-op-executor.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/CatalogObjects.thrift M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/View.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java A fe/src/test/java/org/apache/impala/planner/StatsExtrapolationTest.java M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test A testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test A tests/custom_cluster/test_stats_extrapolation.py M tests/metadata/test_explain.py 27 files changed, 766 insertions(+), 177 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/6840/6 -- To view, visit http://gerrit.cloudera.org:8080/6840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I972c8a03ed70211734631a7dc9085cb33622ebc4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar
[Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5333: Add support for Impala to work with ADLS .. Patch Set 7: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/624/ -- To view, visit http://gerrit.cloudera.org:8080/6910 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6949 to look at the new patch set (#6). Change subject: IMPALA-4923: reduce memory transfer for selective scans .. IMPALA-4923: reduce memory transfer for selective scans Most of the code changes are to restructure things so that the scratch batch's tuple buffer is stored in a separate MemPool from auxiliary memory such as decompression buffers. This part of the change does not change the behaviour of the scanner in itself, but allows us to recycle the tuple buffer without holding onto unused auxiliary memory. The optimisation is implemented in TryCompact(): if enough rows were filtered out during the copy from the scratch batch to the output batch, the fixed-length portions of the surviving rows (if any) are copied to a new, smaller, buffer, and the original, larger, buffer is reused for the next scratch batch. Previously the large buffer was always attached to the output batch, so a large buffer was transferred between threads for every scratch batch processed. In combination with the decompression buffer change in IMPALA-5304, this means that in many cases selective scans don't produce nearly as many empty or near-empty batches and do not attach nearly as much memory to each batch. Performance: Even on an 8 core machine I see some speedup on selective scans. Profiling with "perf top" also showed that time in TCMalloc was reduced - it went from several % of CPU time to a minimal amount. Running TPC-H on the same machine showed a ~5% overall improvement and no regressions. E.g. Q6 got 20-25% faster. I hope to do some additional cluster benchmarking on systems with more cores to verify that the severe performance problems there are fixed, but in the meantime it seems like we have enough evidence that it will at least improve things. Testing: Add a couple of selective scans that exercise the new code paths. Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-scratch-tuple-batch.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M testdata/workloads/functional-query/queries/QueryTest/parquet.test 6 files changed, 211 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/6949/6 -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4923: reduce memory transfer for selective scans
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4923: reduce memory transfer for selective scans .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6949/3/be/src/exec/parquet-scratch-tuple-batch.h File be/src/exec/parquet-scratch-tuple-batch.h: Line 102: void FinalizeTupleTransfer(RowBatch* dst_batch, int num_to_commit) { > Done. Yeah that's a more precise thing to do. Turns out it's the incorrect thing to do (found a bug in testing by test_mem_usage_scaling). The problem is if you have a non-empty output batch, followed by an empty one, this is 1, but it's actually invalid to reuse the tuple memory, because the first batch is still referencing it. -- To view, visit http://gerrit.cloudera.org:8080/6949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3773dc63c498e295a2c1386a15c5e69205e747ea Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes