[Impala-ASF-CR] IMPALA-6948,6962: add end-to-end tests
Vuk Ercegovac has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10291 Change subject: IMPALA-6948,6962: add end-to-end tests .. IMPALA-6948,6962: add end-to-end tests Adds end-to-end tests to validate that following various metadata operations, the catalog state in catalogd and impalads is the same. For IMPALA-6962, catalogd process restart for tests is fixed. Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5 --- M tests/common/impala_cluster.py M tests/common/impala_service.py A tests/custom_cluster/test_metadata_replicas.py 3 files changed, 172 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/10291/1 -- To view, visit http://gerrit.cloudera.org:8080/10291 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic6c5b39e29b2885cd30fede18833cbf23fb755f5 Gerrit-Change-Number: 10291 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6948: Delete catalog update topic entries upon catalog restart
Dimitris Tsirogiannis has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10289 Change subject: IMPALA-6948: Delete catalog update topic entries upon catalog restart .. IMPALA-6948: Delete catalog update topic entries upon catalog restart This commit fixes an issue where the statestore may end up with stale entries in the catalog update topic that do not correspond to the catalog objects stored in the catalog. This may occur if the catalog server restarts and some catalog object (e.g. table) that was known to the catalog before the restart no longer exists in the Hive Metastore after the restart. Fix: The first update for the catalog update topic that is sent by the catalog instructs the statestore to clear any entries it may have on this topic before applying the first update. In that way, we guarantee that the statestore entries are consistent with the catalog objects stored in the catalog server. Any coordinator that detects the catalog restart will receive from the statestore a full topic update that reflects the state of the catalog server. Testing: Added statestore test. Change-Id: I907509bf92da631ece5efd23c275a613ead00e91 Tmp Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6 --- M be/src/catalog/catalog-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M tests/statestore/test_statestore.py 6 files changed, 84 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/10289/1 -- To view, visit http://gerrit.cloudera.org:8080/10289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I74a8ade8e498ac35cb56d3775d2c67a86988d9b6 Gerrit-Change-Number: 10289 Gerrit-PatchSet: 1 Gerrit-Owner: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-6337: Fix infinite loop in Impala shell
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/9195 ) Change subject: IMPALA-6337: Fix infinite loop in Impala shell .. Patch Set 11: > Patch Set 11: > > > Patch Set 11: > > > > > Do you know if this bug exists in 0.1.19? We've actually switched to > > > > that version elsewhere in the Impala python infra. > > > https://github.com/apache/impala/blob/master/infra/python/deps/requirements.txt#L65 > > > > Yes, the bug still exists in 0.1.19. However, depending how we use it, > > there may not be a real issue in Python infra. The infinite loop in Impala > > shell is a specific case because of the way we use sqlparse.split. > > Here's my concern. Work has already started on turning the impala-shell into > an actual python package for distributing through PyPI. See: > > https://gerrit.cloudera.org/c/9771/ > https://issues.apache.org/jira/browse/IMPALA-1071 > https://issues.apache.org/jira/browse/IMPALA-6808 > > With that change, we probably won't be relying on an internally-bundled > version of sqlparse anymore, especially if we're uploading impala-shell to > PyPI. Since we've hit several issues with sqlparse and upgrading may not be an easy option, do you think it's better to bundle our own sqlparse even with distributable impala-shell similar to how we deal with native-toolchain? -- To view, visit http://gerrit.cloudera.org:8080/9195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b Gerrit-Change-Number: 9195 Gerrit-PatchSet: 11 Gerrit-Owner: Fredy WijayaGerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 03 May 2018 01:27:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5706: Parallelise read I/O in sorter
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9943 ) Change subject: IMPALA-5706: Parallelise read I/O in sorter .. Patch Set 7: (16 comments) Looking good http://gerrit.cloudera.org:8080/#/c/9943/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9943/7//COMMIT_MSG@39 PS7, Line 39: double-buffering the number of runs in a single merge decreases and I I think we still need data to decide whether the double-buffering is worth it. Even without that the patch is a big improvement to the logic. http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@a1052 PS7, Line 1052: Can we preserve this DCHECK but make it about the next page being pinned, or is it too awkward? We could move it into the if (is_pinned_) and make it something like: DCHECK(page_index + 1 == pages->size() || pages[page_index + 1]->is_pinned()); http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@880 PS7, Line 880: // Attempt to pin the first fixed and var-length pages. Comment needs updating. http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@881 PS7, Line 881: if (fixed_len_pages_.size() > 0) { Might be cleaner to rewrite these as loops, e.g. for (int i = 0; i < max(2, fixed_len_pages_.size()), i++) http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1049 PS7, Line 1049: page_index nit: idx/index inconsistency. I'm fine with either but the inconsistency in the function is a little distracting. http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1052 PS7, Line 1052: DCHECK(next_unpinned_page != nullptr); This DCHECK doesn't seem useful - if we messed up the vector addressing arithmetic we would almost certainly get an invalid but non-NULL pointer. http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1551 PS7, Line 1551: int64_t Sorter::ComputeMinReservation() { Not a big deal but we could reduce the vertical whitespace in this function. I don't mind it personally but some people prefer denser code. http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1682 PS7, Line 1682: for (int i = 0; i < sorted_runs_.size(); ++i) { Maybe use range-based for loop? I find it more readable personally. http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1706 PS7, Line 1706: int pages_needed_for_full_merge = sorted_runs_.size() * PinnedPagesPerRunForMerge(); Couldn't we compute this more accurately by doing a pass over sorted_runs_? E.g. pages_needed = 0 for (run : sorted_runs_) { pages_needed += min(2, run->num_fixed_len_pages()); pages_needed += min(2, run->num_var_len_pages()); } http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1721 PS7, Line 1721: if (max_runs_per_intermediate_merge + 1 >= sorted_runs_.size()) { I think my above comment about accurately computing the requirement for the final merge applies here too. I guess the maximum runs per intermediate merge and the requirement for the final merge are different since the first is an upper bound only. http://gerrit.cloudera.org:8080/#/c/9943/7/be/src/runtime/sorter.cc@1727 PS7, Line 1727: if (max_runs_per_intermediate_merge > sorted_runs_.size() / 2) { I had a bit of trouble reasoning through whether this calculation was correct for edge cases. E.g. if sorted_runs_.size() == 17 and max_runs_per_intermediate_merge == 8, this is false, which seems correct. But if sorted_runs_.size() == 16 and max_runs_per_intermediate_merge == 8, then is false, but it seems like it should be true, because we can: 1. Merge 8 runs into 1. 2. Merge 9 runs into the final output. For me it might be a bit easier to understand if it was instead: max_runs_per_intermediate_merge * 2 >= sorted_runs_.size() Since that avoids the rounding. http://gerrit.cloudera.org:8080/#/c/9943/7/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test File testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test: PS7: IMPALA-4835 probably caused a bunch of conflicts in the planner tests. Hopefully it isn't too much of a pain to regenerate the output - you can copy over the actual output from logs/fe_Tests/PlannerTests (maybe you're already doing that). http://gerrit.cloudera.org:8080/#/c/9943/7/tests/query_test/test_sort.py File tests/query_test/test_sort.py: http://gerrit.cloudera.org:8080/#/c/9943/7/tests/query_test/test_sort.py@45 PS7, Line 45: def test_multiple_mem_limits(self, vector): Are these tests now tuned for a 3-node minicluster? Maybe we should skip validating TotalMergesPerformed if we're not running against such a minicluster - I'm not sure if it will still pass on a one-node local build.
[Impala-ASF-CR] IMPALA-4123 (prep): Parquet column reader cleanup
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9799 ) Change subject: IMPALA-4123 (prep): Parquet column reader cleanup .. Patch Set 6: Code-Review+1 I'm confident that this is ready to be submitted. Dan, do you want to have another look? -- To view, visit http://gerrit.cloudera.org:8080/9799 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibc00352df3a0b2d605f872ae7e43db2dc90faab1 Gerrit-Change-Number: 9799 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 May 2018 01:03:06 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10273 ) Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 10273 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 03 May 2018 01:01:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6337: Fix infinite loop in Impala shell
David Knupp has posted comments on this change. ( http://gerrit.cloudera.org:8080/9195 ) Change subject: IMPALA-6337: Fix infinite loop in Impala shell .. Patch Set 11: > Patch Set 11: > > > Do you know if this bug exists in 0.1.19? We've actually switched to > that > > version elsewhere in the Impala python infra. > > https://github.com/apache/impala/blob/master/infra/python/deps/requirements.txt#L65 > > Yes, the bug still exists in 0.1.19. However, depending how we use it, there > may not be a real issue in Python infra. The infinite loop in Impala shell is > a specific case because of the way we use sqlparse.split. Here's my concern. Work has already started on turning the impala-shell into an actual python package for distributing through PyPI. See: https://gerrit.cloudera.org/c/9771/ https://issues.apache.org/jira/browse/IMPALA-1071 https://issues.apache.org/jira/browse/IMPALA-6808 With that change, we probably won't be relying on an internally-bundled version of sqlparse anymore, especially if we're uploading impala-shell to PyPI. -- To view, visit http://gerrit.cloudera.org:8080/9195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9142f21a888189d351f00ce09baeba123bc0959b Gerrit-Change-Number: 9195 Gerrit-PatchSet: 11 Gerrit-Owner: Fredy WijayaGerrit-Reviewer: David Knupp Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 03 May 2018 00:41:53 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6908: IsConnResetTException() should include ECONNRESET
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10265 ) Change subject: IMPALA-6908: IsConnResetTException() should include ECONNRESET .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc File be/src/rpc/thrift-util.cc: http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc@209 PS1, Line 209: strncmp(PACKAGE_VERSION, "0.9.0", 5)); > As Sailesh asked earlier, do we even need this? Or, couldn't we make it a c Oh missed that earlier comment. We had some static_assert() from line 63-68 so this may not be necessary after all. May be it helps to move the static_assert() closer to this function ? -- To view, visit http://gerrit.cloudera.org:8080/10265 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I1bb997a833917e5166c9ca192da219f50f4439e2 Gerrit-Change-Number: 10265 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 02 May 2018 23:59:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps
Alex Rodoni has posted comments on this change. ( http://gerrit.cloudera.org:8080/10285 ) Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml File docs/topics/impala_breakpad.xml: http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml@79 PS2, Line 79: Cloudera Manager > This should not be here, since we are reviewing the documentation for Apach Done -- To view, visit http://gerrit.cloudera.org:8080/10285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 Gerrit-Change-Number: 10285 Gerrit-PatchSet: 2 Gerrit-Owner: Alex RodoniGerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 02 May 2018 23:20:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps
Hello Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10285 to look at the new patch set (#3). Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps .. IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 --- M docs/topics/impala_breakpad.xml 1 file changed, 20 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/10285/3 -- To view, visit http://gerrit.cloudera.org:8080/10285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 Gerrit-Change-Number: 10285 Gerrit-PatchSet: 3 Gerrit-Owner: Alex RodoniGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10285 ) Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml File docs/topics/impala_breakpad.xml: http://gerrit.cloudera.org:8080/#/c/10285/2/docs/topics/impala_breakpad.xml@79 PS2, Line 79: Cloudera Manager This should not be here, since we are reviewing the documentation for Apache Impala. The description should be similar to the one above, and it's up to the users how they set a parameter. -- To view, visit http://gerrit.cloudera.org:8080/10285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 Gerrit-Change-Number: 10285 Gerrit-PatchSet: 2 Gerrit-Owner: Alex RodoniGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 02 May 2018 23:07:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java@41 PS6, Line 41: LZO, : LZO_INDEX, //Lzo index file. : LZ4, : ZSTD; > This and related enums (THdfsCompression, the flatbuffer version) are used Filed IMPALA-6963 -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 02 May 2018 23:00:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Hello Zoltan Borok-Nagy, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10165 to look at the new patch set (#7). Change subject: IMPALA-6941: load more text scanner compression plugins .. IMPALA-6941: load more text scanner compression plugins Add extensions for LZ4 and ZSTD (which are supported by Hadoop). Even without a plugin this results in better behaviour because we don't try to treat the files with unknown extensions as uncompressed text. Also allow loading tables containing files with unsupported compression types. There was weird behaviour before we knew of the file extension but didn't support querying the table - the catalog would load the table but the impalad would fail processing the catalog update. The simplest way to fix it is to just allow loading the tables. Switch to always checking plugin version - running mismatched plugin is inherently unsafe. Testing: Positive case where LZO is loaded is exercised. Added coverage for negative case where LZO is disabled. Fixed test gaps: * Querying LZO table with LZO plugin not available. * Interacting with tables with known but unsupported text compressions. * Querying files with unknown compression suffixes (which are treated as uncompressed text). Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 --- M be/src/common/global-flags.cc M be/src/exec/CMakeLists.txt D be/src/exec/hdfs-lzo-text-scanner.cc D be/src/exec/hdfs-lzo-text-scanner.h A be/src/exec/hdfs-plugin-text-scanner.cc A be/src/exec/hdfs-plugin-text-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/util/backend-gflag-util.cc M common/fbs/CatalogObjects.fbs M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/disable-lzo-plugin.test A testdata/workloads/functional-query/queries/QueryTest/unsupported-compression-partitions.test A tests/custom_cluster/test_scanner_plugin.py M tests/metadata/test_partition_metadata.py 20 files changed, 493 insertions(+), 239 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/10165/7 -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 7 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 6: (12 comments) http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h File be/src/exec/hdfs-plugin-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@63 PS6, Line 63: HdfsScanner* (*create_scanner_fn) : (HdfsScanNodeBase* scan_node, RuntimeState* state) > Maybe you could create a typedef for the function pointers, e.g.: Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@67 PS6, Line 67: Status (*issue_initial_ranges_fn)( : HdfsScanNodeBase* scan_node, const std::vector& files) > same as above Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@73 PS6, Line 73: SpinLock > Wouldn't be better to use some read/write lock like boost::shared_mutex? Yeah I think that makes sense. I was thinking that the critical section is probably short enough that it doesn't matter in practice but we don't have to reason about that at all with the shared_mutex. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc File be/src/exec/hdfs-plugin-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@42 PS6, Line 42: "(Advanced) whitelist of HDFS " : "text scanner plugins that Impala will try to dynamically load." > I think you should mention that it is comma-separated, or give an example. I briefly summarized what the plugin does (versus the builtin text parsing). I think that addresses the question but lmk if it doesn't. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@56 PS6, Line 56: HdfsScanner* (*create_scanner_fn) (HdfsScanNodeBase* scan_node, RuntimeState* state) > You could use the typedef here. Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@71 PS6, Line 71: Status (*issue_initial_ranges_fn)( : HdfsScanNodeBase* scan_node, const std::vector & files) > and here Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@123 PS6, Line 123: "GetImpalaBuildVersion", reinterpret_cast (_plugin_impala_build_version))); > nit: too long line Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@126 PS6, Line 126: stringstream ss; : ss << "Scanner plugin " << plugin_name << " was built against Impala version " :<< get_plugin_impala_build_version() << ", but the running Impala version is " :<< GetDaemonBuildVersion(); > nit: I think it's a little inconsistent that we use stringstream here and S Done. Yeah I got lazy about converting this :) http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc@643 PS6, Line 643: =_ > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc@129 PS6, Line 129: =_ > nit: missing space Done http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java@41 PS6, Line 41: LZO, : LZO_INDEX, //Lzo index file. : LZ4, : ZSTD; > should we make these (and e.g. file extension) be runtime parameters as wel This and related enums (THdfsCompression, the flatbuffer version) are used in a lot of places, so it would probably require a lot of changes. I don't think it's inherently hard but it would take a while to verify that the fix in each place is correct. At least now we have all the built-in Hadoop decompression codecs: https://github.com/apache/hadoop/tree/a0a276162147e843a5a4e028abdca5b66f5118da/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress so we could just extend this list when a codec is added to Hadoop (which seems to happen at most once a year). http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py@168 PS6, Line 168: #unique_database = 'test_db' > nit: remove this line Done -- To view, visit
[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10279 ) Change subject: IMPALA-6931: reduces races in query expiration tests .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d Gerrit-Change-Number: 10279 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 02 May 2018 22:14:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10279 ) Change subject: IMPALA-6931: reduces races in query expiration tests .. IMPALA-6931: reduces races in query expiration tests Recent tests ran into flakiness when testing query expiration. This change makes two changes: 1) query state is retrieved earlier; a flaky test skipped the expected state. 2) bump the timing; a flaky test had queries expire before it could check them Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d Reviewed-on: http://gerrit.cloudera.org:8080/10279 Reviewed-by: Dan HechtTested-by: Impala Public Jenkins --- M tests/custom_cluster/test_query_expiration.py 1 file changed, 5 insertions(+), 7 deletions(-) Approvals: Dan Hecht: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d Gerrit-Change-Number: 10279 Gerrit-PatchSet: 2 Gerrit-Owner: Vuk Ercegovac Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps
Hello Lars Volker, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10285 to look at the new patch set (#2). Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps .. IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 --- M docs/topics/impala_breakpad.xml 1 file changed, 20 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/10285/2 -- To view, visit http://gerrit.cloudera.org:8080/10285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 Gerrit-Change-Number: 10285 Gerrit-PatchSet: 2 Gerrit-Owner: Alex RodoniGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6961: [DOCS] Doc --enable minidump flag to disable minidumps
Alex Rodoni has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10285 Change subject: IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps .. IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 --- M docs/topics/impala_breakpad.xml 1 file changed, 19 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/10285/1 -- To view, visit http://gerrit.cloudera.org:8080/10285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5 Gerrit-Change-Number: 10285 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni
[Impala-ASF-CR] IMPALA-6949: Add the option to start the minicluster with EC enabled
Taras Bobrovytsky has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/10275 ) Change subject: IMPALA-6949: Add the option to start the minicluster with EC enabled .. IMPALA-6949: Add the option to start the minicluster with EC enabled In this patch we add the "ERASURE_CODING" enviornment variable. If we enable it, a cluster with 5 data nodes will be created during data loading and HDFS will be started with erasure coding enabled. Testing: I ran the core build, and verified that erasure coding gets enabled in HDFS. Many of our EE tests failed however. Change-Id: I397aed491354be21b0a8441ca671232dca25146c --- M bin/impala-config.sh M bin/run-all-tests.sh M testdata/bin/setup-hdfs-env.sh M testdata/cluster/admin M testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl 5 files changed, 28 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/10275/3 -- To view, visit http://gerrit.cloudera.org:8080/10275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I397aed491354be21b0a8441ca671232dca25146c Gerrit-Change-Number: 10275 Gerrit-PatchSet: 3 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-6949: Add the option to start the minicluster with EC enabled
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/10275 ) Change subject: IMPALA-6949: Add the option to start the minicluster with EC enabled .. Patch Set 2: (5 comments) Yes, nested data loading succeeds every time I tried it. http://gerrit.cloudera.org:8080/#/c/10275/2/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/10275/2/bin/impala-config.sh@449 PS2, Line 449: elif [ "${TARGET_FILESYSTEM}" = "hdfs-ec" ]; then > By making this its own TARGET_FILESYSTEM, you end up disabling a bunch of t I don't think it makes sense to make erasure coding its own file system. I made a separate flag for it. http://gerrit.cloudera.org:8080/#/c/10275/2/bin/impala-config.sh@454 PS2, Line 454: echo "Valid values are: hdfs, isilon, s3, local" > Please update? No need to update any more. http://gerrit.cloudera.org:8080/#/c/10275/2/bin/run-all-tests.sh File bin/run-all-tests.sh: http://gerrit.cloudera.org:8080/#/c/10275/2/bin/run-all-tests.sh@71 PS2, Line 71: FE_TEST=false > Add a comment about why? Done http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/bin/setup-hdfs-env.sh File testdata/bin/setup-hdfs-env.sh: http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/bin/setup-hdfs-env.sh@80 PS2, Line 80: hdfs ec -unsetPolicy -path "${HDFS_EC_PATH:=/test-warehouse/}" > Do you know what this does underneath the covers? Do we need to block while This does not convert the existing files in the directory. It only affects the files that will be placed in the directory in the future. This is why it is possible to have some files erasure coded and some not in the same directory. http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl File testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl: http://gerrit.cloudera.org:8080/#/c/10275/2/testdata/cluster/node_templates/common/etc/hadoop/conf/hdfs-site.xml.tmpl@26 PS2, Line 26: cloudera.erasure_coding.enabled > What's this for? This is needed because running the following command hdfs ec -enablePolicy -policy RS-3-2-1024k Causes this error: RemoteException: enableErasureCodingPolicy is not allowed because cloudera.erasure_coding.enabled=false -- To view, visit http://gerrit.cloudera.org:8080/10275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I397aed491354be21b0a8441ca671232dca25146c Gerrit-Change-Number: 10275 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Wed, 02 May 2018 21:23:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10273 ) Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation .. Patch Set 3: Hit IMPALA-6960 -- To view, visit http://gerrit.cloudera.org:8080/10273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 10273 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 May 2018 21:18:52 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10273 ) Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2402/ -- To view, visit http://gerrit.cloudera.org:8080/10273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 10273 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 May 2018 21:10:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6959: [DOCS] Update to HAProxy configuration sample code
Alex Rodoni has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10284 Change subject: IMPALA-6959: [DOCS] Update to HAProxy configuration sample code .. IMPALA-6959: [DOCS] Update to HAProxy configuration sample code - Changed to deprecated timeouts: contimeout, clitimeout, srvtimeout - Changed the sample timeout values to more realistic values - Added a note that actual timeout values should depend on the user cluster Change-Id: Idff3aa9bbb58c1953cb7e9394ade01c7833c3a34 --- M docs/topics/impala_proxy.xml 1 file changed, 6 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/10284/1 -- To view, visit http://gerrit.cloudera.org:8080/10284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idff3aa9bbb58c1953cb7e9394ade01c7833c3a34 Gerrit-Change-Number: 10284 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Rodoni
[Impala-ASF-CR] IMPALA-6916: Implement COMMENT ON DATABASE
Fredy Wijaya has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/10171 ) Change subject: IMPALA-6916: Implement COMMENT ON DATABASE .. IMPALA-6916: Implement COMMENT ON DATABASE This patch implements updating comment on a database. Syntax: COMMENT ON DATABASE db IS 'comment' Testing: - Added new front-end tests - Ran all front-end tests - Added new end-to-end tests - Ran end-to-end DDL tests Change-Id: Ifcf909c18f97073346f6f603538bf921e69fbb00 --- M common/thrift/CatalogService.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java A fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java A fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/Db.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M tests/metadata/test_ddl.py M tests/metadata/test_ddl_base.py 15 files changed, 275 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/10171/8 -- To view, visit http://gerrit.cloudera.org:8080/10171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifcf909c18f97073346f6f603538bf921e69fbb00 Gerrit-Change-Number: 10171 Gerrit-PatchSet: 8 Gerrit-Owner: Fredy WijayaGerrit-Reviewer: Adam Holley Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Fredy Wijaya
[Impala-ASF-CR] IMPALA-6916: Implement COMMENT ON DATABASE
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/10171 ) Change subject: IMPALA-6916: Implement COMMENT ON DATABASE .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/10171/6/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/10171/6/common/thrift/JniCatalog.thrift@635 PS6, Line 635: // Name of comment to alter. When this field is not set, the comment will be removed. > Comment what happens if the comment is not set Done http://gerrit.cloudera.org:8080/#/c/10171/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10171/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3534 PS6, Line 3534: } > There's a subtle race here with the statestore topic update thread that cal Yes, you're right. The new patch set uses the database lock. -- To view, visit http://gerrit.cloudera.org:8080/10171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifcf909c18f97073346f6f603538bf921e69fbb00 Gerrit-Change-Number: 10171 Gerrit-PatchSet: 8 Gerrit-Owner: Fredy WijayaGerrit-Reviewer: Adam Holley Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Fredy Wijaya Gerrit-Comment-Date: Wed, 02 May 2018 21:01:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10273 ) Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation .. Patch Set 3: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2400/ -- To view, visit http://gerrit.cloudera.org:8080/10273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 10273 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 May 2018 20:51:09 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6908: IsConnResetTException() should include ECONNRESET
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10265 ) Change subject: IMPALA-6908: IsConnResetTException() should include ECONNRESET .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc File be/src/rpc/thrift-util.cc: http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc@209 PS1, Line 209: strncmp(PACKAGE_VERSION, "0.9.0", 5)); > How did the test pass to begin with ? Also, not sure if it warrants a strnc As Sailesh asked earlier, do we even need this? Or, couldn't we make it a compile time check? -- To view, visit http://gerrit.cloudera.org:8080/10265 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I1bb997a833917e5166c9ca192da219f50f4439e2 Gerrit-Change-Number: 10265 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 02 May 2018 19:36:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10233 ) Change subject: IMPALA-6946: handle negative counts in RLE decoder .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/10233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb Gerrit-Change-Number: 10233 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 May 2018 18:59:08 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6908: IsConnResetTException() should include ECONNRESET
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10265 ) Change subject: IMPALA-6908: IsConnResetTException() should include ECONNRESET .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc File be/src/rpc/thrift-util.cc: http://gerrit.cloudera.org:8080/#/c/10265/1/be/src/rpc/thrift-util.cc@209 PS1, Line 209: strncmp(PACKAGE_VERSION, "0.9.0", 5) = > Oops, yes. Changed and re-ran the test again to make sure it works. How did the test pass to begin with ? Also, not sure if it warrants a strncmp every time we call this function. Can we initialize a boolean once during initialization ? -- To view, visit http://gerrit.cloudera.org:8080/10265 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I1bb997a833917e5166c9ca192da219f50f4439e2 Gerrit-Change-Number: 10265 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Comment-Date: Wed, 02 May 2018 18:53:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10279 ) Change subject: IMPALA-6931: reduces races in query expiration tests .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2401/ -- To view, visit http://gerrit.cloudera.org:8080/10279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d Gerrit-Change-Number: 10279 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 02 May 2018 18:32:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10279 ) Change subject: IMPALA-6931: reduces races in query expiration tests .. Patch Set 1: agreed. an immediate next step is to break this test into smaller/simpler ones to avoid noise from interactions. would still be useful to test interactions, but for that, we'll want to make this easier to test. -- To view, visit http://gerrit.cloudera.org:8080/10279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d Gerrit-Change-Number: 10279 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Wed, 02 May 2018 18:30:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10279 ) Change subject: IMPALA-6931: reduces races in query expiration tests .. Patch Set 1: Code-Review+2 If this doesn't do the trick we may want to rethink how we can test this feature in a non racy way. -- To view, visit http://gerrit.cloudera.org:8080/10279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d Gerrit-Change-Number: 10279 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Dan Hecht Gerrit-Comment-Date: Wed, 02 May 2018 18:20:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java File fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java: http://gerrit.cloudera.org:8080/#/c/10165/6/fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java@41 PS6, Line 41: LZO, : LZO_INDEX, //Lzo index file. : LZ4, : ZSTD; should we make these (and e.g. file extension) be runtime parameters as well so that you can add a plugin without having to recompile? Okay with me if that's future work, but wondering if there's anything that makes that overly difficult. -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 02 May 2018 18:17:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10116 ) Change subject: IMPALA-6131: Track time of last statistics update in metadata .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py File tests/metadata/test_last_ddl_time_update.py: http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@52 PS7, Line 52: self.substitutions = {'TBL': "%s.%s" % (db_name, tbl_name), 'WAREHOUSE': WAREHOUSE} Mention the supported substitutions in the methods below (and add comments there). http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@56 PS7, Line 56: False, False Is there a reason we don't test changes to both? Do they never change at the same time? http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@76 PS7, Line 76: TestLastDdlTimeUpdate Do you need to specify the class to access TestHelper? http://gerrit.cloudera.org:8080/#/c/10116/7/tests/metadata/test_last_ddl_time_update.py@179 PS7, Line 179: def run_test(self, query, db_name, table_name, Can you move this method into the Helper class now? -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 02 May 2018 18:14:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10233 ) Change subject: IMPALA-6946: handle negative counts in RLE decoder .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/10233/3/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: http://gerrit.cloudera.org:8080/#/c/10233/3/be/src/util/dict-encoding.h@238 PS3, Line 238: static constexpr int32_t DICT_DECODER_BUFFER_SIZE = 128; I added some DCHECKS in this file and ran into a linkage issue. The fix was to move it outside of the class (I don't think there's a way to get static linkage for C++ class static members). http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@326 PS2, Line 326: LIKELY > LIKELY seems to be strangely placed now - both conditions should be include Yeah I agree both conditions should be included - http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@679 PS2, Line 679: int32_t num_repeats = NextNumRepeats(); > DictDecoder::DecodeNextValue also calls NextNumLiterals() and NextNumRep Good catch - fixed - I went through the uint32_t references there and fixed those. http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@681 PS2, Line 681:int32_t num_repeats_to_set = :std::min(num_repeats, num_values_to_consume - num_consumed); > We could change this to signed too and remove the template parameter from s Done http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@694 PS2, Line 694: int32_t num_literals_to_set = : std::min(num_literals, num_values_to_consume - num_consumed); > Same as in line 681. Done -- To view, visit http://gerrit.cloudera.org:8080/10233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb Gerrit-Change-Number: 10233 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 May 2018 17:54:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder
Hello Csaba Ringhofer, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10233 to look at the new patch set (#3). Change subject: IMPALA-6946: handle negative counts in RLE decoder .. IMPALA-6946: handle negative counts in RLE decoder This improves the handling of out-of-range values to avoid hitting various DCHECKs, including the one in the JIRA. repeat_count_ and literal_count_ are int32_ts. Avoid setting them to a negative value directly or by integer overflow. Switch to using uint32_t for "VLQ" encoding, which should be ULEB-128 encoding according to the Parquet standard. This fixes an infinite loop in PutVlqInt() for negative values - the bug was that shifting right sign-extended the signed value. Testing: Added backend test to exercise handling of large literal and repeat counts that don't fit in an int32_t. Before these fixes it could trigger several DCHECKs. Change-Id: If75ef3fb12494209918c100f26407cd93b17addb --- M be/src/util/bit-stream-utils.h M be/src/util/bit-stream-utils.inline.h M be/src/util/dict-encoding.h M be/src/util/rle-encoding.h M be/src/util/rle-test.cc 5 files changed, 84 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/10233/3 -- To view, visit http://gerrit.cloudera.org:8080/10233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb Gerrit-Change-Number: 10233 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer
[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10273 ) Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation .. Patch Set 3: Code-Review+2 Had to update some planner tests because of different file sizes on 2.x -- To view, visit http://gerrit.cloudera.org:8080/10273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Gerrit-Change-Number: 10273 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 02 May 2018 16:58:01 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6679,IMPALA-6678: reduce scan reservation
Hello Impala Public Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/10273 to review the following change. Change subject: IMPALA-6679,IMPALA-6678: reduce scan reservation .. IMPALA-6679,IMPALA-6678: reduce scan reservation This has two related changes. IMPALA-6679: defer scanner reservation increases When starting each scan range, check to see how big the initial scan range is (the full thing for row-based formats, the footer for Parquet) and determine whether more reservation would be useful. For Parquet, base the ideal reservation on the actual column layout of each file. This avoids reserving memory that we won't use for the actual files that we're scanning. This also avoid the need to estimate ideal reservation in the planner. We also release scanner thread reservations above the minimum as soon as threads complete, so that resources can be released slightly earlier. IMPALA-6678: estimate Parquet column size for reservation - This change also reduces reservation computed by the planner in certain cases by estimating the on-disk size of column data based on stats. It also reduces the default per-column reservation to 4MB since it appears that < 8MB columns are generally common in practice and the method for estimating column size is biased towards over-estimating. There are two main cases to consider for the performance implications: * Memory is available to improve query perf - if we underestimate, we can increase the reservation so we can do "efficient" 8MB I/Os for large columns. * The ideal reservation is not available - query performance is affected because we can't overlap I/O and compute as much and may do smaller (probably 4MB I/Os). However, we should avoid pathological behaviour like tiny I/Os. When stats are not available, we just default to reserving 4MB per column, which typically is more memory than required. When stats are available, the memory required can be reduced below when some heuristic tell us with high confidence that the column data for most or all files is smaller than 4MB. The stats-based heuristic could reduce scan performance if both the conservative heuristics significantly underestimate the column size and memory is constrained such that we can't increase the scan reservation at runtime (in which case the memory might be used by a different operator or scanner thread). Observability: Added counters to track when threads were not spawned due to reservation and to track when reservation increases are requested and denied. These allow determining if performance may have been affected by memory availability. Testing: Updated test_mem_usage_scaling.py memory requirements and added steps to regenerate the requirements. Loops test for a while to flush out flakiness. Added targeted planner and query tests for reservation calculations and increases. Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842 Reviewed-on: http://gerrit.cloudera.org:8080/9757 Reviewed-by: Tim ArmstrongTested-by: Impala Public Jenkins --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-parquet-scanner-test.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/catalog/ColumnStats.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/bin/compute-table-stats.sh M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-pruning.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M
[Impala-ASF-CR] IMPALA-6941: load more text scanner compression plugins
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10165 ) Change subject: IMPALA-6941: load more text scanner compression plugins .. Patch Set 6: (11 comments) LGTM overall, had some minor comments. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h File be/src/exec/hdfs-plugin-text-scanner.h: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@63 PS6, Line 63: HdfsScanner* (*create_scanner_fn) : (HdfsScanNodeBase* scan_node, RuntimeState* state) Maybe you could create a typedef for the function pointers, e.g.: typedef HdfsScanner* (*CreateScannerFn) (HdfsScanNodeBase* scan_node, RuntimeState* state); CreateScannerFn create_scanner_fn; I think it would be more readable, and also you wouldn't need to copy the whole function prototype to hdfs-plugin-text-scanner.cc http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@67 PS6, Line 67: Status (*issue_initial_ranges_fn)( : HdfsScanNodeBase* scan_node, const std::vector& files) same as above http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.h@73 PS6, Line 73: SpinLock Wouldn't be better to use some read/write lock like boost::shared_mutex? http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc File be/src/exec/hdfs-plugin-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@42 PS6, Line 42: "(Advanced) whitelist of HDFS " : "text scanner plugins that Impala will try to dynamically load." I think you should mention that it is comma-separated, or give an example. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@56 PS6, Line 56: HdfsScanner* (*create_scanner_fn) (HdfsScanNodeBase* scan_node, RuntimeState* state) You could use the typedef here. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@71 PS6, Line 71: Status (*issue_initial_ranges_fn)( : HdfsScanNodeBase* scan_node, const std::vector & files) and here http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@123 PS6, Line 123: "GetImpalaBuildVersion", reinterpret_cast (_plugin_impala_build_version))); nit: too long line http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-plugin-text-scanner.cc@126 PS6, Line 126: stringstream ss; : ss << "Scanner plugin " << plugin_name << " was built against Impala version " :<< get_plugin_impala_build_version() << ", but the running Impala version is " :<< GetDaemonBuildVersion(); nit: I think it's a little inconsistent that we use stringstream here and Substitute at other places. http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-scan-node-base.cc@643 PS6, Line 643: =_ nit: missing space http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/10165/6/be/src/exec/hdfs-text-scanner.cc@129 PS6, Line 129: =_ nit: missing space http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10165/6/tests/metadata/test_partition_metadata.py@168 PS6, Line 168: #unique_database = 'test_db' nit: remove this line -- To view, visit http://gerrit.cloudera.org:8080/10165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6 Gerrit-Change-Number: 10165 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 02 May 2018 16:24:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6946: handle negative counts in RLE decoder
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10233 ) Change subject: IMPALA-6946: handle negative counts in RLE decoder .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@326 PS2, Line 326: LIKELY LIKELY seems to be strangely placed now - both conditions should be included in the likely branch, or maybe only the new condition, as current_value_ == value can be often false, for example in dict encoded pages. http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@679 PS2, Line 679: int32_t num_repeats = NextNumRepeats(); DictDecoder::DecodeNextValue also calls NextNumLiterals() and NextNumRepeats(), and expects uint32_t - can you do some cleanup there? http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@681 PS2, Line 681:uint32_t num_repeats_to_set = :std::min(num_repeats, num_values_to_consume - num_consumed); We could change this to signed too and remove the template parameter from std::min, as both arguments and the result would be signed. http://gerrit.cloudera.org:8080/#/c/10233/2/be/src/util/rle-encoding.h@694 PS2, Line 694: uint32_t num_literals_to_set = : std::min(num_literals, num_values_to_consume - num_consumed); Same as in line 681. -- To view, visit http://gerrit.cloudera.org:8080/10233 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If75ef3fb12494209918c100f26407cd93b17addb Gerrit-Change-Number: 10233 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Csaba Ringhofer Gerrit-Comment-Date: Wed, 02 May 2018 14:54:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#7). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/7 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#6). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/6 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6131: Track time of last statistics update in metadata
Hello Lars Volker, Zoltan Borok-Nagy, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10116 to look at the new patch set (#5). Change subject: IMPALA-6131: Track time of last statistics update in metadata .. IMPALA-6131: Track time of last statistics update in metadata The timestamp of the last COMPUTE STATS operation is saved to table property "impala.lastComputeStatsTime". The format is the same as in "transient_lastDdlTime", so the two can be compared to check if the schema has changed since computing statistics. Other changes: - Handling of "transient_lastDdlTime" is simplified - the old logic set it to current time + 1, if the old version was >= current time, to ensure that it is always increased by DDL operations. This was useful in the past, as IMPALA-387 used lastDdlTime to check if partition data needs to be reloaded, but since IMPALA-1480, Impala does not rely on lastDdlTime at all. - Computing / setting stats on HDFS tables no longer increases "transient_lastDdlTime". - When Kudu tables are (re)loaded, it is checked if their HMS representation is up to date, and if it is, then IMetaStoreClient.alter_table() is not called. The old logic always called alter_table() after loading metadata from Kudu. This change was needed to ensure that "transient_lastDdlTime" works similarly in HDFS and Kudu tables, and should also make (re)loading Kudu tables faster. Notes: - Kudu will be able to sync its tables to HMS in the near future (see KUDU-2191), so the Kudu metadata handling in Impala may need to be redesigned. Testing: tests/metadata/test_last_ddl_time_update.py is extended by - also checking "impala.lastComputeStatsTime" - testing more SQL statements - tests for Kudu tables Note that test_last_ddl_time_update.py is ran only in exhaustive testing. Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/metadata/test_last_ddl_time_update.py 6 files changed, 182 insertions(+), 152 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/10116/5 -- To view, visit http://gerrit.cloudera.org:8080/10116 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5 Gerrit-Change-Number: 10116 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6931: reduces races in query expiration tests
Vuk Ercegovac has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10279 Change subject: IMPALA-6931: reduces races in query expiration tests .. IMPALA-6931: reduces races in query expiration tests Recent tests ran into flakiness when testing query expiration. This change makes two changes: 1) query state is retrieved earlier; a flaky test skipped the expected state. 2) bump the timing; a flaky test had queries expire before it could check them Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d --- M tests/custom_cluster/test_query_expiration.py 1 file changed, 5 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/10279/1 -- To view, visit http://gerrit.cloudera.org:8080/10279 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d Gerrit-Change-Number: 10279 Gerrit-PatchSet: 1 Gerrit-Owner: Vuk Ercegovac