[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4617: remove IsConstant() analysis from be .. Patch Set 10: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/222/ -- To view, visit http://gerrit.cloudera.org:8080/5415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4617: remove IsConstant() analysis from be
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4617: remove IsConstant() analysis from be .. Patch Set 10: Code-Review+2 rebase -- To view, visit http://gerrit.cloudera.org:8080/5415 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I24b4c9d6e8e57fd73d906a9c36200e1a84497b90 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
Dimitris Tsirogiannis has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page .. Patch Set 13: Code-Review+1 (5 comments) I'll let MJ and/or Todd make a final pass. http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_create_table.xml File docs/topics/impala_create_table.xml: PS13, Line 85: [PRIMARY KEY (col_name[, ...]] This doesn't belong here. PS13, Line 244: CREATE TABLE [IF NOT EXISTS] db_name.]table_name : [COMMENT 'table_comment'] : STORED AS KUDU : [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)] : AS : select_statement This is not correct. You're missing the PRIMARY KEY and PARTITION BY clauses. PS13, Line 411: impala::username.kudu_t1 nit: I don't think this is a good example. I would be really surprised if someone actually created a Kudu table outside of Impala named as "impala::username". Pls change the table name. http://gerrit.cloudera.org:8080/#/c/5649/13/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: PS13, Line 93: kudu.master_addresses That's the name of the table property. Can you verify that name of the impalad configuration param? PS13, Line 148: There is a many-to-many relationship : between the tablets and tablet servers, managed automatically by Kudu. Isn't more clear to say that each tablet server can store multiple tablets and that tablets are replicated across different tablet servers? -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[Impala-ASF-CR] Add .pep8rc for Impala's Python style
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/5829 Change subject: Add .pep8rc for Impala's Python style .. Add .pep8rc for Impala's Python style Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 --- A .pep8rc 1 file changed, 8 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/5829/1 -- To view, visit http://gerrit.cloudera.org:8080/5829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I65a99fc6e364eb1e49e21d0ba97546bc25f65a27 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen .. Patch Set 4: I think it may have been accidentally disabled by the single node optimisation -- To view, visit http://gerrit.cloudera.org:8080/5732 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation
Tim Armstrong has posted comments on this change. Change subject: IMPALA-4820: avoid writing unencrypted data during write cancellation .. Patch Set 3: Code-Review+2 Carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation
Tim Armstrong has submitted this change and it was merged. Change subject: IMPALA-4820: avoid writing unencrypted data during write cancellation .. IMPALA-4820: avoid writing unencrypted data during write cancellation The bug was that unencrypted data could be written to disk if the write was cancelled before it completed. This bug was introduced after Impala 2.8.0 was branched in the commit "IMPALA-3202,IMPALA-2079: rework scratch file I/O", so does not appear in any released versions of Impala. The fix is to only start decrypting data after the write is complete. Testing: Added a regression test that reproduced the problem (after adding a delay to the write). Change-Id: I956bae0685e863f30be23634b29aa076394db184 Reviewed-on: http://gerrit.cloudera.org:8080/5788 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong--- M be/src/common/global-flags.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc 4 files changed, 73 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] Patch references to Cloudera and CDH in Impala tutorial
Laurel Hale has posted comments on this change. Change subject: Patch references to Cloudera and CDH in Impala tutorial .. Patch Set 1: Code-Review-1 (2 comments) The use of keyref to change "CDH" to a Hadoop distribution number didn't seem to work (see line 60) and use of parentheses not necessary in line 1700. http://gerrit.cloudera.org:8080/#/c/5663/1/docs/topics/impala_tutorial.xml File docs/topics/impala_tutorial.xml: Line 60: If you already have a environment set up and just need to add Impala to it, follow the installation This doesn't seem to work. It's showing up as "CDH" in the build. PS1, Line 1700: (Currently, this technique only works for Parquet files.) Parentheses not necessary. -- To view, visit http://gerrit.cloudera.org:8080/5663 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44245b65ce6f247ae8771f582f4b33c3712145ae Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laurel Hale Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4820: avoid writing unencrypted data during write cancellation .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Dan Hecht has posted comments on this change. Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/5389/6/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: Line 677: output_unmatched_batch_->Reset(); > So then, if I understand correctly, you mentioned a build partition with >8 Is there a dcheck we could add to detect the incorrect state? -- To view, visit http://gerrit.cloudera.org:8080/5389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4818: Ensure the same number of tests are run every time
Michael Brown has posted comments on this change. Change subject: IMPALA-4818: Ensure the same number of tests are run every time .. Patch Set 2: Code-Review+1 (1 comment) Maybe this is a candidate for upvoting? http://gerrit.cloudera.org:8080/#/c/5784/2/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: PS2, Line 195: if cls.exploration_strategy() != 'exhaustive': : cls.ImpalaTestMatrix.add_constraint(lambda v: v.get_value('cancel_delay') in [3]) > I don't quite see what you're getting at here. It seems like line 198-199 i This ends up falling under IMPALA-3947. Sorry for the noise. -- To view, visit http://gerrit.cloudera.org:8080/5784 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I22cecfbe7c9a102f788d01eb80aa188579ef6d7e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3524: Don't process spilled partitions with 0 probe rows
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3524: Don't process spilled partitions with 0 probe rows .. Patch Set 9: This passed a core ASAN run: http://sandbox.jenkins.cloudera.com/job/impala-umbrella-build-and-test/7228/ -- To view, visit http://gerrit.cloudera.org:8080/5389 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I175b32dd9031e51218b38c37693ac3e31dfab47b Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4818: Ensure the same number of tests are run every time
Jim Apple has posted comments on this change. Change subject: IMPALA-4818: Ensure the same number of tests are run every time .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5784/2/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: PS2, Line 195: if cls.exploration_strategy() != 'exhaustive': : cls.ImpalaTestMatrix.add_constraint(lambda v: v.get_value('cancel_delay') in [3]) > L196 will rarely if ever get executed, because the workload as derived from I don't quite see what you're getting at here. It seems like line 198-199 in HEAD get executed for every "core" run. In the history of this patch, line 195 was added because tests were too slow. Generally, I'd like to confine this patch to just derandomizing this method and worry about which test should be run later if we see a problem. -- To view, visit http://gerrit.cloudera.org:8080/5784 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I22cecfbe7c9a102f788d01eb80aa188579ef6d7e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] Improvements to "Unknown disk-ID" warning
Bharath Vissapragada has uploaded a new change for review. http://gerrit.cloudera.org:8080/5828 Change subject: Improvements to "Unknown disk-ID" warning .. Improvements to "Unknown disk-ID" warning This commit, - Appends the table name to the "Unknown disk-ID" warning when disk IDs are missing, for supportability. - Removes reference to enabling dfs block metadata configuration, since it doesn't apply anymore. - Removes VolumeId terminology from the runtime profile. Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 --- M be/src/exec/hdfs-scan-node-base.cc M common/thrift/generate_error_codes.py 2 files changed, 5 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/5828/1 -- To view, visit http://gerrit.cloudera.org:8080/5828 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iddb132ff7ad66f3291b93bf9d8061bd0525ef1b2 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen
Dan Hecht has posted comments on this change. Change subject: IMPALA-4705, IMPALA-4779, IMPALA-4780: Fix some Expr bugs with codegen .. Patch Set 4: > Quite a bit longer. 5~6 mins in debug builds on my machine. I think > it's worth it given that it covers pretty much all the built-in > functions. yeah, I'm guessing this was originally disabled due to the long execution time. but i agree we need to get this coverage somehow. are there anythings we can do (maybe in the future) to speed this up? -- To view, visit http://gerrit.cloudera.org:8080/5732 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40fdb035a565ae2f9c9fbf4db48a548653ef7608 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5611 to look at the new patch set (#10). Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. IMPALA-3909: Populate min/max statistics in Parquet writer Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-parquet-table-writer.h A be/src/exec/parquet-column-stats.h M be/src/exec/parquet-common.h M be/src/runtime/string-value.h M be/src/util/dict-test.cc M tests/query_test/test_insert_parquet.py M tests/util/get_parquet_metadata.py 8 files changed, 754 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/5611/10 -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-3909: Populate min/max statistics in Parquet writer
Lars Volker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer .. Patch Set 9: (17 comments) Thanks for the review. Please see my inline comments and the new PS. http://gerrit.cloudera.org:8080/#/c/5611/6/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: PS6, Line 178: ProcessValue > that was just an off-the-cuff suggestion. if 'append' is more specific, the I tend to favor ProcessValue, since Append (like Encode) leaves out the statistics part. While ProcessValue is vague, it implies that the values is "consumed" afterwards, and that it may be necessary to look at the comment for further details. That being said, how about "ConsumeValue"? Does that imply ownership transfer too much? http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 134: void EncodeColumnStats(ColumnMetaData* meta_data) { > find a better name. 'column stats' is not a thrift concept. these are speci Done Line 236: // Created and set by the derived class. > owner? same for the other pointer members. Done Line 339: int64_t encoded_value_size_; > this seems to be the plain encoding size. even for dict-encoded cols? Yes, the dictionary encoders use it as the size of the encoded values. Should we rename it to plain_encoded_value_size_? Line 347: // Tracks statistics per row group. This gets reset when starting a new file. > hopefully when starting a new row group Yes, Done. Line 643: DCHECK(page_stats_base_ != nullptr); > how does this handle unsupported types? Currently the ColumnStats gets created for all supported types, and the ColumnStats c'tor ensures that the class can only be instantiated for supported types. However, for some types we don't write out the statistics to the resulting parquet file by disabling the Update() logic, thus ensuring that the stats objects stay empty. This allows us to keep the interface of ColumnStats simpler. Line 1028: columns_[i]->EncodeColumnStats(_row_group_->columns[i].meta_data); > where do the row group stats get reset? Line 281 of this PS. http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/hdfs-parquet-table-writer.h File be/src/exec/hdfs-parquet-table-writer.h: Line 103: /// Maximum statistics size. If the combined size of the min and max values of > does this refer to a single thrift Statistics struct? if so, spell that out Done http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: Line 65: void EncodeToThrift(T* parent) const { > this feels more convoluted than it needs to be. i think it would be better Done Line 88: // We explicitly require types to be listed here in order to support column statistics. > i don't understand, i thought those listed types are specifically not suppo We don't update statistics for these types, but creating ColumnStats for them is still supported. This is because we want to support them soon. If we disallow creating them, then the caller will have to deal with only creating ColumnStats for supported column types. Line 90: // follow the ordering semantics of parquet's min/max statistics for the new type. > what are the ordering semantics? (that order as byte sequence == value orde The values should be ordered according to what parquet-mr says (which is hopefully in line with what Hive and Spark do). However, there is currently a lack of specification, especially for logical types, that we're working on clarifying with the parquet-format project. I added a comment explaining where to look for the ordering semantics. Line 97: T>::type; > i find the formatting hard to decipher. please reformat by hand (for instan Done Line 127: // statistics behavior from any implicit behavior of the types? > but shouldn't the stats reflect the behavior of the underlying types. ie, w The stats should follow the ordering semantics defined in parquet-format. Those are currently underspecified, we're working on getting them fixed so we can write stats for other types, too (see my other comments in this review). While I agree that they should ideally be the same, I don't think we can guarantee or enforce that, since ultimately the parquet-format project serves as the reference. Line 148: /// Encodes a single value into an output string using parquet's plain encoding. > 'an output string' makes it sound like this gets converted into a string ty Done Line 159: return encoded_value_size_ < 0 ? ParquetPlainEncoder::ByteSize(v) : > reformat by hand Done http://gerrit.cloudera.org:8080/#/c/5611/9/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: Line 89: static int ByteSize(const T& v) { return sizeof(T); } > does this function make sense at all? why not simply
[Impala-ASF-CR] Release note updates for Impala 2.8
John Russell has posted comments on this change. Change subject: Release note updates for Impala 2.8 .. Patch Set 3: (5 comments) Only minor changes (trailing spaces) to come in next patch set. Right now I'm stuck at "failed to push some refs", so that might not be today. http://gerrit.cloudera.org:8080/#/c/5668/3/docs/topics/impala_incompatible_changes.xml File docs/topics/impala_incompatible_changes.xml: Line 56: > I do not see an answer to this yet. These are all from JIRAs that are marked 'fixed in 2.8.0' and/or coding work completed back in December, before 2.8.0 was branched off. Is there an additional round of checking you are thinking of? The current behavior is as described elsewhere in the docs. The previous behavior (at least for the Kudu-related bullets) is from the impala_kudu fork. PS3, Line 58: Impala Incompatible Changes Introduced in Impala 2.8.x > There was something I saw relating to cross-reference links, but I think th Done Line 1523: > My experience with the CHANGES files in the root directories of open-source Let's continue this discussion as part of the general cleanup of Cloudera-specific info. That is, if we preserve some historical info as-is, we'll have to adjust expectations a little bit for how many instances of Cloudera / CDH / Cloudera Manager remain in the source files. http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_incompatible_changes.xml File docs/topics/impala_incompatible_changes.xml: Line 71: such as COMPRESSION, DEFAULT, and ENCODING, that > new spaces at end of lines Done. I have a pre-commit hook that suppresses these but haven't adapted it yet for the Apache repo. http://gerrit.cloudera.org:8080/#/c/5668/6/docs/topics/impala_new_features.xml File docs/topics/impala_new_features.xml: PS6, Line 105: , and each partition is onl > Do we describe the behavior in more detail someplace else? Yes, in the MT_DOP page linked from this bullet point. -- To view, visit http://gerrit.cloudera.org:8080/5668 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7c47f422e509cec6d3eb8aaa82294b584f393aed Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: John Russell Gerrit-Reviewer: Laurel Hale Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Silvius Rus Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
John Russell has uploaded a new patch set (#13). Change subject: [DOCS] Major update to Impala + Kudu page .. [DOCS] Major update to Impala + Kudu page Upgrade with details of latest syntax. Fine-tune discussion of PK and other Kudu notions. The impala_kudu diff looks larger than actual changes to the page, because subtopics got moved around and promoted/demoted (which changes the indentation). Best to review that page start-to-finish. CREATE TABLE details for Impala + Kudu. ALTER TABLE details for Impala + Kudu. Unhide the Impala partitioning + Kudu topic. Mainly a brief intro then a link to delegate details to the main Kudu page, which already has a partitioning subtopic. Include changes to reserved words. Entirely from Kudu integration work. Add Kudu considerations for misc SQL statements. Addressed Todd's and Dimitris's comments for certain files. (Up to the beginning of the "Partitioning" section in impala_kudu.xml.) Added Kudu blurbs to data type topics: - Some aren't supported. - Others are supported but can't go in the primary key. Added walkthrough of renaming internal/external tables. Split out Kudu CREATE TABLE syntax from other file formats. Correct info about CTAS for Kudu tables. Add examples of basic Kudu, external Kudu, and Kudu CTAS. Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c --- M docs/impala_keydefs.ditamap M docs/shared/impala_common.xml M docs/topics/impala_alter_table.xml M docs/topics/impala_array.xml M docs/topics/impala_boolean.xml M docs/topics/impala_char.xml M docs/topics/impala_compute_stats.xml M docs/topics/impala_create_table.xml M docs/topics/impala_decimal.xml M docs/topics/impala_describe.xml M docs/topics/impala_double.xml M docs/topics/impala_drop_table.xml M docs/topics/impala_explain.xml M docs/topics/impala_float.xml M docs/topics/impala_grant.xml M docs/topics/impala_invalidate_metadata.xml M docs/topics/impala_kudu.xml M docs/topics/impala_literals.xml M docs/topics/impala_map.xml M docs/topics/impala_partitioning.xml M docs/topics/impala_refresh.xml M docs/topics/impala_reserved_words.xml M docs/topics/impala_revoke.xml M docs/topics/impala_show.xml M docs/topics/impala_struct.xml M docs/topics/impala_tables.xml M docs/topics/impala_timestamp.xml M docs/topics/impala_truncate_table.xml M docs/topics/impala_varchar.xml 29 files changed, 2,310 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/5649/13 -- To view, visit http://gerrit.cloudera.org:8080/5649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I76dcb948dab08532fe41326b22ef78d73282db2c Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Ambreen Kazi Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] [DOCS] Major update to Impala + Kudu page
John Russell has posted comments on this change. Change subject: [DOCS] Major update to Impala + Kudu page .. Patch Set 7: (18 comments) Addressed all comments, esp. splitting CREATE TABLE syntax for HDFS vs. Kudu. http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_explain.xml File docs/topics/impala_explain.xml: PS7, Line 250: > "in a SCAN KUDU node" Done PS7, Line 251: , and might involve transmitting : non-matching rows that are filtered out on the Impala side. > remove Done http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_kudu.xml File docs/topics/impala_kudu.xml: PS7, Line 76: scan performance close to that of Parquet > Make sure you check with Mostafa about this claim. Done PS7, Line 147: The work is parallelized : across units of computing called tablet servers. > I believe the unit of computing is the tablet not the tablet server, unless The tablet server is the compute resource, the tablet holds the actual data. PS7, Line 149: between the tablets and tablet servers. > I think it would be nice to mention that the user is not responsible for ma Done PS7, Line 168: (CREATE TABLE and ALTER TABLE) > It is weird to mention CREATE without DROP. Please remove this and mention Since this section is only talking about the delta in syntax, I'll leave as-is. PS7, Line 397: : For non-Kudu tables, Impala allows any column to contain NULL : values, because it is not practical to enforce a not null constraint on HDFS : data files that could be prepared using external tools and ETL processes. : : : : : : For example, a table containing geographic information might require the latitude : and longitude coordinates to always be specified. Other attributes might be allowed : to be NULL. For example, a location might not have a designated : place name, its altitude might be unimportant, and its population might be initially : unknown, to be filled in later. : > You may want to swap these two paragraphs. Done PS7, Line 717: PARTITION BY > Impala still uses "PARTITIONED BY" for HDFS tables. Done PS7, Line 727: . By setting : up an effective partitioning scheme for a Kudu table, you can ensure that the work for : a query can be parallelized evenly across the hosts in a cluster. > Remove. Sometimes the goal is to scan as little as possible. You can say th Whole paragraph was hidden per comment on a previous patch set. (Instead, we link people to the Kudu white paper for those kinds of details.) PS7, Line 936: To see the distribution of data in a Kudu table across the underlying buckets and : partitions, use the SHOW TABLE STATS statement. > This is unfortunately not accurate. SHOW TABLE STATS will only show the dis Done. Reworded rather than deleting entirely. PS7, Line 1122: change > "changes" Done PS7, Line 1159: strong consistency for order of operations > I am not sure I know what that means. Done PS7, Line 1159: total : success or total failure of a multi-row statement > This is "atomicity". Maybe just mention atomic multi-row statements. Done PS7, Line 1160: or data that is read while a write : operation is in progress > Isolation. I added the technical terminology into my wording. PS7, Line 1288: Memory Usage for Operations on Kudu Tables : : : : : The Apache Kudu architecture, topology, and data storage techniques result in : different patterns of memory usage for Impala statements than with HDFS-backed tables. : > I don't find this particularly informative and suggest we remove it unless Yes, audience="hidden" earlier in the tag will make it invisible. http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_literals.xml File docs/topics/impala_literals.xml: PS7, Line 408: most Impala tables > Impala tables backed by HDFS or S3? "most" is kind of vague Done http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_revoke.xml File docs/topics/impala_revoke.xml: PS7, Line 115: access to a Kudu table is all or nothing. > "only table-level permissions are enforced in Kudu tables. Column-level per Done http://gerrit.cloudera.org:8080/#/c/5649/7/docs/topics/impala_tables.xml File docs/topics/impala_tables.xml: PS7, Line 293: using the Apache Kudu storage system > "stored in Apache Kudu" Done -- To view, visit
[Impala-ASF-CR] IMPALA-4820: avoid writing unencrypted data during write cancellation
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4820: avoid writing unencrypted data during write cancellation .. Patch Set 3: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/221/ -- To view, visit http://gerrit.cloudera.org:8080/5788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I956bae0685e863f30be23634b29aa076394db184 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4735: Upgrade pytest in python env to version 2.9.2.
Michael Brown has posted comments on this change. Change subject: IMPALA-4735: Upgrade pytest in python env to version 2.9.2. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/5640/7/tests/run-tests.py File tests/run-tests.py: PS7, Line 144: other_tests Maybe use the name explicit_tests ? -- To view, visit http://gerrit.cloudera.org:8080/5640 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40d129e0e63ca5bee126bac6ac923abb3c7e0a67 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4818: Ensure the same number of tests are run every time
Michael Brown has posted comments on this change. Change subject: IMPALA-4818: Ensure the same number of tests are run every time .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5784/2/tests/query_test/test_cancellation.py File tests/query_test/test_cancellation.py: PS2, Line 195: if cls.exploration_strategy() != 'exhaustive': : cls.ImpalaTestMatrix.add_constraint(lambda v: v.get_value('cancel_delay') in [3]) L196 will rarely if ever get executed, because the workload as derived from L60 is "tpch", not "functional-query". Typical exhaustive tests tend only to apply to tests whose workload is functional-query, via "--workload_exploration_strategy functional-query:exhaustive". I believe this is so because, unlike the functional database, the tpch database will not exist in all the typical exhaustive formats when doing a data load, which means typical "exhaustive vectors" aren't possible with tpch. However, the queries (L35) here come from tpch, hence the value of get_workload(). Some options: - Remove L195. If tests are quick, this is OK. or - Hack up test dimensions and workload to avoid redundant tests but also truly run various cancel delays when workload_exploration_strategy is functional-query:exhaustive . or - Remove L195-196 if loss of coverage is only negligible -- To view, visit http://gerrit.cloudera.org:8080/5784 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I22cecfbe7c9a102f788d01eb80aa188579ef6d7e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim AppleGerrit-Reviewer: David Knupp Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Michael Brown Gerrit-HasComments: Yes