[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@934 PS4, Line 934: null > nullptr Done http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@942 PS4, Line 942: // 3. If the column type is not string, and the dictionary page is not compressed, > I'm not 100% confident that we have test coverage for this case - we have s I have checked, and this branch is executed if exploration strategy > core. http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@944 PS4, Line 944: ScopedBuffer uncompressed_buffer(parent_->dictionary_pool_->mem_tracker()); > nit: unnecessary blank line Done http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@964 PS4, Line 964: > nit: != nullptr (we prefer explicit checks against null to make it visually Done -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 14 Nov 2017 00:14:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Hello Lars Volker, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8436 to look at the new patch set (#5). Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 33 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/5 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 5 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 4: (4 comments) Looks close. http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@934 PS4, Line 934: NULL nullptr http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@942 PS4, Line 942: // 3. If the column type is not string, and the dictionary page is not compressed, I'm not 100% confident that we have test coverage for this case - we have somewhat limited testing of uncompressed parquet (the parquet/none tests are actually snappy-compressed since that's the default). I think the main coverage we have is tests that insert into parquet and read it back. I took a look and it seems like tests/query_test/test_insert_parquet.py should exercise this code path, but could you confirm? E.g. add a log message on each branch and make sure that they appear in the logs. http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@944 PS4, Line 944: nit: unnecessary blank line http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@964 PS4, Line 964: ) nit: != nullptr (we prefer explicit checks against null to make it visually obvious that it's a null check). -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 10 Nov 2017 00:39:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376 PS1, Line 376: virtual bool DictionaryReferencesBuffer() const { return false; } > Add comment. Make pure if possible. Alternatively, you could do something s I have removed this and the other similar functions, and used directly slot_desc_->type().IsVarLenStringType() instead. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272 PS1, Line 272: virtual bool DictionaryReferencesBuffer() const { > add "override" This function was removed. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541 PS1, Line 541: some > how about rewording this to "Column readers for types that require ..." This function was removed. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542 PS1, Line 542: inline bool DictionaryReferencesBufferInline() const { > nit: single line This function was removed. http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945 PS1, Line 945: tmp_buffer > better name Done http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947 PS1, Line 947: int uncompressed_size = decompressor_.get() != nullptr > Move this to L954 Done http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951 PS1, Line 951: if (decompressor_.get() == nullptr && !DictionaryReferencesBuffer()) { > Can you add a brief comment to explain how the 4 different cases are handle Done -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 03 Nov 2017 22:39:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8436 to look at the new patch set (#4). Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 35 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/4 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8436 to look at the new patch set (#3). Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 36 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/3 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 3 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8436 to look at the new patch set (#2). Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h 2 files changed, 38 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/2 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@959 PS1, Line 959: if (tmp_buffer.TryAllocate(uncompressed_size)) { It turned out that this can violate a dcheck, if uncompressed_size is 0, which seems valid if every value of a column in a partition is NULL. This is the case in a partition of functional_parquet.alltypesagg/tinyint_col, which lead to a crash in one of the tests. I think that there should be a table/column with all NULL values by design to test this case explicitly. -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 03 Nov 2017 00:44:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8436 ) Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376 PS1, Line 376: virtual bool DictionaryReferencesBuffer() const { return false; } Add comment. Make pure if possible. Alternatively, you could do something similar to PageContainsTupleData(). http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272 PS1, Line 272: virtual bool DictionaryReferencesBuffer() const { add "override" http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541 PS1, Line 541: some how about rewording this to "Column readers for types that require ..." http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542 PS1, Line 542: inline bool DictionaryReferencesBufferInline() const { nit: single line http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945 PS1, Line 945: tmp_buffer better name http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947 PS1, Line 947: int uncompressed_size = decompressor_.get() != nullptr Move this to L954 http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951 PS1, Line 951: if (decompressor_.get() == nullptr && !DictionaryReferencesBuffer()) { Can you add a brief comment to explain how the 4 different cases are handled? http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@987 PS1, Line 987: if (DictionaryReferencesBuffer()) { : memcpy(dict_values, data_, data_size); : } Single line :) -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Wed, 01 Nov 2017 19:16:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction
Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8436 Change subject: IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction .. IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction During dictionary constructon, most types are copied from the parquet dictionary page, but StringValues keep pointers to it. In this case, the dictionary page must be kept and attached to the last row batch that references it. In case of other types, it is safe do delete the dictionary page after the construction of the dictionary. This patch contains two optimizations: - dictionary pages are deleted as soon as possible for non string types - in non-compressed and non-string case, an unnecessary copy is avoided Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h 2 files changed, 42 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/1 -- To view, visit http://gerrit.cloudera.org:8080/8436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004 Gerrit-Change-Number: 8436 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer