[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-13 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@934
PS4, Line 934: null
> nullptr
Done


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@942
PS4, Line 942:   // 3. If the column type is not string, and the dictionary 
page is not compressed,
> I'm not 100% confident that we have test coverage for this case - we have s
I have checked, and this branch is executed if exploration strategy > core.


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@944
PS4, Line 944:   ScopedBuffer 
uncompressed_buffer(parent_->dictionary_pool_->mem_tracker());
> nit: unnecessary blank line
Done


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@964
PS4, Line 964:
> nit: != nullptr (we prefer explicit checks against null to make it visually
Done



--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 14 Nov 2017 00:14:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-13 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#5).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 33 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/5
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 4:

(4 comments)

Looks close.

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@934
PS4, Line 934: NULL
nullptr


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@942
PS4, Line 942:   // 3. If the column type is not string, and the dictionary 
page is not compressed,
I'm not 100% confident that we have test coverage for this case - we have 
somewhat limited testing of uncompressed parquet (the parquet/none tests are 
actually snappy-compressed since that's the default).

I think the main coverage we have is tests that insert into parquet and read it 
back. I took a look and it seems like tests/query_test/test_insert_parquet.py 
should exercise this code path, but could you confirm? E.g. add a log message 
on each branch and make sure that they appear in the logs.


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@944
PS4, Line 944:
nit: unnecessary blank line


http://gerrit.cloudera.org:8080/#/c/8436/4/be/src/exec/parquet-column-readers.cc@964
PS4, Line 964: )
nit: != nullptr (we prefer explicit checks against null to make it visually 
obvious that it's a null check).



--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 10 Nov 2017 00:39:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376
PS1, Line 376:   virtual bool DictionaryReferencesBuffer() const { return 
false; }
> Add comment. Make pure if possible. Alternatively, you could do something s
I have removed this and the other similar functions, and used directly 
slot_desc_->type().IsVarLenStringType() instead.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272
PS1, Line 272:   virtual bool DictionaryReferencesBuffer() const {
> add "override"
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541
PS1, Line 541: some
> how about rewording this to "Column readers for types that require ..."
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542
PS1, Line 542:   inline bool DictionaryReferencesBufferInline() const {
> nit: single line
This function was removed.


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945
PS1, Line 945: tmp_buffer
> better name
Done


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947
PS1, Line 947:   int uncompressed_size = decompressor_.get() != nullptr
> Move this to L954
Done


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951
PS1, Line 951:   if (decompressor_.get() == nullptr && 
!DictionaryReferencesBuffer()) {
> Can you add a brief comment to explain how the 4 different cases are handle
Done



--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Fri, 03 Nov 2017 22:39:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#4).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 35 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/4
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#3).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
1 file changed, 36 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/3
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-03 Thread Csaba Ringhofer (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8436

to look at the new patch set (#2).

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
2 files changed, 38 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/2
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-02 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@959
PS1, Line 959:   if (tmp_buffer.TryAllocate(uncompressed_size)) {
It turned out that this can violate a dcheck, if uncompressed_size  is 0, which 
seems valid if every value of a column in a partition is NULL. This is the case 
in a partition of functional_parquet.alltypesagg/tinyint_col, which lead to a 
crash in one of the tests. I think that there should be a table/column with all 
NULL values by design to test this case explicitly.



--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Fri, 03 Nov 2017 00:44:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-11-01 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8436 )

Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.h@376
PS1, Line 376:   virtual bool DictionaryReferencesBuffer() const { return 
false; }
Add comment. Make pure if possible. Alternatively, you could do something 
similar to PageContainsTupleData().


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@272
PS1, Line 272:   virtual bool DictionaryReferencesBuffer() const {
add "override"


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@541
PS1, Line 541: some
how about rewording this to "Column readers for types that require ..."


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@542
PS1, Line 542:   inline bool DictionaryReferencesBufferInline() const {
nit: single line


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@945
PS1, Line 945: tmp_buffer
better name


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@947
PS1, Line 947:   int uncompressed_size = decompressor_.get() != nullptr
Move this to L954


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@951
PS1, Line 951:   if (decompressor_.get() == nullptr && 
!DictionaryReferencesBuffer()) {
Can you add a brief comment to explain how the 4 different cases are handled?


http://gerrit.cloudera.org:8080/#/c/8436/1/be/src/exec/parquet-column-readers.cc@987
PS1, Line 987: if (DictionaryReferencesBuffer()) {
 :   memcpy(dict_values, data_, data_size);
 : }
Single line :)



--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 01 Nov 2017 19:16:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

2017-10-31 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8436


Change subject: IMPALA-6054: Parquet dictionary pages should be freed on 
dictionary construction
..

IMPALA-6054: Parquet dictionary pages should be freed on dictionary construction

During dictionary constructon, most types are copied from the parquet
dictionary page, but StringValues keep pointers to it. In this case,
the dictionary page must be kept and attached to the last row batch
that references it. In case of other types, it is safe do delete
the dictionary page after the construction of the dictionary.

This patch contains two optimizations:
- dictionary pages are deleted as soon as possible for non string types
- in non-compressed and non-string case, an unnecessary copy is avoided

Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
2 files changed, 42 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8436/1
--
To view, visit http://gerrit.cloudera.org:8080/8436
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4d9d5f4da1028d961155dafdac0028a1c3641004
Gerrit-Change-Number: 8436
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer