Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
......................................................................


Patch Set 5:

(39 comments)

Will finish buffered-tuple-stream-v2.cc after you reply to the pin count 
management question.

http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/buffered-tuple-stream-v2.cc
File be/src/runtime/buffered-tuple-stream-v2.cc:

PS5, Line 181: write_page_->is_pinned()
when do we have an unpinned write page? what does that mean?
oh, i guess we can get into that state if we return at 190? any way to simplify 
these possible states?


Line 189:   *got_page = 
buffer_pool_client_->reservation()->IncreaseReservationToFit(page_len);
shouldn't this come with IMPALA-3208? or do we need this already?


Line 198:   write_end_ptr_ = new_page.data() + page_len;
would be slightly clearer to compute this based on write_page_


Line 234:     DCHECK(&*read_page_ != write_page_);
here and above: DCHECK_NE/EQ


Line 246:       bytes_in_mem_ -= read_page_->len();
UnpinPage()?


Line 256:   if (!read_page_->is_pinned()) {
is this check sufficient? couldn't the page by pinned due to it also being the 
write page. in which case shouldn't we increment the pin count? i.e. shouldn't 
this check by !is_pinned_?

in fact, is using page->is_pinned() ever the right thing given that we don't 
know whether it is pinned as the read or write page?

oh, i guess we're careful to handle the read_page==write_page case in all 
pinning/unpinning places? i.e the stream only takes one pin count per page even 
if used as both read and write page, is that right? is this explained anywhere? 

(wasn't the plan to just take a pin for each iterator? does that not work?)


Line 300:     bytes_in_mem_ += pages_.front().len();
maybe factor this into PinPage() to match UnpinPage()


PS5, Line 324: is_pinned
i guess this use is probably okay


http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

Line 65: //
///


PS5, Line 66: In the pinned mode all pages are
            : /// pinned and the data is fully in memory.
seems redundant with the next sentence


PS5, Line 74: the
to


PS5, Line 74: PrepareForWrite
shouldn't that be PrepareForRead()?


Line 75: /// pass over the stream.
this should go away soon after switching to BufferPool, right? i.e. buffers can 
be attached to the row-batches rather than using needs-deep-copy mechanism.
if so, how about a TODO (and reference IMPALA-4179)?


Line 121: ///  <---12b---> <---12b---> <-----5b---->
i guess these examples all assume non-nullable?


Line 127: ///   caller does not need to copy if it is streaming.
are these listed in order of precedence? i.e. if a stream is both delete on 
read and pinned, the page is deleted, right?


Line 139: ///   or free up other memory before retrying.
nit:indentation of some of these paragraphs is inconsistent


Line 152: /// layout described above.
maybe a TODO/JIRA for what we had discussed with unifying AddRow() and 
AllocateRow() by evaluating expression?


Line 199:   ///    Returns false and sets 'status' to an error.
nit: indentations of these bullets are inconsistent


Line 262:   /// rows will remain valid until the stream is unpinned, destroyed, 
etc.
let's also put TODO IMPALA-4179 to make it easier to keep straight of where 
we're headed.


Line 267:   /// stream remains pinned.
does this interface require the stream to be pinned? if so, let's be explicit 
about that. if not, why not?


PS5, Line 370: offset
this isn't really the offset.


Line 374:   uint8_t* read_end_ptr_;
given we store number of rows in the page, what is this used for? oh, i see 
this is the physical end, but only used in dchecks. how about removing this and 
just use data() + len() in those dchecks?


Line 380:   uint8_t* write_end_ptr_;
same, and this one looks dead anyway.


Line 382:   /// Number of rows returned to the caller from GetNext().
... since PrepareForRead() was called?
i.e. this doesn't accumulate over passes, right?


Line 386:   int read_page_idx_;
this looks dead


Line 393:   int num_pinned_;
this doesn't looked used either, except for DebugString() which iterates the 
list anyway, so how about removing. or do the exec nodes really need to know 
that?


Line 407:   /// to AddRow() must occur before calling PrepareForRead() and 
subsequent calls to
or AllocateRow()


PS5, Line 414:   
nit extra space


PS5, Line 419: This changes the
             :   /// memory management of the stream
this sentence isn't very helpful


PS5, Line 454: vailable
available?
but does this try to increase reservation? etc


PS5, Line 463: blocks
is this talking about blocking on disk in order to repin a page? or something 
else?


Line 517:   int CalcNumPinned();
won't need this if we get rid of num_pinned_


http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/buffered-tuple-stream-v2.inline.h
File be/src/runtime/buffered-tuple-stream-v2.inline.h:

PS5, Line 29: null_indicator_bytes_per_row
NullIndicatorBytesPerRow


http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

Line 242: }
when does this get destructed?


http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

PS5, Line 104: buffer_reservation
this could use an explanation


http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

PS5, Line 153: the
that?


http://gerrit.cloudera.org:8080/#/c/5811/5/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

PS5, Line 226: freing
freeing


PS5, Line 227: deleted when freeing resources
garbled


Line 232:   /// consistent between buffers and I/O buffers.
in addition to making it consistent, i think we should also revisit this mark 
flush resources transfer and the ownership model once we make the switch. seems 
like we should be able to simplify it further.


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to