[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-28 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has posted comments on this change.

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS1, Line 335: SkipStreamIfEosr();
> Just a passing comment - if this DCHECK fails, all it will tell us via the 
I uploaded another patch but didn't publish it. I've moved the checks to a 
single function and I am now checking both conditions.


http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

PS1, Line 332: 
> consider case where current_idx_ < 0 as well?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-28 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has uploaded a new patch set (#3).

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..

Build a ConcatenatedStreams wrapper for ScannerContext::Stream

The ConcatenatedStreams class keeps track of a vector of streams which
can be utilized by the BaseScalarColumnReader to read transparently
from a list of streams. The ConcatenatedStreams would hide the
different ScanRanges from the caller and provide the abstraction of
a single stream. The class uses a list of streams and a index to point
to the current active stream. The class moves to the next stream once
the current stream reaches its end of stream.

Testing: The existing test_scanners.py test was used to validate the
correctness of the ConcatenatedStreams.

Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
4 files changed, 111 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 


[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-28 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has uploaded a new patch set (#2).

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..

Build a ConcatenatedStreams wrapper for ScannerContext::Stream

The ConcatenatedStreams class keeps track of a vector of streams which
can be utilized by the BaseScalarColumnReader to read transparently
from a list of streams. The ConcatenatedStreams would hide the
different ScanRanges from the caller and provide the abstraction of
a single stream. The class uses a list of streams and a index to point
to the current active stream. The class moves to the next stream once
the current stream reaches its end of stream.

Testing: The existing test_scanners.py test was used to validate the
correctness of the ConcatenatedStreams.

Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
4 files changed, 110 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 


[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-28 Thread Henry Robinson (Code Review)
Henry Robinson has posted comments on this change.

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS1, Line 335: DCHECK(ValidateCurrentIndex());
Just a passing comment - if this DCHECK fails, all it will tell us via the logs 
is that ValidateCurrentIndex() was false, because the DCHECK will just print 
the condition that failed. 

This is useful for debugging, but not as useful as knowing *how* it failed (off 
by one error? was the idx badly initialized?). So my suggestion is to have 
ValidateCurrentIndex() look more like this:

   void ValidateCurrentIndex() {
 DCHECK_LE(0, current_idx_);
 DCHECK_GT(streams_.size(), current_idx_);
   }

The compiler will figure out in release mode that the method call does nothing, 
and should be able to optimize it out.


http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

PS1, Line 332: return current_idx_
consider case where current_idx_ < 0 as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-27 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has posted comments on this change.

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/7513/1//COMMIT_MSG
Commit Message:

Line 16: 
> It's best to document the TODO in the ConcatenatedStreams class itself, sin
Done


http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS1, Line 330:   SkipStreamIfEosr();
> Is there a reason why you cannot DCHECK here, too?
Done


PS1, Line 346:   uint8_t** buffer, int64_t* out_len, Status* status, bool peek) 
{
 :   SkipStreamIfEosr();
 :   r
> This block looks like a candidate for a method that increments current_idx_
Done


http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

PS1, Line 219: boundary_buffer_
> This buffer looks like a good candidate in case you want to stitch together
Should I look into this in a separate change?


PS1, Line 273: Concatenates a list of streams by reading them sequentia
> "Concatenates a list of streams by reading them sequentially." ?
Done


PS1, Line 274: s a wrapper around the functions which are
 :   /// utilized by BaseScalarColumnReader.
> This is a detail of how the class can be used. Try to describe only what it
Done


Line 287: 
> pass by reference
The scope of the vector is causing issues for call by reference.


Line 290: 
> I don't think you need to explicitly define it as empty, since that's the d
Done


PS1, Line 295: ed_len byt
> missing word?
Done


PS1, Line 299: 
> I think you can just say "the current stream", here and elsewhere.
Current stream could be misleading since incase the current stream has hit 
eosr, the current index is advanced to ensure that the bytes are read from a 
following stream which has not hit eosr.


PS1, Line 300: 
> Shouldn't it say "the last stream" since you're concatenating them in order
The current index would reach the last stream only if every stream before it 
has hit eosr. So I was thinking 'every stream' makes it clear.


PS1, Line 313: whic
> from?
Done


Line 317: int current_idx_;
> If you already know all the streams when creating the class, then you could
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-27 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has uploaded a new patch set (#2).

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..

Build a ConcatenatedStreams wrapper for ScannerContext::Stream

The ConcatenatedStreams class keeps track of a vector of streams which
can be utilized by the BaseScalarColumnReader to read transparently
from a list of streams. The ConcatenatedStreams would hide the
different ScanRanges from the caller and provide the abstraction of
a single stream. The class uses a list of streams and a index to point
to the current active stream. The class moves to the next stream once
the current stream reaches its end of stream.

Testing: The existing test_scanners.py test was used to validate the
correctness of the ConcatenatedStreams.

Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
4 files changed, 110 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-26 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..


Patch Set 1:

(13 comments)

Looks good to me overall, please see my comments.

http://gerrit.cloudera.org:8080/#/c/7513/1//COMMIT_MSG
Commit Message:

Line 16: TODO: Currently, the ConcatenatedStreams don't support reads which span
It's best to document the TODO in the ConcatenatedStreams class itself, since 
that's where users of the class would look for limitations.


http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

PS1, Line 330:   if (!ValidateCurrentIndex()) return -1;
Is there a reason why you cannot DCHECK here, too?


PS1, Line 346: while (streams_[current_idx_]->eosr()) {
 : if (!GetNextStream()) break;
 :   }
This block looks like a candidate for a method that increments current_idx_ 
until it the current stream is not at eosr or it's at the last one. Maybe 
SkipStreamIfEosr()? There's probably a better name. :)


http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

PS1, Line 219: boundary_buffer_
This buffer looks like a good candidate in case you want to stitch together 
reads over stream boundaries at some point, using the buffer of either the 
current or the next stream.


PS1, Line 273: Encapsulates a list of concatenated streams which can be
"Concatenates a list of streams by reading them sequentially." ?


PS1, Line 274: For columns where none of the pages can be
 :   /// skipped, the list contains only one stream.
This is a detail of how the class can be used. Try to describe only what it 
does, but mention as a limitation / TODO that the caller must make sure to not 
read across page boundaries.


Line 287: ConcatenatedStreams(const vector streams)
pass by reference


Line 290: ~ConcatenatedStreams() {}
I don't think you need to explicitly define it as empty, since that's the 
default.


PS1, Line 295:   all the 
missing word?


PS1, Line 299: the first stream
I think you can just say "the current stream", here and elsewhere.


PS1, Line 300: every stream
Shouldn't it say "the last stream" since you're concatenating them in order?


PS1, Line 313: into
from?


Line 317: void AddStream(Stream* stream) { streams_.push_back(stream); }
If you already know all the streams when creating the class, then you could 
only use the ConcatenatedStreams(const vector&) ctor. That would 
remove the possible states that the class is in (can read more, eosr).


-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream

2017-07-26 Thread Pooja Nilangekar (Code Review)
Pooja Nilangekar has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7513

Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream
..

Build a ConcatenatedStreams wrapper for ScannerContext::Stream

The ConcatenatedStreams class keeps track of a vector of streams which
can be utilized by the BaseScalarColumnReader to read transparently
from a list of streams. The ConcatenatedStreams would hide the
different ScanRanges from the caller and provide the abstraction of
a single stream. The class uses a list of streams and a index to point
to the current active stream. The class moves to the next stream once
the current stream reaches its end of stream.
TODO: Currently, the ConcatenatedStreams don't support reads which span
across multiple streams. Thus the maximum bytes you can read at any
given instance is dictated by the number of bytes remaining in the
current active stream.

Testing: The existing test_scanners.py test was used to validate the
correctness of the ConcatenatedStreams.

Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
4 files changed, 125 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7513
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Lars Volker