[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Pooja Nilangekar has posted comments on this change. Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS1, Line 335: SkipStreamIfEosr(); > Just a passing comment - if this DCHECK fails, all it will tell us via the I uploaded another patch but didn't publish it. I've moved the checks to a single function and I am now checking both conditions. http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: PS1, Line 332: > consider case where current_idx_ < 0 as well? Done -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Pooja Nilangekar Gerrit-HasComments: Yes
[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Pooja Nilangekar has uploaded a new patch set (#3). Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Build a ConcatenatedStreams wrapper for ScannerContext::Stream The ConcatenatedStreams class keeps track of a vector of streams which can be utilized by the BaseScalarColumnReader to read transparently from a list of streams. The ConcatenatedStreams would hide the different ScanRanges from the caller and provide the abstraction of a single stream. The class uses a list of streams and a index to point to the current active stream. The class moves to the next stream once the current stream reaches its end of stream. Testing: The existing test_scanners.py test was used to validate the correctness of the ConcatenatedStreams. Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h 4 files changed, 111 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/3 -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Pooja Nilangekar
[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Pooja Nilangekar has uploaded a new patch set (#2). Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Build a ConcatenatedStreams wrapper for ScannerContext::Stream The ConcatenatedStreams class keeps track of a vector of streams which can be utilized by the BaseScalarColumnReader to read transparently from a list of streams. The ConcatenatedStreams would hide the different ScanRanges from the caller and provide the abstraction of a single stream. The class uses a list of streams and a index to point to the current active stream. The class moves to the next stream once the current stream reaches its end of stream. Testing: The existing test_scanners.py test was used to validate the correctness of the ConcatenatedStreams. Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h 4 files changed, 110 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/2 -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Pooja Nilangekar
[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Henry Robinson has posted comments on this change. Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS1, Line 335: DCHECK(ValidateCurrentIndex()); Just a passing comment - if this DCHECK fails, all it will tell us via the logs is that ValidateCurrentIndex() was false, because the DCHECK will just print the condition that failed. This is useful for debugging, but not as useful as knowing *how* it failed (off by one error? was the idx badly initialized?). So my suggestion is to have ValidateCurrentIndex() look more like this: void ValidateCurrentIndex() { DCHECK_LE(0, current_idx_); DCHECK_GT(streams_.size(), current_idx_); } The compiler will figure out in release mode that the method call does nothing, and should be able to optimize it out. http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: PS1, Line 332: return current_idx_ consider case where current_idx_ < 0 as well? -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Pooja Nilangekar Gerrit-HasComments: Yes
[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Pooja Nilangekar has posted comments on this change. Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/7513/1//COMMIT_MSG Commit Message: Line 16: > It's best to document the TODO in the ConcatenatedStreams class itself, sin Done http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS1, Line 330: SkipStreamIfEosr(); > Is there a reason why you cannot DCHECK here, too? Done PS1, Line 346: uint8_t** buffer, int64_t* out_len, Status* status, bool peek) { : SkipStreamIfEosr(); : r > This block looks like a candidate for a method that increments current_idx_ Done http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: PS1, Line 219: boundary_buffer_ > This buffer looks like a good candidate in case you want to stitch together Should I look into this in a separate change? PS1, Line 273: Concatenates a list of streams by reading them sequentia > "Concatenates a list of streams by reading them sequentially." ? Done PS1, Line 274: s a wrapper around the functions which are : /// utilized by BaseScalarColumnReader. > This is a detail of how the class can be used. Try to describe only what it Done Line 287: > pass by reference The scope of the vector is causing issues for call by reference. Line 290: > I don't think you need to explicitly define it as empty, since that's the d Done PS1, Line 295: ed_len byt > missing word? Done PS1, Line 299: > I think you can just say "the current stream", here and elsewhere. Current stream could be misleading since incase the current stream has hit eosr, the current index is advanced to ensure that the bytes are read from a following stream which has not hit eosr. PS1, Line 300: > Shouldn't it say "the last stream" since you're concatenating them in order The current index would reach the last stream only if every stream before it has hit eosr. So I was thinking 'every stream' makes it clear. PS1, Line 313: whic > from? Done Line 317: int current_idx_; > If you already know all the streams when creating the class, then you could Done -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Pooja Nilangekar Gerrit-HasComments: Yes
[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Pooja Nilangekar has uploaded a new patch set (#2). Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Build a ConcatenatedStreams wrapper for ScannerContext::Stream The ConcatenatedStreams class keeps track of a vector of streams which can be utilized by the BaseScalarColumnReader to read transparently from a list of streams. The ConcatenatedStreams would hide the different ScanRanges from the caller and provide the abstraction of a single stream. The class uses a list of streams and a index to point to the current active stream. The class moves to the next stream once the current stream reaches its end of stream. Testing: The existing test_scanners.py test was used to validate the correctness of the ConcatenatedStreams. Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h 4 files changed, 110 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/2 -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Lars Volker
[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Lars Volker has posted comments on this change. Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Patch Set 1: (13 comments) Looks good to me overall, please see my comments. http://gerrit.cloudera.org:8080/#/c/7513/1//COMMIT_MSG Commit Message: Line 16: TODO: Currently, the ConcatenatedStreams don't support reads which span It's best to document the TODO in the ConcatenatedStreams class itself, since that's where users of the class would look for limitations. http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS1, Line 330: if (!ValidateCurrentIndex()) return -1; Is there a reason why you cannot DCHECK here, too? PS1, Line 346: while (streams_[current_idx_]->eosr()) { : if (!GetNextStream()) break; : } This block looks like a candidate for a method that increments current_idx_ until it the current stream is not at eosr or it's at the last one. Maybe SkipStreamIfEosr()? There's probably a better name. :) http://gerrit.cloudera.org:8080/#/c/7513/1/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: PS1, Line 219: boundary_buffer_ This buffer looks like a good candidate in case you want to stitch together reads over stream boundaries at some point, using the buffer of either the current or the next stream. PS1, Line 273: Encapsulates a list of concatenated streams which can be "Concatenates a list of streams by reading them sequentially." ? PS1, Line 274: For columns where none of the pages can be : /// skipped, the list contains only one stream. This is a detail of how the class can be used. Try to describe only what it does, but mention as a limitation / TODO that the caller must make sure to not read across page boundaries. Line 287: ConcatenatedStreams(const vectorstreams) pass by reference Line 290: ~ConcatenatedStreams() {} I don't think you need to explicitly define it as empty, since that's the default. PS1, Line 295: all the missing word? PS1, Line 299: the first stream I think you can just say "the current stream", here and elsewhere. PS1, Line 300: every stream Shouldn't it say "the last stream" since you're concatenating them in order? PS1, Line 313: into from? Line 317: void AddStream(Stream* stream) { streams_.push_back(stream); } If you already know all the streams when creating the class, then you could only use the ConcatenatedStreams(const vector &) ctor. That would remove the possible states that the class is in (can read more, eosr). -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] Build a ConcatenatedStreams wrapper for ScannerContext::Stream
Pooja Nilangekar has uploaded a new change for review. http://gerrit.cloudera.org:8080/7513 Change subject: Build a ConcatenatedStreams wrapper for ScannerContext::Stream .. Build a ConcatenatedStreams wrapper for ScannerContext::Stream The ConcatenatedStreams class keeps track of a vector of streams which can be utilized by the BaseScalarColumnReader to read transparently from a list of streams. The ConcatenatedStreams would hide the different ScanRanges from the caller and provide the abstraction of a single stream. The class uses a list of streams and a index to point to the current active stream. The class moves to the next stream once the current stream reaches its end of stream. TODO: Currently, the ConcatenatedStreams don't support reads which span across multiple streams. Thus the maximum bytes you can read at any given instance is dictated by the number of bytes remaining in the current active stream. Testing: The existing test_scanners.py test was used to validate the correctness of the ConcatenatedStreams. Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/parquet-column-readers.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h 4 files changed, 125 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/7513/1 -- To view, visit http://gerrit.cloudera.org:8080/7513 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I026f7a005ba4d6fc412ca0884f4fa0459da9c885 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pooja NilangekarGerrit-Reviewer: Lars Volker