Kim Jin Chul has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8747 )

Change subject: IMPALA-5993: Fix the file offset in value parsing error
......................................................................


Patch Set 2: Code-Review-1

(6 comments)

Preparing a testcase is still ongoing locally.

http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@390
PS1, Line 390:   ///  - error_in_row is an out bool.  It is set to true if any 
field had parse errors
> Can you add comments for the new parameters here?
Done


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@405
PS1, Line 405:   /// TODO: revisit this
> nit: the line wrapping looks odd, can you wrap at 90 chars?
Yes, the line 406 cannot be appended to the line 405 due to the limitation. Let 
me change it like below:

uint8_t* error_in_row, int64_t* error_offset, int64_t* curr_offset)<LF>
WARN_UNUSED_RESULT;


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@251
PS1, Line 251:     error_offsets[i] = *curr_offset;
> isn't fields[i].start the same as curr_offset?
fields[i].start might be used to calculate offset, but fields[i].start and 
curr_offset are not completely same. I think we don't have a start pointer of a 
file. We may need a ptr variable to keep the start of a file.
Do you think the following is better than this?

int64_t* start_of_file) {
...
error_offsets[i] = fields[i].start - start_of_file;

This is can be applied on this file, but it cannot be applied to sequence 
scanner impl. because sequence scanner reads a record(see line 313) and the 
calculation of pointer seems to be invalid:
https://gerrit.cloudera.org/#/c/8747/1/be/src/exec/hdfs-sequence-scanner.cc


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@668
PS1, Line 668: seError() {
> I am confused because this function still exists unchanged. If your code do
My change uses LogColumnParseError instead of LogRowParseError, but 
LogRowParseError is still necessary. I think it cannot be replaced by 
LogColumnParseError. Please see
https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-text-scanner.cc#L806


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@674
PS1, Line 674: Error(const int line,
> I agree that it is much easier to understand. One nit: we should print out
Done. Yes, I agree "file offset" is more clear.


http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@683
PS1, Line 683: char* data, int len) {
             :   if (state_->LogHasSpace() || state_->abort_on_error()) {
             :     stringstream ss;
> I don't think this change adds a big improvement, but I'm also OK with keep
Done



--
To view, visit http://gerrit.cloudera.org:8080/8747
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f
Gerrit-Change-Number: 8747
Gerrit-PatchSet: 2
Gerrit-Owner: Kim Jin Chul <jinc...@gmail.com>
Gerrit-Reviewer: Kim Jin Chul <jinc...@gmail.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Dec 2017 00:18:01 +0000
Gerrit-HasComments: Yes

Reply via email to