[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Abandoned Let's abandon for now to clean up old reviews -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 8 Gerrit-Owner: Jinchul Kim Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Jinchul Kim Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 8: @Csaba, thanks for your review. I have been busy recently an now I have room again. Let me provide a new patch set soon. Have a good day! -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 10 Apr 2018 10:49:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 1: (12 comments) I looked through the change. I can't say that I completely understand the design of the C++ part, so my comments for that part are simple local ones. http://gerrit.cloudera.org:8080/#/c/8747/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8747/7//COMMIT_MSG@11 PS7, Line 11: hit an error, it always prints the end of the file as the nit: hits http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@365 PS7, Line 365: typo: calculated http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@373 PS7, Line 373: ers to ex typo: difference http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@373 PS7, Line 373: aths in Parquet scanner. Returns the status Do you mean "taking read_bytes as argument instead of calculating it"? http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.h@375 PS7, Line 375: Status ScannerDebugAction() WARN_UNUSED_RESULT { nit: remove extra space http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.cc@497 PS7, Line 497: // In this branch, we've just materialized slots not referenced by any conjunct. Checking LogHasSpace() is not necessary here - LogError() checks if there is space for the log anyway, so the check is only useful to avoid the runtime cost of creating the error message if it would be dropped anyway. This is important if the logic is called a lot of time, for example for millions of rows. This part will be called only once (per fragment), as the function returns on the first error. http://gerrit.cloudera.org:8080/#/c/8747/7/be/src/exec/hdfs-scanner.cc@675 PS7, Line 675: stream_->filename(), line, pos, offset); : state_->LogError(ErrorMsg(TErrorCode::GENERAL, s)); : } The Substitute should be moved inside the if block - as I mentioned in line 497, the reason to call LogHasSpace() is to avoid the runtime cost of creating the error message. http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@847 PS7, Line 847: : : : : : : : : Can you explain why this function is needed? Other scanner test classes do not seem to do similar initialization. http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@875 PS7, Line 875: This step could be skipped, if the the table was created first (which creates an HDFS directory) and the data was copied afterwards. http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@890 PS7, Line 890: Why not use the default location ( /test-warehouse/$db_name.db/$table_name )? This line could be removed in that case. Note that assuming that the databases are in /test-warehouse/ makes the tests less flexible, but other tests already rely on that, and the tests are expected to run on the Impala minicluster - many tests would fail on a different cluster, for example planner tests are sensitive to many cluster parameters. So I think that it is not a goal at the moment to make the tests runnable on other clusters. If it will be a goal, then a general solution should be created (probably in ImpalaTestSuite), not custom solutions for different tests. http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@893 PS7, Line 893: : nit: indentation and spaces around '=' In Impala, new blocks are indented with 2 spaces, while continuations of broken lines are intended with 4 spaces (most of the times...). This is a difference compared to PEP8, but makes the python code more consistent with C++ and Java code in Impala. See the python part of https://cwiki.apache.org/confluence/display/IMPALA/Impala+Style+Guide for the list differences from PEP8. http://gerrit.cloudera.org:8080/#/c/8747/7/tests/query_test/test_scanners.py@899 PS7, Line 899: I would prefer to use self.execute_query(). The result it returns has a .log member, which can be split by "\n" to get the list of warnings. Another way to do this is to create a .test file in testdata/workloads, list the warnings in the " ERRORS" section and run it with self.run_test_case(). See https://github.com/apache/impala/blob/89b29d3a89cca041c541476e25435dc0d34344c1/tests/common/test_result_verifier.py#L314 for the way it processes the logs. -- To view, visit
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8747 to look at the new patch set (#8). Change subject: IMPALA-5993: Fix the file offset in value parsing error .. IMPALA-5993: Fix the file offset in value parsing error This is to fix the file offset in value parsing error messages when scanning text files. When the text scanner hit an error, it always prints the end of the file as the offset, even if the error occurs in the middle of the file. This change also contains printing errors column-wise instead of row-wise. Testing: Add two test cases: - TestWrongFileOffset.test_invalid_int - TestWrongFileOffset.test_invalid_int_gzip Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M testdata/data/README A testdata/data/invalid_int.csv A testdata/data/invalid_int.csv.gz M tests/query_test/test_scanners.py 8 files changed, 177 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8747/8 -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 8 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 7: Hi Lars, You may be busy. It would be great if you review this change again. Thanks! Regards, Jinchul -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 09 Feb 2018 00:06:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 7: (25 comments) Hi Lars, Sorry I thought I already replied your comments, but I didn't submit a botton when I answered them. Now the latest patch set is ready for your review. Thanks for your notification. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@368 PS6, Line 368: /// to the error. > Explain what error_file_offsets does in comment Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: r. Ret > Wouldn't this report an invalid value (column inside row)? Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: /// readers to exercise various failure paths in Parquet scanner. Returns the status > It seems confusing to me that there is now a LogColumnParseError and Report Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@391 PS6, Line 391: > Start comments with three /// like the other lines. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@393 PS6, Line 393: tuple (e.g. fields[0] ma > No need to mention its type since it's the same as in the declaration itsel Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@408 PS6, Line 408: /// the reason that the out error parameters are typed uint8_t instead of bool. We need > This looks like it could have some performance impact. Did you do any perf Done, the computation is moved to the error reporting function. By the way, HdfsSequenceScanner::ProcessRange needs the computation because I could not find a way to apply the computation logic. See line#337 https://gerrit.cloudera.org/#/c/8747/7/be/src/exec/hdfs-sequence-scanner.cc http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@233 PS6, Line 233: uint8_t* error_fields, uint8_t* error_in_row) { > nit: wrap at 90 characters. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@254 PS6, Line 254: return EvalConjuncts(tuple_row); > Is +1 for the delimiter? If so, can you add a brief comment? Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@578 PS6, Line 578: DCHECK_EQ(replaced, 1); > Please indent function calls 4 characters, here and elsewhere. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@759 PS6, Line 759: > nit: line wrapping. Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@761 PS6, Line 761: > Why does this not check state_->HasLogSpace() but the method below does? Done http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@768 PS6, Line 768: > I think the message was clearer before the change. Done http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README@136 PS6, Line 136: signed_integer_logical_types > Can you think of a name that captures the nature of the problems? Most of t Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@30 PS6, Line 30: from parquet.ttypes import ConvertedType > I think elsewhere we just use "os.path.join()" when we need it, so for cons Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@841 PS6, Line 841: : @classmethod > nit: missing word? Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@842 PS6, Line 842: @classmethod : def get_workload(cls): > Rephrase to say what it should do, not what the problem was. Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@852 PS6, Line 852: self.hdfs_client = get_hdfs_client_from_conf(hdfs_conf) > If you change this to "if p.c.o...: " it will also cover the empty string. Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@860 PS6, Line 860: "invalid_int": > I think it might be cleaner if every test prepared their own test data sepa Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@862 PS6, Line 862: "dir_path": os.path.join(tmpdir.strpath, "invalid_int"), > This will break with parallel test runs. Use a tmpdir fixture for pytest in Done http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@876 PS6, Line 876: put > nit: putting Done http://gerrit.cloudera.o
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 7: Hi Jinchul. Thanks for uploading a new Patch Set. Is this ready for another round of reviews? If so, could you please mark the comments you addressed as done? Thanks, Lars -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 16 Jan 2018 17:18:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8747 to look at the new patch set (#7). Change subject: IMPALA-5993: Fix the file offset in value parsing error .. IMPALA-5993: Fix the file offset in value parsing error This is to fix the file offset in value parsing error messages when scanning text files. When the text scanner hit an error, it always prints the end of the file as the offset, even if the error occurs in the middle of the file. This change also contains printing errors column-wise instead of row-wise. Testing: Add two test cases: - TestWrongFileOffset.test_invalid_int - TestWrongFileOffset.test_invalid_int_gzip Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M testdata/data/README A testdata/data/invalid_int.csv A testdata/data/invalid_int.csv.gz M tests/query_test/test_scanners.py 8 files changed, 177 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8747/7 -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 7 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 6: (25 comments) http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@368 PS6, Line 368: /// This is called from WriteAlignedTuples. Explain what error_file_offsets does in comment http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: column Wouldn't this report an invalid value (column inside row)? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@382 PS6, Line 382: /// Utility function to append an error message for an invalid column. It seems confusing to me that there is now a LogColumnParseError and ReportColumnParseError. Do we need both? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@391 PS6, Line 391: // - error_offsets is an out array. error_offsets[i] will be set to file offset Start comments with three /// like the other lines. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@393 PS6, Line 393: is an out 8 byte integer No need to mention its type since it's the same as in the declaration itself. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.h@408 PS6, Line 408: uint8_t* error_in_row, int64_t* error_file_offsets, int64_t* curr_file_offset) This looks like it could have some performance impact. Did you do any perf experiments? Have you considered not setting error_file_offsets inside WriteCompleteTuple and computing it from the field lengths during error reporting? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@233 PS6, Line 233: uint8_t* error_fields, uint8_t* error_in_row, nit: wrap at 90 characters. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@254 PS6, Line 254: *curr_offset += len + 1; Is +1 for the delimiter? If so, can you add a brief comment? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@578 PS6, Line 578: builder.CreateAdd(builder.CreateZExt(len, codegen->GetType(TYPE_BIGINT)), Please indent function calls 4 characters, here and elsewhere. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@759 PS6, Line 759: "Error parsing row: file: $0, (file offset: $1)", nit: line wrapping. http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@761 PS6, Line 761: state_->LogError(ErrorMsg(TErrorCode::GENERAL, s)); Why does this not check state_->HasLogSpace() but the method below does? http://gerrit.cloudera.org:8080/#/c/8747/6/be/src/exec/hdfs-scanner.cc@768 PS6, Line 768: at I think the message was clearer before the change. http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/8747/6/testdata/data/README@136 PS6, Line 136: problematic_rows_impala_5993 Can you think of a name that captures the nature of the problems? Most of the files in here are somewhat problematic. It could be "invalid_int.csv" (if it is an invalid int column). http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@30 PS6, Line 30: from os.path import join as join_path I think elsewhere we just use "os.path.join()" when we need it, so for consistency it would be good to do the same here. http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@841 PS6, Line 841: Tests that wrong file offset in value parsing error messages when : scanning text files. nit: missing word? http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@842 PS6, Line 842: scanning text files. When the text scanner hits an error, it always prints the end of : the file as the offset, even if the error occurs in the middle of the file. Rephrase to say what it should do, not what the problem was. http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@852 PS6, Line 852: if pytest.config.option.namenode_http_address is None: If you change this to "if p.c.o...: " it will also cover the empty string. We often return at the end of the if-branch and put the on the top level. You can move the initialization of hdfs_loc up to achieve that. http://gerrit.cloudera.org:8080/#/c/8747/6/tests/query_test/test_scanners.py@860 PS6, Line 860: def prepare(self): I think it might be cleaner if every test prepared their own test data separately. http://gerrit.cloudera.org:8080/#/c/8747/6/test
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8747 to look at the new patch set (#5). Change subject: IMPALA-5993: Fix the file offset in value parsing error .. IMPALA-5993: Fix the file offset in value parsing error This is to fix the file offset in value parsing error messages when scanning text files. When the text scanner hit an error, it always prints the end of the file as the offset, even if the error occurs in the middle of the file. This change also contains: - Print errors column-wise instead of row-wise - Adopt more user friendly message format Testing: Add two test cases: - TestWrongFileOffset.test_parsing_wrong_text - TestWrongFileOffset.test_parsing_wrong_gzip_text Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M testdata/data/README A testdata/data/problematic_rows_impala_5993.csv A testdata/data/problematic_rows_impala_5993.csv.gz M tests/query_test/test_scanners.py 8 files changed, 300 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8747/5 -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 5 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 4: Code-Review-1 Lars, as I mentioned in the JIRA, let me provide a new patch set. -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Sun, 10 Dec 2017 12:07:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 4: Can you adjust the logging details according to the discussion in the Jira? -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Fri, 08 Dec 2017 19:29:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8747 to look at the new patch set (#4). Change subject: IMPALA-5993: Fix the file offset in value parsing error .. IMPALA-5993: Fix the file offset in value parsing error This is to fix the file offset in value parsing error messages when scanning text files. When the text scanner hit an error, it always prints the end of the file as the offset, even if the error occurs in the middle of the file. This change also contains: - Print errors column-wise instead of row-wise - Adopt more user friendly message format Testing: Add two test cases: - TestWrongFileOffset.test_parsing_wrong_text - TestWrongFileOffset.test_parsing_wrong_gzip_text Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M testdata/data/README A testdata/data/problematic_rows_impala_5993.csv A testdata/data/problematic_rows_impala_5993.csv.gz M tests/query_test/test_scanners.py 8 files changed, 303 insertions(+), 53 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8747/4 -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 4 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@264 PS1, Line 264: // define i1 @WriteCompleteTuple(%"class.impala::HdfsScanner"* %this, : // %"class.impala::MemPool"* %pool, : // %"struct.impala::FieldLocation"* %fields, : // %"class.impala::Tuple"* %tuple, : // %"class.impala::TupleRow"* %tuple_row, : // %"class.impala::Tuple"* %template, : // i8* %error_fields, : // i8* %error_in_row, : // i64* %error_file_offsets, : // i64* %curr_file_offset) #35 { : // entry: : // %tuple_ptr = bitcast %"class.impala::Tuple"* %tuple to : // <{ %"struct.impala::StringValue", %"struct.impala::StringValue", i32, i32, i8 }>* : // call void @_ZN6impala11HdfsScanner9InitTupleEPNS_5TupleES2_.4( : // %"class.impala::HdfsScanner"* %this, %"class.impala::Tuple"* %template, : // %"class.impala::Tuple"* %tuple) : // %0 = bitcast %"class.impala::TupleRow"* %tuple_row to : // <{ %"struct.impala::StringValue", %"struct.impala::StringValue", i32, i32, i8 }>** : // %1 = getelementptr inbounds <{ %"struct.impala::StringValue", : // %"struct.impala::StringValue", i32, i32, i8 }>*, <{ %"struct.impala::StringValue", : // %"struct.impala::StringValue", i32, i32, i8 }>** %0, i32 0 : // store <{ %"struct.impala::StringValue", %"struct.impala::StringValue", i32, i32, : // i8 }>* %tuple_ptr, <{ %"struct.impala::StringValue", : // %"struct.impala::StringValue", i32, i32, i8 }>** %1 : // br label %parse : // : // parse:; preds = %entry : // %data_ptr = getelementptr inbounds %"struct.impala::FieldLocation", : // %"struct.impala::FieldLocation"* %fields, i32 0, i32 0 : // %len_ptr = getelementptr inbounds %"struct.impala::FieldLocation", : // %"struct.impala::FieldLocation"* %fields, i32 0, i32 1 : // %slot_error_ptr = getelementptr inbounds i8, i8* %error_fields, i32 0 : // %slot_error_file_offset_ptr = getelementptr inbounds i64, i64* %error_file_offsets, : // i32 0 : // %data = load i8*, i8** %data_ptr : // %len = load i32, i32* %len_ptr : // %len_lt_zero = icmp slt i32 %len, 0 : // %ones_compliment_len = xor i32 %len, -1 : // %positive_len = add i32 %ones_compliment_len, 1 : // %select_positive_len = select i1 %len_lt_zero, i32 %positive_len, i32 %len : // %2 = call i1 @WriteSlot(<{ %"struct.impala::StringValue", : // %"struct.impala::StringValue", i32, i32, i8 }>* %tuple_ptr, i8* %data, : // i32 %select_positive_len) : // %slot_parse_error = xor i1 %2, true : // %error_in_row1 = or i1 false, %slot_parse_error : // %3 = zext i1 %slot_parse_error to i8 : // store i8 %3, i8* %slot_error_ptr : // %curr_file_offset2 = load i64, i64* %curr_file_offset : // store i64 %curr_file_offset2, i64* %slot_error_file_offset_ptr : // %4 = zext i32 %select_positive_len to i64 : // %5 = add i64 %4, 1 : // %6 = add i64 %curr_file_offset2, %5 : // store i64 %6, i64* %curr_file_offset : // > Yes, please update it. We generally try to maintain these as we come across Done -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 05 Dec 2017 00:27:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8747 to look at the new patch set (#3). Change subject: IMPALA-5993: Fix the file offset in value parsing error .. IMPALA-5993: Fix the file offset in value parsing error This is to fix the file offset in value parsing error messages when scanning text files. When the text scanner hit an error, it always prints the end of the file as the offset, even if the error occurs in the middle of the file. Testing: Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M testdata/data/README 5 files changed, 186 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8747/3 -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 3 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 2: Code-Review-1 (6 comments) Preparing a testcase is still ongoing locally. http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@390 PS1, Line 390: /// - error_in_row is an out bool. It is set to true if any field had parse errors > Can you add comments for the new parameters here? Done http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@405 PS1, Line 405: /// TODO: revisit this > nit: the line wrapping looks odd, can you wrap at 90 chars? Yes, the line 406 cannot be appended to the line 405 due to the limitation. Let me change it like below: uint8_t* error_in_row, int64_t* error_offset, int64_t* curr_offset) WARN_UNUSED_RESULT; http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@251 PS1, Line 251: error_offsets[i] = *curr_offset; > isn't fields[i].start the same as curr_offset? fields[i].start might be used to calculate offset, but fields[i].start and curr_offset are not completely same. I think we don't have a start pointer of a file. We may need a ptr variable to keep the start of a file. Do you think the following is better than this? int64_t* start_of_file) { ... error_offsets[i] = fields[i].start - start_of_file; This is can be applied on this file, but it cannot be applied to sequence scanner impl. because sequence scanner reads a record(see line 313) and the calculation of pointer seems to be invalid: https://gerrit.cloudera.org/#/c/8747/1/be/src/exec/hdfs-sequence-scanner.cc http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@668 PS1, Line 668: seError() { > I am confused because this function still exists unchanged. If your code do My change uses LogColumnParseError instead of LogRowParseError, but LogRowParseError is still necessary. I think it cannot be replaced by LogColumnParseError. Please see https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/exec/hdfs-text-scanner.cc#L806 http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@674 PS1, Line 674: Error(const int line, > I agree that it is much easier to understand. One nit: we should print out Done. Yes, I agree "file offset" is more clear. http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@683 PS1, Line 683: char* data, int len) { : if (state_->LogHasSpace() || state_->abort_on_error()) { : stringstream ss; > I don't think this change adds a big improvement, but I'm also OK with keep Done -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 2 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Tue, 05 Dec 2017 00:18:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8747 to look at the new patch set (#2). Change subject: IMPALA-5993: Fix the file offset in value parsing error .. IMPALA-5993: Fix the file offset in value parsing error This is to fix the file offset in value parsing error messages when scanning text files. When the text scanner hit an error, it always prints the end of the file as the offset, even if the error occurs in the middle of the file. Testing: Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M testdata/data/README 5 files changed, 62 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8747/2 -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 2 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@668 PS1, Line 668: stream_->file_offset() > Sorry there was my mistake in the last sentence.I meant that please let me I am confused because this function still exists unchanged. If your code does not rely on it, can you remove it? -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 1 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 04 Dec 2017 23:32:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@668 PS1, Line 668: stream_->file_offset() > I think there are two approaches to fix: Sorry there was my mistake in the last sentence.I meant that please let me know some reasons if you would prefer the first one than the second one. The fix for the first one will read stream line by line. -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 1 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 04 Dec 2017 21:20:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@668 PS1, Line 668: stream_->file_offset() > Yes, that was the main reason for opening the Jira. Are you going to fix th I think there are two approaches to fix: 1. Still rely on the function by reading stream incrementally 2. Track offset instead of using the function My proposal adopts the second one because I thought ahead of the full scanning was intended such as better performace. I implemented the second one on this patch set. If you think there is no reason reading the stream incremenally, let me try to fix the problem with the first one. Is there your preference? Please let me know some reasons if you prefer the second one than the first one. -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 1 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 04 Dec 2017 21:12:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 1: (7 comments) Thank you for working on this. Please see my inline comments, especially regarding the additional variables to track the file offset. http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@390 PS1, Line 390: /// - error_in_row is an out bool. It is set to true if any field had parse errors Can you add comments for the new parameters here? http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.h@405 PS1, Line 405: uint8_t* error_in_row, int64_t* error_offset, nit: the line wrapping looks odd, can you wrap at 90 chars? http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@251 PS1, Line 251: error_offsets[i] = *curr_offset; isn't fields[i].start the same as curr_offset? http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@264 PS1, Line 264: // define i1 @WriteCompleteTuple(%"class.impala::HdfsScanner"* %this, : // %"class.impala::MemPool"* %pool, : // %"struct.impala::FieldLocation"* %fields, : // %"class.impala::Tuple"* %tuple, : // %"class.impala::TupleRow"* %tuple_row, : // %"class.impala::Tuple"* %template, : // i8* %error_fields, i8* %error_in_row) { : // entry: : // %tuple_ptr = bitcast %"class.impala::Tuple"* %tuple : //to <{ %"struct.impala::StringValue", i8 }>* : // %tuple_ptr1 = bitcast %"class.impala::Tuple"* %template : // to <{ %"struct.impala::StringValue", i8 }>* : // %int8_ptr = bitcast <{ %"struct.impala::StringValue", i8 }>* %tuple_ptr to i8* : // %null_bytes_ptr = getelementptr i8, i8* %int8_ptr, i32 16 : // call void @llvm.memset.p0i8.i64(i8* %null_bytes_ptr, i8 0, i64 1, i32 0, i1 false) : // %0 = bitcast %"class.impala::TupleRow"* %tuple_row : //to <{ %"struct.impala::StringValue", i8 }>** : // %1 = getelementptr <{ %"struct.impala::StringValue", i8 }>*, : // <{ %"struct.impala::StringValue", i8 }>** %0, i32 0 : // store <{ %"struct.impala::StringValue", i8 }>* %tuple_ptr, : // <{ %"struct.impala::StringValue", i8 }>** %1 : // br label %parse : // : // parse:; preds = %entry : // %data_ptr = getelementptr %"struct.impala::FieldLocation", : //%"struct.impala::FieldLocation"* %fields, i32 0, i32 0 : // %len_ptr = getelementptr %"struct.impala::FieldLocation", : // %"struct.impala::FieldLocation"* %fields, i32 0, i32 1 : // %slot_error_ptr = getelementptr i8, i8* %error_fields, i32 0 : // %data = load i8*, i8** %data_ptr : // %len = load i32, i32* %len_ptr : // %2 = call i1 @WriteSlot(<{ %"struct.impala::StringValue", i8 }>* %tuple_ptr, : // i8* %data, i32 %len) : // %slot_parse_error = xor i1 %2, true : // %error_in_row2 = or i1 false, %slot_parse_error : // %3 = zext i1 %slot_parse_error to i8 : // store i8 %3, i8* %slot_error_ptr : // %4 = call %"class.impala::ScalarExprEvaluator"* @GetConjunctCtx( : //%"class.impala::HdfsScanner"* %this, i32 0) : // %conjunct_eval = call i16 @"impala::Operators::Eq_StringVal_StringValWrapper"( : //%"class.impala::ScalarExprEvaluator"* %4, %"class.impala::TupleRow"* %tuple_row) : // %5 = ashr i16 %conjunct_eval, 8 : // %6 = trunc i16 %5 to i8 : // %val = trunc i8 %6 to i1 : // br i1 %val, label %parse3, label %eval_fail : // : // parse3: ; preds = %parse : // %7 = zext i1 %error_in_row2 to i8 : // store i8 %7, i8* %error_in_row : // ret i1 true : // : // eval_fail:; preds = %parse : // ret i1 false : // } > Should I ma
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@264 PS1, Line 264: // define i1 @WriteCompleteTuple(%"class.impala::HdfsScanner"* %this, : // %"class.impala::MemPool"* %pool, : // %"struct.impala::FieldLocation"* %fields, : // %"class.impala::Tuple"* %tuple, : // %"class.impala::TupleRow"* %tuple_row, : // %"class.impala::Tuple"* %template, : // i8* %error_fields, i8* %error_in_row) { : // entry: : // %tuple_ptr = bitcast %"class.impala::Tuple"* %tuple : //to <{ %"struct.impala::StringValue", i8 }>* : // %tuple_ptr1 = bitcast %"class.impala::Tuple"* %template : // to <{ %"struct.impala::StringValue", i8 }>* : // %int8_ptr = bitcast <{ %"struct.impala::StringValue", i8 }>* %tuple_ptr to i8* : // %null_bytes_ptr = getelementptr i8, i8* %int8_ptr, i32 16 : // call void @llvm.memset.p0i8.i64(i8* %null_bytes_ptr, i8 0, i64 1, i32 0, i1 false) : // %0 = bitcast %"class.impala::TupleRow"* %tuple_row : //to <{ %"struct.impala::StringValue", i8 }>** : // %1 = getelementptr <{ %"struct.impala::StringValue", i8 }>*, : // <{ %"struct.impala::StringValue", i8 }>** %0, i32 0 : // store <{ %"struct.impala::StringValue", i8 }>* %tuple_ptr, : // <{ %"struct.impala::StringValue", i8 }>** %1 : // br label %parse : // : // parse:; preds = %entry : // %data_ptr = getelementptr %"struct.impala::FieldLocation", : //%"struct.impala::FieldLocation"* %fields, i32 0, i32 0 : // %len_ptr = getelementptr %"struct.impala::FieldLocation", : // %"struct.impala::FieldLocation"* %fields, i32 0, i32 1 : // %slot_error_ptr = getelementptr i8, i8* %error_fields, i32 0 : // %data = load i8*, i8** %data_ptr : // %len = load i32, i32* %len_ptr : // %2 = call i1 @WriteSlot(<{ %"struct.impala::StringValue", i8 }>* %tuple_ptr, : // i8* %data, i32 %len) : // %slot_parse_error = xor i1 %2, true : // %error_in_row2 = or i1 false, %slot_parse_error : // %3 = zext i1 %slot_parse_error to i8 : // store i8 %3, i8* %slot_error_ptr : // %4 = call %"class.impala::ScalarExprEvaluator"* @GetConjunctCtx( : //%"class.impala::HdfsScanner"* %this, i32 0) : // %conjunct_eval = call i16 @"impala::Operators::Eq_StringVal_StringValWrapper"( : //%"class.impala::ScalarExprEvaluator"* %4, %"class.impala::TupleRow"* %tuple_row) : // %5 = ashr i16 %conjunct_eval, 8 : // %6 = trunc i16 %5 to i8 : // %val = trunc i8 %6 to i1 : // br i1 %val, label %parse3, label %eval_fail : // : // parse3: ; preds = %parse : // %7 = zext i1 %error_in_row2 to i8 : // store i8 %7, i8* %error_in_row : // ret i1 true : // : // eval_fail:; preds = %parse : // ret i1 false : // } Should I maintain this? When I dump IR, this comment has not been updated for a while. I would like to remove the old IR completely because it is meaningless if we spend our precious time. -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 1 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 04 Dec 2017 15:16:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has posted comments on this change. ( http://gerrit.cloudera.org:8080/8747 ) Change subject: IMPALA-5993: Fix the file offset in value parsing error .. Patch Set 1: Code-Review-1 (4 comments) As I commented, this change does not include a test which is ongoing locally. By the way, I would like to get your early review for the kernel code change. Please check my comments. Thanks. http://gerrit.cloudera.org:8080/#/c/8747/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8747/1//COMMIT_MSG@14 PS1, Line 14: Testing: A custom test will be introduced at tests/custom_cluster/test_impala_5993.py. The test contains some hdfs commands which come from tests/util/hdfs_util.py and table will be created by 'create table stored as textfile location' query. http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@668 PS1, Line 668: stream_->file_offset() The offset can point to EOF because the data is read ahead of time via ScannerContext :: Stream :: GetBuffer in impala::HdfsTextScanner::FillByteBuffer. Therefore, file_offset cannot be used to print the problematic offset. http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@674 PS1, Line 674: Error parsing row: file: $0, (line: $1, pos: $2, offset: $3) I think the refined message format is more user friendly. Let me provide another change if you think this improvement cannot be in a bug fix. http://gerrit.cloudera.org:8080/#/c/8747/1/be/src/exec/hdfs-scanner.cc@683 PS1, Line 683: "Error converting column at the index: " :<< desc->col_pos() - scan_node_->num_partition_keys() :<< " (type: " << desc->type() << ")"; I think the refined message format is more user friendly. Let me provide another change if you think this improvement cannot be in a bug fix. -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 1 Gerrit-Owner: Kim Jin Chul Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 04 Dec 2017 15:09:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5993: Fix the file offset in value parsing error
Kim Jin Chul has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8747 Change subject: IMPALA-5993: Fix the file offset in value parsing error .. IMPALA-5993: Fix the file offset in value parsing error This is to fix the file offset in value parsing error messages when scanning text files. When the text scanner hit an error, it always prints the end of the file as the offset, even if the error occurs in the middle of the file. Testing: Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc 4 files changed, 49 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8747/1 -- To view, visit http://gerrit.cloudera.org:8080/8747 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4a4ac3199ffe12fcce0bf792b3e6ce529b9af61f Gerrit-Change-Number: 8747 Gerrit-PatchSet: 1 Gerrit-Owner: Kim Jin Chul