Zihao Ye has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21039 )

Change subject: IMPALA-12786: Optimize count(*) for JSON scans
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser-test.cc
File be/src/exec/json/json-parser-test.cc:

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser-test.cc@207
PS8, Line 207: TestSkip(R"("abc\")", TYPE_STRING, 
kParseErrorStringMissQuotationMark);
> Add negative/positive testcase against unicode and/or encoded unicode?
Done


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h
File be/src/exec/json/json-parser.h:

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@253
PS8, Line 253: 
https://github.com/Tencent/rapidjson/blob/master/include/rapidjson/reader.h#L539
> nit: Use permalink
Done


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@269
PS8, Line 269: inline bool Consume(char expect) {
> nit: Please add comment about what this function do.
Done


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@282
PS8, Line 282: /// The following function is used to skip a specific JSON 
value. It maintains logic
             :   /// consistent with rapidjson, consuming the stream and 
returning true upon successfully
             :   /// skipping the specified value, or returning false and 
setting the respective error
             :   /// code if an error is encountered.
> nit: Consider mentioning this RFC about valid JSON values.
Done


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.h@293
PS8, Line 293: bool SkipValue();
> rapidjson::GenericReader has ParseHex4 to handle unicode. Will that be a co
I believe this won't be a concern. As mentioned in the commit message, 
erroneous encoding does not affect our counting of JSON objects. We do not 
require the actual data of the string values, as long as it is a correctly 
formatted string value (with unescaped starting and ending double quotes) then 
we are able to correctly skip over it. Therefore, we do not need ParseHex4 to 
specifically handle Unicode, because it does not affect our recognition of the 
ending double quotes for strings.


http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.cc
File be/src/exec/json/json-parser.cc:

http://gerrit.cloudera.org:8080/#/c/21039/8/be/src/exec/json/json-parser.cc@332
PS8, Line 332: template<class Stream>
             : bool JsonSkipper<Stream>::SkipNumber() {
> rapidjson::GenericReader seems to handle NaN and Infinity. This one does no
Done, and I have created another ticket IMPALA-12957 to track the feature 
support for Inf & NaN.



--
To view, visit http://gerrit.cloudera.org:8080/21039
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I97ff097661c3c577aeafeeb1518408ce7a8a255e
Gerrit-Change-Number: 21039
Gerrit-PatchSet: 8
Gerrit-Owner: Zihao Ye <eyiz...@163.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Zihao Ye <eyiz...@163.com>
Gerrit-Comment-Date: Fri, 29 Mar 2024 09:07:19 +0000
Gerrit-HasComments: Yes

Reply via email to