Zihao Ye has posted comments on this change. ( http://gerrit.cloudera.org:8080/21039 )
Change subject: IMPALA-12786: Optimize count(*) for JSON scans ...................................................................... Patch Set 10: (5 comments) Thanks for the review! http://gerrit.cloudera.org:8080/#/c/21039/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21039/10//COMMIT_MSG@53 PS10, Line 53: +--------------+------------+-------+----------+------------+-----------+-------+----------+------------+--------+-------+--------+-----------+ > Nice improvement! Do you also get chance to test on nested data set, e.g. t I haven't tested on nested datasets yet, mainly because there isn't a sufficiently large JSON-formatted nested dataset available. The load_nested.py file currently does not support generating JSON format. Do we have any other available scripts that can generate it? http://gerrit.cloudera.org:8080/#/c/21039/10/be/src/exec/json/json-parser-test.cc File be/src/exec/json/json-parser-test.cc: http://gerrit.cloudera.org:8080/#/c/21039/10/be/src/exec/json/json-parser-test.cc@103 PS10, Line 103: } : else { > nit: please put "else" to the line of "}" Done http://gerrit.cloudera.org:8080/#/c/21039/10/be/src/exec/json/json-parser-test.cc@193 PS10, Line 193: TestSkip(R"({"a":null, "b":[1,true,false]})", TYPE_OBJECT); > Can we add more nested cases? E.g. Done, and that will fail due to the nested object missing key names. http://gerrit.cloudera.org:8080/#/c/21039/10/be/src/exec/json/json-parser-test.cc@247 PS10, Line 247: EXPECT_OK(js.Scan(max_rows, &num_rows)); > Can we also check row count from this is the same as the final 'row_count'? Did you referring to checking whether the row count results obtained from Scan() and Count() are consistent? If so, the EXPECT_EQ(row_count, js.row_count()) below has already checked this. http://gerrit.cloudera.org:8080/#/c/21039/10/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/21039/10/be/src/service/query-options.cc@1259 PS10, Line 1259: query_options->__set_enable_tuple_cache(enable_tuple_cache); > missing "break" after this Done -- To view, visit http://gerrit.cloudera.org:8080/21039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I97ff097661c3c577aeafeeb1518408ce7a8a255e Gerrit-Change-Number: 21039 Gerrit-PatchSet: 10 Gerrit-Owner: Zihao Ye <eyiz...@163.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Zihao Ye <eyiz...@163.com> Gerrit-Comment-Date: Tue, 14 May 2024 08:00:06 +0000 Gerrit-HasComments: Yes