Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 )
Change subject: IMPALA-9042: Milestone 1: properly scan files that has full ACID schema ...................................................................... Patch Set 8: (6 comments) Thanks for adding so many tests! Left some comments for the tests. The changes for path resolution looks good to me. Haven't gone through the changes related to data load. I'll look into them tomorrow if still needs. http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15395/8//COMMIT_MSG@7 PS8, Line 7: IMPALA-9042 Change this to IMPALA-9484? http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@194 PS8, Line 194: "hive.acid.version" nit: make this a constant? http://gerrit.cloudera.org:8080/#/c/15395/8/be/src/exec/hdfs-orc-scanner.cc@197 PS8, Line 197: but file " : "is not nit: explain that it doesn't have metadata "hive.acid.version" = "2"? http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py File tests/query_test/test_nested_types.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_nested_types.py@213 PS8, Line 213: base_table = "functional_orc_def.complextypestbl_non_transactional" Do we have test coverage on reading nested types from full-ACID partitioned table? http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners.py@207 PS8, Line 207: functional_orc_def I think we should get the db name by "functional" + vector.get_value('table_format').db_suffix() in case that the compression is not deflate, but snappy or others in exhaustive test. http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/15395/8/tests/query_test/test_scanners_fuzz.py@181 PS8, Line 181: self.run_stmt_in_hive("insert into %s.%s select * from %s.%s" % (fuzz_db, : fuzz_table, src_db, src_table)) I'm not sure if this copies the data of complextypestbl correctly since I previously encountered a name resolution issue for struct fields: https://issues.apache.org/jira/browse/IMPALA-9410?focusedCommentId=17060613&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17060613 Could you verify it? If the issue still exists, we can copy data from the parquet table. -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Tue, 31 Mar 2020 10:10:37 +0000 Gerrit-HasComments: Yes