Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 )
Change subject: IMPALA-9741: Support querying Iceberg table by impala ...................................................................... Patch Set 13: (10 comments) Thanks for applying the changes and for the tests. Unfortunately last time the gerrit "New UI" hid some files from me... http://gerrit.cloudera.org:8080/#/c/16143/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16143/13//COMMIT_MSG@38 PS13, Line 38: custom cluster test test_iceberg.py The tests are added to test_scanners.py http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/data/README@503 PS13, Line 503: Supported query icebreg table by impala Please update this to reflect the current title http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/data/README@504 PS13, Line 504: Apache Iceberg is an open table format for huge analytic datasets. : Iceberg avoids unpleasant surprises. Schema evolution works and won’t inadvertently : un-delete data. Users don’t need to know about partitioning to get fast queries. : And Iceberg was designed to solve correctness problems in eventually-consistent cloud : object stores. Please provide a description about the tables in the iceberg directory, not about Iceberg itself. Also, please mention how the data was generated, e.g. what tool and what version. http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/datasets/functional/functional_schema_template.sql@2868 PS13, Line 2868: STORED AS ICEBERG Don't you need to provide a partition spec? Also, please add a SHOW PARTITIONS test for it in one of the test files. http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test File testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test: http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@3 PS13, Line 3: Supported query icebreg table by impala Please update this http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@42 PS13, Line 42: 14 You could add a RUNTIME_PROFILE section to check that partition pruning worked. E.g. https://github.com/apache/impala/blob/76e4a17fb379bb232618dccb4ad3504dbe5c945c/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test#L29 http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@77 PS13, Line 77: SELECT count(*) from iceberg_non_partitioned Please add non-count queries as well. After you are convinced that the queries produce the correct results, you can use the following command to re-generate this file: bin/impala-py.test --update_results tests/query_test/test_scanners.py::TestIceberg::test_iceberg You'll find the generated .test file in 'logs/ee_tests/' Alternatively, you can set these in the Impala shell: set write_delimited=true; set delimiter=,; Then Impala shell will produce an output that can be copy-pasted here. http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@83 PS13, Line 83: ==== Please add SHOW FILES for each table. Also, you could add an iceberg-negative.test that'd check we get proper error messages for features not supported for Iceberg. E.g.: https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test http://gerrit.cloudera.org:8080/#/c/16143/13/tests/custom_cluster/test_iceberg.py File tests/custom_cluster/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/16143/13/tests/custom_cluster/test_iceberg.py@23 PS13, Line 23: class TestCreatingIcebergTable(CustomClusterTestSuite): This test could be put into tests/query_test, instead of tests/custom_cluster. And it could just inherit from ImpalaTestSuite. This way we wouldn't need to restart the Impala cluster whenever this test is being run. http://gerrit.cloudera.org:8080/#/c/16143/13/tests/custom_cluster/test_iceberg.py@30 PS13, Line 30: @pytest.mark.execute_serially I think there's no need for this annotation. Since this test is using a unique database it can be run in parallel with other tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Sun, 19 Jul 2020 11:34:42 +0000 Gerrit-HasComments: Yes
