[ https://issues.apache.org/jira/browse/IMPALA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Csaba Ringhofer updated IMPALA-14365: ------------------------------------- Description: It would be nice to rethink the test set and infrastructure in Impala. A few reasons why it is actual now: - Python 2->3 migration of EE tests is nearly ready (IMPALA-8508) - Beeswax protocol deprecation (IMPALA-12095) is nearly finished and tests can assume using HS2 (or hs2-http) - test vector set generation was wrong, the fix leads to much more test in exhaustive (IMPALA-13125) - workload handling is confusing and was often misunderstood: IMPALA-3947 - our tests are simply slow (>5h to merge a commit) Some testing decisions were made more than a decode ago, and priorities have also shifted since that time. Some examples: - Impala is mainly used with Parquet files, while some rarely used file formats have a high footprint in the exhaustive test vector set (e.g. sequence files, rc files) - HBase is rarely used through Impala while it has a large impact on dataload and tests - Hive ACID tests have a huge footprint while there is little active development around it - there is lot of development around Iceberg, and compared to that the Iceberg testdata and test set seems small - compatibility is tested with Hive while in practice reading Spark generated data seems more common was: It would be nice to rethink the test set and infrastructure in Impala. A few reasons why it is actual now: - Python 2->3 migration of EE tests is nearly ready (IMPALA-8508) - Beeswax protocol deprecation (IMPALA-12095) is nearly finished and tests can assume using HS2 (or hs2-http) - test vector set generation was wrong, the fix leads to much more test in exhaustive (IMPALA-8508) - workload handling is confusing and was often misunderstood: IMPALA-3947 - our tests are simply slow (>5h to merge a commit) Some testing decisions were made more than a decode ago, and priorities have also shifted since that time. Some examples: - Impala is mainly used with Parquet files, while some rarely used file formats have a high footprint in the exhaustive test vector set (e.g. sequence files, rc files) - HBase is rarely used through Impala while it has a large impact on dataload and tests - Hive ACID tests have a huge footprint while there is little active development around it - there is lot of development around Iceberg, and compared to that the Iceberg testdata and test set seems small - compatibility is tested with Hive while in practice reading Spark generated data seems more common > Test framework cleanup 2025 > --------------------------- > > Key: IMPALA-14365 > URL: https://issues.apache.org/jira/browse/IMPALA-14365 > Project: IMPALA > Issue Type: Epic > Components: Infrastructure, Test > Reporter: Csaba Ringhofer > Priority: Major > > It would be nice to rethink the test set and infrastructure in Impala. > A few reasons why it is actual now: > - Python 2->3 migration of EE tests is nearly ready (IMPALA-8508) > - Beeswax protocol deprecation (IMPALA-12095) is nearly finished and tests > can assume using HS2 (or hs2-http) > - test vector set generation was wrong, the fix leads to much more test in > exhaustive (IMPALA-13125) > - workload handling is confusing and was often misunderstood: IMPALA-3947 > - our tests are simply slow (>5h to merge a commit) > Some testing decisions were made more than a decode ago, and priorities have > also shifted since that time. Some examples: > - Impala is mainly used with Parquet files, while some rarely used file > formats have a high footprint in the exhaustive test vector set (e.g. > sequence files, rc files) > - HBase is rarely used through Impala while it has a large impact on dataload > and tests > - Hive ACID tests have a huge footprint while there is little active > development around it > - there is lot of development around Iceberg, and compared to that the > Iceberg testdata and test set seems small > - compatibility is tested with Hive while in practice reading Spark generated > data seems more common -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org