[ 
https://issues.apache.org/jira/browse/IMPALA-14365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Ringhofer updated IMPALA-14365:
-------------------------------------
    Description: 
It would be nice to rethink the test set and infrastructure in Impala.
A few reasons why it is actual now:
- Python 2->3 migration of EE tests is nearly ready (IMPALA-8508)
- Beeswax protocol deprecation (IMPALA-12095) is nearly finished and tests can 
assume using HS2 (or hs2-http)
- test vector set generation was wrong, the fix leads to much more test in 
exhaustive (IMPALA-13125)
- workload handling is confusing and was often misunderstood: IMPALA-3947
- our tests are simply slow (>5h to merge a commit)

Some testing decisions were made more than a decode ago, and priorities have 
also shifted since that time. Some examples:
- Impala is mainly used with Parquet files, while some rarely used file formats 
have a high footprint in the exhaustive test vector set (e.g. sequence files, 
rc files)
- HBase is rarely used through Impala while it has a large impact on dataload 
and tests
- Hive ACID tests have a huge footprint while there is little active 
development around it
- there is lot of development around Iceberg, and compared to that the Iceberg 
testdata and test set seems small
- compatibility is tested with Hive while in practice reading Spark generated 
data seems more common


  was:
It would be nice to rethink the test set and infrastructure in Impala.
A few reasons why it is actual now:
- Python 2->3 migration of EE tests is nearly ready (IMPALA-8508)
- Beeswax protocol deprecation (IMPALA-12095) is nearly finished and tests can 
assume using HS2 (or hs2-http)
- test vector set generation was wrong, the fix leads to much more test in 
exhaustive (IMPALA-8508)
- workload handling is confusing and was often misunderstood: IMPALA-3947
- our tests are simply slow (>5h to merge a commit)

Some testing decisions were made more than a decode ago, and priorities have 
also shifted since that time. Some examples:
- Impala is mainly used with Parquet files, while some rarely used file formats 
have a high footprint in the exhaustive test vector set (e.g. sequence files, 
rc files)
- HBase is rarely used through Impala while it has a large impact on dataload 
and tests
- Hive ACID tests have a huge footprint while there is little active 
development around it
- there is lot of development around Iceberg, and compared to that the Iceberg 
testdata and test set seems small
- compatibility is tested with Hive while in practice reading Spark generated 
data seems more common



> Test framework cleanup 2025
> ---------------------------
>
>                 Key: IMPALA-14365
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14365
>             Project: IMPALA
>          Issue Type: Epic
>          Components: Infrastructure, Test
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> It would be nice to rethink the test set and infrastructure in Impala.
> A few reasons why it is actual now:
> - Python 2->3 migration of EE tests is nearly ready (IMPALA-8508)
> - Beeswax protocol deprecation (IMPALA-12095) is nearly finished and tests 
> can assume using HS2 (or hs2-http)
> - test vector set generation was wrong, the fix leads to much more test in 
> exhaustive (IMPALA-13125)
> - workload handling is confusing and was often misunderstood: IMPALA-3947
> - our tests are simply slow (>5h to merge a commit)
> Some testing decisions were made more than a decode ago, and priorities have 
> also shifted since that time. Some examples:
> - Impala is mainly used with Parquet files, while some rarely used file 
> formats have a high footprint in the exhaustive test vector set (e.g. 
> sequence files, rc files)
> - HBase is rarely used through Impala while it has a large impact on dataload 
> and tests
> - Hive ACID tests have a huge footprint while there is little active 
> development around it
> - there is lot of development around Iceberg, and compared to that the 
> Iceberg testdata and test set seems small
> - compatibility is tested with Hive while in practice reading Spark generated 
> data seems more common



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to