--- title: "Week 3" date: 2023-06-15 --- ### Model training
Basic [model training]( https://github.com/baolef/libreoffice-ci/blob/main/train.py) pipeline is completed with [testselect]( https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py) model. Further optimization is needed to reduce memory and time cost, together with performance. Currently, [testselect]( https://github.com/baolef/libreoffice-ci/blob/main/models/testselect.py) is trained on a subset of size 16384 (containing training and testing set) of the full dataset of size 122019 due to memory cost, and it has reached a failure recall of 91.4% and saving 90% of unit test computational cost. Its detailed confusion matrix is shown below: | | Fail (Predicted) | Pass (Predicted) | |---------------|------------------|------------------| | Fail (Actual) | 480 | 45 | | Pass (Actual) | 556910 | 5045893 |