Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
On Saturday, 25 June 2022 01:57:24 PDT Volker Hilsheimer wrote: > Perhaps this is a good time to discuss whether we should move Qt Network > into its own repository. This would make qtbase integrations less exposed > to network failure, which - even without certificates expiring - are a fact > of life. And qtbase integrations already suffer from plenty of flakiness. > And that an operational issue might require patches to merge and to get > cherry picked, which might take several attempts, each taking several > hours, just amplifies that problem further. > Conceptually, we have made that kind of change before (when taking Qt > Positioning out of the qtlocation repo). But there are some challenges. > One challenge is that several of our Qt Core tests are using networking > features (tests outside of tests/auto/network that include > network-settings.h: tst_qdir, tst_qdiriterator, tst_qfile, tst_qfileinfo, > tst_qiodevice, tst_qtextstream, tst_qfiledialog2). Without having looked > into the details, I’d assume that we might not need an actual server to > test many of those codepaths (or that those tests can be moved into a > qtnetwork repo, ie. QTextStream::stillOpenWhenAtEnd doesn’t seem to test > QTextStream, which never closes a QIODevice). Personally, I'd prefer if those Core tests ddn't use Networking. The majority of them aren't actually using QtNetwork, they are the Windows portion that deals with the SMB server provided by the Network Test Server. So the issue isn't that of QtNetwork, but of the NTS and would remain anyway. That would leave a few tests like QTextStream that use QTcpSocket for some particular QIODevice sequential condition, but which could be replaced with an identical condition with a different class, like QProcess. But what's the gain? This looks like a lot of effort to me, particularly if we don't move the UNC path tests in the file classes. Not looking scientifically at it, but from memory, the network test server and the networking tests haven't been the majority of spurious failures in the CI. They're a big contributor, but not the majority. From a random sampling of test failures in the past week, I see: Non-test failures: * general CI failures - "failed to acquire machine" [1] * sccache network failures [2] * licensing issues with the INTEGRITY compiler * timeouts [3] * weird unexplained failures like [4] or [5] Test failures: * flaky tests on timing (QMutex, QDeadlineTimer, etc.) * QFSModel on macOS on ARM [6] * a std::filesystem unexplored issue on Windows [7] * some widget issues like [8] or [9] And yes, network test failures in https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655473411 But they are nowhere near the majority, even the plurality. The CI general failures, sccache failures and timeouts appear to be far more common and deserve more attention. Even among pure test failures the network ones don't appear to be the largest contributor. So I have to ask: is the effort worth the benefit? [1] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655663797 [2] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1656078094 [3] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1656083019 [4] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655995816 [5] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1656009936 [6] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1654781141 [7] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1654295531 [8] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655717505 [9] https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1647034101 -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
> On 25 Jun 2022, at 18:33, Thiago Macieira wrote: > > On Saturday, 25 June 2022 01:57:24 PDT Volker Hilsheimer wrote: >> Perhaps this is a good time to discuss whether we should move Qt Network >> into its own repository. This would make qtbase integrations less exposed >> to network failure, which - even without certificates expiring - are a fact >> of life. And qtbase integrations already suffer from plenty of flakiness. >> And that an operational issue might require patches to merge and to get >> cherry picked, which might take several attempts, each taking several >> hours, just amplifies that problem further. > >> Conceptually, we have made that kind of change before (when taking Qt >> Positioning out of the qtlocation repo). But there are some challenges. > >> One challenge is that several of our Qt Core tests are using networking >> features (tests outside of tests/auto/network that include >> network-settings.h: tst_qdir, tst_qdiriterator, tst_qfile, tst_qfileinfo, >> tst_qiodevice, tst_qtextstream, tst_qfiledialog2). Without having looked >> into the details, I’d assume that we might not need an actual server to >> test many of those codepaths (or that those tests can be moved into a >> qtnetwork repo, ie. QTextStream::stillOpenWhenAtEnd doesn’t seem to test >> QTextStream, which never closes a QIODevice). > > Personally, I'd prefer if those Core tests ddn't use Networking. The majority > of them aren't actually using QtNetwork, they are the Windows portion that > deals with the SMB server provided by the Network Test Server. So the issue > isn't that of QtNetwork, but of the NTS and would remain anyway. That would > leave a few tests like QTextStream that use QTcpSocket for some particular > QIODevice sequential condition, but which could be replaced with an identical > condition with a different class, like QProcess. > > But what's the gain? This looks like a lot of effort to me, particularly if > we > don't move the UNC path tests in the file classes. > > Not looking scientifically at it, but from memory, the network test server > and > the networking tests haven't been the majority of spurious failures in the > CI. > They're a big contributor, but not the majority. From a random sampling of > test failures in the past week, I see: > > Non-test failures: > * general CI failures - "failed to acquire machine" [1] > * sccache network failures [2] > * licensing issues with the INTEGRITY compiler > * timeouts [3] > * weird unexplained failures like [4] or [5] > Test failures: > * flaky tests on timing (QMutex, QDeadlineTimer, etc.) > * QFSModel on macOS on ARM [6] > * a std::filesystem unexplored issue on Windows [7] > * some widget issues like [8] or [9] > > And yes, network test failures in > https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655473411 > > But they are nowhere near the majority, even the plurality. The CI general > failures, sccache failures and timeouts appear to be far more common and > deserve more attention. > > Even among pure test failures the network ones don't appear to be the largest > contributor. So I have to ask: is the effort worth the benefit? It’s not going to be a silver-bullet, and I agree that there are other sources of flakiness that are likely larger contributors to failing integrations. Anecdotally, it seems that every single patch that I was involved in during the last couple of weeks was blocked in some branch by either tst_qtcpsocket or tst_qnetworkreply failure (or both [1]). Perhaps it’s likely that things are related - if the network is unstable enough, then we might either get an sccache failure, or a Qt Network test failure. [1] https://codereview.qt-project.org/c/qt/qtbase/+/417978 However, and again anecdotally, qtbase seems to suffer from the dependency on a stable network a lot more than other repositories. Maybe because it’s large enough for any glitch to likely hit either a build or a test run. In which case making qtbase smaller, and esp taking away those tests that might take a lot of time. In which case making qtbase smaller might improve things as well. So, I think that it might be worth it. Cleaning up the QtCore tests that don’t just test file I/O with UNC paths (*) seems like an almost trivial refactoring - move the relevant tests to a test case in qtnetwork. The git work is also mostly a mechanical exercise, repeating what we did with Qt Positioning. We might not have to solve any hard engineering problem to take at least a little step forward. Whereas making sccache fault tolerant, or making the CI system run test VMs with certain performance guarantees, or generally writing reliable tests that nevertheless depend on non-deterministic subsystems, seem like much harder engineering problems (that are still worth trying). Volker (*) As for the UNC stuff - it seems that we are testing only string parsing code. We are not taking care of any of the actual network traffic or S
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
On Sunday, 26 June 2022 06:38:30 PDT Volker Hilsheimer wrote: > (*) As for the UNC stuff - it seems that we are testing only string parsing > code. We are not taking care of any of the actual network traffic or SMB > protocol. So do we need to access a share from a remote server at all? > Would it be an option to create and share a folder on the Windows VMs > running those tests during provisioning, and then use '\\$(COMPUTERNAME)’? > That works for me on a local VM at least, all QFile tests pass (and we > could probably even enable tst_QFile::largeUncFileSupport and simplify > tst_QFile::writeLargeDataBlock_data) after running this as admin: This is a very good idea. -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
> On 26. Jun 2022, at 15:38, Volker Hilsheimer wrote: > > Whereas making sccache fault tolerant I proposed a simple approach to get rid of sccache failures here https://bugreports.qt.io/browse/COIN-740?focusedCommentId=651260&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-651260 ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
Hello! While I don't necessarily object to splitting network out on a fundamental level, I also find it to be questionable given the effort. If it can be done with little effort, including cherry-picking fixes backwards, even to 5.15, then it might be something to be investigated. Off the top of my head, I can only think of the fact that network and IO tend to be more interconnected compared to various other parts of Qt, so historically it may have been useful to keep those together. But it's likely less important these days. As for what is flaky and what is not, I'm not sure it's useful to compare. Of course, I always notice when something I haven't remotely touched causes my integrations to fail, and I expect it's the same for others :) Mårten PS. [7] from Thiago's message was not a std::filesystem issue, it was tst_QThread crashing on exit (probably a real failure from the integration). I have seen the filesystem issue before but it only ever happens for the first run, and never fails twice. Possibly granting it title of "flakiest test" in the database :) But since it only happens once or not at all I've been unable to debug it. > -Original Message- > From: Development On Behalf Of > Volker Hilsheimer > Sent: søndag 26. juni 2022 15:39 > To: Macieira, Thiago > Cc: development@qt-project.org > Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase > network failures) > > > > > On 25 Jun 2022, at 18:33, Thiago Macieira > wrote: > > > > On Saturday, 25 June 2022 01:57:24 PDT Volker Hilsheimer wrote: > >> Perhaps this is a good time to discuss whether we should move Qt > >> Network into its own repository. This would make qtbase integrations > >> less exposed to network failure, which - even without certificates > >> expiring - are a fact of life. And qtbase integrations already suffer from > plenty of flakiness. > >> And that an operational issue might require patches to merge and to > >> get cherry picked, which might take several attempts, each taking > >> several hours, just amplifies that problem further. > > > >> Conceptually, we have made that kind of change before (when taking Qt > >> Positioning out of the qtlocation repo). But there are some challenges. > > > >> One challenge is that several of our Qt Core tests are using > >> networking features (tests outside of tests/auto/network that include > >> network-settings.h: tst_qdir, tst_qdiriterator, tst_qfile, > >> tst_qfileinfo, tst_qiodevice, tst_qtextstream, tst_qfiledialog2). > >> Without having looked into the details, I’d assume that we might not > >> need an actual server to test many of those codepaths (or that those > >> tests can be moved into a qtnetwork repo, ie. > >> QTextStream::stillOpenWhenAtEnd doesn’t seem to test QTextStream, > which never closes a QIODevice). > > > > Personally, I'd prefer if those Core tests ddn't use Networking. The > > majority of them aren't actually using QtNetwork, they are the Windows > > portion that deals with the SMB server provided by the Network Test > > Server. So the issue isn't that of QtNetwork, but of the NTS and would > > remain anyway. That would leave a few tests like QTextStream that use > > QTcpSocket for some particular QIODevice sequential condition, but > > which could be replaced with an identical condition with a different class, > like QProcess. > > > > But what's the gain? This looks like a lot of effort to me, > > particularly if we don't move the UNC path tests in the file classes. > > > > Not looking scientifically at it, but from memory, the network test > > server and the networking tests haven't been the majority of spurious > failures in the CI. > > They're a big contributor, but not the majority. From a random > > sampling of test failures in the past week, I see: > > > > Non-test failures: > > * general CI failures - "failed to acquire machine" [1] > > * sccache network failures [2] > > * licensing issues with the INTEGRITY compiler > > * timeouts [3] > > * weird unexplained failures like [4] or [5] Test failures: > > * flaky tests on timing (QMutex, QDeadlineTimer, etc.) > > * QFSModel on macOS on ARM [6] > > * a std::filesystem unexplored issue on Windows [7] > > * some widget issues like [8] or [9] > > > > And yes, network test failures in > > https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1655473411 > > > > But they are nowhere near the majority, even the plurality. The CI > > general failures, scca
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
> On 26 Jun 2022, at 17:35, Thiago Macieira wrote: > > On Sunday, 26 June 2022 06:38:30 PDT Volker Hilsheimer wrote: >> (*) As for the UNC stuff - it seems that we are testing only string parsing >> code. We are not taking care of any of the actual network traffic or SMB >> protocol. So do we need to access a share from a remote server at all? >> Would it be an option to create and share a folder on the Windows VMs >> running those tests during provisioning, and then use '\\$(COMPUTERNAME)’? >> That works for me on a local VM at least, all QFile tests pass (and we >> could probably even enable tst_QFile::largeUncFileSupport and simplify >> tst_QFile::writeLargeDataBlock_data) after running this as admin: > > This is a very good idea. This now implemented in: https://codereview.qt-project.org/c/qt/qt5/+/418785 (provisioning script) https://codereview.qt-project.org/c/qt/qtbase/+/418799 (use UNC paths to local shares in tests) Works nicely on a local minicoin Windows VM (except tst_qdiriterator, for the unrelated reason that the test tries to create a directory structure in the source tree of the test). Volker ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
> If it can be done with little effort, including cherry-picking fixes > backwards, > even to 5.15, then it might be something to be investigated. Well, at least with qt positioning we need to handle all the cherry-picks to 5.15 manually, because the cherry-pick bot obviously does not detect that those need to go to qtlocation.git repo. So, if we split network now, we will need to care about manually picking to 6.4, 6.3, 6.2 and 5.15. Best regards, Ivan From: Development on behalf of Mårten Nordheim Sent: Monday, June 27, 2022 12:12 PM To: Volker Hilsheimer ; Macieira, Thiago Cc: development@qt-project.org Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures) Hello! While I don't necessarily object to splitting network out on a fundamental level, I also find it to be questionable given the effort. If it can be done with little effort, including cherry-picking fixes backwards, even to 5.15, then it might be something to be investigated. Off the top of my head, I can only think of the fact that network and IO tend to be more interconnected compared to various other parts of Qt, so historically it may have been useful to keep those together. But it's likely less important these days. As for what is flaky and what is not, I'm not sure it's useful to compare. Of course, I always notice when something I haven't remotely touched causes my integrations to fail, and I expect it's the same for others :) Mårten PS. [7] from Thiago's message was not a std::filesystem issue, it was tst_QThread crashing on exit (probably a real failure from the integration). I have seen the filesystem issue before but it only ever happens for the first run, and never fails twice. Possibly granting it title of "flakiest test" in the database :) But since it only happens once or not at all I've been unable to debug it. > -Original Message- > From: Development On Behalf Of > Volker Hilsheimer > Sent: søndag 26. juni 2022 15:39 > To: Macieira, Thiago > Cc: development@qt-project.org > Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase > network failures) > > > > > On 25 Jun 2022, at 18:33, Thiago Macieira > wrote: > > > > On Saturday, 25 June 2022 01:57:24 PDT Volker Hilsheimer wrote: > >> Perhaps this is a good time to discuss whether we should move Qt > >> Network into its own repository. This would make qtbase integrations > >> less exposed to network failure, which - even without certificates > >> expiring - are a fact of life. And qtbase integrations already suffer from > plenty of flakiness. > >> And that an operational issue might require patches to merge and to > >> get cherry picked, which might take several attempts, each taking > >> several hours, just amplifies that problem further. > > > >> Conceptually, we have made that kind of change before (when taking Qt > >> Positioning out of the qtlocation repo). But there are some challenges. > > > >> One challenge is that several of our Qt Core tests are using > >> networking features (tests outside of tests/auto/network that include > >> network-settings.h: tst_qdir, tst_qdiriterator, tst_qfile, > >> tst_qfileinfo, tst_qiodevice, tst_qtextstream, tst_qfiledialog2). > >> Without having looked into the details, I’d assume that we might not > >> need an actual server to test many of those codepaths (or that those > >> tests can be moved into a qtnetwork repo, ie. > >> QTextStream::stillOpenWhenAtEnd doesn’t seem to test QTextStream, > which never closes a QIODevice). > > > > Personally, I'd prefer if those Core tests ddn't use Networking. The > > majority of them aren't actually using QtNetwork, they are the Windows > > portion that deals with the SMB server provided by the Network Test > > Server. So the issue isn't that of QtNetwork, but of the NTS and would > > remain anyway. That would leave a few tests like QTextStream that use > > QTcpSocket for some particular QIODevice sequential condition, but > > which could be replaced with an identical condition with a different class, > like QProcess. > > > > But what's the gain? This looks like a lot of effort to me, > > particularly if we don't move the UNC path tests in the file classes. > > > > Not looking scientifically at it, but from memory, the network test > > server and the networking tests haven't been the majority of spurious > failures in the CI. > > They're a big contributor, but not the majority. From a random > > sampling of test failures in the past week, I see: > > &g
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
On Mon, Jun 27, 2022 at 11:40:15AM +, Ivan Solovev wrote: So, if we split network now, we will need to care about manually picking to 6.4, 6.3, 6.2 and 5.15. or just make the bot recognize a syntax like `qtbase(6.4 6.3 6.2 5.15)`? fwiw, qtrepotools//bin/git-qt-cherry-pick should be probably updated and put to use. in fact, it might even make the use of a magic syntax in the pick-bot superfluous. ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
> Well, at least with qt positioning we need to handle all the cherry-picks to > 5.15 manually Which is probably fine for positioning, I don't expect there to be too many patches getting picked back. But for a given period I would say it's not unusual for half of the patches going to Network to be cherry-picked. Mårten -- From: Ivan Solovev Sent: mandag 27. juni 2022 13:40 To: Mårten Nordheim ; Volker Hilsheimer ; Macieira, Thiago Cc: development@qt-project.org Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures) > If it can be done with little effort, including cherry-picking fixes > backwards, > even to 5.15, then it might be something to be investigated. Well, at least with qt positioning we need to handle all the cherry-picks to 5.15 manually, because the cherry-pick bot obviously does not detect that those need to go to qtlocation.git repo. So, if we split network now, we will need to care about manually picking to 6.4, 6.3, 6.2 and 5.15. Best regards, Ivan From: Development <mailto:development-boun...@qt-project.org> on behalf of Mårten Nordheim <mailto:marten.nordh...@qt.io> Sent: Monday, June 27, 2022 12:12 PM To: Volker Hilsheimer <mailto:volker.hilshei...@qt.io>; Macieira, Thiago <mailto:thiago.macie...@intel.com> Cc: mailto:development@qt-project.org <mailto:development@qt-project.org> Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures) Hello! While I don't necessarily object to splitting network out on a fundamental level, I also find it to be questionable given the effort. If it can be done with little effort, including cherry-picking fixes backwards, even to 5.15, then it might be something to be investigated. Off the top of my head, I can only think of the fact that network and IO tend to be more interconnected compared to various other parts of Qt, so historically it may have been useful to keep those together. But it's likely less important these days. As for what is flaky and what is not, I'm not sure it's useful to compare. Of course, I always notice when something I haven't remotely touched causes my integrations to fail, and I expect it's the same for others :) Mårten PS. [7] from Thiago's message was not a std::filesystem issue, it was tst_QThread crashing on exit (probably a real failure from the integration). I have seen the filesystem issue before but it only ever happens for the first run, and never fails twice. Possibly granting it title of "flakiest test" in the database :) But since it only happens once or not at all I've been unable to debug it. > -Original Message- > From: Development <mailto:development-boun...@qt-project.org> On Behalf Of > Volker Hilsheimer > Sent: søndag 26. juni 2022 15:39 > To: Macieira, Thiago <mailto:thiago.macie...@intel.com> > Cc: mailto:development@qt-project.org > Subject: Re: [Development] Splitting Qt Network out of qtbase (was: QtBase > network failures) > > > > > On 25 Jun 2022, at 18:33, Thiago Macieira <mailto:thiago.macie...@intel.com> > wrote: > > > > On Saturday, 25 June 2022 01:57:24 PDT Volker Hilsheimer wrote: > >> Perhaps this is a good time to discuss whether we should move Qt > >> Network into its own repository. This would make qtbase integrations > >> less exposed to network failure, which - even without certificates > >> expiring - are a fact of life. And qtbase integrations already suffer from > plenty of flakiness. > >> And that an operational issue might require patches to merge and to > >> get cherry picked, which might take several attempts, each taking > >> several hours, just amplifies that problem further. > > > >> Conceptually, we have made that kind of change before (when taking Qt > >> Positioning out of the qtlocation repo). But there are some challenges. > > > >> One challenge is that several of our Qt Core tests are using > >> networking features (tests outside of tests/auto/network that include > >> network-settings.h: tst_qdir, tst_qdiriterator, tst_qfile, > >> tst_qfileinfo, tst_qiodevice, tst_qtextstream, tst_qfiledialog2). > >> Without having looked into the details, I'd assume that we might not > >> need an actual server to test many of those codepaths (or that those > >> tests can be moved into a qtnetwork repo, ie. > >> QTextStream::stillOpenWhenAtEnd doesn't seem to test QTextStream, > which never closes a QIODevice). > > > > Personally, I'd prefer if those Core tests ddn't use Networking. The > > majority of them aren't actually using QtNetwork,
Re: [Development] Splitting Qt Network out of qtbase (was: QtBase network failures)
On Saturday, 25 June 2022 09:33:42 PDT Thiago Macieira wrote: > * licensing issues with the INTEGRITY compiler And now I have a URL for this one: https://testresults.qt.io/coin/integration/qt/qtbase/tasks/1656777463 -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering ___ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development