Bug#908678: Update on the security-tracker git discussion
Hi, On Tue, 6 Aug 2019 08:28:43 +0200 Salvatore Bonaccorso wrote: > Thanks for keeping track and following up. > > On Tue, Aug 06, 2019 at 08:05:11AM +0200, Bastian Blank wrote: > > Moin > > > > On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote: > > > On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote: > > > > p.s.: Question is if we should do a split as well for the other types of > > > > files which are supported (DSA, TDSA, ...) while at it. > > > We can axe out DTSA/* while we're at it. > > > For DSA/list (and DLA/list) we can initially keep it as a single file, it > > > can > > > still be split later on if necessary. > > > > Following up to > > > > | Please provide a plan how and when to fix this before 2019-06-30. > > > > We have now one month later. Please provide the plan. > > The items in > https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1 > needs further detailed and then sorted/prioritized. Later actual > implementation work on making the split possible on tracker and other > tooling side needs to happen. We cannot depend on a non-functional > instance for the day to day work, so all of the above basically will > need to be ported in some sensible way. > > Progress is slow due to other time limitations in day to day tasks. > > Still if it is going to be too much burden for salsa admin and needs > to be fast, then I only see that we temporarily switch away from salsa > to gitlab or another hosting (github will not work) and then move back > once the split has finally happened. It seems a bit difficult to make a big switch, probably because it's not easy to know and test all the various involved scripts. Considering a more progressive approach, is there something preventing us from switching to the rewritten repository and split/merging the file, something like: diff --git a/conf/post-merge b/conf/post-merge new file mode 100755 index 00..a9991c1cc9 --- /dev/null +++ b/conf/post-merge @@ -0,0 +1,3 @@ +#!/bin/sh +echo "post-merge" +[ -f data/CVE/1999.list ] && cat data/CVE/*.list > data/CVE/list diff --git a/conf/pre-commit b/conf/pre-commit index 767e478e36..12e781e97d 100755 --- a/conf/pre-commit +++ b/conf/pre-commit @@ -5,3 +5,4 @@ set -e exec 1>&2 make check-syntax +bin/split-by-year.py ? Cheers! Sylvain
Bug#908678: Update on the security-tracker git discussion
Hi Bastian, Thanks for keeping track and following up. On Tue, Aug 06, 2019 at 08:05:11AM +0200, Bastian Blank wrote: > Moin > > On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote: > > On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote: > > > p.s.: Question is if we should do a split as well for the other types of > > > files which are supported (DSA, TDSA, ...) while at it. > > We can axe out DTSA/* while we're at it. > > For DSA/list (and DLA/list) we can initially keep it as a single file, it > > can > > still be split later on if necessary. > > Following up to > > | Please provide a plan how and when to fix this before 2019-06-30. > > We have now one month later. Please provide the plan. The items in https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1 needs further detailed and then sorted/prioritized. Later actual implementation work on making the split possible on tracker and other tooling side needs to happen. We cannot depend on a non-functional instance for the day to day work, so all of the above basically will need to be ported in some sensible way. Progress is slow due to other time limitations in day to day tasks. Still if it is going to be too much burden for salsa admin and needs to be fast, then I only see that we temporarily switch away from salsa to gitlab or another hosting (github will not work) and then move back once the split has finally happened. Regards, Salvatore
Bug#908678: Update on the security-tracker git discussion
Moin On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote: > On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote: > > p.s.: Question is if we should do a split as well for the other types of > > files which are supported (DSA, TDSA, ...) while at it. > We can axe out DTSA/* while we're at it. > For DSA/list (and DLA/list) we can initially keep it as a single file, it can > still be split later on if necessary. Following up to | Please provide a plan how and when to fix this before 2019-06-30. We have now one month later. Please provide the plan. Bastian -- We do not colonize. We conquer. We rule. There is no other way for us. -- Rojan, "By Any Other Name", stardate 4657.5
Bug#908678: Update on the security-tracker git discussion
On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote: > p.s.: Question is if we should do a split as well for the other types of > files which are supported (DSA, TDSA, ...) while at it. We can axe out DTSA/* while we're at it. For DSA/list (and DLA/list) we can initially keep it as a single file, it can still be split later on if necessary. Cheers, Moritz
Bug#908678: Update on the security-tracker git discussion
Hi, On Mon, Jun 24, 2019 at 01:57:36PM +0200, Salvatore Bonaccorso wrote: > Hi, > > On Sun, Jun 09, 2019 at 01:48:58PM +0200, Salvatore Bonaccorso wrote: > > On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote: > > > Notes on possible CVE/list splits > > > - > > [...] > > > > After a face-to-face conversation with Daniel, Daniel suggested to > > create a priority list out of that, we will followup with that to that > > (ideally as gitlab task-list) here with a link once we have made our > > minds on it. > > The plan was initially to do that in that week. Due to some other > issues (Debian related, and other) this was not possible. The plan > still holds to prioritize these tasks so that people wanting to help > contribute have something to tackle. So I'm starting to track those here be better/more easily track work on those: https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1 (but they need to reshuffle an consolidate the items). Basically before the switch the two major topics (the security-tracker code base itself) and tools involved in the workflow for triaging/updating CVEs need to be adapted to a split repo situation, which makes many of the items go into the first group anyway, but not all. So slow still work in progress. On personal note, it would be nice to have some dedicated time for this only, but ... Regards, Salvatore p.s.: Question is if we should do a split as well for the other types of files which are supported (DSA, TDSA, ...) while at it.
Bug#908678: Update on the security-tracker git discussion
Hi, On Sun, Jun 09, 2019 at 01:48:58PM +0200, Salvatore Bonaccorso wrote: > On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote: > > Notes on possible CVE/list splits > > - > [...] > > After a face-to-face conversation with Daniel, Daniel suggested to > create a priority list out of that, we will followup with that to that > (ideally as gitlab task-list) here with a link once we have made our > minds on it. The plan was initially to do that in that week. Due to some other issues (Debian related, and other) this was not possible. The plan still holds to prioritize these tasks so that people wanting to help contribute have something to tackle. Regards, Salvatore
Bug#908678: Update on the security-tracker git discussion
On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote: > Notes on possible CVE/list splits > - [...] After a face-to-face conversation with Daniel, Daniel suggested to create a priority list out of that, we will followup with that to that (ideally as gitlab task-list) here with a link once we have made our minds on it. Regards, Salvatore
Bug#908678: Update on the security-tracker git discussion
Hi Salvatore, On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote: > Hi, > > On Thu, Jun 06, 2019 at 06:11:53PM +0200, Salvatore Bonaccorso wrote: > > Hi Daniel, > > > > On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote: > > > Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso: > > > > Could you again point me to your splitted up variant mirror? > > > > > > https://git.faster-it.de/debian_security_security-tracker_split_files/ > > > > Thanks! > > > > While starting to look at it, could you change the splitting to > > $year.list instead of list.$year? I know this comes from the initial > > script which was commited. It is though more intuitive working with > > $work.something than something.$year in this context. > > Thanks to Daniel for providing the converted repository (with list > named as well the other way around as $year.list, which is more > intuitive, and looks saner (to me)) which get updated regularly, this > helps as a extremly good basis. > > Below are some thoughs which I started thinking of during the last few > days, please not it might not yet be complete. Please as well try to > not push/force us too much -- whilst we understand the issue, and see > that something whatever the solution is (split, move somewhere else) > -- we have regularly more serious issues popping up we want and need > to look at those. But we acknowledge and see als well salsa admin > point of view. > > That said, here is what I have at the moment, some are easy, some > will/might be more involving. > > Notes on possible CVE/list splits > - > > - workflows on files itself by most active users. Often kept open > cross-checking issues all issues in one file. But this will "just" > need other ways to deal with the situation by the persons working > most on it. > - Code of security-tracker service and python modules itself which > currently rely on the data/*/list formats (DSA, DLA, CVE, ...) This > could probably be split up and use data/*/*.list > - Externally called but included in code: update script which fetches > MITRE list and integrates all needed changes (see further below). > - bin/bts-update (called from scripts/update-CVE-assignments in cron of > the securiy-tracker-services) operates based on data/CVE/list and > keeps track of the already tagged bugs by comparing with an 'oldlist'. > The oldlist is copied on a run on soriano.debian.org as 'state' file > similar to logroate's statefile (cron). > - bin/check-new-issues: parsing of TODO and checks for the new issues is > as well based on 'data/CVE/list' existence and parsing. After a split > up the interactive commands should still be able to navigate trough > the items. > - bin/check-syntax: Check syntax of the various lists based on the security- > tracker parser for the lists. make check-syntax from the Makefile, pre- > commit hook or C/I tests are all using this script for syntax check. > Depends on CVEfile as well from python/bugs.py. Relevant here is the > check-syntax target from the Makefile. At SVN times this was actually > only testing the syntax of the changed files, but now it just runs > make check-syntax. > - bin/compare-nvd-cve reads from data/CVE/list and this is probably > easier to adapt and it's used basically in a "experimental" target in > Makefile for update-compare-nvd target. AFAICS this is just reading > the information should be easy to adapt to any split up setup. > - bin/gen-{DSA,DLA}: Used the data/CVE/list for sanity check for > presence of the CVE. > - bin/get-todo-items (this script is currently not working correctly and > it's implemented already via the webview, so need to consider if we > actually still need it). > - bin/inject-embedded-code-copies (experimental script, not > actively used) > - bin/rejected-with-info relies on data/CVE/list directly, but will be > potentially easily adaptable in a splited setup. > - bin/setup-repo: checks for data/CVE/list just to make sure it's the > right repo. > - bin/report-vuln uses CVEFile (from python/bugs.py). > - bin/update and bin/updatelist: Parses DSA/DTSA/DLA list and > data/CVE/list adding new entries from MITRE feed and crossreferences > for the DSA/DLA's to a new data/CVE/list which then in the cronjob on > soriano will be committed. That is one processing those files in a > splitted setup this will need continue to work. > - bin/update-db (Used triggered by Makefile target to update security.db > sqlite database). > - bin/update-nvd (possibly dependency on the CVE lists via the used > modules but not directly). > - data/config.json contains the sources for CVE, DSA, DLA and extended > lists. Currently path thus will be a path component starting from > data, e.g. for CVE files path is '/CVE/list'. See as well "Setting up > an extended instance" in the documentation. > - lib/python/bugs.py contains the classes CVEFile, DSAFile, > CVEExte
Bug#908678: Update on the security-tracker git discussion
Hi, On Thu, Jun 06, 2019 at 06:11:53PM +0200, Salvatore Bonaccorso wrote: > Hi Daniel, > > On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote: > > Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso: > > > Could you again point me to your splitted up variant mirror? > > > > https://git.faster-it.de/debian_security_security-tracker_split_files/ > > Thanks! > > While starting to look at it, could you change the splitting to > $year.list instead of list.$year? I know this comes from the initial > script which was commited. It is though more intuitive working with > $work.something than something.$year in this context. Thanks to Daniel for providing the converted repository (with list named as well the other way around as $year.list, which is more intuitive, and looks saner (to me)) which get updated regularly, this helps as a extremly good basis. Below are some thoughs which I started thinking of during the last few days, please not it might not yet be complete. Please as well try to not push/force us too much -- whilst we understand the issue, and see that something whatever the solution is (split, move somewhere else) -- we have regularly more serious issues popping up we want and need to look at those. But we acknowledge and see als well salsa admin point of view. That said, here is what I have at the moment, some are easy, some will/might be more involving. Notes on possible CVE/list splits - - workflows on files itself by most active users. Often kept open cross-checking issues all issues in one file. But this will "just" need other ways to deal with the situation by the persons working most on it. - Code of security-tracker service and python modules itself which currently rely on the data/*/list formats (DSA, DLA, CVE, ...) This could probably be split up and use data/*/*.list - Externally called but included in code: update script which fetches MITRE list and integrates all needed changes (see further below). - bin/bts-update (called from scripts/update-CVE-assignments in cron of the securiy-tracker-services) operates based on data/CVE/list and keeps track of the already tagged bugs by comparing with an 'oldlist'. The oldlist is copied on a run on soriano.debian.org as 'state' file similar to logroate's statefile (cron). - bin/check-new-issues: parsing of TODO and checks for the new issues is as well based on 'data/CVE/list' existence and parsing. After a split up the interactive commands should still be able to navigate trough the items. - bin/check-syntax: Check syntax of the various lists based on the security- tracker parser for the lists. make check-syntax from the Makefile, pre- commit hook or C/I tests are all using this script for syntax check. Depends on CVEfile as well from python/bugs.py. Relevant here is the check-syntax target from the Makefile. At SVN times this was actually only testing the syntax of the changed files, but now it just runs make check-syntax. - bin/compare-nvd-cve reads from data/CVE/list and this is probably easier to adapt and it's used basically in a "experimental" target in Makefile for update-compare-nvd target. AFAICS this is just reading the information should be easy to adapt to any split up setup. - bin/gen-{DSA,DLA}: Used the data/CVE/list for sanity check for presence of the CVE. - bin/get-todo-items (this script is currently not working correctly and it's implemented already via the webview, so need to consider if we actually still need it). - bin/inject-embedded-code-copies (experimental script, not actively used) - bin/rejected-with-info relies on data/CVE/list directly, but will be potentially easily adaptable in a splited setup. - bin/setup-repo: checks for data/CVE/list just to make sure it's the right repo. - bin/report-vuln uses CVEFile (from python/bugs.py). - bin/update and bin/updatelist: Parses DSA/DTSA/DLA list and data/CVE/list adding new entries from MITRE feed and crossreferences for the DSA/DLA's to a new data/CVE/list which then in the cronjob on soriano will be committed. That is one processing those files in a splitted setup this will need continue to work. - bin/update-db (Used triggered by Makefile target to update security.db sqlite database). - bin/update-nvd (possibly dependency on the CVE lists via the used modules but not directly). - data/config.json contains the sources for CVE, DSA, DLA and extended lists. Currently path thus will be a path component starting from data, e.g. for CVE files path is '/CVE/list'. See as well "Setting up an extended instance" in the documentation. - lib/python/bugs.py contains the classes CVEFile, DSAFile, CVEExtendFile. - lib/python/debian_support.py: defines the getconfig function reading data/config.json. - lib/python/security_db.py, via getSources get the configuration from where to read CVE, DSA, DLA, Extends information defined in config.json. Regards, Salvatore
Bug#908678: Update on the security-tracker git discussion
Hi Daniel, On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote: > Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso: > > Could you again point me to your splitted up variant mirror? > > https://git.faster-it.de/debian_security_security-tracker_split_files/ Thanks! While starting to look at it, could you change the splitting to $year.list instead of list.$year? I know this comes from the initial script which was commited. It is though more intuitive working with $work.something than something.$year in this context. Thanks already! Regards, Salvatore
Bug#908678: Update on the security-tracker git discussion
Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso: Could you again point me to your splitted up variant mirror? https://git.faster-it.de/debian_security_security-tracker_split_files/
Bug#908678: Update on the security-tracker git discussion
Hi Daniel, On Thu, Jan 24, 2019 at 12:23:31PM +0100, Daniel Lange wrote: > Zobel brought up the security-tracker git discussion in the #debian-security > irc channel again and I'd like to record a few of the items touched there > for others that were not present: > > DLange has a running mirror of the git repo with split files since three > months. This is based on anarcat's scripts published previously in this bug. > The rewriting mirror repo works flawlessly. All history is retained sans gpg > commit signatures. > > Corsac noted that "redoing the tooling is a pain" and anarcat and DLange > iterated we are willing to help fix the tools. But we need a commitment from > the security-team that the migration to a split file repo is wanted. And we > need a prioritized list of tools that need to be split-files enabled. > > The discussion iterated that "moving elsewhere" doesn't really fix the > underlying git-usage issue. So while this would take load off salsa, it will > not improve clone times and hamper collaboration with Debian people outside > the security team. > > Still - to gain some data - DLange tried to push the security-tracker repo > to github. This bails out as the history contains a file > 100MB (hard limit > for Github): > > remote: error: GH001: Large files detected. You may want to try Git Large > File Storage - https://git-lfs.github.com. > [..] > remote: error: File data/CVE/allitems.html is 111.44 MB; this exceeds > GitHub's file size limit of 100.00 MB > > So we would have to re-write history for pushing to GitHub. Commits from > 2017-12-29 that introduce "data/CVE/allitems.html" and drop it again would > need to be modified. Technically all commits after these have to be > re-written as well. I have not tested whether Github supports refs/replace > substitutes which would be a work-around. > > As noticeable on Salsa and per > https://gitlab.com/gitlab-com/support-forum/issues/230 Gitlab does not > enforce per-file size limits. > But the pain of hosting and using this repo is not really different for any > Gitlab instance. > > So that means self-hosting of a non-split-file repo would probably have to > be on a security DSA machine or similar. > > Again, as said above, discussion participants outside the security team > would prefer a commitment to split the offending data/CVE/list file into > annual chunks, enable the tooling and stay on salsa. I was planning to take so time in the next days to to re-evaluate your findings. As this was missing in previous reply thanks Daniel for your time so far for the above summarization. Thanks as well for your effort in finding a solution which involves retaining the history. Could you again point me to your splitted up variant mirror? Regards, Salvatore
Bug#908678: Update on the security-tracker git discussion
Zobel brought up the security-tracker git discussion in the #debian-security irc channel again and I'd like to record a few of the items touched there for others that were not present: DLange has a running mirror of the git repo with split files since three months. This is based on anarcat's scripts published previously in this bug. The rewriting mirror repo works flawlessly. All history is retained sans gpg commit signatures. Corsac noted that "redoing the tooling is a pain" and anarcat and DLange iterated we are willing to help fix the tools. But we need a commitment from the security-team that the migration to a split file repo is wanted. And we need a prioritized list of tools that need to be split-files enabled. The discussion iterated that "moving elsewhere" doesn't really fix the underlying git-usage issue. So while this would take load off salsa, it will not improve clone times and hamper collaboration with Debian people outside the security team. Still - to gain some data - DLange tried to push the security-tracker repo to github. This bails out as the history contains a file > 100MB (hard limit for Github): remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com. [..] remote: error: File data/CVE/allitems.html is 111.44 MB; this exceeds GitHub's file size limit of 100.00 MB So we would have to re-write history for pushing to GitHub. Commits from 2017-12-29 that introduce "data/CVE/allitems.html" and drop it again would need to be modified. Technically all commits after these have to be re-written as well. I have not tested whether Github supports refs/replace substitutes which would be a work-around. As noticeable on Salsa and per https://gitlab.com/gitlab-com/support-forum/issues/230 Gitlab does not enforce per-file size limits. But the pain of hosting and using this repo is not really different for any Gitlab instance. So that means self-hosting of a non-split-file repo would probably have to be on a security DSA machine or similar. Again, as said above, discussion participants outside the security team would prefer a commitment to split the offending data/CVE/list file into annual chunks, enable the tooling and stay on salsa.