Bug#908678: Update on the security-tracker git discussion

2020-10-02 Thread Sylvain Beucler
Hi,

On Tue, 6 Aug 2019 08:28:43 +0200 Salvatore Bonaccorso wrote:
> Thanks for keeping track and following up.
> 
> On Tue, Aug 06, 2019 at 08:05:11AM +0200, Bastian Blank wrote:
> > Moin
> > 
> > On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote:
> > > On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> > > > p.s.: Question is if we should do a split as well for the other types of
> > > >   files which are supported (DSA, TDSA, ...) while at it.
> > > We can axe out DTSA/* while we're at it.
> > > For DSA/list (and DLA/list) we can initially keep it as a single file, it 
> > > can
> > > still be split later on if necessary.
> > 
> > Following up to 
> > 
> > | Please provide a plan how and when to fix this before 2019-06-30.
> > 
> > We have now one month later.  Please provide the plan.
> 
> The items in
> https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1
> needs further detailed and then sorted/prioritized. Later actual
> implementation work on making the split possible on tracker and other
> tooling side needs to happen. We cannot depend on a non-functional
> instance for the day to day work, so all of the above basically will
> need to be ported in some sensible way.
> 
> Progress is slow due to other time limitations in day to day tasks.
> 
> Still if it is going to be too much burden for salsa admin and needs
> to be fast, then I only see that we temporarily switch away from salsa
> to gitlab or another hosting (github will not work) and then move back
> once the split has finally happened.

It seems a bit difficult to make a big switch, probably because it's not
easy to know and test all the various involved scripts.

Considering a more progressive approach, is there something preventing
us from switching to the rewritten repository and split/merging the
file, something like:

diff --git a/conf/post-merge b/conf/post-merge
new file mode 100755
index 00..a9991c1cc9
--- /dev/null
+++ b/conf/post-merge
@@ -0,0 +1,3 @@
+#!/bin/sh
+echo "post-merge"
+[ -f data/CVE/1999.list ] && cat data/CVE/*.list > data/CVE/list
diff --git a/conf/pre-commit b/conf/pre-commit
index 767e478e36..12e781e97d 100755
--- a/conf/pre-commit
+++ b/conf/pre-commit
@@ -5,3 +5,4 @@ set -e
 exec 1>&2

 make check-syntax
+bin/split-by-year.py

?

Cheers!
Sylvain



Bug#908678: Update on the security-tracker git discussion

2019-08-06 Thread Salvatore Bonaccorso
Hi Bastian,

Thanks for keeping track and following up.

On Tue, Aug 06, 2019 at 08:05:11AM +0200, Bastian Blank wrote:
> Moin
> 
> On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote:
> > On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> > > p.s.: Question is if we should do a split as well for the other types of
> > >   files which are supported (DSA, TDSA, ...) while at it.
> > We can axe out DTSA/* while we're at it.
> > For DSA/list (and DLA/list) we can initially keep it as a single file, it 
> > can
> > still be split later on if necessary.
> 
> Following up to 
> 
> | Please provide a plan how and when to fix this before 2019-06-30.
> 
> We have now one month later.  Please provide the plan.

The items in
https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1
needs further detailed and then sorted/prioritized. Later actual
implementation work on making the split possible on tracker and other
tooling side needs to happen. We cannot depend on a non-functional
instance for the day to day work, so all of the above basically will
need to be ported in some sensible way.

Progress is slow due to other time limitations in day to day tasks.

Still if it is going to be too much burden for salsa admin and needs
to be fast, then I only see that we temporarily switch away from salsa
to gitlab or another hosting (github will not work) and then move back
once the split has finally happened.

Regards,
Salvatore



Bug#908678: Update on the security-tracker git discussion

2019-08-06 Thread Bastian Blank
Moin

On Tue, Jul 02, 2019 at 01:38:10PM +0200, Moritz Muehlenhoff wrote:
> On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> > p.s.: Question is if we should do a split as well for the other types of
> >   files which are supported (DSA, TDSA, ...) while at it.
> We can axe out DTSA/* while we're at it.
> For DSA/list (and DLA/list) we can initially keep it as a single file, it can
> still be split later on if necessary.

Following up to 

| Please provide a plan how and when to fix this before 2019-06-30.

We have now one month later.  Please provide the plan.

Bastian

-- 
We do not colonize.  We conquer.  We rule.  There is no other way for us.
-- Rojan, "By Any Other Name", stardate 4657.5



Bug#908678: Update on the security-tracker git discussion

2019-07-02 Thread Moritz Muehlenhoff
On Tue, Jul 02, 2019 at 01:25:43PM +0200, Salvatore Bonaccorso wrote:
> p.s.: Question is if we should do a split as well for the other types of
>   files which are supported (DSA, TDSA, ...) while at it.

We can axe out DTSA/* while we're at it.

For DSA/list (and DLA/list) we can initially keep it as a single file, it can
still be split later on if necessary.

Cheers,
Moritz



Bug#908678: Update on the security-tracker git discussion

2019-07-02 Thread Salvatore Bonaccorso
Hi,

On Mon, Jun 24, 2019 at 01:57:36PM +0200, Salvatore Bonaccorso wrote:
> Hi,
> 
> On Sun, Jun 09, 2019 at 01:48:58PM +0200, Salvatore Bonaccorso wrote:
> > On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> > > Notes on possible CVE/list splits
> > > -
> > [...]
> > 
> > After a face-to-face conversation with Daniel, Daniel suggested to
> > create a priority list out of that, we will followup with that to that
> > (ideally as gitlab task-list) here with a link once we have made our
> > minds on it.
> 
> The plan was initially to do that in that week. Due to some other
> issues (Debian related, and other) this was not possible. The plan
> still holds to prioritize these tasks so that people wanting to help
> contribute have something to tackle.

So I'm starting to track those here be better/more easily track work
on those:
https://salsa.debian.org/security-tracker-team/security-tracker-service/issues/1
(but they need to reshuffle an consolidate the items). Basically
before the switch the two major topics (the security-tracker code base
itself) and tools involved in the workflow for triaging/updating CVEs
need to be adapted to a split repo situation, which makes many of the
items go into the first group anyway, but not all.

So slow still work in progress.

On personal note, it would be nice to have some dedicated time for
this only, but ...

Regards,
Salvatore

p.s.: Question is if we should do a split as well for the other types of
  files which are supported (DSA, TDSA, ...) while at it.



Bug#908678: Update on the security-tracker git discussion

2019-06-24 Thread Salvatore Bonaccorso
Hi,

On Sun, Jun 09, 2019 at 01:48:58PM +0200, Salvatore Bonaccorso wrote:
> On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> > Notes on possible CVE/list splits
> > -
> [...]
> 
> After a face-to-face conversation with Daniel, Daniel suggested to
> create a priority list out of that, we will followup with that to that
> (ideally as gitlab task-list) here with a link once we have made our
> minds on it.

The plan was initially to do that in that week. Due to some other
issues (Debian related, and other) this was not possible. The plan
still holds to prioritize these tasks so that people wanting to help
contribute have something to tackle.

Regards,
Salvatore



Bug#908678: Update on the security-tracker git discussion

2019-06-09 Thread Salvatore Bonaccorso
On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> Notes on possible CVE/list splits
> -
[...]

After a face-to-face conversation with Daniel, Daniel suggested to
create a priority list out of that, we will followup with that to that
(ideally as gitlab task-list) here with a link once we have made our
minds on it.

Regards,
Salvatore



Bug#908678: Update on the security-tracker git discussion

2019-06-09 Thread Guido Günther
Hi Salvatore,
On Sat, Jun 08, 2019 at 06:29:24PM +0200, Salvatore Bonaccorso wrote:
> Hi,
> 
> On Thu, Jun 06, 2019 at 06:11:53PM +0200, Salvatore Bonaccorso wrote:
> > Hi Daniel,
> > 
> > On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote:
> > > Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:
> > > > Could you again point me to your splitted up variant mirror?
> > > 
> > > https://git.faster-it.de/debian_security_security-tracker_split_files/
> > 
> > Thanks!
> > 
> > While starting to look at it, could you change the splitting to
> > $year.list instead of list.$year? I know this comes from the initial
> > script which was commited. It is though more intuitive working with
> > $work.something than something.$year in this context.
> 
> Thanks to Daniel for providing the converted repository (with list
> named as well the other way around as $year.list, which is more
> intuitive, and looks saner (to me)) which get updated regularly, this
> helps as a extremly good basis.
> 
> Below are some thoughs which I started thinking of during the last few
> days, please not it might not yet be complete. Please as well try to
> not push/force us too much -- whilst we understand the issue, and see
> that something whatever the solution is (split, move somewhere else)
> -- we have regularly more serious issues popping up we want and need
> to look at those. But we acknowledge and see als well salsa admin
> point of view.
> 
> That said, here is what I have at the moment, some are easy, some
> will/might be more involving.
> 
> Notes on possible CVE/list splits
> -
> 
> - workflows on files itself by most active users. Often kept open
>   cross-checking issues all issues in one file. But this will "just"
>   need other ways to deal with the situation by the persons working
>   most on it.
> - Code of security-tracker service and python modules itself which
>   currently rely on the data/*/list formats (DSA, DLA, CVE, ...) This
>   could probably be split up and use data/*/*.list
> - Externally called but included in code: update script which fetches
>   MITRE list and integrates all needed changes (see further below).
> - bin/bts-update (called from scripts/update-CVE-assignments in cron of
>   the securiy-tracker-services) operates based on data/CVE/list and
>   keeps track of the already tagged bugs by comparing with an 'oldlist'.
>   The oldlist is copied on a run on soriano.debian.org as 'state' file
>   similar to logroate's statefile (cron).
> - bin/check-new-issues: parsing of TODO and checks for the new issues is
>   as well based on 'data/CVE/list' existence and parsing. After a split
>   up the interactive commands should still be able to navigate trough
>   the items.
> - bin/check-syntax: Check syntax of the various lists based on the security-
>   tracker parser for the lists. make check-syntax from the Makefile, pre-
>   commit hook or C/I tests are all using this script for syntax check.
>   Depends on CVEfile as well from python/bugs.py. Relevant here is the
>   check-syntax target from the Makefile. At SVN times this was actually
>   only testing the syntax of the changed files, but now it just runs
>   make check-syntax.
> - bin/compare-nvd-cve reads from data/CVE/list and this is probably
>   easier to adapt and it's used basically in a "experimental" target in
>   Makefile for update-compare-nvd target. AFAICS this is just reading
>   the information should be easy to adapt to any split up setup.
> - bin/gen-{DSA,DLA}: Used the data/CVE/list for sanity check for
>   presence of the CVE.
> - bin/get-todo-items (this script is currently not working correctly and
>   it's implemented already via the webview, so need to consider if we
>   actually still need it).
> - bin/inject-embedded-code-copies (experimental script, not
>   actively used)
> - bin/rejected-with-info relies on data/CVE/list directly, but will be
>   potentially easily adaptable in a splited setup.
> - bin/setup-repo: checks for data/CVE/list just to make sure it's the
>   right repo.
> - bin/report-vuln uses CVEFile (from python/bugs.py).
> - bin/update and bin/updatelist: Parses DSA/DTSA/DLA list and
>   data/CVE/list adding new entries from MITRE feed and crossreferences
>   for the DSA/DLA's to a new data/CVE/list which then in the cronjob on
>   soriano will be committed. That is one processing those files in a
>   splitted setup this will need continue to work.
> - bin/update-db (Used triggered by Makefile target to update security.db
>   sqlite database).
> - bin/update-nvd (possibly dependency on the CVE lists via the used
>   modules but not directly).
> - data/config.json contains the sources for CVE, DSA, DLA and extended
>   lists. Currently path thus will be a path component starting from
>   data, e.g. for CVE files path is '/CVE/list'. See as well "Setting up
>   an extended instance" in the documentation.
> - lib/python/bugs.py contains the classes CVEFile, DSAFile,
>   

Bug#908678: Update on the security-tracker git discussion

2019-06-08 Thread Salvatore Bonaccorso
Hi,

On Thu, Jun 06, 2019 at 06:11:53PM +0200, Salvatore Bonaccorso wrote:
> Hi Daniel,
> 
> On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote:
> > Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:
> > > Could you again point me to your splitted up variant mirror?
> > 
> > https://git.faster-it.de/debian_security_security-tracker_split_files/
> 
> Thanks!
> 
> While starting to look at it, could you change the splitting to
> $year.list instead of list.$year? I know this comes from the initial
> script which was commited. It is though more intuitive working with
> $work.something than something.$year in this context.

Thanks to Daniel for providing the converted repository (with list
named as well the other way around as $year.list, which is more
intuitive, and looks saner (to me)) which get updated regularly, this
helps as a extremly good basis.

Below are some thoughs which I started thinking of during the last few
days, please not it might not yet be complete. Please as well try to
not push/force us too much -- whilst we understand the issue, and see
that something whatever the solution is (split, move somewhere else)
-- we have regularly more serious issues popping up we want and need
to look at those. But we acknowledge and see als well salsa admin
point of view.

That said, here is what I have at the moment, some are easy, some
will/might be more involving.

Notes on possible CVE/list splits
-

- workflows on files itself by most active users. Often kept open
  cross-checking issues all issues in one file. But this will "just"
  need other ways to deal with the situation by the persons working
  most on it.
- Code of security-tracker service and python modules itself which
  currently rely on the data/*/list formats (DSA, DLA, CVE, ...) This
  could probably be split up and use data/*/*.list
- Externally called but included in code: update script which fetches
  MITRE list and integrates all needed changes (see further below).
- bin/bts-update (called from scripts/update-CVE-assignments in cron of
  the securiy-tracker-services) operates based on data/CVE/list and
  keeps track of the already tagged bugs by comparing with an 'oldlist'.
  The oldlist is copied on a run on soriano.debian.org as 'state' file
  similar to logroate's statefile (cron).
- bin/check-new-issues: parsing of TODO and checks for the new issues is
  as well based on 'data/CVE/list' existence and parsing. After a split
  up the interactive commands should still be able to navigate trough
  the items.
- bin/check-syntax: Check syntax of the various lists based on the security-
  tracker parser for the lists. make check-syntax from the Makefile, pre-
  commit hook or C/I tests are all using this script for syntax check.
  Depends on CVEfile as well from python/bugs.py. Relevant here is the
  check-syntax target from the Makefile. At SVN times this was actually
  only testing the syntax of the changed files, but now it just runs
  make check-syntax.
- bin/compare-nvd-cve reads from data/CVE/list and this is probably
  easier to adapt and it's used basically in a "experimental" target in
  Makefile for update-compare-nvd target. AFAICS this is just reading
  the information should be easy to adapt to any split up setup.
- bin/gen-{DSA,DLA}: Used the data/CVE/list for sanity check for
  presence of the CVE.
- bin/get-todo-items (this script is currently not working correctly and
  it's implemented already via the webview, so need to consider if we
  actually still need it).
- bin/inject-embedded-code-copies (experimental script, not
  actively used)
- bin/rejected-with-info relies on data/CVE/list directly, but will be
  potentially easily adaptable in a splited setup.
- bin/setup-repo: checks for data/CVE/list just to make sure it's the
  right repo.
- bin/report-vuln uses CVEFile (from python/bugs.py).
- bin/update and bin/updatelist: Parses DSA/DTSA/DLA list and
  data/CVE/list adding new entries from MITRE feed and crossreferences
  for the DSA/DLA's to a new data/CVE/list which then in the cronjob on
  soriano will be committed. That is one processing those files in a
  splitted setup this will need continue to work.
- bin/update-db (Used triggered by Makefile target to update security.db
  sqlite database).
- bin/update-nvd (possibly dependency on the CVE lists via the used
  modules but not directly).
- data/config.json contains the sources for CVE, DSA, DLA and extended
  lists. Currently path thus will be a path component starting from
  data, e.g. for CVE files path is '/CVE/list'. See as well "Setting up
  an extended instance" in the documentation.
- lib/python/bugs.py contains the classes CVEFile, DSAFile,
  CVEExtendFile.
- lib/python/debian_support.py: defines the getconfig function reading
  data/config.json.
- lib/python/security_db.py, via getSources get the configuration from
  where to read CVE, DSA, DLA, Extends information defined in
  config.json.

Regards,
Salvatore



Bug#908678: Update on the security-tracker git discussion

2019-06-06 Thread Salvatore Bonaccorso
Hi Daniel,

On Thu, Jun 06, 2019 at 08:35:47AM +0200, Daniel Lange wrote:
> Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:
> > Could you again point me to your splitted up variant mirror?
> 
> https://git.faster-it.de/debian_security_security-tracker_split_files/

Thanks!

While starting to look at it, could you change the splitting to
$year.list instead of list.$year? I know this comes from the initial
script which was commited. It is though more intuitive working with
$work.something than something.$year in this context.

Thanks already!

Regards,
Salvatore



Bug#908678: Update on the security-tracker git discussion

2019-06-06 Thread Daniel Lange

Am 06.06.19 um 07:31 schrieb Salvatore Bonaccorso:

Could you again point me to your splitted up variant mirror?


https://git.faster-it.de/debian_security_security-tracker_split_files/



Bug#908678: Update on the security-tracker git discussion

2019-01-24 Thread Daniel Lange
Zobel brought up the security-tracker git discussion in the 
#debian-security irc channel again and I'd like to record a few of the 
items touched there for others that were not present:


DLange has a running mirror of the git repo with split files since three 
months. This is based on anarcat's scripts published previously in this 
bug. The rewriting mirror repo works flawlessly. All history is retained 
sans gpg commit signatures.


Corsac noted that "redoing the tooling is a pain" and anarcat and DLange 
iterated we are willing to help fix the tools. But we need a commitment 
from the security-team that the migration to a split file repo is 
wanted. And we need a prioritized list of tools that need to be 
split-files enabled.


The discussion iterated that "moving elsewhere" doesn't really fix the 
underlying git-usage issue. So while this would take load off salsa, it 
will not improve clone times and hamper collaboration with Debian people 
outside the security team.


Still - to gain some data - DLange tried to push the security-tracker 
repo to github. This bails out as the history contains a file > 100MB 
(hard limit for Github):


remote: error: GH001: Large files detected. You may want to try Git 
Large File Storage - https://git-lfs.github.com.

[..]
remote: error: File data/CVE/allitems.html is 111.44 MB; this exceeds 
GitHub's file size limit of 100.00 MB


So we would have to re-write history for pushing to GitHub. Commits from 
2017-12-29 that introduce "data/CVE/allitems.html" and drop it again 
would need to be modified. Technically all commits after these have to 
be re-written as well. I have not tested whether Github supports 
refs/replace substitutes which would be a work-around.


As noticeable on Salsa and per 
https://gitlab.com/gitlab-com/support-forum/issues/230 Gitlab does not 
enforce per-file size limits.
But the pain of hosting and using this repo is not really different for 
any Gitlab instance.


So that means self-hosting of a non-split-file repo would probably have 
to be on a security DSA machine or similar.


Again, as said above, discussion participants outside the security team 
would prefer a commitment to split the offending data/CVE/list file into 
annual chunks, enable the tooling and stay on salsa.