Bug#908678: Testing the filter-branch scripts
On 2018-11-12 12:22:58, Antoine Beaupré wrote: > I'll start a run on the whole history to see if I can find any problems, > as soon as a first clone finishes resolving those damn deltas. ;) The Python job finished successfully here after 10 hours. I did some tests on the new git repository. Cloning the repository from scratch takes around 2 minutes (the original repo: 21 minutes). It is 145MB while the original repo is 1.6GB. Running git annotate on data/CVE/list.2018 takes about 26 seconds, while it takes basically forever to annotate the original data/CVE/list. (It's been running for 10 minutes here already.) So that's about it. I have not done a thorough job at checking the actual *integrity* of the results. It's difficult, considering CVE identifiers are not sequential in the data/CVE/list file, so a naive diff like this will fail: $ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} ) data/CVE/list | diffstat list |106562 +-- 1 file changed, 53281 insertions(+), 53281 deletions(-) But at least the numbers add up: it looks like no line is lost. And indeed, it looks like all CVEs add up: $ diff -u <(cat ../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999} | grep ^CVE | sort -n ) <( grep ^CVE data/CVE/list | sort -n ) | diffstat 0 files changed A cursory look at the diff seems to indicate it is clean, however. I looked at splitting that file per CVE. That did not scale and just created new problems. But splitting by *year* seems like a very efficient switch, and I think it would be worth pursuing that idea forward. A. -- There is no cloud, it's just someone else's computer. - Chris Watterson
Bug#908678: Testing the filter-branch scripts
On 2018-11-10 18:56:01, Daniel Lange wrote: > Antoine, > > thank you very much for your filter-branch scripts. you're welcome! glad it can be of use. > I tested each: > > 1) the golang version: > It completes after 3h36min: > > # git filter-branch --tree-filter '/split-by-year' HEAD > Rewrite a09118bf0a33f3721c0b8f6880c4cbb1e407a39d (68282/68286) (12994 seconds > passed, remaining 0 predicted) > Ref 'refs/heads/master' was rewritten > > But it doesn't Close() the os.OpenFile handles so ... > all data/CVE/list. files are 0 bytes long. Sic! Well. That explains part of the performance difference. ;) There were multiple problems with the golang source - variable shadowing and, yes, a missing Close(). Surprisingly, the fixed version results is *slower* than the equivalent Python code, taking about one second per run or 1102 seconds for the last 1000 commits. I'm at a loss as to how I managed to make go run slower than Python here (and can't help but think C would have been easier, again). Probably poor programming on my part. New version attached. [...] > 2.1) the Python version > You claim #!/usr/bin/python3 in the shebang, so I tried that first: > > # git filter-branch --tree-filter '/usr/bin/python3 > /__pycache__/split-by-year.cpython-35.pyc' HEAD > Rewrite 990d3c4bbb49308fb3de1e0e91b9ba5600386f8a (1220/68293) (41 seconds > passed, remaining 2254 predicted) > Traceback (most recent call last): > File "split-by-year.py", line 13, in > File "/usr/lib/python3.5/codecs.py", line 321, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 5463: > invalid start byte > tree filter failed: /usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc I suspected this would be a problem, but didn't find any occurence in the shallow clone so I forgot about it. Note that the golang version takes great care to treat the data as binary... > The offending commit is: > * 990d3c4bbb - Rename sarge-checks data to something not specific to sarge, > since we're working on etch now. > Sorry for the probable annoyance, but it had to be done. (13 years ago) > [Joey Hess] > > There will be many more like this, so for Python3 > this needs needs to be made unicode-agnostic. ... so I rewrote the thing to handle only binary and tested it against that version of the file. It seems to work fine. > Notice I compiled the .py to .pyc which makes it > much faster and thus well usable. Interesting. I didn't see much difference in performance in my benchmarks on average, but the worst-case run did improve by 150ms, so I guess this is worth the trouble. For those who didn't know (like me) this means running: python -m compileall bin/split-by-year.py Whenever the .py file changes (right?). > 2.2) Python, when a string was a string .. Python2 > Your code is actually Python2, so why not give that a try: > > # git filter-branch --tree-filter '/usr/bin/python2 /split-by-year.pyc' HEAD > Rewrite b59da20b82011ffcfa6c4a453de9df58ee036b2c (2516/68293) (113 seconds > passed, remaining 2954 predicted) > Traceback (most recent call last): > File "split-by-year.py", line 18, in > yearly = 'data/CVE/list.{:d}'.format(year) > NameError: name 'year' is not defined > tree filter failed: /usr/bin/python2 /split-by-year.pyc > > The offending commit is: > * b59da20b82 - claim (13 years ago) [Moritz Muehlenhoff] > | diff --git a/data/CVE/list b/data/CVE/list > | index 7b5d1d21d6..cdf0b74dd0 100644 > | --- a/data/CVE/list > | +++ b/data/CVE/list > | @@ -1,3 +1,4 @@ > | +begin claimed by jmm > | CVE-2005-3276 (The sys_get_thread_area function in process.c in Linux 2.6 > before ...) > | TODO: check > | CVE-2005-3275 (The NAT code (1) ip_nat_proto_tcp.c and (2) > ip_nat_proto_udp.c in ...) > | @@ -34,6 +35,7 @@ CVE-2005-3260 (Multiple cross-site scripting (XSS) > vulnerabilities in ...) > | TODO: check > | CVE-2005-3259 (Multiple SQL injection vulnerabilities in > versatileBulletinBoard (vBB) ...) > | TODO: check > | +end claimed by jmm > | CVE-2005- [Insecure caching of user id in mantis] > | - mantis (bug #330682; unknown) > | CVE-2005- [Filter information disclosure in mantis] > > As you see the line "+begin claimed by jmm" breaks the too simplistic parser > logic. > Unfortunately dry-running against a current version of data/CVE/list such > errors do not show up. > The "violations" of the file format are transient and buried in history. Hmm... That's a trickier one. I guess we could just pretend that line doesn't exist and drop it from history... But I chose to buffer it and treat it like the CVE line so it gets attached to the right file. See if it does what you expect. git cat-file -p b59da20b82:data/CVE/list > data/CVE/list.b59da20b82 split-by-year.py data/CVE/list.b59da20b82 Performance-wise, I shaved off a surprising 60ms by enclosing all the code in a function
Bug#908678: Some more thoughts and some tests on the security-tracker git repo
On 2018-11-09 16:05:06, Antoine Beaupré wrote: > 2. do a crazy filter-branch to send commits to the right > files. considering how long an initial clone takes, i can't even > begin to imagine how long *that* would take. but it would be the > most accurate simulation. > > Short of that, I think it's somewhat dishonest to compare a clean > repository with split files against a repository with history over 14 > years and thousands of commits. Intuitively, I think you're right and > that "sharding" the data in yearly packets would help a lot git's > performance. But we won't know until we simulate it, and if hit that > problem again 5 years from now, all that work will have been for > nothing. (Although it *would* give us 5 years...) So I've done that crzy filter-branch, on a shallow clone (1000 commits). The original clone is about 30MB, but the split repo is only 4MB. Cloning the original repo takes a solid 30+ seconds: [1221]anarcat@curie:src130$ time git clone file://$PWD/security-tracker-1000.orig security-tracker-1000.orig-test Clonage dans 'security-tracker-1000.orig-test'... remote: Énumération des objets: 5291, fait. remote: Décompte des objets: 100% (5291/5291), fait. remote: Compression des objets: 100% (1264/1264), fait. remote: Total 5291 (delta 3157), réutilisés 5291 (delta 3157) Réception d'objets: 100% (5291/5291), 8.80 MiB | 19.47 MiB/s, fait. Résolution des deltas: 100% (3157/3157), fait. 64.35user 0.44system 0:34.32elapsed 188%CPU (0avgtext+0avgdata 200056maxresident)k 0inputs+58968outputs (0major+48449minor)pagefaults 0swaps Cloning the split repo takes less than a second: [1223]anarcat@curie:src$ time git clone file://$PWD/security-tracker-1000-filtered security-tracker-1000-filtered-test Clonage dans 'security-tracker-1000-filtered-test'... remote: Énumération des objets: 2214, fait. remote: Décompte des objets: 100% (2214/2214), fait. remote: Compression des objets: 100% (1190/1190), fait. remote: Total 2214 (delta 936), réutilisés 2214 (delta 936) Réception d'objets: 100% (2214/2214), 1.25 MiB | 22.78 MiB/s, fait. Résolution des deltas: 100% (936/936), fait. 0.25user 0.04system 0:00.38elapsed 79%CPU (0avgtext+0avgdata 8200maxresident)k 0inputs+8664outputs (0major+3678minor)pagefaults 0swaps So this is clearly a win, and I think it would be possible to rewrite the history using the filter-branch command. Commit IDs would change, but we would keep all commits and so annotate and all that good stuff would still work. The split-by-year bash script was too slow for my purposes: it was taking a solid 15 seconds for each run, which meant it would have taken 9 *days* to process the entire repository. So I tried to see if this could be optimized, so we could split the file while keeping history without having to shutdown the whole system for days. I first rewrote it in Python, which processed the 1000 commits in 801 seconds. This gives an estimate of 15 hours for the 68278 commits I had locally. Concerned about the Python startup time, I then tried golang, which processed the tree in 262 seconds, giving final estimate of 4.8 hours. Attached are both implementations, for those who want to reproduce my results. Note that they differ from the original implementation in that they have to (naturally) remove the data/CVE/list file itself otherwise it's kept in history. Here's how to call it: git -c commit.gpgSign=false filter-branch --tree-filter '/home/anarcat/src/security-tracker/bin/split-by-year.py data/CVE/list' HEAD Also observe how all gpg commit signatures are (obviously) lost. I have explicitely disabled that because those actually take a long time to compute... I haven't tested if a graft would improve performance, but I suspect it would not, given the sheer size of the repository that would effectively need to be carried over anyways. A. -- Man really attains the state of complete humanity when he produces, without being forced by physical need to sell himself as a commodity. - Ernesto "Che" Guevara package main import ( "bufio" "bytes" "io" "log" "os" "strconv" "strings" ) func main() { file, err := os.Open("data/CVE/list") if err != nil { log.Fatal(err) } defer file.Close() var ( line []byte cve []byte year uint64 year_str string target *os.File header bool ) fds := make(map[uint64]*os.File, 20) scanner := bufio.NewReader(file) for { line, err = scanner.ReadBytes('\n') if bytes.HasPrefix(line, []byte("CVE-")) { cve = line year_str = strings.Split(string(line), "-")[1] year, _ = strconv.ParseUint(year_str, 0, 0) header = true } else { if target, ok := fds[year]; !ok { target, err = os.OpenFile("data/CVE/list."+year_str, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644) if err != nil {
Bug#908678: Some more thoughts and some tests on the security-tracker git repo
On 2018-09-26 14:56:16, Daniel Lange wrote: [...] > In any case, a repo with just the split files but no maintained history clones > in ~12s in the above test setup. It also brings the (bare) repo down from > 3,3GB > to 189MB. So the issue is really the data/CVE/list file. So I've looked in that problem as well, four months ago: https://salsa.debian.org/security-tracker-team/security-tracker/issues/2 In there I proposed splitting the data/CVE/list file into "one file per CVE". In retrospect, that was a rather naive approach and yielded all sorts of problems: there were so many files that it create problems even for the shell (argument list too long). I hadn't thought of splitting things in "one *file* per year". That could really help! Unfortunately, it's hard to simulate what it would look like *14 years* from now (yes, that's how old that repo is already). I can think of two ways to simulate that: 1. generate commits to recreate all files from scratch: parse data/CVE/list, split it up into chunks, and add each CVE in one separate commit. it's not *exactly* how things are done now, but it should be a close enough approximation 2. do a crazy filter-branch to send commits to the right files. considering how long an initial clone takes, i can't even begin to imagine how long *that* would take. but it would be the most accurate simulation. Short of that, I think it's somewhat dishonest to compare a clean repository with split files against a repository with history over 14 years and thousands of commits. Intuitively, I think you're right and that "sharding" the data in yearly packets would help a lot git's performance. But we won't know until we simulate it, and if hit that problem again 5 years from now, all that work will have been for nothing. (Although it *would* give us 5 years...) > That said, data/DSA/list is 14575 lines. That seems to not bother git too much > yet. Still if things get re-structured, this file may be worth a look, too. Yeah, I haven't had trouble with that one yet either. > To me the most reasonable path forward unfortunately looks like start a new > repo > for 2019+ and "just" import the split files or single-record files as > mentioned > by pabs but not the git/svn/cvs history. The old repo would - of course - stay > around but frozen at a deadline. In any case, I personally don't think history over those files is that critical. We rarely dig into that history because it's so expensive... Any "git annotate" takes forever in this repo, and running *that* it over data/CVE/list takes tens of minutes. That said, once we pick a solution, we *could* craft a magic filter-branch that *would* keep history. It might be worth eating that performance cost then. I'll run some tests to see if I can make sense of such a filter. > Corsac also mentioned on IRC that the repo could be hosted outside of Gitlab. > That would reduce the pressure for some time. > But cgit and other git frontends (as well as backends) we tested also struggle > with the repo (which is why my company, Faster IT GmbH, used the > security-tracker > repo as a very welcome test case in the first place). > So that would buy time but not be a solution long(er) term. Agreed. I think the benefits of hosting on gitlab outweigh the trouble in rearchitecturing our datastore. As I said, it's not just gitlab that's struggling with a 17MB text file: git itself has trouble dealing with it as well, and I am often frustrated by that in my work... A. -- You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes, can then turn around and suddenly write virtualization layers without security holes. - Theo de Raadt
Re: Dealing with renamed source packages during CVE triaging
On 2018-06-15 10:27:45, Moritz Muehlenhoff wrote: > On Fri, Jun 15, 2018 at 04:34:14PM +1000, Brian May wrote: >> Moritz Muehlenhoff writes: >> >> > On Wed, Jun 13, 2018 at 05:19:40PM +1000, Brian May wrote: [...] >> That generates a report of all packages that we need to check. I assume >> we would need some way of marking packages that we have checked and >> found to be not affected, so we can get a list of packages that need >> immediate attention and don't repeatedly check the same package multiple >> times. How should we do this? Maybe another file in the security tracker >> repository? > > Maybe start with the script initially and see whether it's useful as an > approach in general. State tracking can be discussed/added later. Maybe the same principle applies as with the approach I considered. We could have a --stop argument that would consider entries up to a certain CVE number and ignore the rest of the file. > Lots of the false positives will result from crappy/outdated entries > in embedded-code-copies, so fixing those up will drastically reduce > false positives. If the embedded-code-copies is used more systematically, with a semi-automated script, in the triaging process, we'll be more inclined to keep it up to date as well so I think it would actually help with that as well... bam: do you want me to start working on that script or were you working on this already? Thanks for the feedback, A. -- Ils versent un pauvre miel sur leurs mots pourris et te parlent de pénurie Et sur ta faim, sur tes amis, ils aiguisent leur appétit - Richard Desjardins, La maison est ouverte
Re: Dealing with renamed source packages during CVE triaging
I've finalized a prototype during my research on this problem, which I have detailed on GitLab, as it's really code that should be merged. It would also benefit from wider attention considering it affects more than LTS now. Anyways, the MR is here: https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4 Comments are welcome there or here. For what it's worth, I reused Lamby's crude parser because I wanted to get the prototype out the door. I am also uncertain that a full parser can create the CVE/list file as is reliably without introducing inconsistent diffs... I also drifted into the core datastructures of the security tracker, and wondered if it would be better to split up our large CVE/list file now that we're using git. I had mixed results. For those interested, it is documented here: https://salsa.debian.org/security-tracker-team/security-tracker/issues/2 Cheers! a. -- If it's important for you, you'll find a way. If it's not, you'll find an excuse. - Unknown
Re: Dealing with renamed source packages during CVE triaging
On 2018-06-08 03:29:38, Brian May wrote: > Antoine Beaupré writes: > >> Right now, it seems that all scripts that hammer at those files do so >> with their own ad-hoc parsing code. Is that the recommended way of >> chopping those files up? Or is there a better parsing library out there? > > It sounds like we really good do with a good parsing library. Maybe one > that supports making changes too. > > I could make a start on this. As I mentioned in the other thread, I am uncertain where to go from here. Some scripts use JSON, others parse the files by hand... I also found out yesterday after writing this that there is *already* a parsing library in the security tracker. It can parse {CVE,DSA,DLA}/list files and lives in lib/python/bugs.py, but it's somewhat coupled with the sqlite database - i'm not sure it's usable standalone. But yeah, maybe clarifying all this stuff would help, for sure... I would recommend not writing yet another library from scratch however, as we probably have a dozen such parser already and it's confusing enough as it is. ;) a. -- L'ennui avec la grande famille humaine, c'est que tout le monde veut en être le père. - Mafalda
Re: Dealing with renamed source packages during CVE triaging
Sorry for resurrecting this old thread, but I've been looking at how to deal with renamed packages in CVE triaging again. When we last talked about this, we observed how we were sometimes missing packages during triage, e.g. `tiff3` that was present in wheezy. That's not an issue anymore since wheezy is gone, but the problem occurs more broadly in other packages. In fact, it seems to me this is similar to the broader of embedded code copies. We could generalize renamed packages to the embedded code copies problem. We have a database of those in data/embedded-code-copies already, although I'm not sure how up to date that file actually is, nor how it is currently used in the workflow. It seems to me any database of renames we could be would clearly overlap with the embedded-code-copies file, so I figured I would write a (Python, we already have Perl and bash ones...) to start with. I have tried to upload this in a fork on salsa but gave up as push (of a single commit!) was stuck "resolving deltas"... Anyways, here's the snippet: https://salsa.debian.org/anarcat/security-tracker/snippets/70 The next step is to figure out how to actually modify the data/CVE/list file to introduce the changes. Considering the large number of packages in the embedded-code-copies file, I am not sure we want to retroactively change all previous entries. jmm suggested we run a cronjob that would keep track of where it is in history which would resolve this nicely. One question that remains is what, exactly, to add in the CVE metadata. One problem we faced last we looked at this is that we needed to add an entry like: SOURCEPACKAGE ... which would (e.g.) get triaged to: SOURCEPACKAGE [wheezy] SOURCEPACKAGE (or whatever) ... later on. This requires inside knowledge of the suites and their packages, something I find surprisingly hard to do in the security tracker. With embedded-code-copies, we will have to add something for all the other source packages, e.g.: OTHERSOURCE Right now, it seems that all scripts that hammer at those files do so with their own ad-hoc parsing code. Is that the recommended way of chopping those files up? Or is there a better parsing library out there? Thanks for any advice, A.
Re: pulling in other vulnerability databases
On 2018-01-26 00:31:19, Ben Hutchings wrote: > On Thu, 2018-01-25 at 10:17 -0500, Antoine Beaupré wrote: > [...] >> > OS vendors (RH/SUSE) >> > Upstream projects (Xen, Linux etc) >> >> I believe those already follow the CVE process and eventually converge >> over doing the right thing. So I am not really concerned about those >> people. > [...] > > Linux has a security contact (secur...@kernel.org), but this is only > used for reporting bugs and discussing how to fix them; CVE assignments > are left to distributions, DWF, etc. Many security fixes don't get > discussed there anyway. > > I would estimate that less than half of security fixes in Linux > actually get CVE IDs. Well that's just disturbing. I am not sure, however, that I can meaningfully change this by ... er... say writing the kernel mailing lists, unfortunately. I haven't got a reply from Snyk.io (yet?) by the way. I suspect I may not get anything at all... Any other ideas as to the next steps in general here? a. -- May your trails be crooked, winding, lonesome, dangerous, leading to the most amazing view. May your mountains rise into and above the clouds. - Edward Abbey
Re: pulling in other vulnerability databases
On 2018-01-25 16:27:58, Moritz Mühlenhoff wrote: > Antoine Beaupré wrote: >> So, regarding the first two (and similar), someone needs to teach those >> folks about proper security tracking here... ;) Should I contact them >> directly? > > Who in particular? Node and Snyk? Sure, go ahead. I contacted snyk.io with mitre and the secteam in cc. I'm not sure what to say to nodesecurity.io folks: there's a lot of stuff in there, and I'm not sure it affects us so much. I couldn't find a package without a CVE that is also in Debian in the first few pages of their advosiry pages... pabs, did you have any issues in mind that were problematic here specifically? in any case, feel free to reuse the message I sent to snyk as a template for others... a.
Re: improving the report-vuln script
Those changes all look good to me. Note that I worked on a separate git repo and have more granular patches for my changes if that's useful. Thanks! -- I know where I am going, and I know the truth, and I do not have to be what you want me to be. I am free to be what I want. - Muhammad Ali
Re: improving the report-vuln script
On 2017-03-31 06:07:08, Salvatore Bonaccorso wrote: > Hi Antoine, > > At the top of the this mail: fist of all thanks for your work on the > report-vuln script. I use it extensively in it's current form in a two > staged workflow. 1. generate a template for reportbug, then feed it > int via reportbug -i ... and do some cleanups before sending the mail. Interesting! I couldn't figure out how to use reportbug productively... surely it would be nice to have a better tip for this in the file itself... > On Thu, Mar 30, 2017 at 03:04:39PM -0400, Antoine Beaupré wrote: [...] >> --affected AFFECTED affected version (default: unspecified) > > This one might be a bit confusing: Would you change it to --version? > Does this makes sense? or is --version more confusing because > interpreted as to show version of report-vuln? So we might leave it as > --affected. I found --version confusing, especially since there was already "include_version" in the code (which was even more confusing, because it actually meant "include blanks". Furthermore, --version, by convention, shows the script version number and users may not expect it to actually run the script (a bit like --help). >> --severity SEVERITY severity (default: grave) > > I'm not sure: I think severity important as default would be safer. Sure. I just kept the existing default. > The severity is often choosed after evaluation really to grave. But I > think it should always be a deliberate choice to set a RC severity. So > I would tend to set default severity to important. This might be > disputed though. The current script actually (if --no-blanks is set) > set it already to grave. So I'm proposing a change here. I agree with the proposed change, makes sense. It's one of the reasons I added the flag in the first place - I didn't want to file RC bugs for issues that were no-dsa... >> --no-cc, --cc add X-Debbugs-CC header to >> --cc-list CCLIST list of addres to add in CC (default: > > Typo: address? (but see below). Typical. :) >> ['t...@security.debian.org', 'secure-testing- >> t...@lists.alioth.debian.org']) >> >> I was not sure in which case multiple CVEs could be used - should we >> report multiple CVEs in a single bug? that seems like bad >> practice... IMHO, each bug should have its own CVE... > > It always depends! There are cases (and sometimes prferences ;-)) > where one fills multiple CVEs in one bugreport. This is perfectly > fine. My approach for example is: if I know all reported CVEs are with > known fixes, all supported versions in lower sites are affected as > well for the same set of CVEs, then one bugeport is fine. Otherwise I > try to split them up addording to the found/triaged versions as well > making individual bugs for individual CVEs. But supporting reporting > mutliple CVEs is perfectly fine. Okay, this is the heuristic I was looking for. How about we explain this in the "epilog" of the usage or at least somewhere? >> Ideally, this script would interactively submit the bug and wait for >> confirmation, but it seems this is something that is notoriously hard to >> do in the Debian BTS - I personnally haven't found a way to promptly get >> a bug identifier when submitting bugs other than waiting for the >> autoreply to kick in, which can take some time. > > Possibly not feasable in the current form. Right. >> Since this is not something the LTS team directly uses all the time - >> it's more something for the main security team - I figured it would be >> safer to submit it for review here, especially since the changes are >> rather large. Let me know if there's a better place to discuss this if >> this isn't appropriate. > > Appreciated. I will try to use your new version and report back to > you, can you please though wait before commiting. No problem, of course. I wish I could have pushed this to a feature branch or something... ;) >> Attached is the patch for the new features and a fresh copy which is >> only twice as long as the diff... > > Thanks. Handy :) let's see... >> --- ../secure-testing/bin/report-vuln2017-03-30 14:17:31.262515489 >> -0400 >> +++ report-vuln 2017-03-30 14:53:50.614772451 -0400 >> @@ -19,6 +19,7 @@ >> # >> # export http_proxy if you need to use an http proxy to report bugs >> >> +import argparse >> import sys, re, urllib, os >> >> temp_id = re.compile('(?:CVE|cve)\-[0-9]{4}-') >> @@ -112,7 +113,7 @@ >> >> return ret + '\n' >> >> -def gen_text(pkg, cveid, include_version = False, severity = >> 'FILLINSEVERITY'): >&g
improving the report-vuln script
Hi, I had some spare time today and I figured I would look at the backlog of unreported vulnerabilities here: https://security-tracker.debian.org/tracker/status/unreported This is one of the housekeeping tasks we were told was useful for the security team here: https://wiki.debian.org/LTS/Development#Housekeeping_Tasks The instructions for using the report-vuln script weren't quite accurate: first it can't really be used with reportbug, from what I can tell, at least not in a useful way.. Reportbug will still prompt for all its usual stuff which makes it less effective. Furthermore, the recommended use (in the script itself) of Mutt doesn't add the necessary headers, and I found the "blanks" argument rather confusing. Therefore, I have rewritten parts of the script to make it easier to automate. It's now possible to specify the severity and affected versions directly on the commandline, and the usage is much clearer. Before: $ ./bin/report-vuln -h ./bin/report-vuln [--no-blanks] Now: usage: report-vuln [-h] [--no-blanks] [--affected AFFECTED] [--severity SEVERITY] [--no-cc] [--cc-list CCLIST] pkg cve [cve ...] positional arguments: pkg affected package cve relevant CVE for this issue, may be used multiple time if the issue has multiple CVEs optional arguments: -h, --helpshow this help message and exit --no-blanks, --blanks include blank fields to be filled (default: False) --affected AFFECTED affected version (default: unspecified) --severity SEVERITY severity (default: grave) --no-cc, --cc add X-Debbugs-CC header to --cc-list CCLIST list of addres to add in CC (default: ['t...@security.debian.org', 'secure-testing- t...@lists.alioth.debian.org']) I was not sure in which case multiple CVEs could be used - should we report multiple CVEs in a single bug? that seems like bad practice... IMHO, each bug should have its own CVE... Ideally, this script would interactively submit the bug and wait for confirmation, but it seems this is something that is notoriously hard to do in the Debian BTS - I personnally haven't found a way to promptly get a bug identifier when submitting bugs other than waiting for the autoreply to kick in, which can take some time. Since this is not something the LTS team directly uses all the time - it's more something for the main security team - I figured it would be safer to submit it for review here, especially since the changes are rather large. Let me know if there's a better place to discuss this if this isn't appropriate. Attached is the patch for the new features and a fresh copy which is only twice as long as the diff... --- ../secure-testing/bin/report-vuln 2017-03-30 14:17:31.262515489 -0400 +++ report-vuln 2017-03-30 14:53:50.614772451 -0400 @@ -19,6 +19,7 @@ # # export http_proxy if you need to use an http proxy to report bugs +import argparse import sys, re, urllib, os temp_id = re.compile('(?:CVE|cve)\-[0-9]{4}-') @@ -112,7 +113,7 @@ return ret + '\n' -def gen_text(pkg, cveid, include_version = False, severity = 'FILLINSEVERITY'): +def gen_text(pkg, cveid, blanks = False, severity = 'FILLINSEVERITY', affected=None, cc=False, cclist=None): vuln_suff = 'y' cve_suff = '' time_w = 'was' @@ -124,8 +125,13 @@ time_w = 'were' header = '''Package: %s\n''' % (pkg) - if include_version: - header += 'Version: FILLINAFFECTEDVERSION\n' + if affected is None: + if blanks: + header += "Version: FILLINAFFECTEDVERSION\n" +else: +header += "Version: %s\n" % affected +if cc and len(cclist) > 0: +header += "X-Debbugs-CC: %s\n" % " ".join(cclist) header += '''Severity: %s Tags: security @@ -160,31 +166,54 @@ print '\nhttps://security-tracker.debian.org/tracker/source-package/%s' % (pkg) print '(issues without CVE id are assigned a TEMP one, but it may change over time)\n' - if not include_version: - print '''Please adjust the affected versions in the BTS as needed.\n''' + if not blanks: + print '''\nPlease adjust the affected versions in the BTS as needed.\n''' def error(msg): print 'error: ' + msg sys.exit(1) -def usage(): - print sys.argv[0], '[--no-blanks] ' - sys.exit(0) +class NegateAction(argparse.Action): +'''add a toggle flag to argparse + +this is similar to 'store_true' or 'store_false', but allows +arguments prefixed with --no to disable the default. the default +is set depending on the first argument - if it starts with the +negative form (define by default as '--no'), the default is False, +otherwise True. +''' + +negative = '--no' + +def __init__(self, option_strings, *args, **kwargs): +'''set default depending on the first argument''' +default = not
Bug#761945: fixing links for DLAs in the security tracker
On 2017-03-30 08:38:07, Salvatore Bonaccorso wrote: > Hi Antoine, > > On Wed, Mar 29, 2017 at 03:49:31PM -0400, Antoine Beaupré wrote: >> On 2017-03-29 17:02:44, Salvatore Bonaccorso wrote: >> > Hi Antoine, >> >> Hi! >> >> > If you want to look at this part: There is a ./parse-dla.pl script in >> > the webwml CVS, which is used to import the DLAs (this is an >> > analogeous script to parse-advisory.pl which is used to import the >> > DSAs). >> >> I see... The scripts are in /english/security for anyone looking. And if >> people are (like me) thinking "... wat.. CVS?" then yes, we are still >> using this: >> >> https://www.debian.org/devel/website/using_cvs >> >> My cvs commandline finger memory is *definitely* still there though, so >> that works for me. :) >> >> > The "manual" steps one would perform are roughly: >> > >> > ./parse-dla.pl $message >> > cvs add $year/dla-$nr.{wml,data} >> > cvs commit -m '[DLA $nr] $source security update' >> >> Is this something the security team performs as part of the DSA release >> process? Or is this something the debian-www people do? I guess you need >> write access to the repository and I see that *you* do, but is this >> expected from everyone working on releasing public advisories, the same >> way we need access to the security tracker? > > No it's not something we do as part of a regular DSA releasing > process, and as well not expected to do so, as the websites are under > debian-www "domain" (and btw, they do a great job!). But often, when I > have time I do as well the import (but as you will see from cvs log, > not always). For the security team the current process is: preparing > the DSA (packages, tracker work, text, releasing packages), send out > the advisory (at this stage our work is basically done). Okay, so this is just something the www team needs to catchup with then... >> And to import older entries, we'll need the original templates, which we >> deliberately did *not* commit anywhere, so they are basically available >> only as mailing list archives, and thus hard to find automatically. > > But given the debian-lts-announce is archived, shouldn't it be > relatively easy to frist grab all announces from > https://lists.debian.org/debian-lts-announce/ then check which one > need to still be imported, extract the mail and do the import? Or do I > missunderstand you? I was assuming I was just web-browsing, but it's true I can probably access the mailboxes and grep through this directly. It's still a pain in the butt. ;) >> I foresee difficulties in importing the missing data... >> >> Here's the bits that are missing: >> >> * the last DLA on the website is DLA-445-2, which is basically the last >>DLA before squeeze support ended and wheezy was handed over >> >> * among those 445 DLAs, there are actually 31 missing: >> >>webwml$ cd english/security/; find -name 'dla-*.wml' | wc -l >>424 >> >> * even worse, it seems there are at least 20 advisories missing from >>the website because regression uploads hide advisories, because our >>naming convention differs from DSA ("DLA-XXX-N", where XXX is the >>original advisory and N are regression updates) > > I do not understand this point. What do you mean by hinding? For DSA's > as well only one https://www.debian.org/security/$year/dsa-$nr is ever > visible as well (and it depends if the text has been then updated > according to a regression update or not, and in DLAs cases I guess > just only the last iteration might has been imported, not the initial > -1 one). Here's an example. DLA-445-1 was a Squid upload to squeeze, announced here: https://lists.debian.org/debian-lts-announce/2016/02/msg00037.html it caused a regression, which was fixed in DLA-445-2, announced here: https://lists.debian.org/debian-lts-announce/2016/03/msg1.html You can see both DLAs in the sectracker: https://security-tracker.debian.org/tracker/DLA-445-1 https://security-tracker.debian.org/tracker/DLA-445-2 (Note, BTW, that the regression update doesn't refer to the previous DLA or the CVE, you just need to know the convention to figure that out.) On the website, you only see the regression update: https://www.debian.org/security/2016/dla-445 That is the equivalent of DLA-445-2. I am not sure that DLA-445-1 is anywhere. I guess that's another bug to report as well? [...] >> > having something on the debian- side which does this >> > automatically, once a DSA or DLA arrives would help surely the >> > debian-www team who
Bug#761945: fixing links for DLAs in the security tracker
On 2017-03-29 17:02:44, Salvatore Bonaccorso wrote: > Hi Antoine, Hi! > If you want to look at this part: There is a ./parse-dla.pl script in > the webwml CVS, which is used to import the DLAs (this is an > analogeous script to parse-advisory.pl which is used to import the > DSAs). I see... The scripts are in /english/security for anyone looking. And if people are (like me) thinking "... wat.. CVS?" then yes, we are still using this: https://www.debian.org/devel/website/using_cvs My cvs commandline finger memory is *definitely* still there though, so that works for me. :) > The "manual" steps one would perform are roughly: > > ./parse-dla.pl $message > cvs add $year/dla-$nr.{wml,data} > cvs commit -m '[DLA $nr] $source security update' Is this something the security team performs as part of the DSA release process? Or is this something the debian-www people do? I guess you need write access to the repository and I see that *you* do, but is this expected from everyone working on releasing public advisories, the same way we need access to the security tracker? And to import older entries, we'll need the original templates, which we deliberately did *not* commit anywhere, so they are basically available only as mailing list archives, and thus hard to find automatically. I foresee difficulties in importing the missing data... Here's the bits that are missing: * the last DLA on the website is DLA-445-2, which is basically the last DLA before squeeze support ended and wheezy was handed over * among those 445 DLAs, there are actually 31 missing: webwml$ cd english/security/; find -name 'dla-*.wml' | wc -l 424 * even worse, it seems there are at least 20 advisories missing from the website because regression uploads hide advisories, because our naming convention differs from DSA ("DLA-XXX-N", where XXX is the original advisory and N are regression updates) $ grep DLA- data/DLA/list | sed 's/.* DLA-//;s/ .*//' | sort -n | sed '/445-2/,$d' | wc -l 465 * the canonical list has 928 advisories: secure-testing$ grep DLA- data/DLA/list | wc -l 928 So, lots of work there. > The background work leading to that was done by Frank Lichtenheld in > #762255. Great to see that! It does seem problematic to import regression updates however. > having something on the debian- side which does this > automatically, once a DSA or DLA arrives would help surely the > debian-www team who then "only" have to do the translations and fix > obvious mistakes. OTOH keep in mind: When the debian- team imports > a DSA or DLA they may need to do some adjustments so, I'm not sure if > it's liked to have the automatism, since sometimes before cvs commit > some changes need to be done on the .wml file. It looks like this is something that should be discussed with the www people... Maybe a bug against www.debian.org? This begs the question, however - wouldn't it be simpler to import those advisories in the security tracker directly? At least, we should figure out why the imports have ceased after wheezy-LTS started... > Writing the above a bit in a hurry let me know if unclear what I > meant. Thanks for the response! A. -- What this country needs is more unemployed politicians. - Angela Davis
Bug#761945: fixing links for DLAs in the security tracker
On 2017-03-29 07:29:06, Salvatore Bonaccorso wrote: > Hi, > > On Wed, Mar 29, 2017 at 06:28:49AM +0200, Salvatore Bonaccorso wrote: >> Hi, >> >> On Tue, Mar 28, 2017 at 10:16:52PM +, Holger Levsen wrote: >> > On Tue, Mar 28, 2017 at 10:35:34PM +0200, Moritz Muehlenhoff wrote: >> > > Well, you don't have a web site comparable to >> > > https://www.debian.org/security/2017/dsa-3796, so where should >> > > it possibly link to? >> > >> > I guess it's time to create this "web site" then :) >> >> See as well https://bugs.debian.org/761945 (and respective clones for >> debian-). > > The security-tracker side of this has been implemented now, Paul Wise > did the corresponding work. But around 400 DLA's are not yet imported > so many links will sow a page not found. > > A working example: > https://security-tracker.debian.org/tracker/DLA-55-1 or > https://security-tracker.debian.org/tracker/DLA-400-1 So I guess the next steps are for the LTS team: 1. update the documentation so that updating the website is part of the workflow 2. import old DLA advisories into the websites I can try and complete this by the end of the week. > p.s.: generally: for changes to the security-tracker, please do not use > debian-lts but rather the security-tracker list (or even > better/depending on case via bugreports). Understood. I wasn't sure, but will try to keep that in mind in the future... A. -- Rock journalism is people who can't write, interviewing people who can't talk, in order to provide articles for people who can't read. - Frank Zappa
Bug#776738: security-tracker: ignoring end-of-life packages (was: squeeze/wheezy updates of Redmine (+ long term state of redmine packaging))
On 2016-01-01 11:30:36, Raphael Hertzog wrote: > Hi, > > On Thu, 31 Dec 2015, Antoine Beaupré wrote: >> > I have thus pushed the attached patch to the git repository of >> > debian-security-support. Ccing the security team to inform them >> > of this change. >> >> Understood. >> >> Would it make sense to fix the secure-testing tracker to show those? It >> would have stopped me right there, if it would have shown the older >> versions were unsupported... > > Yes, I agree that the tracker should know which version is end-of-life > and no longer supported... and we should not have to re-add this information > every time. > > The only downside is that we would be less likely to properly record when > issues do not apply to the unsupported packages. > > In fact I would have expected to have a bug report open here: > https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=security-tracker;dist=unstable > > But I don't see anything that looks close yet. Maybe we should start by > requesting this first... Well, there are a few similar issues filed: * #772961: Please display no-dsa/end-of-life tags in "open issues" table in "source package view" * #776738: end-of-life issues clutter the list of open issues * #780892: please show unsupported packages as unsupported instead of unimportant #776738 seems to be the most relevant to me, so I added it in CC. The security team (in CC) also suggested way this could be implemented privately, but i will let them paste that here. A. -- O gentilshommes, la vie est courte. Si nous vivons, nous vivons pour marcher sur la tête des rois. - William Shakespeare