Bug#908678: Testing the filter-branch scripts

2018-11-13 Thread Antoine Beaupré
On 2018-11-12 12:22:58, Antoine Beaupré wrote:
> I'll start a run on the whole history to see if I can find any problems,
> as soon as a first clone finishes resolving those damn deltas. ;)

The Python job finished successfully here after 10 hours.

I did some tests on the new git repository. Cloning the repository from
scratch takes around 2 minutes (the original repo: 21 minutes). It is
145MB while the original repo is 1.6GB.

Running git annotate on data/CVE/list.2018 takes about 26 seconds, while
it takes basically forever to annotate the original data/CVE/list. (It's
been running for 10 minutes here already.)

So that's about it. I have not done a thorough job at checking the
actual *integrity* of the results. It's difficult, considering CVE
identifiers are not sequential in the data/CVE/list file, so a naive
diff like this will fail:

$ diff -u <(cat 
../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999}
 ) data/CVE/list | diffstat
 list |106562 
+--
 1 file changed, 53281 insertions(+), 53281 deletions(-)

But at least the numbers add up: it looks like no line is lost. And
indeed, it looks like all CVEs add up:

$ diff -u <(cat 
../security-tracker-full-test-filtered-bis/data/CVE/list.{2019,2018,2017,2016,2015,2014,2013,2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999}
 | grep ^CVE | sort -n ) <( grep ^CVE data/CVE/list | sort -n  ) | diffstat
 0 files changed

A cursory look at the diff seems to indicate it is clean, however.

I looked at splitting that file per CVE. That did not scale and just
created new problems. But splitting by *year* seems like a very
efficient switch, and I think it would be worth pursuing that idea
forward.

A.

-- 
There is no cloud, it's just someone else's computer.
   - Chris Watterson



Bug#908678: Testing the filter-branch scripts

2018-11-12 Thread Antoine Beaupré
On 2018-11-10 18:56:01, Daniel Lange wrote:
> Antoine,
>
> thank you very much for your filter-branch scripts.

you're welcome! glad it can be of use.

> I tested each:
>
> 1) the golang version:
> It completes after 3h36min:
>
> # git filter-branch --tree-filter '/split-by-year' HEAD
> Rewrite a09118bf0a33f3721c0b8f6880c4cbb1e407a39d (68282/68286) (12994 seconds 
> passed, remaining 0 predicted)
> Ref 'refs/heads/master' was rewritten
>
> But it doesn't Close() the os.OpenFile handles so ...
> all data/CVE/list. files are 0 bytes long. Sic!

Well. That explains part of the performance difference. ;)

There were multiple problems with the golang source - variable shadowing
and, yes, a missing Close(). Surprisingly, the fixed version results is
*slower* than the equivalent Python code, taking about one second per
run or 1102 seconds for the last 1000 commits. I'm at a loss as to how I
managed to make go run slower than Python here (and can't help but think
C would have been easier, again). Probably poor programming on my
part. New version attached.

[...]

> 2.1) the Python version
> You claim #!/usr/bin/python3 in the shebang, so I tried that first:
>
> # git filter-branch --tree-filter '/usr/bin/python3 
> /__pycache__/split-by-year.cpython-35.pyc' HEAD
> Rewrite 990d3c4bbb49308fb3de1e0e91b9ba5600386f8a (1220/68293) (41 seconds 
> passed, remaining 2254 predicted)
>   Traceback (most recent call last):
>   File "split-by-year.py", line 13, in 
>   File "/usr/lib/python3.5/codecs.py", line 321, in decode
> (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 5463: 
> invalid start byte
> tree filter failed: /usr/bin/python3 /__pycache__/split-by-year.cpython-35.pyc

I suspected this would be a problem, but didn't find any occurence in
the shallow clone so I forgot about it. Note that the golang version
takes great care to treat the data as binary...

> The offending commit is:
> * 990d3c4bbb - Rename sarge-checks data to something not specific to sarge, 
> since we're working on etch now.
>   Sorry for the probable annoyance, but it had to be done. (13 years ago) 
> [Joey Hess]
>
> There will be many more like this, so for Python3
> this needs needs to be made unicode-agnostic.

... so I rewrote the thing to handle only binary and tested it against
that version of the file. It seems to work fine.

> Notice I compiled the .py to .pyc which makes it
> much faster and thus well usable.

Interesting. I didn't see much difference in performance in my
benchmarks on average, but the worst-case run did improve by 150ms, so I
guess this is worth the trouble. For those who didn't know (like me)
this means running:

python -m compileall bin/split-by-year.py

Whenever the .py file changes (right?).

> 2.2) Python, when a string was a string .. Python2
> Your code is actually Python2, so why not give that a try:
>
> # git filter-branch --tree-filter '/usr/bin/python2 /split-by-year.pyc' HEAD
> Rewrite b59da20b82011ffcfa6c4a453de9df58ee036b2c (2516/68293) (113 seconds 
> passed, remaining 2954 predicted)
>   Traceback (most recent call last):
>   File "split-by-year.py", line 18, in 
> yearly = 'data/CVE/list.{:d}'.format(year)
> NameError: name 'year' is not defined
> tree filter failed: /usr/bin/python2 /split-by-year.pyc
>
> The offending commit is:
> * b59da20b82 - claim (13 years ago) [Moritz Muehlenhoff]
> | diff --git a/data/CVE/list b/data/CVE/list
> | index 7b5d1d21d6..cdf0b74dd0 100644
> | --- a/data/CVE/list
> | +++ b/data/CVE/list
> | @@ -1,3 +1,4 @@
> | +begin claimed by jmm
> |  CVE-2005-3276 (The sys_get_thread_area function in process.c in Linux 2.6 
> before ...)
> |   TODO: check
> |  CVE-2005-3275 (The NAT code (1) ip_nat_proto_tcp.c and (2) 
> ip_nat_proto_udp.c in ...)
> | @@ -34,6 +35,7 @@ CVE-2005-3260 (Multiple cross-site scripting (XSS) 
> vulnerabilities in ...)
> |   TODO: check
> |  CVE-2005-3259 (Multiple SQL injection vulnerabilities in 
> versatileBulletinBoard (vBB) ...)
> |   TODO: check
> | +end claimed by jmm
> |  CVE-2005- [Insecure caching of user id in mantis]
> |   - mantis  (bug #330682; unknown)
> |  CVE-2005- [Filter information disclosure in mantis]
>
> As you see the line "+begin claimed by jmm" breaks the too simplistic parser 
> logic.
> Unfortunately dry-running against a current version of data/CVE/list such 
> errors do not show up.
> The "violations" of the file format are transient and buried in history.

Hmm... That's a trickier one. I guess we could just pretend that line
doesn't exist and drop it from history... But I chose to buffer it and
treat it like the CVE line so it gets attached to the right file. See if
it does what you expect.

   git cat-file -p b59da20b82:data/CVE/list > data/CVE/list.b59da20b82
   split-by-year.py data/CVE/list.b59da20b82

Performance-wise, I shaved off a surprising 60ms by enclosing all the
code in a function 

Bug#908678: Some more thoughts and some tests on the security-tracker git repo

2018-11-09 Thread Antoine Beaupré
On 2018-11-09 16:05:06, Antoine Beaupré wrote:
>  2. do a crazy filter-branch to send commits to the right
> files. considering how long an initial clone takes, i can't even
> begin to imagine how long *that* would take. but it would be the
> most accurate simulation.
>
> Short of that, I think it's somewhat dishonest to compare a clean
> repository with split files against a repository with history over 14
> years and thousands of commits. Intuitively, I think you're right and
> that "sharding" the data in yearly packets would help a lot git's
> performance. But we won't know until we simulate it, and if hit that
> problem again 5 years from now, all that work will have been for
> nothing. (Although it *would* give us 5 years...)

So I've done that crzy filter-branch, on a shallow clone (1000
commits). The original clone is about 30MB, but the split repo is only
4MB.

Cloning the original repo takes a solid 30+ seconds:

[1221]anarcat@curie:src130$ time git clone 
file://$PWD/security-tracker-1000.orig security-tracker-1000.orig-test
Clonage dans 'security-tracker-1000.orig-test'...
remote: Énumération des objets: 5291, fait.
remote: Décompte des objets: 100% (5291/5291), fait.
remote: Compression des objets: 100% (1264/1264), fait.
remote: Total 5291 (delta 3157), réutilisés 5291 (delta 3157)
Réception d'objets: 100% (5291/5291), 8.80 MiB | 19.47 MiB/s, fait.
Résolution des deltas: 100% (3157/3157), fait.
64.35user 0.44system 0:34.32elapsed 188%CPU (0avgtext+0avgdata 
200056maxresident)k
0inputs+58968outputs (0major+48449minor)pagefaults 0swaps

Cloning the split repo takes less than a second:

[1223]anarcat@curie:src$ time git clone 
file://$PWD/security-tracker-1000-filtered security-tracker-1000-filtered-test
Clonage dans 'security-tracker-1000-filtered-test'...
remote: Énumération des objets: 2214, fait.
remote: Décompte des objets: 100% (2214/2214), fait.
remote: Compression des objets: 100% (1190/1190), fait.
remote: Total 2214 (delta 936), réutilisés 2214 (delta 936)
Réception d'objets: 100% (2214/2214), 1.25 MiB | 22.78 MiB/s, fait.
Résolution des deltas: 100% (936/936), fait.
0.25user 0.04system 0:00.38elapsed 79%CPU (0avgtext+0avgdata 8200maxresident)k
0inputs+8664outputs (0major+3678minor)pagefaults 0swaps

So this is clearly a win, and I think it would be possible to rewrite
the history using the filter-branch command. Commit IDs would change,
but we would keep all commits and so annotate and all that good stuff
would still work.

The split-by-year bash script was too slow for my purposes: it was
taking a solid 15 seconds for each run, which meant it would have taken
9 *days* to process the entire repository.

So I tried to see if this could be optimized, so we could split the file
while keeping history without having to shutdown the whole system for
days. I first rewrote it in Python, which processed the 1000 commits in
801 seconds. This gives an estimate of 15 hours for the 68278 commits I
had locally. Concerned about the Python startup time, I then tried
golang, which processed the tree in 262 seconds, giving final estimate
of 4.8 hours.

Attached are both implementations, for those who want to reproduce my
results. Note that they differ from the original implementation in that
they have to (naturally) remove the data/CVE/list file itself otherwise
it's kept in history.

Here's how to call it:

git -c commit.gpgSign=false filter-branch --tree-filter 
'/home/anarcat/src/security-tracker/bin/split-by-year.py data/CVE/list' HEAD

Also observe how all gpg commit signatures are (obviously) lost. I have
explicitely disabled that because those actually take a long time to
compute...

I haven't tested if a graft would improve performance, but I suspect it
would not, given the sheer size of the repository that would effectively
need to be carried over anyways.

A.

-- 
Man really attains the state of complete humanity when he produces,
without being forced by physical need to sell himself as a commodity.
- Ernesto "Che" Guevara
package main

import (
	"bufio"
	"bytes"
	"io"
	"log"
	"os"
	"strconv"
	"strings"
)

func main() {
	file, err := os.Open("data/CVE/list")
	if err != nil {
		log.Fatal(err)
	}
	defer file.Close()

	var (
		line []byte
		cve  []byte
		year uint64
		year_str string
		target   *os.File
		header   bool
	)
	fds := make(map[uint64]*os.File, 20)
	scanner := bufio.NewReader(file)
	for {
		line, err = scanner.ReadBytes('\n')

		if bytes.HasPrefix(line, []byte("CVE-")) {

			cve = line
			year_str = strings.Split(string(line), "-")[1]
			year, _ = strconv.ParseUint(year_str, 0, 0)
			header = true
		} else {
			if target, ok := fds[year]; !ok {
target, err = os.OpenFile("data/CVE/list."+year_str, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
	

Bug#908678: Some more thoughts and some tests on the security-tracker git repo

2018-11-09 Thread Antoine Beaupré
On 2018-09-26 14:56:16, Daniel Lange wrote:

[...]

> In any case, a repo with just the split files but no maintained history clones
> in ~12s in the above test setup. It also brings the (bare) repo down from 
> 3,3GB
> to 189MB. So the issue is really the data/CVE/list file.

So I've looked in that problem as well, four months ago:

https://salsa.debian.org/security-tracker-team/security-tracker/issues/2

In there I proposed splitting the data/CVE/list file into "one file per
CVE". In retrospect, that was a rather naive approach and yielded all
sorts of problems: there were so many files that it create problems even
for the shell (argument list too long).

I hadn't thought of splitting things in "one *file* per year". That
could really help! Unfortunately, it's hard to simulate what it would
look like *14 years* from now (yes, that's how old that repo is
already).

I can think of two ways to simulate that:

 1. generate commits to recreate all files from scratch: parse
data/CVE/list, split it up into chunks, and add each CVE in one
separate commit. it's not *exactly* how things are done now, but it
should be a close enough approximation

 2. do a crazy filter-branch to send commits to the right
files. considering how long an initial clone takes, i can't even
begin to imagine how long *that* would take. but it would be the
most accurate simulation.

Short of that, I think it's somewhat dishonest to compare a clean
repository with split files against a repository with history over 14
years and thousands of commits. Intuitively, I think you're right and
that "sharding" the data in yearly packets would help a lot git's
performance. But we won't know until we simulate it, and if hit that
problem again 5 years from now, all that work will have been for
nothing. (Although it *would* give us 5 years...)

> That said, data/DSA/list is 14575 lines. That seems to not bother git too much
> yet. Still if things get re-structured, this file may be worth a look, too.

Yeah, I haven't had trouble with that one yet either.

> To me the most reasonable path forward unfortunately looks like start a new 
> repo
> for 2019+ and "just" import the split files or single-record files as 
> mentioned
> by pabs but not the git/svn/cvs history. The old repo would - of course - stay
> around but frozen at a deadline.

In any case, I personally don't think history over those files is that
critical. We rarely dig into that history because it's so
expensive... Any "git annotate" takes forever in this repo, and running
*that* it over data/CVE/list takes tens of minutes.

That said, once we pick a solution, we *could* craft a magic
filter-branch that *would* keep history. It might be worth eating that
performance cost then. I'll run some tests to see if I can make sense of
such a filter.

> Corsac also mentioned on IRC that the repo could be hosted outside of Gitlab.
> That would reduce the pressure for some time.
> But cgit and other git frontends (as well as backends) we tested also struggle
> with the repo (which is why my company, Faster IT GmbH, used the 
> security-tracker
> repo as a very welcome test case in the first place).
> So that would buy time but not be a solution long(er) term.

Agreed. I think the benefits of hosting on gitlab outweigh the trouble
in rearchitecturing our datastore. As I said, it's not just gitlab
that's struggling with a 17MB text file: git itself has trouble dealing
with it as well, and I am often frustrated by that in my work...

A.

-- 
You are absolutely deluded, if not stupid, if you think that a
worldwide collection of software engineers who can't write operating
systems or applications without security holes, can then turn around
and suddenly write virtualization layers without security holes.
- Theo de Raadt



Re: Dealing with renamed source packages during CVE triaging

2018-06-20 Thread Antoine Beaupré
On 2018-06-15 10:27:45, Moritz Muehlenhoff wrote:
> On Fri, Jun 15, 2018 at 04:34:14PM +1000, Brian May wrote:
>> Moritz Muehlenhoff  writes:
>> 
>> > On Wed, Jun 13, 2018 at 05:19:40PM +1000, Brian May wrote:

[...]

>> That generates a report of all packages that we need to check. I assume
>> we would need some way of marking packages that we have checked and
>> found to be not affected, so we can get a list of packages that need
>> immediate attention and don't repeatedly check the same package multiple
>> times. How should we do this? Maybe another file in the security tracker
>> repository?
>
> Maybe start with the script initially and see whether it's useful as an
> approach in general. State tracking can be discussed/added later.

Maybe the same principle applies as with the approach I considered. We
could have a --stop argument that would consider entries up to a certain
CVE number and ignore the rest of the file.

> Lots of the false positives will result from crappy/outdated entries
> in embedded-code-copies, so fixing those up will drastically reduce
> false positives.

If the embedded-code-copies is used more systematically, with a
semi-automated script, in the triaging process, we'll be more inclined
to keep it up to date as well so I think it would actually help with
that as well...

bam: do you want me to start working on that script or were you working
on this already?

Thanks for the feedback,

A.

-- 
Ils versent un pauvre miel sur leurs mots pourris et te parlent de pénurie
Et sur ta faim, sur tes amis, ils aiguisent leur appétit
- Richard Desjardins, La maison est ouverte



Re: Dealing with renamed source packages during CVE triaging

2018-06-08 Thread Antoine Beaupré
I've finalized a prototype during my research on this problem, which I
have detailed on GitLab, as it's really code that should be merged. It
would also benefit from wider attention considering it affects more than
LTS now. Anyways, the MR is here:

https://salsa.debian.org/security-tracker-team/security-tracker/merge_requests/4

Comments are welcome there or here.

For what it's worth, I reused Lamby's crude parser because I wanted to
get the prototype out the door. I am also uncertain that a full parser
can create the CVE/list file as is reliably without introducing
inconsistent diffs...

I also drifted into the core datastructures of the security tracker, and
wondered if it would be better to split up our large CVE/list file now
that we're using git. I had mixed results. For those interested, it is
documented here:

https://salsa.debian.org/security-tracker-team/security-tracker/issues/2

Cheers!

a.
-- 
If it's important for you, you'll find a way.
If it's not, you'll find an excuse.
- Unknown



Re: Dealing with renamed source packages during CVE triaging

2018-06-08 Thread Antoine Beaupré
On 2018-06-08 03:29:38, Brian May wrote:
> Antoine Beaupré  writes:
>
>> Right now, it seems that all scripts that hammer at those files do so
>> with their own ad-hoc parsing code. Is that the recommended way of
>> chopping those files up? Or is there a better parsing library out there?
>
> It sounds like we really good do with a good parsing library. Maybe one
> that supports making changes too.
>
> I could make a start on this.

As I mentioned in the other thread, I am uncertain where to go from
here. Some scripts use JSON, others parse the files by hand... I also
found out yesterday after writing this that there is *already* a parsing
library in the security tracker. It can parse {CVE,DSA,DLA}/list files
and lives in lib/python/bugs.py, but it's somewhat coupled with the
sqlite database - i'm not sure it's usable standalone.

But yeah, maybe clarifying all this stuff would help, for sure... I
would recommend not writing yet another library from scratch however, as
we probably have a dozen such parser already and it's confusing enough
as it is. ;)

a.
-- 
L'ennui avec la grande famille humaine, c'est que tout le monde veut
en être le père.
- Mafalda



Re: Dealing with renamed source packages during CVE triaging

2018-06-07 Thread Antoine Beaupré
Sorry for resurrecting this old thread, but I've been looking at how to
deal with renamed packages in CVE triaging again. When we last talked
about this, we observed how we were sometimes missing packages during
triage, e.g. `tiff3` that was present in wheezy. That's not an issue
anymore since wheezy is gone, but the problem occurs more broadly in
other packages.

In fact, it seems to me this is similar to the broader of embedded code
copies. We could generalize renamed packages to the embedded code copies
problem. We have a database of those in data/embedded-code-copies
already, although I'm not sure how up to date that file actually is, nor
how it is currently used in the workflow.

It seems to me any database of renames we could be would clearly overlap
with the embedded-code-copies file, so I figured I would write a
(Python, we already have Perl and bash ones...) to start with. I have
tried to upload this in a fork on salsa but gave up as push (of a single
commit!) was stuck "resolving deltas"... Anyways, here's the snippet:

https://salsa.debian.org/anarcat/security-tracker/snippets/70

The next step is to figure out how to actually modify the data/CVE/list
file to introduce the changes. Considering the large number of packages
in the embedded-code-copies file, I am not sure we want to retroactively
change all previous entries. jmm suggested we run a cronjob that would
keep track of where it is in history which would resolve this nicely.

One question that remains is what, exactly, to add in the CVE
metadata. One problem we faced last we looked at this is that we needed
to add an entry like:

   SOURCEPACKAGE 

... which would (e.g.) get triaged to:

   SOURCEPACKAGE 
   [wheezy] SOURCEPACKAGE  (or whatever)

... later on. This requires inside knowledge of the suites and their
packages, something I find surprisingly hard to do in the security
tracker.

With embedded-code-copies, we will have to add something for all the
other source packages, e.g.:

   OTHERSOURCE 

Right now, it seems that all scripts that hammer at those files do so
with their own ad-hoc parsing code. Is that the recommended way of
chopping those files up? Or is there a better parsing library out there?

Thanks for any advice,

A.



Re: pulling in other vulnerability databases

2018-01-26 Thread Antoine Beaupré
On 2018-01-26 00:31:19, Ben Hutchings wrote:
> On Thu, 2018-01-25 at 10:17 -0500, Antoine Beaupré wrote:
> [...]
>> > OS vendors (RH/SUSE)
>> > Upstream projects (Xen, Linux etc)
>> 
>> I believe those already follow the CVE process and eventually converge
>> over doing the right thing. So I am not really concerned about those
>> people.
> [...]
>
> Linux has a security contact (secur...@kernel.org), but this is only
> used for reporting bugs and discussing how to fix them; CVE assignments
> are left to distributions, DWF, etc.  Many security fixes don't get
> discussed there anyway.
>
> I would estimate that less than half of security fixes in Linux
> actually get CVE IDs.

Well that's just disturbing. I am not sure, however, that I can
meaningfully change this by ... er... say writing the kernel mailing
lists, unfortunately.

I haven't got a reply from Snyk.io (yet?) by the way. I suspect I may
not get anything at all... Any other ideas as to the next steps in
general here?

a.

-- 
May your trails be crooked, winding, lonesome, dangerous, leading to
the most amazing view. May your mountains rise into and above the
clouds.
- Edward Abbey



Re: pulling in other vulnerability databases

2018-01-25 Thread Antoine Beaupré
On 2018-01-25 16:27:58, Moritz Mühlenhoff wrote:
> Antoine Beaupré wrote:
>> So, regarding the first two (and similar), someone needs to teach those
>> folks about proper security tracking here... ;) Should I contact them
>> directly?
>
> Who in particular? Node and Snyk? Sure, go ahead.

I contacted snyk.io with mitre and the secteam in cc.

I'm not sure what to say to nodesecurity.io folks: there's a lot of
stuff in there, and I'm not sure it affects us so much. I couldn't find
a package without a CVE that is also in Debian in the first few pages of
their advosiry pages...

pabs, did you have any issues in mind that were problematic here
specifically?

in any case, feel free to reuse the message I sent to snyk as a template
for others...

a.



Re: improving the report-vuln script

2017-03-31 Thread Antoine Beaupré

Those changes all look good to me. Note that I worked on a separate git
repo and have more granular patches for my changes if that's useful.

Thanks!

-- 
I know where I am going, and I know the truth,
and I do not have to be what you want me to be.
I am free to be what I want.
 - Muhammad Ali



Re: improving the report-vuln script

2017-03-31 Thread Antoine Beaupré
On 2017-03-31 06:07:08, Salvatore Bonaccorso wrote:
> Hi Antoine,
>
> At the top of the this mail: fist of all thanks for your work on the
> report-vuln script. I use it extensively in it's current form in a two
> staged workflow. 1. generate a template for reportbug, then feed it
> int via reportbug -i ... and do some cleanups before sending the mail.

Interesting! I couldn't figure out how to use reportbug
productively... surely it would be nice to have a better tip for this in
the file itself...

> On Thu, Mar 30, 2017 at 03:04:39PM -0400, Antoine Beaupré wrote:

[...]

>>   --affected AFFECTED   affected version (default: unspecified)
>
> This one might be a bit confusing: Would you change it to --version?
> Does this makes sense? or is --version more confusing because
> interpreted as to show version of report-vuln? So we might leave it as
> --affected.

I found --version confusing, especially since there was already
"include_version" in the code (which was even more confusing, because it
actually meant "include blanks".

Furthermore, --version, by convention, shows the script version number
and users may not expect it to actually run the script (a bit like
--help).

>>   --severity SEVERITY   severity (default: grave)
>
> I'm not sure: I think severity important as default would be safer.

Sure. I just kept the existing default.

> The severity is often choosed after evaluation really to grave. But I
> think it should always be a deliberate choice to set a RC severity. So
> I would tend to set default severity to important. This might be
> disputed though. The current script actually (if --no-blanks is set)
> set it already to grave. So I'm proposing a change here.

I agree with the proposed change, makes sense. It's one of the reasons I
added the flag in the first place - I didn't want to file RC bugs for
issues that were no-dsa...

>>   --no-cc, --cc add X-Debbugs-CC header to
>>   --cc-list CCLIST  list of addres to add in CC (default:
>
> Typo: address? (but see below).

Typical. :)

>> ['t...@security.debian.org', 'secure-testing-
>> t...@lists.alioth.debian.org'])
>> 
>> I was not sure in which case multiple CVEs could be used - should we
>> report multiple CVEs in a single bug? that seems like bad
>> practice... IMHO, each bug should have its own CVE...
>
> It always depends! There are cases (and sometimes prferences ;-))
> where one fills multiple CVEs in one bugreport. This is perfectly
> fine. My approach for example is: if I know all reported CVEs are with
> known fixes, all supported versions in lower sites are affected as
> well for the same set of CVEs, then one bugeport is fine. Otherwise I
> try to split them up addording to the found/triaged versions as well
> making individual bugs for individual CVEs. But supporting reporting
> mutliple CVEs is perfectly fine.

Okay, this is the heuristic I was looking for. How about we explain this
in the "epilog" of the usage or at least somewhere?

>> Ideally, this script would interactively submit the bug and wait for
>> confirmation, but it seems this is something that is notoriously hard to
>> do in the Debian BTS - I personnally haven't found a way to promptly get
>> a bug identifier when submitting bugs other than waiting for the
>> autoreply to kick in, which can take some time.
>
> Possibly not feasable in the current form.

Right.

>> Since this is not something the LTS team directly uses all the time -
>> it's more something for the main security team - I figured it would be
>> safer to submit it for review here, especially since the changes are
>> rather large. Let me know if there's a better place to discuss this if
>> this isn't appropriate.
>
> Appreciated. I will try to use your new version and report back to
> you, can you please though wait before commiting.

No problem, of course. I wish I could have pushed this to a feature
branch or something... ;)

>> Attached is the patch for the new features and a fresh copy which is
>> only twice as long as the diff...
>
> Thanks. Handy :)

let's see...

>> --- ../secure-testing/bin/report-vuln2017-03-30 14:17:31.262515489 
>> -0400
>> +++ report-vuln  2017-03-30 14:53:50.614772451 -0400
>> @@ -19,6 +19,7 @@
>>  #
>>  # export http_proxy if you need to use an http proxy to report bugs
>>  
>> +import argparse
>>  import sys, re, urllib, os
>>  
>>  temp_id = re.compile('(?:CVE|cve)\-[0-9]{4}-')
>> @@ -112,7 +113,7 @@
>>  
>>  return ret + '\n'
>>  
>> -def gen_text(pkg, cveid, include_version = False, severity = 
>> 'FILLINSEVERITY'):
>&g

improving the report-vuln script

2017-03-30 Thread Antoine Beaupré
Hi,

I had some spare time today and I figured I would look at the backlog of
unreported vulnerabilities here:

https://security-tracker.debian.org/tracker/status/unreported

This is one of the housekeeping tasks we were told was useful for the
security team here:

https://wiki.debian.org/LTS/Development#Housekeeping_Tasks

The instructions for using the report-vuln script weren't quite
accurate: first it can't really be used with reportbug, from what I can
tell, at least not in a useful way.. Reportbug will still prompt for all
its usual stuff which makes it less effective.

Furthermore, the recommended use (in the script itself) of Mutt doesn't
add the necessary headers, and I found the "blanks" argument rather
confusing.

Therefore, I have rewritten parts of the script to make it easier to
automate. It's now possible to specify the severity and affected
versions directly on the commandline, and the usage is much clearer.

Before:

$ ./bin/report-vuln -h
./bin/report-vuln [--no-blanks]  

Now:

usage: report-vuln [-h] [--no-blanks] [--affected AFFECTED]
   [--severity SEVERITY] [--no-cc] [--cc-list CCLIST]
   pkg cve [cve ...]

positional arguments:
  pkg   affected package
  cve   relevant CVE for this issue, may be used multiple time
if the issue has multiple CVEs

optional arguments:
  -h, --helpshow this help message and exit
  --no-blanks, --blanks
include blank fields to be filled (default: False)
  --affected AFFECTED   affected version (default: unspecified)
  --severity SEVERITY   severity (default: grave)
  --no-cc, --cc add X-Debbugs-CC header to
  --cc-list CCLIST  list of addres to add in CC (default:
['t...@security.debian.org', 'secure-testing-
t...@lists.alioth.debian.org'])

I was not sure in which case multiple CVEs could be used - should we
report multiple CVEs in a single bug? that seems like bad
practice... IMHO, each bug should have its own CVE...

Ideally, this script would interactively submit the bug and wait for
confirmation, but it seems this is something that is notoriously hard to
do in the Debian BTS - I personnally haven't found a way to promptly get
a bug identifier when submitting bugs other than waiting for the
autoreply to kick in, which can take some time.

Since this is not something the LTS team directly uses all the time -
it's more something for the main security team - I figured it would be
safer to submit it for review here, especially since the changes are
rather large. Let me know if there's a better place to discuss this if
this isn't appropriate.

Attached is the patch for the new features and a fresh copy which is
only twice as long as the diff...

--- ../secure-testing/bin/report-vuln	2017-03-30 14:17:31.262515489 -0400
+++ report-vuln	2017-03-30 14:53:50.614772451 -0400
@@ -19,6 +19,7 @@
 #
 # export http_proxy if you need to use an http proxy to report bugs
 
+import argparse
 import sys, re, urllib, os
 
 temp_id = re.compile('(?:CVE|cve)\-[0-9]{4}-')
@@ -112,7 +113,7 @@
 
 	return ret + '\n'
 
-def gen_text(pkg, cveid, include_version = False, severity = 'FILLINSEVERITY'):
+def gen_text(pkg, cveid, blanks = False, severity = 'FILLINSEVERITY', affected=None, cc=False, cclist=None):
 	vuln_suff = 'y'
 	cve_suff = ''
 	time_w = 'was'
@@ -124,8 +125,13 @@
 		time_w = 'were'
 	
 	header = '''Package: %s\n''' % (pkg)
-	if include_version:
-		header += 'Version: FILLINAFFECTEDVERSION\n'
+	if affected is None:
+	if blanks:
+		header += "Version: FILLINAFFECTEDVERSION\n"
+else:
+header += "Version: %s\n" % affected
+if cc and len(cclist) > 0:
+header += "X-Debbugs-CC: %s\n" % " ".join(cclist)
 	header += '''Severity: %s
 Tags: security
 
@@ -160,31 +166,54 @@
 		print '\nhttps://security-tracker.debian.org/tracker/source-package/%s' % (pkg)
 		print '(issues without CVE id are assigned a TEMP one, but it may change over time)\n'
 
-	if not include_version:
-		print '''Please adjust the affected versions in the BTS as needed.\n'''
+	if not blanks:
+		print '''\nPlease adjust the affected versions in the BTS as needed.\n'''
 
 def error(msg):
 	print 'error: ' + msg
 	sys.exit(1)
 
-def usage():
-	print sys.argv[0], '[--no-blanks]  '
-	sys.exit(0)
+class NegateAction(argparse.Action):
+'''add a toggle flag to argparse
+
+this is similar to 'store_true' or 'store_false', but allows
+arguments prefixed with --no to disable the default. the default
+is set depending on the first argument - if it starts with the
+negative form (define by default as '--no'), the default is False,
+otherwise True.
+'''
+
+negative = '--no'
+
+def __init__(self, option_strings, *args, **kwargs):
+'''set default depending on the first argument'''
+default = not 

Bug#761945: fixing links for DLAs in the security tracker

2017-03-30 Thread Antoine Beaupré
On 2017-03-30 08:38:07, Salvatore Bonaccorso wrote:
> Hi Antoine,
>
> On Wed, Mar 29, 2017 at 03:49:31PM -0400, Antoine Beaupré wrote:
>> On 2017-03-29 17:02:44, Salvatore Bonaccorso wrote:
>> > Hi Antoine,
>> 
>> Hi!
>> 
>> > If you want to look at this part: There is a ./parse-dla.pl script in
>> > the webwml CVS, which is used to import the DLAs (this is an
>> > analogeous script to parse-advisory.pl which is used to import the
>> > DSAs).
>> 
>> I see... The scripts are in /english/security for anyone looking. And if
>> people are (like me) thinking "... wat.. CVS?" then yes, we are still
>> using this:
>> 
>> https://www.debian.org/devel/website/using_cvs
>> 
>> My cvs commandline finger memory is *definitely* still there though, so
>> that works for me. :)
>> 
>> > The "manual" steps one would perform are roughly:
>> >
>> > ./parse-dla.pl $message
>> > cvs add $year/dla-$nr.{wml,data}
>> > cvs commit -m '[DLA $nr] $source security update'
>> 
>> Is this something the security team performs as part of the DSA release
>> process? Or is this something the debian-www people do? I guess you need
>> write access to the repository and I see that *you* do, but is this
>> expected from everyone working on releasing public advisories, the same
>> way we need access to the security tracker?
>
> No it's not something we do as part of a regular DSA releasing
> process, and as well not expected to do so, as the websites are under
> debian-www "domain" (and btw, they do a great job!). But often, when I
> have time I do as well the import (but as you will see from cvs log,
> not always). For the security team the current process is: preparing
> the DSA (packages, tracker work, text, releasing packages), send out
> the advisory (at this stage our work is basically done).

Okay, so this is just something the www team needs to catchup with then...

>> And to import older entries, we'll need the original templates, which we
>> deliberately did *not* commit anywhere, so they are basically available
>> only as mailing list archives, and thus hard to find automatically.
>
> But given the debian-lts-announce is archived, shouldn't it be
> relatively easy to frist grab all announces from
> https://lists.debian.org/debian-lts-announce/ then check which one
> need to still be imported, extract the mail and do the import? Or do I
> missunderstand you?

I was assuming I was just web-browsing, but it's true I can probably
access the mailboxes and grep through this directly. It's still a pain
in the butt. ;)

>> I foresee difficulties in importing the missing data...
>> 
>> Here's the bits that are missing:
>> 
>>  * the last DLA on the website is DLA-445-2, which is basically the last
>>DLA before squeeze support ended and wheezy was handed over
>> 
>>  * among those 445 DLAs, there are actually 31 missing:
>> 
>>webwml$ cd english/security/; find -name 'dla-*.wml' | wc -l
>>424
>> 
>>  * even worse, it seems there are at least 20 advisories missing from
>>the website because regression uploads hide advisories, because our
>>naming convention differs from DSA ("DLA-XXX-N", where XXX is the
>>original advisory and N are regression updates)
>
> I do not understand this point. What do you mean by hinding? For DSA's
> as well only one https://www.debian.org/security/$year/dsa-$nr is ever
> visible as well (and it depends if the text has been then updated
> according to a regression update or not, and in DLAs cases I guess
> just only the last iteration might has been imported, not the initial
> -1 one).

Here's an example. DLA-445-1 was a Squid upload to squeeze, announced
here:

https://lists.debian.org/debian-lts-announce/2016/02/msg00037.html

it caused a regression, which was fixed in DLA-445-2, announced here:

https://lists.debian.org/debian-lts-announce/2016/03/msg1.html

You can see both DLAs in the sectracker:

https://security-tracker.debian.org/tracker/DLA-445-1
https://security-tracker.debian.org/tracker/DLA-445-2

(Note, BTW, that the regression update doesn't refer to the previous DLA
or the CVE, you just need to know the convention to figure that out.)

On the website, you only see the regression update:

https://www.debian.org/security/2016/dla-445

That is the equivalent of DLA-445-2. I am not sure that DLA-445-1 is
anywhere.

I guess that's another bug to report as well?

[...]

>> > having something on the debian- side which does this
>> > automatically, once a DSA or DLA arrives would help surely the
>> > debian-www team who 

Bug#761945: fixing links for DLAs in the security tracker

2017-03-29 Thread Antoine Beaupré
On 2017-03-29 17:02:44, Salvatore Bonaccorso wrote:
> Hi Antoine,

Hi!

> If you want to look at this part: There is a ./parse-dla.pl script in
> the webwml CVS, which is used to import the DLAs (this is an
> analogeous script to parse-advisory.pl which is used to import the
> DSAs).

I see... The scripts are in /english/security for anyone looking. And if
people are (like me) thinking "... wat.. CVS?" then yes, we are still
using this:

https://www.debian.org/devel/website/using_cvs

My cvs commandline finger memory is *definitely* still there though, so
that works for me. :)

> The "manual" steps one would perform are roughly:
>
> ./parse-dla.pl $message
> cvs add $year/dla-$nr.{wml,data}
> cvs commit -m '[DLA $nr] $source security update'

Is this something the security team performs as part of the DSA release
process? Or is this something the debian-www people do? I guess you need
write access to the repository and I see that *you* do, but is this
expected from everyone working on releasing public advisories, the same
way we need access to the security tracker?

And to import older entries, we'll need the original templates, which we
deliberately did *not* commit anywhere, so they are basically available
only as mailing list archives, and thus hard to find automatically.

I foresee difficulties in importing the missing data...

Here's the bits that are missing:

 * the last DLA on the website is DLA-445-2, which is basically the last
   DLA before squeeze support ended and wheezy was handed over

 * among those 445 DLAs, there are actually 31 missing:

   webwml$ cd english/security/; find -name 'dla-*.wml' | wc -l
   424

 * even worse, it seems there are at least 20 advisories missing from
   the website because regression uploads hide advisories, because our
   naming convention differs from DSA ("DLA-XXX-N", where XXX is the
   original advisory and N are regression updates)

   $ grep DLA- data/DLA/list | sed 's/.* DLA-//;s/ .*//' | sort -n | sed 
'/445-2/,$d' | wc -l
   465

 * the canonical list has 928 advisories:

   secure-testing$ grep DLA- data/DLA/list | wc -l 
   928

So, lots of work there.

> The background work leading to that was done by Frank Lichtenheld in
> #762255.

Great to see that! It does seem problematic to import regression updates
however.

> having something on the debian- side which does this
> automatically, once a DSA or DLA arrives would help surely the
> debian-www team who then "only" have to do the translations and fix
> obvious mistakes. OTOH keep in mind: When the debian- team imports
> a DSA or DLA they may need to do some adjustments so, I'm not sure if
> it's liked to have the automatism, since sometimes before cvs commit
> some changes need to be done on the .wml file. 

It looks like this is something that should be discussed with the www
people... Maybe a bug against www.debian.org?

This begs the question, however - wouldn't it be simpler to import those
advisories in the security tracker directly?

At least, we should figure out why the imports have ceased after
wheezy-LTS started...

> Writing the above a bit in a hurry let me know if unclear what I
> meant.

Thanks for the response!

A.

-- 
What this country needs is more unemployed politicians.
- Angela Davis



Bug#761945: fixing links for DLAs in the security tracker

2017-03-29 Thread Antoine Beaupré
On 2017-03-29 07:29:06, Salvatore Bonaccorso wrote:
> Hi,
>
> On Wed, Mar 29, 2017 at 06:28:49AM +0200, Salvatore Bonaccorso wrote:
>> Hi,
>> 
>> On Tue, Mar 28, 2017 at 10:16:52PM +, Holger Levsen wrote:
>> > On Tue, Mar 28, 2017 at 10:35:34PM +0200, Moritz Muehlenhoff wrote:
>> > > Well, you don't have a web site comparable to 
>> > > https://www.debian.org/security/2017/dsa-3796, so where should
>> > > it possibly link to?
>> >  
>> > I guess it's time to create this "web site" then :)
>> 
>> See as well https://bugs.debian.org/761945 (and respective clones for
>> debian-).
>
> The security-tracker side of this has been implemented now, Paul Wise
> did the corresponding work. But around 400 DLA's are not yet imported
> so many links will sow a page not found.
>
> A working example:
> https://security-tracker.debian.org/tracker/DLA-55-1 or
> https://security-tracker.debian.org/tracker/DLA-400-1

So I guess the next steps are for the LTS team:

 1. update the documentation so that updating the website is part of the
 workflow

 2. import old DLA advisories into the websites

I can try and complete this by the end of the week.

> p.s.: generally: for changes to the security-tracker, please do not use
>   debian-lts but rather the security-tracker list (or even
>   better/depending on case via bugreports).

Understood. I wasn't sure, but will try to keep that in mind in the
future...

A.

-- 
Rock journalism is people who can't write, interviewing people who can't
talk, in order to provide articles for people who can't read.
- Frank Zappa



Bug#776738: security-tracker: ignoring end-of-life packages (was: squeeze/wheezy updates of Redmine (+ long term state of redmine packaging))

2016-01-01 Thread Antoine Beaupré
On 2016-01-01 11:30:36, Raphael Hertzog wrote:
> Hi,
>
> On Thu, 31 Dec 2015, Antoine Beaupré wrote:
>> > I have thus pushed the attached patch to the git repository of
>> > debian-security-support. Ccing the security team to inform them
>> > of this change.
>> 
>> Understood.
>> 
>> Would it make sense to fix the secure-testing tracker to show those? It
>> would have stopped me right there, if it would have shown the older
>> versions were unsupported...
>
> Yes, I agree that the tracker should know which version is end-of-life
> and no longer supported... and we should not have to re-add this information
> every time.
>
> The only downside is that we would be less likely to properly record when
> issues do not apply to the unsupported packages.
>
> In fact I would have expected to have a bug report open here:
> https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=security-tracker;dist=unstable
>
> But I don't see anything that looks close yet. Maybe we should start by
> requesting this first...

Well, there are a few similar issues filed:

 * #772961: Please display no-dsa/end-of-life tags in "open issues"
   table in "source package view"

 * #776738: end-of-life issues clutter the list of open issues

 * #780892: please show unsupported packages as unsupported instead of
   unimportant

#776738 seems to be the most relevant to me, so I added it in CC. The
security team (in CC) also suggested way this could be implemented
privately, but i will let them paste that here.

A.
-- 
O gentilshommes, la vie est courte.
Si nous vivons, nous vivons 
pour marcher sur la tête des rois.
- William Shakespeare