Re: Mapping Reproducibility Bug Reports to Commits

2021-11-15 Thread Johannes Schauer Marin Rodrigues
Hi Muhammad,

others already explained how packaging VCS are (sadly) basically a free-for-all
in Debian and that you will probably not get anything better than some
heuristics.  I wanted to add some more ideas to the ones that were already
presented. So in addition to what was already said you can also try any of the
following:

1. If the packaging is on salsa and the commit contains a "closes: " line,
   then the bug will contain a message like this one which will let you
   directly identify the commit that fixed the bug:
   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=907352#39

2. If the changelog entry only closes the reproducible bug and nothing else,
   then you can use snapshot.d.o and then debdiff the version that closed the
   bug with the version before that. This method will work even for packages
   that are not using any VCS.

3. If the changelog closes multiple bugs but also points out *who* closed the
   reproducible bug and that person changed nothing else according to
   d/changelog then it's also easy to find the commit. This of course only
   works if the package does use a VCS and if your tools can detect and
   understand the specific packaging style that was used.

4. There were GSoC projects involving reproducible builds. For example Maria
   Valentina Marin Rodrigues contributed back in 2015 and if you find a commit
   of her in packaging repos it will be fixing a reproducible builds bug. There
   might be more GSoC students for which you can apply a similar approach.

Just my 2c.

Thanks!

cheers, josch

signature.asc
Description: signature


Re: Mapping Reproducibility Bug Reports to Commits

2021-11-14 Thread peter green


I am a researcher at the University of Waterloo, conducting a project to study 
reproducibility issues in Debian packages.

The first step for me is to link each Reproducibility-related bug at this link: 
https://bugs.debian.org/cgi-bin/pkgreport.cgi?usertag=reproducible-bui...@lists.alioth.debian.org
 to the corresponding commit that fixed the bug.

However, I am unable to find an explicit way of doing so programatically. 
Please assist.


There is no explicit link.

Most (but not all) debian packages are maintained in a VCS and there are fields 
in the source package
that identify the location and type of the VCS (almost certainly git nowadays), 
but there are multiple
different workflows used (git-buildpackage is the most common and normally uses a 
"patches-unapplied"
git tree, but there is also dgit which uses a "patches applied" git tree. Git 
trees may or may not
contain the upstream source. At least one language community uses a system 
where the git tree stores
files that are used to generate the Debian packaging rather than the final 
Debian packaging itself.

Also maintainer practices for strucuring commits vary, some maintainers update 
the changelog at the same
time as making the actual changes, others update the changelog in a batch later.

Sometimes bugs aren't even closed from the changelog at all but instead are 
closed by the maintainer
after the upload. Particularly if the maintainer is not sure whether a change 
will fix the bug.

With all that said, it's probably doable to develop heuristics that map bug 
numbers to commits in most
cases, an outline might be.

* Check if the package has a VCS and the relavent changelog can be found in 
said VCS, if there is no VCS give up and reffer the bug for human attention.
* Map the bug number to a changelog line (if there is no such mapping, give up 
and reffer the bug for human attention)
* Determine which commit added the changelog line (e.g. with git blame), see if 
there are actual code changes in that commit,
* if so take it as the probable commit, if not then search backwards a bit for 
a commit message that matches
the changelog line.

Another option having guessed a range of commits from the changelog and/or from 
comparing the VCS to the
source packages may be to run a bisection, this would likely require some 
effort to detect what workflow
is in use though.



Re: Mapping Reproducibility Bug Reports to Commits

2021-11-14 Thread Andrey Rahmatullin
On Sun, Nov 14, 2021 at 05:53:24PM +, Muhammad Hassan wrote:
> Hi all,
> 
> I am a researcher at the University of Waterloo, conducting a project to 
> study reproducibility issues in Debian packages.
> 
> The first step for me is to link each Reproducibility-related bug at this 
> link: 
> https://bugs.debian.org/cgi-bin/pkgreport.cgi?usertag=reproducible-bui...@lists.alioth.debian.org
>  to the corresponding commit that fixed the bug.
This task implies that packaging for all Debian packages is stored in some
VCS, those repos are public, specific commits are marked as fixing
specific bugs and probably, depending on the specifics of the task, that
commits are granular. None of this is true. The best you can do is find
which uploads closed those bugs, using their "fixed in version" data.


-- 
WBR, wRAR


signature.asc
Description: PGP signature


Mapping Reproducibility Bug Reports to Commits

2021-11-14 Thread Muhammad Hassan
Hi all,

I am a researcher at the University of Waterloo, conducting a project to study 
reproducibility issues in Debian packages.

The first step for me is to link each Reproducibility-related bug at this link: 
https://bugs.debian.org/cgi-bin/pkgreport.cgi?usertag=reproducible-bui...@lists.alioth.debian.org
 to the corresponding commit that fixed the bug.

However, I am unable to find an explicit way of doing so programatically. 
Please assist.



Regards,

Muhammad Hassan
M.Math (Computer Science) || University of Waterloo

[https://uwaterloo.ca/brand/sites/ca.brand/files/styles/body-500px-wide/public/uploads/images/logo-vertical-.jpg?itok=rBgJ8S9H]

(+1) 548 994 4717 / LinkedIn