Bug#1042947: UDD: create a duck importer

2023-10-26 Thread Baptiste Beauplat
Hi Lucas,

On Wed, 2023-10-25 at 20:57 +0200, Lucas Nussbaum wrote:
> On 08/08/23 at 06:42 +0200, Lucas Nussbaum wrote:
> > Hi Baptiste,
> > 
> > On 07/08/23 at 22:07 +0200, Baptiste Beauplat wrote:
> > > Hi Lucas,
> > > 
> > > On 2023-08-03 10:30, Lucas Nussbaum wrote:
> > > > duck-as-a-service (duck.debian.net) has been broken for a long
> > > > time,
> > > > and
> > > > the corresponding UDD importer is broken as well (see #949009,
> > > > #963887).
> > > > In the meantime, duck continued evolving (was rewritten?) and
> > > > is now
> > > > checking a lot more places for URLs.
> > > > 
> > > > It would probably be useful to re-create a way to provide duck
> > > > results
> > > > as a service, based on UDD, similarly to what is done for
> > > > upstream or
> > > > lintian data.
> > > > 
> > > > Ideally, this would be done in cooperation with the duck
> > > > maintainer
> > > > to
> > > > do the following changes:
> > > > - in duck, separate the logic to get URLs from sources, from
> > > > the
> > > > logic
> > > >   to check those URLs (for example, allow dumping a list of
> > > > URLs, and
> > > >   also using a list of URLs as source)
> > > > - in duck, provide machine-readable outputs (JSON?)
> > > 
> > > Currently duck has two features which can help us:
> > > 
> > > - The `-n` switch, which gets all URLs and prints them to stdout
> > > - The `-l filename` switch, which takes a file with one URL per
> > > line
> > > and checks them
> > > 
> > > Theoretically, what's missing in only a `--json` switch, which
> > > would
> > > change the output from console/text to JSON.
> > > 
> > > But, as I see it, the `-l` argument is limited in two aspects:
> > > 
> > > - It provides only the URL, loosing the checker type which is
> > > used to
> > > select what kind of validation will be performed.
> > > 
> > >   For instance, a https://salsa.debian.org/rfrancoise/tmux.git of
> > > type
> > > VCS-Git would be tested as a standard URL in the `-l` context,
> > > instead
> > > of a git repository.
> > > 
> > > - It requires a file
> > > 
> > > I'm thinking of implementing a new JSON specific input format
> > > (`--input-json`?), including the two information, which would
> > > read from
> > > stdout instead of a file.
> > > 
> > > The format would be as simple as:
> > > 
> > > ```json
> > > [
> > >    {"type": "VCS-Git",
> > >     "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
> > >     "filename": "debian/control",  # optional key
> > >     "line_number": 10},    # optional key
> > >    ...
> > > ]
> > > ```
> > > 
> > > Following this logic, the output format for checking URLs would
> > > be the
> > > same, as to have `duck --json -n | duck --input-json` working.
> > > 
> > > The JSON result would hold an additional dictionary for each URL
> > > entries
> > > named "result", described as follows:
> > > 
> > > ```json
> > > [
> > >    {"type": "VCS-Git",
> > >     "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
> > >     "filename": "debian/control",  # optional key
> > >     "line_number": 10, # optional key
> > >     "result": {
> > >    "state": 0,  # 0 for OK, 1 for Error, 2 for Information
> > >    "detail": "Informative message",
> > >    "certainty": "possible" # optional key
> > >    }},
> > >    ...
> > > ]
> > > ```
> > > 
> > > Let me know what you think of it.
> > 
> > That would be perfect!
> > 
> > In the context of UDD, I will probably implement that as two
> > tables:
> > - one to store the mapping between source packages and urls
> >   (source, version, url, type, filename, line_number)
> >   which would be updated when a new source version gets uploaded
> > - one to store the status of urls
> >   (url, type, result, timestamp of last check)
> >   which would be updated with a retry policy to be defined
> > 
> > I would not use (filename, line_number) in the input of the URL
> > testing part.
> > The reason for that design is that it will easily allow to gather
> > the
> > status for several versions of the package (testing + unstable +
> > experimental for example), while not duplicating the checks for
> > URLs.
> 
> Just checking: did you make progress on this?

Sort of.

I could not see a clean way to add this feature without a total rewrite
of duck. So that's what I've started, and I'm making steady progress on
that front.

However, I have not started working on the json interface
implementation just yet.

I'll keep you posted once I have a working version of that.

Best,
-- 
Baptiste Beauplat



signature.asc
Description: This is a digitally signed message part


Bug#1042947: UDD: create a duck importer

2023-10-25 Thread Lucas Nussbaum
On 08/08/23 at 06:42 +0200, Lucas Nussbaum wrote:
> Hi Baptiste,
> 
> On 07/08/23 at 22:07 +0200, Baptiste Beauplat wrote:
> > Hi Lucas,
> > 
> > On 2023-08-03 10:30, Lucas Nussbaum wrote:
> > > duck-as-a-service (duck.debian.net) has been broken for a long time,
> > > and
> > > the corresponding UDD importer is broken as well (see #949009,
> > > #963887).
> > > In the meantime, duck continued evolving (was rewritten?) and is now
> > > checking a lot more places for URLs.
> > > 
> > > It would probably be useful to re-create a way to provide duck
> > > results
> > > as a service, based on UDD, similarly to what is done for upstream or
> > > lintian data.
> > > 
> > > Ideally, this would be done in cooperation with the duck maintainer
> > > to
> > > do the following changes:
> > > - in duck, separate the logic to get URLs from sources, from the
> > > logic
> > >   to check those URLs (for example, allow dumping a list of URLs, and
> > >   also using a list of URLs as source)
> > > - in duck, provide machine-readable outputs (JSON?)
> > 
> > Currently duck has two features which can help us:
> > 
> > - The `-n` switch, which gets all URLs and prints them to stdout
> > - The `-l filename` switch, which takes a file with one URL per line
> > and checks them
> > 
> > Theoretically, what's missing in only a `--json` switch, which would
> > change the output from console/text to JSON.
> > 
> > But, as I see it, the `-l` argument is limited in two aspects:
> > 
> > - It provides only the URL, loosing the checker type which is used to
> > select what kind of validation will be performed.
> > 
> >   For instance, a https://salsa.debian.org/rfrancoise/tmux.git of type
> > VCS-Git would be tested as a standard URL in the `-l` context, instead
> > of a git repository.
> > 
> > - It requires a file
> > 
> > I'm thinking of implementing a new JSON specific input format
> > (`--input-json`?), including the two information, which would read from
> > stdout instead of a file.
> > 
> > The format would be as simple as:
> > 
> > ```json
> > [
> >    {"type": "VCS-Git",
> >     "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
> >     "filename": "debian/control",  # optional key
> >     "line_number": 10},    # optional key
> >    ...
> > ]
> > ```
> > 
> > Following this logic, the output format for checking URLs would be the
> > same, as to have `duck --json -n | duck --input-json` working.
> > 
> > The JSON result would hold an additional dictionary for each URL
> > entries
> > named "result", described as follows:
> > 
> > ```json
> > [
> >    {"type": "VCS-Git",
> >     "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
> >     "filename": "debian/control",  # optional key
> >     "line_number": 10, # optional key
> >     "result": {
> >    "state": 0,  # 0 for OK, 1 for Error, 2 for Information
> >    "detail": "Informative message",
> >    "certainty": "possible" # optional key
> >    }},
> >    ...
> > ]
> > ```
> > 
> > Let me know what you think of it.
> 
> That would be perfect!
> 
> In the context of UDD, I will probably implement that as two tables:
> - one to store the mapping between source packages and urls
>   (source, version, url, type, filename, line_number)
>   which would be updated when a new source version gets uploaded
> - one to store the status of urls
>   (url, type, result, timestamp of last check)
>   which would be updated with a retry policy to be defined
> 
> I would not use (filename, line_number) in the input of the URL
> testing part.
> The reason for that design is that it will easily allow to gather the
> status for several versions of the package (testing + unstable +
> experimental for example), while not duplicating the checks for URLs.

Hi Baptiste,

Just checking: did you make progress on this?

Lucas



Bug#1042947: UDD: create a duck importer

2023-08-07 Thread Lucas Nussbaum
Hi Baptiste,

On 07/08/23 at 22:07 +0200, Baptiste Beauplat wrote:
> Hi Lucas,
> 
> On 2023-08-03 10:30, Lucas Nussbaum wrote:
> > duck-as-a-service (duck.debian.net) has been broken for a long time,
> > and
> > the corresponding UDD importer is broken as well (see #949009,
> > #963887).
> > In the meantime, duck continued evolving (was rewritten?) and is now
> > checking a lot more places for URLs.
> > 
> > It would probably be useful to re-create a way to provide duck
> > results
> > as a service, based on UDD, similarly to what is done for upstream or
> > lintian data.
> > 
> > Ideally, this would be done in cooperation with the duck maintainer
> > to
> > do the following changes:
> > - in duck, separate the logic to get URLs from sources, from the
> > logic
> >   to check those URLs (for example, allow dumping a list of URLs, and
> >   also using a list of URLs as source)
> > - in duck, provide machine-readable outputs (JSON?)
> 
> Currently duck has two features which can help us:
> 
> - The `-n` switch, which gets all URLs and prints them to stdout
> - The `-l filename` switch, which takes a file with one URL per line
> and checks them
> 
> Theoretically, what's missing in only a `--json` switch, which would
> change the output from console/text to JSON.
> 
> But, as I see it, the `-l` argument is limited in two aspects:
> 
> - It provides only the URL, loosing the checker type which is used to
> select what kind of validation will be performed.
> 
>   For instance, a https://salsa.debian.org/rfrancoise/tmux.git of type
> VCS-Git would be tested as a standard URL in the `-l` context, instead
> of a git repository.
> 
> - It requires a file
> 
> I'm thinking of implementing a new JSON specific input format
> (`--input-json`?), including the two information, which would read from
> stdout instead of a file.
> 
> The format would be as simple as:
> 
> ```json
> [
>    {"type": "VCS-Git",
>     "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
>     "filename": "debian/control",  # optional key
>     "line_number": 10},    # optional key
>    ...
> ]
> ```
> 
> Following this logic, the output format for checking URLs would be the
> same, as to have `duck --json -n | duck --input-json` working.
> 
> The JSON result would hold an additional dictionary for each URL
> entries
> named "result", described as follows:
> 
> ```json
> [
>    {"type": "VCS-Git",
>     "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
>     "filename": "debian/control",  # optional key
>     "line_number": 10, # optional key
>     "result": {
>    "state": 0,  # 0 for OK, 1 for Error, 2 for Information
>    "detail": "Informative message",
>    "certainty": "possible" # optional key
>    }},
>    ...
> ]
> ```
> 
> Let me know what you think of it.

That would be perfect!

In the context of UDD, I will probably implement that as two tables:
- one to store the mapping between source packages and urls
  (source, version, url, type, filename, line_number)
  which would be updated when a new source version gets uploaded
- one to store the status of urls
  (url, type, result, timestamp of last check)
  which would be updated with a retry policy to be defined

I would not use (filename, line_number) in the input of the URL
testing part.
The reason for that design is that it will easily allow to gather the
status for several versions of the package (testing + unstable +
experimental for example), while not duplicating the checks for URLs.

Lucas



Bug#1042947: UDD: create a duck importer

2023-08-07 Thread Baptiste Beauplat
Hi Lucas,

On 2023-08-03 10:30, Lucas Nussbaum wrote:
> duck-as-a-service (duck.debian.net) has been broken for a long time,
> and
> the corresponding UDD importer is broken as well (see #949009,
> #963887).
> In the meantime, duck continued evolving (was rewritten?) and is now
> checking a lot more places for URLs.
> 
> It would probably be useful to re-create a way to provide duck
> results
> as a service, based on UDD, similarly to what is done for upstream or
> lintian data.
> 
> Ideally, this would be done in cooperation with the duck maintainer
> to
> do the following changes:
> - in duck, separate the logic to get URLs from sources, from the
> logic
>   to check those URLs (for example, allow dumping a list of URLs, and
>   also using a list of URLs as source)
> - in duck, provide machine-readable outputs (JSON?)

Currently duck has two features which can help us:

- The `-n` switch, which gets all URLs and prints them to stdout
- The `-l filename` switch, which takes a file with one URL per line
and checks them

Theoretically, what's missing in only a `--json` switch, which would
change the output from console/text to JSON.

But, as I see it, the `-l` argument is limited in two aspects:

- It provides only the URL, loosing the checker type which is used to
select what kind of validation will be performed.

  For instance, a https://salsa.debian.org/rfrancoise/tmux.git of type
VCS-Git would be tested as a standard URL in the `-l` context, instead
of a git repository.

- It requires a file

I'm thinking of implementing a new JSON specific input format
(`--input-json`?), including the two information, which would read from
stdout instead of a file.

The format would be as simple as:

```json
[
   {"type": "VCS-Git",
    "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
    "filename": "debian/control",  # optional key
    "line_number": 10},    # optional key
   ...
]
```

Following this logic, the output format for checking URLs would be the
same, as to have `duck --json -n | duck --input-json` working.

The JSON result would hold an additional dictionary for each URL
entries
named "result", described as follows:

```json
[
   {"type": "VCS-Git",
    "url": "https://salsa.debian.org/rfrancoise/tmux.git;,
    "filename": "debian/control",  # optional key
    "line_number": 10, # optional key
    "result": {
   "state": 0,  # 0 for OK, 1 for Error, 2 for Information
   "detail": "Informative message",
   "certainty": "possible" # optional key
   }},
   ...
]
```

Let me know what you think of it.

> Then UDD could process source packages to extract URLs, check those
> URLs
> on a regular basis (similarly to what is done for lintian), and
> publish/export the results in all relevant places.

Best,
-- 
Baptiste Beauplat



signature.asc
Description: This is a digitally signed message part


Bug#1042947: UDD: create a duck importer

2023-08-03 Thread Lucas Nussbaum
Package: qa.debian.org
Severity: wishlist
User: qa.debian@packages.debian.org
Usertags: udd

duck-as-a-service (duck.debian.net) has been broken for a long time, and
the corresponding UDD importer is broken as well (see #949009, #963887).
In the meantime, duck continued evolving (was rewritten?) and is now
checking a lot more places for URLs.

It would probably be useful to re-create a way to provide duck results
as a service, based on UDD, similarly to what is done for upstream or
lintian data.

Ideally, this would be done in cooperation with the duck maintainer to
do the following changes:
- in duck, separate the logic to get URLs from sources, from the logic
  to check those URLs (for example, allow dumping a list of URLs, and
  also using a list of URLs as source)
- in duck, provide machine-readable outputs (JSON?)

Then UDD could process source packages to extract URLs, check those URLs
on a regular basis (similarly to what is done for lintian), and
publish/export the results in all relevant places.

- Lucas