[udd] injecting debtags information

2008-09-17 Thread Stefano Zacchiroli
[ Following a suggestion by Lucas, we are moving to the debian-qa
  mailing list the discussions about UDD, the Ultimate Debian Database;
  shout if you object to this choice. ]

Any objection about injecting debtags information into UDD?

It will be really helpful for making queries able to filter packages on
the tags basis.

Beside that, the obvious question is of course how to do that, AFAIK
currently tags are not tied specifically to any particular version or
suite of (binary) packages, so it would be enough to use as a key a
table indexed by (binary) package names.

That said, we have of course the choice between just having one tuple
per package, with a field containing all associated tags (e.g.,
comma-separated), and having one tuple for each pair .

I tend to prefer the latter, as we have already expressed relational
algebra the different tags for each package, without needing to resort
to sub-string machinery.

To the debtags people: what would be a daily updated source of
information we can download to have up to date debtags data?

Cheers.

-- 
Stefano Zacchiroli -*- PhD in Computer Science \ PostDoc @ Univ. Paris 7
[EMAIL PROTECTED],pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
I'm still an SGML person,this newfangled /\ All one has to do is hit the
XML stuff is so ... simplistic  -- Manoj \/ right keys at the right time


signature.asc
Description: Digital signature


Re: [udd] injecting debtags information

2008-09-17 Thread Lucas Nussbaum
On 17/09/08 at 20:39 +0200, Stefano Zacchiroli wrote:
> That said, we have of course the choice between just having one tuple
> per package, with a field containing all associated tags (e.g.,
> comma-separated), and having one tuple for each pair .
> 
> I tend to prefer the latter, as we have already expressed relational
> algebra the different tags for each package, without needing to resort
> to sub-string machinery.

me too.
-- 
| Lucas Nussbaum
| [EMAIL PROTECTED]   http://www.lucas-nussbaum.net/ |
| jabber: [EMAIL PROTECTED] GPG: 1024D/023B3F4F |


signature.asc
Description: Digital signature


Re: [udd] injecting debtags information

2008-09-18 Thread Enrico Zini
On Wed, Sep 17, 2008 at 08:39:55PM +0200, Stefano Zacchiroli wrote:

> To the debtags people: what would be a daily updated source of
> information we can download to have up to date debtags data?

At the moment, these two (that need to be merged together):

  svn://svn.debian.org/debtags/tagdb/tagdb
  svn://svn.debian.org/svn/secure-testing/data/package-tags

They are not updated daily: the debtags/tagdb one is updated whenever I
do a manual review, and the secure-testing one is updated whenever the
security team commits anything to it.

I have scripts that take data from both sources, merge them and upload
it as tag overrides.  I have been meaning for a long time to also
publish the merged dataset somewhere, but I've never had the excuse to
push me to do it.  This could be the excuse: let me know if you have
special requirements, otherwise I was thinking of publishing it
similarly to what you find in http://debtags.debian.org/tags/


Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <[EMAIL PROTECTED]>


signature.asc
Description: Digital signature


Re: [udd] injecting debtags information

2008-09-18 Thread Stefano Zacchiroli
On Thu, Sep 18, 2008 at 11:01:58AM +0100, Enrico Zini wrote:
> At the moment, these two (that need to be merged together):
> 
>   svn://svn.debian.org/debtags/tagdb/tagdb

This one does not exist, but there is
  svn://svn.debian.org/debtags/tagdb/tags
which I'm quite sure is what you meant. The format looks obvious ...

>   svn://svn.debian.org/svn/secure-testing/data/package-tags

.. while the format of this one is obscure to me, as I don't see any
tags in it, but only something like sections.

> They are not updated daily: the debtags/tagdb one is updated whenever I
> do a manual review, and the secure-testing one is updated whenever the
> security team commits anything to it.

This is not a problem, we will need something to download routinely, if
you update it less frequently it is not a big deal. After all we are all
happily living with sporadicly updated tag overrides in aptitude, and it
is just fine.

> I have scripts that take data from both sources, merge them and upload
> it as tag overrides.  I have been meaning for a long time to also
> publish the merged dataset somewhere, but I've never had the excuse to
> push me to do it.  This could be the excuse: let me know if you have
> special requirements, otherwise I was thinking of publishing it
> similarly to what you find in http://debtags.debian.org/tags/

Yes, the merged data set would be wonderful.

It looks like that http://debtags.alioth.debian.org/tags/tags-current.gz
has the very same format of the first URL you mentioned, so maybe we can
start injecting tags-current.gz and switch to the URL of the merged
dataset as soon as it is available. Would it be OK with you?

Any other drawback in polling against tags-current.gz rather than
against tags from svn?

Finally, my understanding is that the only tags coming from
testing-security are those related to the security support of a given
package, and that all other tags are coming from the debtags alioth
project. Is this correct?

Cheers.

-- 
Stefano Zacchiroli -*- PhD in Computer Science \ PostDoc @ Univ. Paris 7
[EMAIL PROTECTED],pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
I'm still an SGML person,this newfangled /\ All one has to do is hit the
XML stuff is so ... simplistic  -- Manoj \/ right keys at the right time


signature.asc
Description: Digital signature


Re: [udd] injecting debtags information

2008-09-18 Thread Enrico Zini
On Thu, Sep 18, 2008 at 04:42:02PM +0200, Stefano Zacchiroli wrote:
> On Thu, Sep 18, 2008 at 11:01:58AM +0100, Enrico Zini wrote:
> > At the moment, these two (that need to be merged together):
> >   svn://svn.debian.org/debtags/tagdb/tagdb
> This one does not exist, but there is
>   svn://svn.debian.org/debtags/tagdb/tags
> which I'm quite sure is what you meant. The format looks obvious ...

Yes indeed, sorry.

> >   svn://svn.debian.org/svn/secure-testing/data/package-tags
> .. while the format of this one is obscure to me, as I don't see any
> tags in it, but only something like sections.

I have a script that parses it and it has a bit of magic inside, so I
don't suggest you reimplement it but I suggest you get the parsed merged
data from me.


> > I have scripts that take data from both sources, merge them and upload
> > it as tag overrides.  I have been meaning for a long time to also
> > publish the merged dataset somewhere, but I've never had the excuse to
> > push me to do it.  This could be the excuse: let me know if you have
> > special requirements, otherwise I was thinking of publishing it
> > similarly to what you find in http://debtags.debian.org/tags/
> Yes, the merged data set would be wonderful.

Ok, I'll do it and let you know when I'm done.  Ping me on Jabber if you
don't see me doing it, as I'm in a rather busy period atm.

> It looks like that http://debtags.alioth.debian.org/tags/tags-current.gz
> has the very same format of the first URL you mentioned, so maybe we can
> start injecting tags-current.gz and switch to the URL of the merged
> dataset as soon as it is available. Would it be OK with you?

It is ok, but I suggest you inject from the svn version instead.

> Any other drawback in polling against tags-current.gz rather than
> against tags from svn?

Yes: tags-current.gz is unreviewed, and it gets whatever is added by
anonymous people on the debtags web tagging interface.  The version in
subversion instead is manually reviewed, and more reliable.

> Finally, my understanding is that the only tags coming from
> testing-security are those related to the security support of a given
> package, and that all other tags are coming from the debtags alioth
> project. Is this correct?

It is indeed correct.


Ciao,

Enrico

-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <[EMAIL PROTECTED]>


signature.asc
Description: Digital signature