Le Wed, Aug 05, 2009 at 12:05:53PM +0200, Steffen Moeller a écrit : > Andreas Tille wrote: > > On Wed, Aug 05, 2009 at 11:38:02AM +0900, Charles Plessy wrote: > >> I have been thinking a bit on the issue. How about the following workflow: > >> > >> - Create a new file with a ???Name: contents??? field syntax in the > >> Debian source > >> packages, for ???online meta-data??? that typically require internet > >> access to > >> be useful. > > > > Sounds reasonable. > > I agree. > > Could we somehow prototype what we want to achieve?
> Could you pair that with an incremental implementation plan? And ask for help > were you > want help? Dear all, it took some time, but I have now a more concrete proposal. First of all, let's summarise the situation. We want to integrate some metadata in our “web sentinels”, like ‘http://debian-med.alioth.debian.org/tasks/bio’. The simplest for creating these pages is to centralise all the information in the Ultimate Debian Database (http://udd.debian.org/). Typical metadata is bibliographic information or registration URL. The UDD is fed with tables that have to be deposited in a trusted location. The issue is how to prepare the tables with data collected by multiple package maintainers. What I propose is to have a special file in the source packages for gathering all possible useful informations, debian/upstream-metadata.yaml. In contrary to debian/control, this file would not contribute data to the Packages.gz files of the Debian archive. I think that there are enough source packages managed in version control systems that we can use them as the main source of our data. This makes debian/upstream-metadata.yaml available indendantly of the Debian archive, and more importantly, will allow to update the metadata without uploading the package, but in a way that only the maintainers can do the update, which keeps things under control. The missing piece of the puzzle is then an aggregator that would collect the information from the source packages and prepare tables for the UDD. I am drafting such a program at http://upstream-metadata.debian.net/. Currently, it does not do much: http://upstream-metadata.debian.net/<package>/ALL gets debian/upstream-metadata.yaml if the package is in a subversion server that is available to ’debcheckout’. Luckily, most of our packages are. http://upstream-metadata.debian.net/<package>/<key> gives the content of the metadata for one key. For instance, http://upstream-metadata.debian.net/samtools/PMID gives the PubMed identification number for the article describing SamTools, 19505943. This is the proof or principle for data retreival. Then, we need to construct the tables. I plan to have the program store the results in a BerkeleyDB database, and to make it output tables at constant intervals, for instance daily. The update of the internal database would we done in two ways. First, updates could be pushed with commit hooks when package maintainers commit changes to debian/upstream-metadata.yaml. It could be as simple as having an url that triggers an update, and using wget or curl to activate the aggregator. Second, normal read access could trigger an update if the record is getting old. In summary, I propose to store metadata in YAML format in the source pacakges, retreive and store it in a central place using a web agent through the VCS in which the source packages are stored, and periodically output tables for the UDD, which keeps a central role for the generation of our web sentinel pages. The proof of principle presented above is only a few lines of code, but I would prefer discuss further the idea before putting more time on it. Lastly, I have accumulated a dozen of debian/upstream-metadata.yaml files in the packages I maintain, so that meaningful tests are doable for table generation later. I do not remember the list by heart, but it contains seaview, bwa, clustalw, clustalx, perlprimer, samtools, and most of the packages I have updated recently. Since I am quite unexperienced in programming, help is of course most welcome. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-med-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org