Hello Andrius, You politely and confidently invited comments.
1.) Thanks! 2.) I like your use of the concise "TL;DR" (ie: "Too Long; Don't Read"). 3.) Sometimes I shorten it even more, to "TLDR". 4.) How is the PDB Chemical Component Dictionary licensed, if at all? Is it in the public domain? 5.) Could it somehow compliment a big computer spread sheet summarizing life span experiments, where a column named "intervention" contains the names of molecules, like "vitamin c" and "glycine"[1]? 6.) Could debian packages have version numbers based on dates the CCD was downloaded? Thanks, Kingsley [1] World's biggest collection of the results of life span experiments https://kingsleymorse.ch/life_extension.html#preprocessed_life_span_data On 06/23/2023 09:31, Andrius Merkys wrote: > Hello, > > TL;DR: I propose packaging frequently updated PDB Chemical Component > Dictionary. Reasons, technical solutions and limitations below. > > PDB Chemical Component Dictionary (CCD) [1] is a single file (~400 MB > uncompressed) collection of small molecule components found in PDB entries. > It is used by at least a couple of Debian packages: openstructure, which > needs it as a build dependency, and libcifpp. > > For openstructure I have resorted to putting some version of the CCD in > debian/ directory to fulfill the build requirement and then provide it as > /usr/share/openstructure/components.cif.gz. However, due to this CCD is not > updated as frequently as it is released. Moreover, large-sized debian/ > directories are frowned upon. Therefore I would like to outsource the CCD. > > libcifpp package provides a cron task which keeps an up-to-date CCD in its > cache directory, which is good as Debian-packaged CCD file would stay static > between Debian releases. However, this does not help building openstructure > due to network access constraint. > > I propose packaging CCD as a separate source package. It does not have > version, thus update date would have to be used instead. I have hacked > together a watch file to check for new versions, but it fails on > mk-origtargz step: > > version=4 > opts="downloadurlmangle=s|status.*|monomers/components.cif.gz|,filenamemangle=s|(\d+)/$|ccd-$1.gz|" > \ > https://files.wwpdb.org/pub/pdb/data/status/ \ > https://files.wwpdb.org/pub/pdb/data/status/(\d+)/ > > Thus the tarball would have to be produced by get-orig-source target in > debian/rules unless there are other solutions. > > Here I would like to ask for comments and suggestions. I am aware that > packaging large and frequently updated data files is not usual practice, but > I believe that doing so would both resolve problems with building > openstructure and benefit users needing a stable CCD version. > > [1] https://www.wwpdb.org/data/ccd > > Best wishes, > Andrius > -- Time is the fire in which we all burn.