Hi Maarten, On 2021-09-09 17:54, Maarten L. Hekkelman wrote: > Op 09-09-2021 om 15:14 schreef Andrius Merkys: >>> But I would not mind having a system wide service to update data files >>> like these. Perhaps with a log with version info, so you can look up >>> what version was used at what date. >> Indeed, it would be nice to find a generic solution, but this might be >> tricky. There are conflicting needs of stability (no updates), freshness >> (updates every day) and multi-user support (no updates and updates >> everyday all at once on the same machine). The only solution I can think >> of now is keeping all the downloaded versions with version/date in their >> names like: >> >> /var/cache/pdb/components/components-20210814.cif.gz >> /var/cache/pdb/components/components-20210820.cif.gz >> /var/cache/pdb/components/components-20210826.cif.gz >> ... >> (maybe /var/cache/pdb/components/components.cif.gz symlink to the latest) >> >> Then a user would use environment variable, say, PDB_COMPONENTS to point >> to a file with version in its name should they need a specific stable >> database, and would use /var/cache/pdb/components/components.cif.gz >> should they need the most up-to-date one. >> >> Does this sound reasonable? > > I think a bit more is required, when looking at the FAIR principles[1] I > can see a few other issues coming up. What would be nice is to have e.g. > a JSON file along with the data containing a hash, download date and > other meta data for the data files available. Then if you store the hash > (and perhaps more meta data) for the data file along with your results, > you can always recover what version of the datafile was used. > > In the PDB-REDO database we're trying to do this for e.g. the version of > all the tools used to create a record.
I agree that additional persistent download log would be beneficial. I would prefer linear comma-separated or tab-separated value list to simplify reading and writing, but the format is more of a matter of taste :) > [1] https://en.wikipedia.org/wiki/FAIR_data Best, Andrius