Hi Tiffany!, list,

I think dumps are reasonable for regular backups, but not a good choice for
creating more long-term archives; so guess it depends a bit on the goal.
Changes in versions, options, encoding, and database engines can make it
difficult to import SQL dumps accurately.  I think text-file formats (csv)
are still the best long-term archive option -- they are easy to version and
compress and ubiquitous, but far from a perfect option -- in particular, a
round-trip db -> csv -> db may likely not preserve data types (boolean /
int / char etc) accurately. Storing this as 'metadata' can help but is
somewhat manual. I'm not convinced that we have a good performant,
compressable, cross-platform, widely established file-based exchange format
available at this time (queue comments about json, hdf5, or parquet).

A somewhat separate issue is whether such files need a git-like tool to
manage versions.  IMHO the goal is really to preserve each dump in a way
that doesn't risk accidental overwriting of a previous version and captures
some basic metadata (timestamp); something a file-naming convention can
provide and git may not be necessary (given both the potentially large size
of data dumps and the often compelling case to compress these files in a
binary format).

I'm really no expert in any of this though, so sharing this as much to
learn where it goes wrong rather than as solid advice!

Cheers,

Carl

On Fri, Aug 10, 2018 at 12:08 PM Bennet Fauber <ben...@umich.edu> wrote:

> Tiffany,
>
> You might experiment with some smallish databases.  The order of
> records may well change significantly from dump to dump, making the
> apparent differences and the actual differences between any two dumps
> appear much larger than they really are.
>
> Good luck!
>
>
> On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss
> <discuss@lists.carpentries.org> wrote:
> >
> > Thanks all for your input - very helpful! Dav - happy for you to
> questions the general strategy. As I said, I know very little about this.
> In my case its a smallish, simple SQLite database with ~ 8 tables. So
> dumping/transaction logs/etc might work well and easily. But if there's a
> better and different strategy for checkpointing SQLite databases, I'd love
> to learn.
> >
> > Thanks!
> > Tiffany
> > The Carpentries / discuss / see discussions + participants + delivery
> options Permalink
>
> ------------------------------------------
> The Carpentries: discuss
> Permalink:
> https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889
> Delivery options:
> https://carpentries.topicbox.com/groups/discuss/subscription
>
-- 

http://carlboettiger.info

------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Md590217388b52aadab77edf2
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Reply via email to