No recommendation here, but I just came across this tool for
including/diffing sqlite binaries in git repos
<https://github.com/cannadayr/git-sqlite>, which is clearly relevant.

(It came up in a HN discussion for another related application
<https://news.ycombinator.com/item?id=17865687>.)

On Fri, Aug 10, 2018 at 1:28 PM Carl Boettiger via discuss <
discuss@lists.carpentries.org> wrote:

> Hi Tiffany!, list,
>
> I think dumps are reasonable for regular backups, but not a good choice
> for creating more long-term archives; so guess it depends a bit on the
> goal.  Changes in versions, options, encoding, and database engines can
> make it difficult to import SQL dumps accurately.  I think text-file
> formats (csv) are still the best long-term archive option -- they are easy
> to version and compress and ubiquitous, but far from a perfect option -- in
> particular, a round-trip db -> csv -> db may likely not preserve data types
> (boolean / int / char etc) accurately. Storing this as 'metadata' can help
> but is somewhat manual. I'm not convinced that we have a good performant,
> compressable, cross-platform, widely established file-based exchange format
> available at this time (queue comments about json, hdf5, or parquet).
>
> A somewhat separate issue is whether such files need a git-like tool to
> manage versions.  IMHO the goal is really to preserve each dump in a way
> that doesn't risk accidental overwriting of a previous version and captures
> some basic metadata (timestamp); something a file-naming convention can
> provide and git may not be necessary (given both the potentially large size
> of data dumps and the often compelling case to compress these files in a
> binary format).
>
> I'm really no expert in any of this though, so sharing this as much to
> learn where it goes wrong rather than as solid advice!
>
> Cheers,
>
> Carl
>
> On Fri, Aug 10, 2018 at 12:08 PM Bennet Fauber <ben...@umich.edu> wrote:
>
>> Tiffany,
>>
>> You might experiment with some smallish databases.  The order of
>> records may well change significantly from dump to dump, making the
>> apparent differences and the actual differences between any two dumps
>> appear much larger than they really are.
>>
>> Good luck!
>>
>>
>> On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss
>> <discuss@lists.carpentries.org> wrote:
>> >
>> > Thanks all for your input - very helpful! Dav - happy for you to
>> questions the general strategy. As I said, I know very little about this.
>> In my case its a smallish, simple SQLite database with ~ 8 tables. So
>> dumping/transaction logs/etc might work well and easily. But if there's a
>> better and different strategy for checkpointing SQLite databases, I'd love
>> to learn.
>> >
>> > Thanks!
>> > Tiffany
>> > The Carpentries / discuss / see discussions + participants + delivery
>> options Permalink
>>
>> ------------------------------------------
>> The Carpentries: discuss
>> Permalink:
>> https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889
>> Delivery options:
>> https://carpentries.topicbox.com/groups/discuss/subscription
>>
> --
>
> http://carlboettiger.info
> *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss /
> see discussions <https://carpentries.topicbox.com/groups/discuss> +
> participants <https://carpentries.topicbox.com/groups/discuss/members> + 
> delivery
> options <https://carpentries.topicbox.com/groups/discuss/subscription>
> Permalink
> <https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Md590217388b52aadab77edf2>
>

------------------------------------------
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-M5d95bbf9e8825d3a09fe7cb8
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription

Reply via email to