Re: [sqlite] Reduce database file size
On Sun, 11 Apr 2010 14:09:21 -0400 Nikolaus Rathwrote: > Using rsync is unfortunately not possible, I'm stuck with HTTP and > FTP. > > I tried a few PPM compressors, but even though the compression ratio > is amazing, I'm not desperate enough to invest *that* much CPU time > yet. > > I may take at smaz though, thanks for the pointer. > You can try bbb and paq8. You can find them at http://cs.fit.edu/~mmahoney/compression/ and other compressors. Please, note that they are very slow (depending on options it can take hours) At http://mattmahoney.net/dc/text.html you can look for more compression programs. PeaZip has a gui for some of this compression programs http://peazip.sourceforge.net/ and works under windows and linux (but hasn't bbb). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reduce database file size
If the subject is transferring data from a web server to a central database, I like to refer to a feature of an open source package that I published myself. It is only a side-side feature, resulting from the general design, but it was added with just this in mind. This feature, 'Connecting over the internet', is described in http:// packages.python.org/sqmediumlite/#5.4 (this is only for Python users). Op 12-apr-2010, om 6:57 heeft Roger Binns het volgende geschreven: > Nikolaus Rath wrote: >> I see. I gues I got confused by the help that refers to the >> parameter as >> 'cmd': > > The help shows that a list of TABLE is taken in the same syntax as > SQLite's shell uses. Also note that the help is formatted for > documenting interactive usage rather than API usage. > >> That works, thanks! But why is it *that* slow? > > Because it is written in Python and SQLite's is in C. I've made no > effort to optimise the Python - please send me a sample database and > I'll see what can be done. > > Additionally the Python supports more functionality which does have an > effect on speed. For example you can choose what encoding is used. > > Roger > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reduce database file size
Nikolaus Rath wrote: > I see. I gues I got confused by the help that refers to the parameter as > 'cmd': The help shows that a list of TABLE is taken in the same syntax as SQLite's shell uses. Also note that the help is formatted for documenting interactive usage rather than API usage. > That works, thanks! But why is it *that* slow? Because it is written in Python and SQLite's is in C. I've made no effort to optimise the Python - please send me a sample database and I'll see what can be done. Additionally the Python supports more functionality which does have an effect on speed. For example you can choose what encoding is used. Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reduce database file size
Roger Binnswrites: > On 04/11/2010 11:09 AM, Nikolaus Rath wrote: >> Yes, you remember correctly. Actually that would be a perfect solution. >> But how do I use it? It seems to me that I need to pass some argument to >> Shell.command_dump(), because the following just produces an empty file: >> >> >> import apsw >> ofh = open('dump.txt', 'w') >> db = apsw.Connection('test.db') >> s = apsw.Shell(db=db, stdout=ofh) >> s.command_dump('.dump') > > That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-) I see. I gues I got confused by the help that refers to the parameter as 'cmd': | command_dump(self, cmd) | dump ?TABLE? [TABLE...]: Dumps all or specified tables in SQL text format | | The table name is treated as like pattern so you can use % as > I'd suggest the more orthodox way of invoking commands which is less likely > to catch you out like this. > >s.process_command(".dump") That works, thanks! But why is it *that* slow? [0] vostro:~/tmp$ cat dump.py import apsw db = apsw.Connection('test.db') ofh = open('dump-py.sql', 'w') s = apsw.Shell(db=db, stdout=ofh) s.process_command('.dump') [0] vostro:~/tmp$ cat dump.sh cat
Re: [sqlite] Reduce database file size
On 04/11/2010 11:09 AM, Nikolaus Rath wrote: > Yes, you remember correctly. Actually that would be a perfect solution. > But how do I use it? It seems to me that I need to pass some argument to > Shell.command_dump(), because the following just produces an empty file: > > > import apsw > ofh = open('dump.txt', 'w') > db = apsw.Connection('test.db') > s = apsw.Shell(db=db, stdout=ofh) > s.command_dump('.dump') That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-) I'd suggest the more orthodox way of invoking commands which is less likely to catch you out like this. s.process_command(".dump") Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reduce database file size
Roger Binnswrites: > On 04/10/2010 03:06 PM, Nikolaus Rath wrote: >> However, I noticed that if I dump the entire database into a text file >> with the SQLite shell and then compress the text file, the result is >> significantly smaller than the "stripped" compressed database: > > Have you tried different page sizes? Yes, but the effect isn't very big: Page Size: DB: Compressed DB: 512 152 MB21 MB 1024147 MB19 MB 4096143 MB18 MB 10240 147 MB19 MB > You could also do the transmission using rsync with compression which may > turn out to transfer even less. > > It may also be worthwhile looking to other compression mechanisms. For > example smaz works well on English text because the compression dictionary > is prebuilt rather than generated for the specific data. A prebuilt > dictionary type mechanism may work well for you. You may also find a PPM > compressor even better. Using rsync is unfortunately not possible, I'm stuck with HTTP and FTP. I tried a few PPM compressors, but even though the compression ratio is amazing, I'm not desperate enough to invest *that* much CPU time yet. I may take at smaz though, thanks for the pointer. >> Alternatively, is there an easy way to dump and recover the DB using the >> standard API rather than the SQLite shell? (Obviously I could write a >> dump program myself, but I'd be nice if there is a solution that >> requires less work). > > You are using Python and APSW IIRC. APSW includes a Python based shell > that has dumping and restore code and can be used programmatically or > interactively. Yes, you remember correctly. Actually that would be a perfect solution. But how do I use it? It seems to me that I need to pass some argument to Shell.command_dump(), because the following just produces an empty file: import apsw ofh = open('dump.txt', 'w') db = apsw.Connection('test.db') s = apsw.Shell(db=db, stdout=ofh) s.command_dump('.dump') Thanks, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reduce database file size
On 04/10/2010 03:06 PM, Nikolaus Rath wrote: > However, I noticed that if I dump the entire database into a text file > with the SQLite shell and then compress the text file, the result is > significantly smaller than the "stripped" compressed database: Have you tried different page sizes? You could also do the transmission using rsync with compression which may turn out to transfer even less. It may also be worthwhile looking to other compression mechanisms. For example smaz works well on English text because the compression dictionary is prebuilt rather than generated for the specific data. A prebuilt dictionary type mechanism may work well for you. You may also find a PPM compressor even better. > Alternatively, is there an easy way to dump and recover the DB using the > standard API rather than the SQLite shell? (Obviously I could write a > dump program myself, but I'd be nice if there is a solution that > requires less work). You are using Python and APSW IIRC. APSW includes a Python based shell that has dumping and restore code and can be used programmatically or interactively. Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Reduce database file size
On 10 Apr 2010, at 11:06pm, Nikolaus Rath wrote: > However, I noticed that if I dump the entire database into a text file > with the SQLite shell and then compress the text file, the result is > significantly smaller than the "stripped" compressed database: > > Full database: 146 MB > Without Custom Indices: 117 MB > Compressed: 13 MB > > Dumped DB: 181 MB > Compressed: 6.8 MB I assume you're using lzma -9 for both of these. You might try other compressors besides lzma but you're already getting very good compression. > Is there a way to strip even more redundancies from the DB for the > transfer (e.g. the automatically created indices for primary keys)? I think those are inherent in how SQLite works. You can't strip them from its format. > Alternatively, is there an easy way to dump and recover the DB using the > standard API rather than the SQLite shell? (Obviously I could write a > dump program myself, but I'd be nice if there is a solution that > requires less work). I think you've already worked out the best way to do it using sqlite and SQL formats. And of course if you're writing your own dump program you're probably better off using TSV format than raw SQL commands. Unless you have BLOBs in your file. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Reduce database file size
Hello, I would like to make an SQLite database as small as possible to transfer it over a slow link. Currently I am dropping all custom indices, run VACUUM and compress the file with lzma -9. However, I noticed that if I dump the entire database into a text file with the SQLite shell and then compress the text file, the result is significantly smaller than the "stripped" compressed database: Full database: 146 MB Without Custom Indices: 117 MB Compressed: 13 MB Dumped DB: 181 MB Compressed: 6.8 MB Is there a way to strip even more redundancies from the DB for the transfer (e.g. the automatically created indices for primary keys)? Alternatively, is there an easy way to dump and recover the DB using the standard API rather than the SQLite shell? (Obviously I could write a dump program myself, but I'd be nice if there is a solution that requires less work). Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users