Re: [sqlite] Reduce database file size

2010-04-15 Thread Eduardo
On Sun, 11 Apr 2010 14:09:21 -0400
Nikolaus Rath  wrote:

> Using rsync is unfortunately not possible, I'm stuck with HTTP and
> FTP.
> 
> I tried a few PPM compressors, but even though the compression ratio
> is amazing, I'm not desperate enough to invest *that* much CPU time
> yet.
> 
> I may take at smaz though, thanks for the pointer.
> 


You can try bbb and paq8. You can find them at
http://cs.fit.edu/~mmahoney/compression/ and other compressors.
Please, note that they are very slow (depending on options it can take
hours)

At http://mattmahoney.net/dc/text.html you can look for more
compression programs.

PeaZip has a gui for some of this
compression programs http://peazip.sourceforge.net/ and works under
windows and linux (but hasn't bbb).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reduce database file size

2010-04-12 Thread Edzard Pasma
If the subject is transferring data from a web server to a central  
database, I like to refer to a feature of an open source package that  
I published myself. It is only a side-side feature, resulting from  
the general design, but it was added with just this in mind. This  
feature, 'Connecting over the internet', is described in http:// 
packages.python.org/sqmediumlite/#5.4 (this is only for Python users).

Op 12-apr-2010, om 6:57 heeft Roger Binns het volgende geschreven:

> Nikolaus Rath wrote:
>> I see. I gues I got confused by the help that refers to the  
>> parameter as
>> 'cmd':
>
> The help shows that a list of TABLE is taken in the same syntax as
> SQLite's shell uses.  Also note that the help is formatted for
> documenting interactive usage rather than API usage.
>
>> That works, thanks! But why is it *that* slow?
>
> Because it is written in Python and SQLite's is in C.  I've made no
> effort to optimise the Python - please send me a sample database and
> I'll see what can be done.
>
> Additionally the Python supports more functionality which does have an
> effect on speed.  For example you can choose what encoding is used.
>
> Roger
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reduce database file size

2010-04-11 Thread Roger Binns
Nikolaus Rath wrote:
> I see. I gues I got confused by the help that refers to the parameter as
> 'cmd':

The help shows that a list of TABLE is taken in the same syntax as 
SQLite's shell uses.  Also note that the help is formatted for 
documenting interactive usage rather than API usage.

> That works, thanks! But why is it *that* slow?

Because it is written in Python and SQLite's is in C.  I've made no 
effort to optimise the Python - please send me a sample database and 
I'll see what can be done.

Additionally the Python supports more functionality which does have an 
effect on speed.  For example you can choose what encoding is used.

Roger
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reduce database file size

2010-04-11 Thread Nikolaus Rath
Roger Binns  writes:
> On 04/11/2010 11:09 AM, Nikolaus Rath wrote:
>> Yes, you remember correctly. Actually that would be a perfect solution.
>> But how do I use it? It seems to me that I need to pass some argument to
>> Shell.command_dump(), because the following just produces an empty file:
>>
>>
>> import apsw
>> ofh = open('dump.txt', 'w')
>> db = apsw.Connection('test.db')
>> s = apsw.Shell(db=db, stdout=ofh)
>> s.command_dump('.dump')
>
> That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-)

I see. I gues I got confused by the help that refers to the parameter as
'cmd':

 |  command_dump(self, cmd)
 |  dump ?TABLE? [TABLE...]: Dumps all or specified tables in SQL text 
format
 |  
 |  The table name is treated as like pattern so you can use % as

 
> I'd suggest the more orthodox way of invoking commands which is less likely 
> to catch you out like this.
>
>s.process_command(".dump")

That works, thanks! But why is it *that* slow?

[0] vostro:~/tmp$ cat dump.py
import apsw

db = apsw.Connection('test.db')
ofh = open('dump-py.sql', 'w')
s = apsw.Shell(db=db, stdout=ofh)
s.process_command('.dump')

[0] vostro:~/tmp$ cat dump.sh
cat 

Re: [sqlite] Reduce database file size

2010-04-11 Thread Roger Binns
On 04/11/2010 11:09 AM, Nikolaus Rath wrote:
> Yes, you remember correctly. Actually that would be a perfect solution.
> But how do I use it? It seems to me that I need to pass some argument to
> Shell.command_dump(), because the following just produces an empty file:
>
>
> import apsw
> ofh = open('dump.txt', 'w')
> db = apsw.Connection('test.db')
> s = apsw.Shell(db=db, stdout=ofh)
> s.command_dump('.dump')

That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-)

I'd suggest the more orthodox way of invoking commands which is less likely 
to catch you out like this.

   s.process_command(".dump")

Roger
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reduce database file size

2010-04-11 Thread Nikolaus Rath
Roger Binns  writes:
> On 04/10/2010 03:06 PM, Nikolaus Rath wrote:
>> However, I noticed that if I dump the entire database into a text file
>> with the SQLite shell and then compress the text file, the result is
>> significantly smaller than the "stripped" compressed database:
>
> Have you tried different page sizes?

Yes, but the effect isn't very big:

Page Size:  DB:   Compressed DB:
512 152 MB21 MB
1024147 MB19 MB
4096143 MB18 MB
10240   147 MB19 MB

> You could also do the transmission using rsync with compression which may 
> turn out to transfer even less.
>
> It may also be worthwhile looking to other compression mechanisms.  For 
> example smaz works well on English text because the compression dictionary 
> is prebuilt rather than generated for the specific data.  A prebuilt 
> dictionary type mechanism may work well for you.  You may also find a PPM 
> compressor even better.

Using rsync is unfortunately not possible, I'm stuck with HTTP and FTP.

I tried a few PPM compressors, but even though the compression ratio is
amazing, I'm not desperate enough to invest *that* much CPU time yet.

I may take at smaz though, thanks for the pointer.

>> Alternatively, is there an easy way to dump and recover the DB using the
>> standard API rather than the SQLite shell? (Obviously I could write a
>> dump program myself, but I'd be nice if there is a solution that
>> requires less work).
>
> You are using Python and APSW IIRC. APSW includes a Python based shell
> that has dumping and restore code and can be used programmatically or
> interactively.

Yes, you remember correctly. Actually that would be a perfect solution.
But how do I use it? It seems to me that I need to pass some argument to
Shell.command_dump(), because the following just produces an empty file:


import apsw
ofh = open('dump.txt', 'w')
db = apsw.Connection('test.db')
s = apsw.Shell(db=db, stdout=ofh)
s.command_dump('.dump')


Thanks,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reduce database file size

2010-04-10 Thread Roger Binns
On 04/10/2010 03:06 PM, Nikolaus Rath wrote:
> However, I noticed that if I dump the entire database into a text file
> with the SQLite shell and then compress the text file, the result is
> significantly smaller than the "stripped" compressed database:

Have you tried different page sizes?

You could also do the transmission using rsync with compression which may 
turn out to transfer even less.

It may also be worthwhile looking to other compression mechanisms.  For 
example smaz works well on English text because the compression dictionary 
is prebuilt rather than generated for the specific data.  A prebuilt 
dictionary type mechanism may work well for you.  You may also find a PPM 
compressor even better.

> Alternatively, is there an easy way to dump and recover the DB using the
> standard API rather than the SQLite shell? (Obviously I could write a
> dump program myself, but I'd be nice if there is a solution that
> requires less work).

You are using Python and APSW IIRC.  APSW includes a Python based shell that 
has dumping and restore code and can be used programmatically or interactively.

Roger
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Reduce database file size

2010-04-10 Thread Simon Slavin

On 10 Apr 2010, at 11:06pm, Nikolaus Rath wrote:

> However, I noticed that if I dump the entire database into a text file
> with the SQLite shell and then compress the text file, the result is
> significantly smaller than the "stripped" compressed database:
> 
> Full database: 146 MB
> Without Custom Indices: 117 MB
> Compressed: 13 MB
> 
> Dumped DB: 181 MB
> Compressed: 6.8 MB

I assume you're using lzma -9 for both of these.  You might try other 
compressors besides lzma but you're already getting very good compression.

> Is there a way to strip even more redundancies from the DB for the
> transfer (e.g. the automatically created indices for primary keys)?

I think those are inherent in how SQLite works.  You can't strip them from its 
format.

> Alternatively, is there an easy way to dump and recover the DB using the
> standard API rather than the SQLite shell? (Obviously I could write a
> dump program myself, but I'd be nice if there is a solution that
> requires less work).

I think you've already worked out the best way to do it using sqlite and SQL 
formats.  And of course if you're writing your own dump program you're probably 
better off using TSV format than raw SQL commands.  Unless you have BLOBs in 
your file.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Reduce database file size

2010-04-10 Thread Nikolaus Rath
Hello,

I would like to make an SQLite database as small as possible to transfer
it over a slow link. Currently I am dropping all custom indices, run
VACUUM and compress the file with lzma -9.

However, I noticed that if I dump the entire database into a text file
with the SQLite shell and then compress the text file, the result is
significantly smaller than the "stripped" compressed database:

Full database: 146 MB
Without Custom Indices: 117 MB
Compressed: 13 MB

Dumped DB: 181 MB
Compressed: 6.8 MB


Is there a way to strip even more redundancies from the DB for the
transfer (e.g. the automatically created indices for primary keys)?

Alternatively, is there an easy way to dump and recover the DB using the
standard API rather than the SQLite shell? (Obviously I could write a
dump program myself, but I'd be nice if there is a solution that
requires less work).


Best,

   -Nikolaus

-- 
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users