Long ago when developing fts1, I experimented with using zlib
compression as part of the implementation.  It fell by the wayside
because it really didn't provide enough performance improvement (I
needed an order of magnitude, it didn't provide it), and because of
licensing issues (fts1/2/3 are part of core SQLite, which does not
include zlib).

Chromium already has zlib, and I don't think there's any particular
reason not to hack our version of fts to support it.  Looking at my
October history file, I get the following (numbers are in megabytes):

ls -lh History\ Index\ 2009-10
# -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
.../sqlite3 History\ Index\ 2009-10
select round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
from pages_content;
# 34.9
select 
round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
from pages_content;
# 12.29
select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
# 24.6
select round(sum(length(compress(block)))/1024.0/1024.0,2) from pages_segments;
# 14.3

pages_segments is the fts index.  Since it is consulted very
frequently, I'd be slightly nervous about compressing it.
pages_content is the document data, which is hit after the index (or
when doing a lookup by document id), so compressing it shouldn't have
much performance impact.

Does this seem like a win worth pursuing?

-scott

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev

Reply via email to