Re: [chromium-dev] SQLite compression in history database.
Due to bugs we've seen users with 10gb history files, which may contribute to complaints. http://code.google.com/p/chromium/issues/detail?id=24947 Even if compression ends up being pretty slow, you could imagine using it for our archived history (history more than a month old). On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium) wrote: > I'm all for it. I vaguely remember people complaining about the size > of our history files, and most of my history files are over 50M. > > -- Elliot > > On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess wrote: >> Long ago when developing fts1, I experimented with using zlib >> compression as part of the implementation. It fell by the wayside >> because it really didn't provide enough performance improvement (I >> needed an order of magnitude, it didn't provide it), and because of >> licensing issues (fts1/2/3 are part of core SQLite, which does not >> include zlib). >> >> Chromium already has zlib, and I don't think there's any particular >> reason not to hack our version of fts to support it. Looking at my >> October history file, I get the following (numbers are in megabytes): >> >> ls -lh History\ Index\ 2009-10 >> # -rw-r--r--@ 1 shess eng 66M Nov 24 09:38 History Index 2009-10 >> .../sqlite3 History\ Index\ 2009-10 >> select >> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2) >> from pages_content; >> # 34.9 >> select >> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2) >> from pages_content; >> # 12.29 >> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments; >> # 24.6 >> select round(sum(length(compress(block)))/1024.0/1024.0,2) from >> pages_segments; >> # 14.3 >> >> pages_segments is the fts index. Since it is consulted very >> frequently, I'd be slightly nervous about compressing it. >> pages_content is the document data, which is hit after the index (or >> when doing a lookup by document id), so compressing it shouldn't have >> much performance impact. >> >> Does this seem like a win worth pursuing? >> >> -scott >> >> -- >> Chromium Developers mailing list: chromium-dev@googlegroups.com >> View archives, change email options, or unsubscribe: >> http://groups.google.com/group/chromium-dev >> > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] SQLite compression in history database.
On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium) wrote: > I'm all for it. I vaguely remember people complaining about the size > of our history files, and most of my history files are over 50M. Part of the reason for this are bugs like http://code.google.com/p/chromium/issues/detail?id=24946 . Shouldn't we fix these first? > > -- Elliot > > On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess wrote: >> Long ago when developing fts1, I experimented with using zlib >> compression as part of the implementation. It fell by the wayside >> because it really didn't provide enough performance improvement (I >> needed an order of magnitude, it didn't provide it), and because of >> licensing issues (fts1/2/3 are part of core SQLite, which does not >> include zlib). >> >> Chromium already has zlib, and I don't think there's any particular >> reason not to hack our version of fts to support it. Looking at my >> October history file, I get the following (numbers are in megabytes): >> >> ls -lh History\ Index\ 2009-10 >> # -rw-r--r--@ 1 shess eng 66M Nov 24 09:38 History Index 2009-10 >> .../sqlite3 History\ Index\ 2009-10 >> select >> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2) >> from pages_content; >> # 34.9 >> select >> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2) >> from pages_content; >> # 12.29 >> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments; >> # 24.6 >> select round(sum(length(compress(block)))/1024.0/1024.0,2) from >> pages_segments; >> # 14.3 >> >> pages_segments is the fts index. Since it is consulted very >> frequently, I'd be slightly nervous about compressing it. >> pages_content is the document data, which is hit after the index (or >> when doing a lookup by document id), so compressing it shouldn't have >> much performance impact. >> >> Does this seem like a win worth pursuing? >> >> -scott >> >> -- >> Chromium Developers mailing list: chromium-dev@googlegroups.com >> View archives, change email options, or unsubscribe: >> http://groups.google.com/group/chromium-dev >> > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] SQLite compression in history database.
I'm all for it. I vaguely remember people complaining about the size of our history files, and most of my history files are over 50M. -- Elliot On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess wrote: > Long ago when developing fts1, I experimented with using zlib > compression as part of the implementation. It fell by the wayside > because it really didn't provide enough performance improvement (I > needed an order of magnitude, it didn't provide it), and because of > licensing issues (fts1/2/3 are part of core SQLite, which does not > include zlib). > > Chromium already has zlib, and I don't think there's any particular > reason not to hack our version of fts to support it. Looking at my > October history file, I get the following (numbers are in megabytes): > > ls -lh History\ Index\ 2009-10 > # -rw-r--r--@ 1 shess eng 66M Nov 24 09:38 History Index 2009-10 > .../sqlite3 History\ Index\ 2009-10 > select > round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2) > from pages_content; > # 34.9 > select > round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2) > from pages_content; > # 12.29 > select round(sum(length(block))/1024.0/1024.0,2) from pages_segments; > # 24.6 > select round(sum(length(compress(block)))/1024.0/1024.0,2) from > pages_segments; > # 14.3 > > pages_segments is the fts index. Since it is consulted very > frequently, I'd be slightly nervous about compressing it. > pages_content is the document data, which is hit after the index (or > when doing a lookup by document id), so compressing it shouldn't have > much performance impact. > > Does this seem like a win worth pursuing? > > -scott > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] SQLite compression in history database.
Long ago when developing fts1, I experimented with using zlib compression as part of the implementation. It fell by the wayside because it really didn't provide enough performance improvement (I needed an order of magnitude, it didn't provide it), and because of licensing issues (fts1/2/3 are part of core SQLite, which does not include zlib). Chromium already has zlib, and I don't think there's any particular reason not to hack our version of fts to support it. Looking at my October history file, I get the following (numbers are in megabytes): ls -lh History\ Index\ 2009-10 # -rw-r--r--@ 1 shess eng 66M Nov 24 09:38 History Index 2009-10 .../sqlite3 History\ Index\ 2009-10 select round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2) from pages_content; # 34.9 select round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2) from pages_content; # 12.29 select round(sum(length(block))/1024.0/1024.0,2) from pages_segments; # 24.6 select round(sum(length(compress(block)))/1024.0/1024.0,2) from pages_segments; # 14.3 pages_segments is the fts index. Since it is consulted very frequently, I'd be slightly nervous about compressing it. pages_content is the document data, which is hit after the index (or when doing a lookup by document id), so compressing it shouldn't have much performance impact. Does this seem like a win worth pursuing? -scott -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev