Re: [chromium-dev] SQLite compression in history database.
Due to bugs we've seen users with 10gb history files, which may contribute to complaints. http://code.google.com/p/chromium/issues/detail?id=24947 Even if compression ends up being pretty slow, you could imagine using it for our archived history (history more than a month old). On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium) wrote: > I'm all for it. I vaguely remember people complaining about the size > of our history files, and most of my history files are over 50M. > > -- Elliot > > On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess wrote: >> Long ago when developing fts1, I experimented with using zlib >> compression as part of the implementation. It fell by the wayside >> because it really didn't provide enough performance improvement (I >> needed an order of magnitude, it didn't provide it), and because of >> licensing issues (fts1/2/3 are part of core SQLite, which does not >> include zlib). >> >> Chromium already has zlib, and I don't think there's any particular >> reason not to hack our version of fts to support it. Looking at my >> October history file, I get the following (numbers are in megabytes): >> >> ls -lh History\ Index\ 2009-10 >> # -rw-r--r--@ 1 shess eng 66M Nov 24 09:38 History Index 2009-10 >> .../sqlite3 History\ Index\ 2009-10 >> select >> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2) >> from pages_content; >> # 34.9 >> select >> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2) >> from pages_content; >> # 12.29 >> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments; >> # 24.6 >> select round(sum(length(compress(block)))/1024.0/1024.0,2) from >> pages_segments; >> # 14.3 >> >> pages_segments is the fts index. Since it is consulted very >> frequently, I'd be slightly nervous about compressing it. >> pages_content is the document data, which is hit after the index (or >> when doing a lookup by document id), so compressing it shouldn't have >> much performance impact. >> >> Does this seem like a win worth pursuing? >> >> -scott >> >> -- >> Chromium Developers mailing list: chromium-dev@googlegroups.com >> View archives, change email options, or unsubscribe: >> http://groups.google.com/group/chromium-dev >> > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] SQLite compression in history database.
On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium) wrote: > I'm all for it. I vaguely remember people complaining about the size > of our history files, and most of my history files are over 50M. Part of the reason for this are bugs like http://code.google.com/p/chromium/issues/detail?id=24946 . Shouldn't we fix these first? > > -- Elliot > > On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess wrote: >> Long ago when developing fts1, I experimented with using zlib >> compression as part of the implementation. It fell by the wayside >> because it really didn't provide enough performance improvement (I >> needed an order of magnitude, it didn't provide it), and because of >> licensing issues (fts1/2/3 are part of core SQLite, which does not >> include zlib). >> >> Chromium already has zlib, and I don't think there's any particular >> reason not to hack our version of fts to support it. Looking at my >> October history file, I get the following (numbers are in megabytes): >> >> ls -lh History\ Index\ 2009-10 >> # -rw-r--r--@ 1 shess eng 66M Nov 24 09:38 History Index 2009-10 >> .../sqlite3 History\ Index\ 2009-10 >> select >> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2) >> from pages_content; >> # 34.9 >> select >> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2) >> from pages_content; >> # 12.29 >> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments; >> # 24.6 >> select round(sum(length(compress(block)))/1024.0/1024.0,2) from >> pages_segments; >> # 14.3 >> >> pages_segments is the fts index. Since it is consulted very >> frequently, I'd be slightly nervous about compressing it. >> pages_content is the document data, which is hit after the index (or >> when doing a lookup by document id), so compressing it shouldn't have >> much performance impact. >> >> Does this seem like a win worth pursuing? >> >> -scott >> >> -- >> Chromium Developers mailing list: chromium-dev@googlegroups.com >> View archives, change email options, or unsubscribe: >> http://groups.google.com/group/chromium-dev >> > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] SQLite compression in history database.
I'm all for it. I vaguely remember people complaining about the size of our history files, and most of my history files are over 50M. -- Elliot On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess wrote: > Long ago when developing fts1, I experimented with using zlib > compression as part of the implementation. It fell by the wayside > because it really didn't provide enough performance improvement (I > needed an order of magnitude, it didn't provide it), and because of > licensing issues (fts1/2/3 are part of core SQLite, which does not > include zlib). > > Chromium already has zlib, and I don't think there's any particular > reason not to hack our version of fts to support it. Looking at my > October history file, I get the following (numbers are in megabytes): > > ls -lh History\ Index\ 2009-10 > # -rw-r--r--@ 1 shess eng 66M Nov 24 09:38 History Index 2009-10 > .../sqlite3 History\ Index\ 2009-10 > select > round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2) > from pages_content; > # 34.9 > select > round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2) > from pages_content; > # 12.29 > select round(sum(length(block))/1024.0/1024.0,2) from pages_segments; > # 24.6 > select round(sum(length(compress(block)))/1024.0/1024.0,2) from > pages_segments; > # 14.3 > > pages_segments is the fts index. Since it is consulted very > frequently, I'd be slightly nervous about compressing it. > pages_content is the document data, which is hit after the index (or > when doing a lookup by document id), so compressing it shouldn't have > much performance impact. > > Does this seem like a win worth pursuing? > > -scott > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev