Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Evan Martin
Due to bugs we've seen users with 10gb history files, which may
contribute to complaints.
  http://code.google.com/p/chromium/issues/detail?id=24947

Even if compression ends up being pretty slow, you could imagine using
it for our archived history (history more than a month old).

On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium)
 wrote:
> I'm all for it. I vaguely remember people complaining about the size
> of our history files, and most of my history files are over 50M.
>
> -- Elliot
>
> On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess  wrote:
>> Long ago when developing fts1, I experimented with using zlib
>> compression as part of the implementation.  It fell by the wayside
>> because it really didn't provide enough performance improvement (I
>> needed an order of magnitude, it didn't provide it), and because of
>> licensing issues (fts1/2/3 are part of core SQLite, which does not
>> include zlib).
>>
>> Chromium already has zlib, and I don't think there's any particular
>> reason not to hack our version of fts to support it.  Looking at my
>> October history file, I get the following (numbers are in megabytes):
>>
>> ls -lh History\ Index\ 2009-10
>> # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
>> .../sqlite3 History\ Index\ 2009-10
>> select 
>> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
>> from pages_content;
>> # 34.9
>> select 
>> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
>> from pages_content;
>> # 12.29
>> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
>> # 24.6
>> select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
>> pages_segments;
>> # 14.3
>>
>> pages_segments is the fts index.  Since it is consulted very
>> frequently, I'd be slightly nervous about compressing it.
>> pages_content is the document data, which is hit after the index (or
>> when doing a lookup by document id), so compressing it shouldn't have
>> much performance impact.
>>
>> Does this seem like a win worth pursuing?
>>
>> -scott
>>
>> --
>> Chromium Developers mailing list: chromium-dev@googlegroups.com
>> View archives, change email options, or unsubscribe:
>>    http://groups.google.com/group/chromium-dev
>>
>
> --
> Chromium Developers mailing list: chromium-dev@googlegroups.com
> View archives, change email options, or unsubscribe:
>    http://groups.google.com/group/chromium-dev
>

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Nico Weber
On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium)
 wrote:
> I'm all for it. I vaguely remember people complaining about the size
> of our history files, and most of my history files are over 50M.

Part of the reason for this are bugs like
http://code.google.com/p/chromium/issues/detail?id=24946 . Shouldn't
we fix these first?

>
> -- Elliot
>
> On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess  wrote:
>> Long ago when developing fts1, I experimented with using zlib
>> compression as part of the implementation.  It fell by the wayside
>> because it really didn't provide enough performance improvement (I
>> needed an order of magnitude, it didn't provide it), and because of
>> licensing issues (fts1/2/3 are part of core SQLite, which does not
>> include zlib).
>>
>> Chromium already has zlib, and I don't think there's any particular
>> reason not to hack our version of fts to support it.  Looking at my
>> October history file, I get the following (numbers are in megabytes):
>>
>> ls -lh History\ Index\ 2009-10
>> # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
>> .../sqlite3 History\ Index\ 2009-10
>> select 
>> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
>> from pages_content;
>> # 34.9
>> select 
>> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
>> from pages_content;
>> # 12.29
>> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
>> # 24.6
>> select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
>> pages_segments;
>> # 14.3
>>
>> pages_segments is the fts index.  Since it is consulted very
>> frequently, I'd be slightly nervous about compressing it.
>> pages_content is the document data, which is hit after the index (or
>> when doing a lookup by document id), so compressing it shouldn't have
>> much performance impact.
>>
>> Does this seem like a win worth pursuing?
>>
>> -scott
>>
>> --
>> Chromium Developers mailing list: chromium-dev@googlegroups.com
>> View archives, change email options, or unsubscribe:
>>    http://groups.google.com/group/chromium-dev
>>
>
> --
> Chromium Developers mailing list: chromium-dev@googlegroups.com
> View archives, change email options, or unsubscribe:
>    http://groups.google.com/group/chromium-dev
>

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Elliot Glaysher (Chromium)
I'm all for it. I vaguely remember people complaining about the size
of our history files, and most of my history files are over 50M.

-- Elliot

On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess  wrote:
> Long ago when developing fts1, I experimented with using zlib
> compression as part of the implementation.  It fell by the wayside
> because it really didn't provide enough performance improvement (I
> needed an order of magnitude, it didn't provide it), and because of
> licensing issues (fts1/2/3 are part of core SQLite, which does not
> include zlib).
>
> Chromium already has zlib, and I don't think there's any particular
> reason not to hack our version of fts to support it.  Looking at my
> October history file, I get the following (numbers are in megabytes):
>
> ls -lh History\ Index\ 2009-10
> # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
> .../sqlite3 History\ Index\ 2009-10
> select 
> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
> from pages_content;
> # 34.9
> select 
> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
> from pages_content;
> # 12.29
> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
> # 24.6
> select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
> pages_segments;
> # 14.3
>
> pages_segments is the fts index.  Since it is consulted very
> frequently, I'd be slightly nervous about compressing it.
> pages_content is the document data, which is hit after the index (or
> when doing a lookup by document id), so compressing it shouldn't have
> much performance impact.
>
> Does this seem like a win worth pursuing?
>
> -scott
>
> --
> Chromium Developers mailing list: chromium-dev@googlegroups.com
> View archives, change email options, or unsubscribe:
>    http://groups.google.com/group/chromium-dev
>

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev