Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Evan Martin
Due to bugs we've seen users with 10gb history files, which may
contribute to complaints.
  http://code.google.com/p/chromium/issues/detail?id=24947

Even if compression ends up being pretty slow, you could imagine using
it for our archived history (history more than a month old).

On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium)
 wrote:
> I'm all for it. I vaguely remember people complaining about the size
> of our history files, and most of my history files are over 50M.
>
> -- Elliot
>
> On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess  wrote:
>> Long ago when developing fts1, I experimented with using zlib
>> compression as part of the implementation.  It fell by the wayside
>> because it really didn't provide enough performance improvement (I
>> needed an order of magnitude, it didn't provide it), and because of
>> licensing issues (fts1/2/3 are part of core SQLite, which does not
>> include zlib).
>>
>> Chromium already has zlib, and I don't think there's any particular
>> reason not to hack our version of fts to support it.  Looking at my
>> October history file, I get the following (numbers are in megabytes):
>>
>> ls -lh History\ Index\ 2009-10
>> # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
>> .../sqlite3 History\ Index\ 2009-10
>> select 
>> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
>> from pages_content;
>> # 34.9
>> select 
>> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
>> from pages_content;
>> # 12.29
>> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
>> # 24.6
>> select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
>> pages_segments;
>> # 14.3
>>
>> pages_segments is the fts index.  Since it is consulted very
>> frequently, I'd be slightly nervous about compressing it.
>> pages_content is the document data, which is hit after the index (or
>> when doing a lookup by document id), so compressing it shouldn't have
>> much performance impact.
>>
>> Does this seem like a win worth pursuing?
>>
>> -scott
>>
>> --
>> Chromium Developers mailing list: chromium-dev@googlegroups.com
>> View archives, change email options, or unsubscribe:
>>    http://groups.google.com/group/chromium-dev
>>
>
> --
> Chromium Developers mailing list: chromium-dev@googlegroups.com
> View archives, change email options, or unsubscribe:
>    http://groups.google.com/group/chromium-dev
>

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Nico Weber
On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium)
 wrote:
> I'm all for it. I vaguely remember people complaining about the size
> of our history files, and most of my history files are over 50M.

Part of the reason for this are bugs like
http://code.google.com/p/chromium/issues/detail?id=24946 . Shouldn't
we fix these first?

>
> -- Elliot
>
> On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess  wrote:
>> Long ago when developing fts1, I experimented with using zlib
>> compression as part of the implementation.  It fell by the wayside
>> because it really didn't provide enough performance improvement (I
>> needed an order of magnitude, it didn't provide it), and because of
>> licensing issues (fts1/2/3 are part of core SQLite, which does not
>> include zlib).
>>
>> Chromium already has zlib, and I don't think there's any particular
>> reason not to hack our version of fts to support it.  Looking at my
>> October history file, I get the following (numbers are in megabytes):
>>
>> ls -lh History\ Index\ 2009-10
>> # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
>> .../sqlite3 History\ Index\ 2009-10
>> select 
>> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
>> from pages_content;
>> # 34.9
>> select 
>> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
>> from pages_content;
>> # 12.29
>> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
>> # 24.6
>> select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
>> pages_segments;
>> # 14.3
>>
>> pages_segments is the fts index.  Since it is consulted very
>> frequently, I'd be slightly nervous about compressing it.
>> pages_content is the document data, which is hit after the index (or
>> when doing a lookup by document id), so compressing it shouldn't have
>> much performance impact.
>>
>> Does this seem like a win worth pursuing?
>>
>> -scott
>>
>> --
>> Chromium Developers mailing list: chromium-dev@googlegroups.com
>> View archives, change email options, or unsubscribe:
>>    http://groups.google.com/group/chromium-dev
>>
>
> --
> Chromium Developers mailing list: chromium-dev@googlegroups.com
> View archives, change email options, or unsubscribe:
>    http://groups.google.com/group/chromium-dev
>

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Elliot Glaysher (Chromium)
I'm all for it. I vaguely remember people complaining about the size
of our history files, and most of my history files are over 50M.

-- Elliot

On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess  wrote:
> Long ago when developing fts1, I experimented with using zlib
> compression as part of the implementation.  It fell by the wayside
> because it really didn't provide enough performance improvement (I
> needed an order of magnitude, it didn't provide it), and because of
> licensing issues (fts1/2/3 are part of core SQLite, which does not
> include zlib).
>
> Chromium already has zlib, and I don't think there's any particular
> reason not to hack our version of fts to support it.  Looking at my
> October history file, I get the following (numbers are in megabytes):
>
> ls -lh History\ Index\ 2009-10
> # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
> .../sqlite3 History\ Index\ 2009-10
> select 
> round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
> from pages_content;
> # 34.9
> select 
> round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
> from pages_content;
> # 12.29
> select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
> # 24.6
> select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
> pages_segments;
> # 14.3
>
> pages_segments is the fts index.  Since it is consulted very
> frequently, I'd be slightly nervous about compressing it.
> pages_content is the document data, which is hit after the index (or
> when doing a lookup by document id), so compressing it shouldn't have
> much performance impact.
>
> Does this seem like a win worth pursuing?
>
> -scott
>
> --
> Chromium Developers mailing list: chromium-dev@googlegroups.com
> View archives, change email options, or unsubscribe:
>    http://groups.google.com/group/chromium-dev
>

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


[chromium-dev] SQLite compression in history database.

2009-11-24 Thread Scott Hess
Long ago when developing fts1, I experimented with using zlib
compression as part of the implementation.  It fell by the wayside
because it really didn't provide enough performance improvement (I
needed an order of magnitude, it didn't provide it), and because of
licensing issues (fts1/2/3 are part of core SQLite, which does not
include zlib).

Chromium already has zlib, and I don't think there's any particular
reason not to hack our version of fts to support it.  Looking at my
October history file, I get the following (numbers are in megabytes):

ls -lh History\ Index\ 2009-10
# -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
.../sqlite3 History\ Index\ 2009-10
select round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
from pages_content;
# 34.9
select 
round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
from pages_content;
# 12.29
select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
# 24.6
select round(sum(length(compress(block)))/1024.0/1024.0,2) from pages_segments;
# 14.3

pages_segments is the fts index.  Since it is consulted very
frequently, I'd be slightly nervous about compressing it.
pages_content is the document data, which is hit after the index (or
when doing a lookup by document id), so compressing it shouldn't have
much performance impact.

Does this seem like a win worth pursuing?

-scott

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev