Robert Haas wrote:
>On Fri, Jan 2, 2009 at 3:23 PM, Stephen R. van den Berg <s...@cuci.nl> wrote:
>> Three things:
>> a. Shouldn't it in theory be possible to have a decompression algorithm
>>   which is IO-bound because it decompresses faster than the disk can
>>   supply the data?  (On common current hardware).
>> b. Has the current algorithm been carefully benchmarked and/or optimised
>>   and/or chosen to fit the IO-bound target as close as possible?
>> c. Are there any well-known pitfalls/objections which would prevent me from
>>   changing the algorithm to something more efficient (read: IO-bound)?

>Any compression algorithm is going to require you to decompress the
>entire string before extracting a substring at a given offset.  When
>the data is uncompressed, you can jump directly to the offset you want
>to read.  Even if the compression algorithm requires no overhead at
>all, it's going to make the location of the data nondeterministic, and
>therefore force additional disk reads.

That shouldn't be insurmountable:
- I currently have difficulty imagining applications that actually do
  lots of substring extractions from large compressible fields.
  The most likely operation would be a table which contains tsearch
  indexed large textfields, but those are unlikely to participate in
  a lot of substring extractions.
- Even if substring operations would be likely, I could envision a compressed
  format which compresses in compressed chunks of say 64KB which can then
  be addressed randomly independently.
-- 
Sincerely,
           Stephen R. van den Berg.

"Always remember that you are unique.  Just like everyone else."

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to