Re: [HACKERS] Custom compression methods

2017-11-07 Thread Ildus Kurbangaliev
On Thu, 2 Nov 2017 23:02:34 +0800
Craig Ringer  wrote:

> On 2 November 2017 at 17:41, Ildus Kurbangaliev
>  wrote:
> 
> > In this patch compression methods is suitable for MAIN and EXTENDED
> > storages like in current implementation in postgres. Just instead
> > only of LZ4 you can specify any other compression method.  
> 
> We've had this discussion before.
> 
> Please read the "pluggable compression support" thread. See you in a
> few days ;) sorry, it's kinda long.
> 
> https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de
> 
> IIRC there were some concerns about what happened with pg_upgrade,
> with consuming precious toast bits, and a few other things.
> 

Thank you for the link, I didn't see that thread when I looked over
mailing lists. I read it briefly, and I can address few things
relating to my patch.

Most concerns have been related with legal issues.
Actually that was the reason I did not include any new compression
algorithms to my patch. Unlike that patch mine only provides syntax
and is just a way to give the users use their own compression algorithms
and deal with any legal issues themselves.

I use only one unused bit in header (there's still one free ;), that's
enough to determine that data is compressed or not.

I did found out that pg_upgrade doesn't work properly with my patch,
soon I will send fix for it.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-05 Thread Tom Lane
Robert Haas  writes:
> A basic problem here is that, as proposed, DROP COMPRESSION METHOD may
> break your database irretrievably.  If there's no data compressed
> using the compression method you dropped, everything is cool -
> otherwise everything is broken and there's no way to recover.  The
> only obvious alternative is to disallow DROP altogether (or make it
> not really DROP).

> Both of those alternatives sound fairly unpleasant to me, but I'm not
> exactly sure what to recommend in terms of how to make it better.
> Ideally anything we expose as an SQL command should have a DROP
> command that undoes whatever CREATE did and leaves the database in an
> intact state, but that seems hard to achieve in this case.

If the use of a compression method is tied to specific data types and/or
columns, then each of those could have a dependency on the compression
method, forcing a type or column drop if you did DROP COMPRESSION METHOD.
That would leave no reachable data using the removed compression method.
So that part doesn't seem unworkable on its face.

IIRC, the bigger concerns in the last discussion had to do with
replication, ie, can downstream servers make sense of the data.
Maybe that's not any worse than the issues you get with non-core
index AMs, but I'm not sure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-05 Thread Adam Brusselback
> If there's no data compressed
> using the compression method you dropped, everything is cool -
> otherwise everything is broken and there's no way to recover.
> The only obvious alternative is to disallow DROP altogether (or make it
> not really DROP).

Wouldn't whatever was using the compression method have something
marking which method was used? If so, couldn't we just scan if there is
any data using it, and if so disallow the drop, or possibly an option to allow
the drop and rewrite the table either uncompressed, or with the default
compression method?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-05 Thread Robert Haas
On Sun, Nov 5, 2017 at 2:22 PM, Oleg Bartunov  wrote:
>> IIRC there were some concerns about what happened with pg_upgrade,
>> with consuming precious toast bits, and a few other things.
>
> yes, pg_upgrade may be a problem.

A basic problem here is that, as proposed, DROP COMPRESSION METHOD may
break your database irretrievably.  If there's no data compressed
using the compression method you dropped, everything is cool -
otherwise everything is broken and there's no way to recover.  The
only obvious alternative is to disallow DROP altogether (or make it
not really DROP).

Both of those alternatives sound fairly unpleasant to me, but I'm not
exactly sure what to recommend in terms of how to make it better.
Ideally anything we expose as an SQL command should have a DROP
command that undoes whatever CREATE did and leaves the database in an
intact state, but that seems hard to achieve in this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-05 Thread Oleg Bartunov
On Thu, Nov 2, 2017 at 6:02 PM, Craig Ringer  wrote:
> On 2 November 2017 at 17:41, Ildus Kurbangaliev
>  wrote:
>
>> In this patch compression methods is suitable for MAIN and EXTENDED
>> storages like in current implementation in postgres. Just instead only
>> of LZ4 you can specify any other compression method.
>
> We've had this discussion before.
>
> Please read the "pluggable compression support" thread. See you in a
> few days ;) sorry, it's kinda long.
>
> https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de
>

the proposed patch provides "pluggable" compression and let's user
decide by their own which algorithm to use.
The postgres core doesn't responsible for any patent problem.


> IIRC there were some concerns about what happened with pg_upgrade,
> with consuming precious toast bits, and a few other things.

yes, pg_upgrade may be a problem.

>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-02 Thread Craig Ringer
On 2 November 2017 at 17:41, Ildus Kurbangaliev
 wrote:

> In this patch compression methods is suitable for MAIN and EXTENDED
> storages like in current implementation in postgres. Just instead only
> of LZ4 you can specify any other compression method.

We've had this discussion before.

Please read the "pluggable compression support" thread. See you in a
few days ;) sorry, it's kinda long.

https://www.postgresql.org/message-id/flat/20130621000900.GA12425%40alap2.anarazel.de#20130621000900.ga12...@alap2.anarazel.de

IIRC there were some concerns about what happened with pg_upgrade,
with consuming precious toast bits, and a few other things.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-02 Thread Ildus Kurbangaliev
On Wed, 1 Nov 2017 17:05:58 -0400
Peter Eisentraut  wrote:

> On 9/12/17 10:55, Ildus Kurbangaliev wrote:
> >> The patch also includes custom compression method for tsvector
> >> which is used in tests.
> >>
> >> [1]
> >> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
> >>   
> > Attached rebased version of the patch. Added support of pg_dump, the
> > code was simplified, and a separate cache for compression options
> > was added.  
> 
> I would like to see some more examples of how this would be used, so
> we can see how it should all fit together.
> 
> So far, it's not clear to me that we need a compression method as a
> standalone top-level object.  It would make sense, perhaps, to have a
> compression function attached to a type, so a type can provide a
> compression function that is suitable for its specific storage.

In this patch compression methods is suitable for MAIN and EXTENDED
storages like in current implementation in postgres. Just instead only
of LZ4 you can specify any other compression method. 

Idea is not to change compression for some types, but give the user and
extension developers opportunity to change how data in some attribute
will be compressed because they know about it more than database itself.

> 
> The proposal here is very general: You can use any of the eligible
> compression methods for any attribute.  That seems very complicated to
> manage.  Any attribute could be compressed using either a choice of
> general compression methods or a type-specific compression method, or
> perhaps another type-specific compression method.  That's a lot.  Is
> this about packing certain types better, or trying out different
> compression algorithms, or about changing the TOAST thresholds, and
> so on?

It is about extensibility of postgres, for example if you
need to store a lot of time series data you can create an extension that
stores array of timestamps in more optimized way, using delta encoding
or something else. I'm not sure that such specialized things should be
in core.

In case of array of timestamps in could look like this:

CREATE EXTENSION timeseries; -- some extension that provides compression
method

Extension installs a compression method:

CREATE OR REPLACE FUNCTION timestamps_compression_handler(INTERNAL)
RETURNS COMPRESSION_HANDLER AS 'MODULE_PATHNAME',
'timestamps_compression_handler' LANGUAGE C STRICT;

CREATE COMPRESSION METHOD cm1 HANDLER timestamps_compression_handler;

And user can specify it in his table:

CREATE TABLE t1 (
time_series_data timestamp[] COMPRESSED cm1;
)

I think generalization of some method to a type is not a good idea. For
some attribute you could be happy with builtin LZ4, for other you can
need more compressibility and so on.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Custom compression methods

2017-11-01 Thread Peter Eisentraut
On 9/12/17 10:55, Ildus Kurbangaliev wrote:
>> The patch also includes custom compression method for tsvector which
>> is used in tests.
>>
>> [1]
>> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
> Attached rebased version of the patch. Added support of pg_dump, the
> code was simplified, and a separate cache for compression options was
> added.

I would like to see some more examples of how this would be used, so we
can see how it should all fit together.

So far, it's not clear to me that we need a compression method as a
standalone top-level object.  It would make sense, perhaps, to have a
compression function attached to a type, so a type can provide a
compression function that is suitable for its specific storage.

The proposal here is very general: You can use any of the eligible
compression methods for any attribute.  That seems very complicated to
manage.  Any attribute could be compressed using either a choice of
general compression methods or a type-specific compression method, or
perhaps another type-specific compression method.  That's a lot.  Is
this about packing certain types better, or trying out different
compression algorithms, or about changing the TOAST thresholds, and so on?

Ideally, we would like something that just works, with minimal
configuration and nudging.  Let's see a list of problems to be solved
and then we can discuss what the right set of primitives might be to
address them.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers