Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-04-12 Thread Jim Nasby

On 4/8/15 4:58 PM, Tim Uckun wrote:

So is there a third and even faster way of doing this?


Please don't top-post.


On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver mailto:adrian.kla...@aklaver.com>> wrote:

On 04/07/2015 07:49 PM, Tim Uckun wrote:

I understand that there is overhead involved in parsing the
strings and
such.  The amount of overhead was surprising to me but that's
another
matter.  What I am really puzzled about is the difference
between the
statements

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||
' VALUES
(($1).*)' USING NEW ;



Offhand I would say because in the first case you are doing a SELECT
and in the second you are just doing a substitution.



They both do string interpolation but one is significantly
faster than
the other.   Is there a third and even faster way?

I am using RDS so I can't really do stored procs in C.


There are only 3 ways you can "steer" data into the correct partition:

1) Outside the database (ie: the application itself knows what partition 
table to use)

2) Rules
3) Triggers

#1 will almost always be faster (I say almost because if something has 
to decide row-by-row it may be faster for the database to do it).


#2 and #3 depend on the workload. Rules are parsed once PER STATEMENT, 
so if you're using a single INSERT to load a lot of rows they might well 
be faster than triggers. But as mentioned, they're a real PITA to use 
and they don't work at all with COPY.


When it comes to triggers, you will get different performance depending 
on the trigger language used as well as how you write the trigger 
itself. Presumably a trigger function written in C will be faster than 
anything else. I would expect plpgsql to be the next fastest after that, 
but that's just a guess.


As for the difference between SELECT and VALUES above, that's probably 
due to a difference in parsing or in the number of transforms the NEW 
has to go through in the executor. My guess is that because SELECT is 
much more general purpose than VALUES it's both more expensive to parse 
as well as more expensive to execute. If you really want to know for 
certain, connect gdb to a database backend on your laptop/desktop, fire 
off an insert (to load plpgsql into backend memory), set a gdb 
breakpoint on exec_stmt_block(), and see what's different between the 
two use cases.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-04-08 Thread Tim Uckun
So is there a third and even faster way of doing this?

On Wed, Apr 8, 2015 at 4:08 PM, Adrian Klaver 
wrote:

> On 04/07/2015 07:49 PM, Tim Uckun wrote:
>
>> I understand that there is overhead involved in parsing the strings and
>> such.  The amount of overhead was surprising to me but that's another
>> matter.  What I am really puzzled about is the difference between the
>> statements
>>
>> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
>> ($1).*' USING NEW ;
>>
>> and
>>
>>EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES
>> (($1).*)' USING NEW ;
>>
>
>
> Offhand I would say because in the first case you are doing a SELECT and
> in the second you are just doing a substitution.
>
>
>
>> They both do string interpolation but one is significantly faster than
>> the other.   Is there a third and even faster way?
>>
>> I am using RDS so I can't really do stored procs in C.
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-04-07 Thread Adrian Klaver

On 04/07/2015 07:49 PM, Tim Uckun wrote:

I understand that there is overhead involved in parsing the strings and
such.  The amount of overhead was surprising to me but that's another
matter.  What I am really puzzled about is the difference between the
statements

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

   EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES
(($1).*)' USING NEW ;



Offhand I would say because in the first case you are doing a SELECT and 
in the second you are just doing a substitution.




They both do string interpolation but one is significantly faster than
the other.   Is there a third and even faster way?

I am using RDS so I can't really do stored procs in C.




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-04-07 Thread Tim Uckun
I understand that there is overhead involved in parsing the strings and
such.  The amount of overhead was surprising to me but that's another
matter.  What I am really puzzled about is the difference between the
statements

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

  EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES
(($1).*)' USING NEW ;

They both do string interpolation but one is significantly faster than the
other.   Is there a third and even faster way?

I am using RDS so I can't really do stored procs in C.


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-04-07 Thread Jim Nasby

On 3/12/15 8:15 AM, Tomas Vondra wrote:

On 12.3.2015 04:57, Tim Uckun wrote:

I am using postgres 9.4, the default install with "brew install
postgres, no tuning at all.  BTW if I use postgres.app application the
benchmarks run twice as slow!


I have no idea what brew or postgres.app is. But I strongly recommend
you to do some tuning.

   https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server



Why do you think there is such dramatic difference between

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

  EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES(
($1).*)' USING NEW ;

One is thirty percent faster than the other.  Also is there an even
better way that I don't know about.


Because processing dynamic SQL commands (i.e. EXECUTE '...') is simply
more expensive, as it needs to do more stuff (on every execution). There
are reasons for that, but you may think of it as regular queries vs.
prepared statements.

Prepared statements are parsed and planned once, regular query needs to
be parsed and planned over and over again.


BTW, if you're that concerned about performance you could probably do a 
lot better than a plpgsql trigger by creating one in C. There's an 
enormous amount of code involved just in parsing and starting a plpgsql 
trigger, and then it's going to have to re-parse the dynamic SQL for 
every single row, whereas a C trigger could avoid almost all of that.


Rules are likely to be even faster (at least until you get to a fairly 
large number of partitions), but as Thomas mentioned they're very tricky 
to use. The critical thing to remember with them is they're essentially 
hacking on the original query itself.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-12 Thread Tomas Vondra
On 12.3.2015 04:57, Tim Uckun wrote:
> I am using postgres 9.4, the default install with "brew install
> postgres, no tuning at all.  BTW if I use postgres.app application the
> benchmarks run twice as slow!

I have no idea what brew or postgres.app is. But I strongly recommend
you to do some tuning.

  https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

> 
> Why do you think there is such dramatic difference between
> 
> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
> ($1).*' USING NEW ;
> 
> and
> 
>  EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES(
> ($1).*)' USING NEW ;
> 
> One is thirty percent faster than the other.  Also is there an even
> better way that I don't know about.

Because processing dynamic SQL commands (i.e. EXECUTE '...') is simply
more expensive, as it needs to do more stuff (on every execution). There
are reasons for that, but you may think of it as regular queries vs.
prepared statements.

Prepared statements are parsed and planned once, regular query needs to
be parsed and planned over and over again.


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-11 Thread Tim Uckun
I am using postgres 9.4, the default install with "brew install postgres,
no tuning at all.  BTW if I use postgres.app application the benchmarks run
twice as slow!

Why do you think there is such dramatic difference between

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
($1).*' USING NEW ;

and

 EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES(
($1).*)' USING NEW ;

One is thirty percent faster than the other.  Also is there an even better
way that I don't know about.


On Thu, Mar 12, 2015 at 10:42 AM, Tomas Vondra  wrote:

> On 11.3.2015 21:43, Tim Uckun wrote:
> > Hey I hate to bump my post but I would really appreciate some input
> > on this benchmark. I am very alarmed that adding a very simple
> > partitioning trigger slows the insert speed by an order of magnitude.
> > Is there any way to speed this up?
>
> I think to actually give you a meaningful response, we really need more
> details about your configuration - what PostgreSQL version are you
> using, what configuration have you changed from the defaults and such.
>
> Anyway, you're right that triggers are not cheap. The numbers I get on
> the development version with a bit of tuning look like this:
>
> INSERT (direct)1.5 sec
> INSERT 4.0 sec
> INSERT (EXECUTE)  11.5 sec
>
> So it's ~ what you've measured. Rules have the lowest overhead, but also
> there's a lot of tricky places.
>
> There's not much you can do, except for inserting the data directly into
> the right partition (without any triggers).
>
>
> --
> Tomas Vondrahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-11 Thread Tomas Vondra
On 11.3.2015 21:43, Tim Uckun wrote:
> Hey I hate to bump my post but I would really appreciate some input
> on this benchmark. I am very alarmed that adding a very simple
> partitioning trigger slows the insert speed by an order of magnitude.
> Is there any way to speed this up?

I think to actually give you a meaningful response, we really need more
details about your configuration - what PostgreSQL version are you
using, what configuration have you changed from the defaults and such.

Anyway, you're right that triggers are not cheap. The numbers I get on
the development version with a bit of tuning look like this:

INSERT (direct)1.5 sec
INSERT 4.0 sec
INSERT (EXECUTE)  11.5 sec

So it's ~ what you've measured. Rules have the lowest overhead, but also
there's a lot of tricky places.

There's not much you can do, except for inserting the data directly into
the right partition (without any triggers).


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-11 Thread Tim Uckun
Hey I hate to bump my post but I would really appreciate some input on this
benchmark. I am very alarmed that adding a very simple partitioning trigger
slows the insert speed by an order of magnitude.  Is there any way to speed
this up?

On Mon, Mar 9, 2015 at 11:44 AM, Tim Uckun  wrote:

> I wrote a quick benchmarking script to test various partitioning
> strategies. The code is here.
>
> https://gist.github.com/timuckun/954ab6bdce36fa14bc1c
>
> I was astonished at the variability of the timings between the different
> variations.
>
> The test data contained 270K records.  I did a simple insert into without
> any triggers, with three different trigger variations and with a rule.  The
> results were like this
>
> clean  0.00   0.00   0.00 (  3.119498)
> func_1  0.00   0.00   0.00 (  7.435094)
> func_2  0.00   0.00   0.00 ( 28.427617)
> func_3  0.00   0.00   0.00 ( 18.348554)
> Rule   0.00   0.00   0.00 (  2.901931)
>
> A clean insert 3.1 seconds,  putting a rule took less time!
>
> A simple insert into table_name values (NEW.*) doubled the time it takes
> to insert the records.  Using an EXECUTE with an interpolated table name
> took TEN TIMES LONGER!. Making a slight change to the EXECUTE took a third
> off the execution time WTF???
>
> This has left me both baffled and curious.  If changing little things like
> this makes a huge difference what else can I do to make this even faster?
>
> Would using a different language help? Is Javasscript, Python or Perl
> faster?  Is there some other syntax I can use?  I tried this
>
> EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES ('
> || NEW.* || ')' but that gave me an error.
>
>
>


[GENERAL] Benchmarking partitioning triggers and rules

2015-03-08 Thread Tim Uckun
I wrote a quick benchmarking script to test various partitioning
strategies. The code is here.

https://gist.github.com/timuckun/954ab6bdce36fa14bc1c

I was astonished at the variability of the timings between the different
variations.

The test data contained 270K records.  I did a simple insert into without
any triggers, with three different trigger variations and with a rule.  The
results were like this

clean  0.00   0.00   0.00 (  3.119498)
func_1  0.00   0.00   0.00 (  7.435094)
func_2  0.00   0.00   0.00 ( 28.427617)
func_3  0.00   0.00   0.00 ( 18.348554)
Rule   0.00   0.00   0.00 (  2.901931)

A clean insert 3.1 seconds,  putting a rule took less time!

A simple insert into table_name values (NEW.*) doubled the time it takes to
insert the records.  Using an EXECUTE with an interpolated table name took
TEN TIMES LONGER!. Making a slight change to the EXECUTE took a third off
the execution time WTF???

This has left me both baffled and curious.  If changing little things like
this makes a huge difference what else can I do to make this even faster?

Would using a different language help? Is Javasscript, Python or Perl
faster?  Is there some other syntax I can use?  I tried this

EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES ('  ||
NEW.* || ')' but that gave me an error.