Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Vitalii Tymchyshyn
There is switch-like sql case:
39.6.2.4. Simple CASE

CASE search-expression
WHEN expression [, expression [ ... ]] THEN
  statements
  [ WHEN expression [, expression [ ... ]] THEN
  statements
... ]
  [ ELSE
  statements ]
END CASE;

It should work like C switch statement.

Also, for bulk insert, have you tried "for each statement" triggers instead
of "for each row"?
This would look like a lot of inserts and would not be fast in
single-row-insert case, but can give you benefit for huge inserts.
It should look like
insert into quotes_2012_09_10 select * from new where
cast(new.received_time as date) = '2012-09-10' ;
insert into quotes_2012_09_11 select * from new where
cast(new.received_time as date) = '2012-09-11' ;
...

2012/12/27 Stephen Frost 

> * Jeff Janes (jeff.ja...@gmail.com) wrote:
> > If the main goal is to make it faster, I'd rather see all of plpgsql get
> > faster, rather than just a special case of partitioning triggers.  For
> > example, right now a CASE  statement with 100 branches is
> about
> > the same speed as an equivalent list of 100 elsif.  So it seems to be
> doing
> > a linear search, when it could be doing a hash that should be a lot
> faster.
>
> That's a nice thought, but I'm not sure that it'd really be practical.
> CASE statements in plpgsql are completely general and really behave more
> like an if/elsif tree than a C-style switch() statement or similar.  For
> one thing, the expression need not use the same variables, could be
> complex multi-variable conditionals, etc.
>
> Figuring out that you could build a dispatch table for a given CASE
> statement and then building it, storing it, and remembering to use it,
> wouldn't be cheap.
>
> On the other hand, I've actually *wanted* a simpler syntax on occation.
> I have no idea if there'd be a way to make it work, but this would be
> kind of nice:
>
> CASE OF x -- or whatever
>   WHEN 1 THEN blah blah
>   WHEN 2 THEN blah blah
>   WHEN 3 THEN blah blah
> END
>
> which would be possible to build into a dispatch table by looking at the
> type of x and the literals used in the overall CASE statement.  Even so,
> there would likely be some number of WHEN conditions required before
> it'd actually be more efficient to use, though perhaps getting rid of
> the expression evaluation (if that'd be possible) would make up for it.
>
> Thanks,
>
> Stephen
>



-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Vitalii Tymchyshyn
BTW: If "select count(*) from new" is fast, you can even choose the
strategy in trigger depending on insert size.


2012/12/28 Vitalii Tymchyshyn 

> There is switch-like sql case:
> 39.6.2.4. Simple CASE
>
> CASE search-expression
> WHEN expression [, expression [ ... ]] THEN
>   statements
>   [ WHEN expression [, expression [ ... ]] THEN
>   statements
> ... ]
>   [ ELSE
>   statements ]
> END CASE;
>
> It should work like C switch statement.
>
> Also, for bulk insert, have you tried "for each statement" triggers
> instead of "for each row"?
> This would look like a lot of inserts and would not be fast in
> single-row-insert case, but can give you benefit for huge inserts.
> It should look like
> insert into quotes_2012_09_10 select * from new where
> cast(new.received_time as date) = '2012-09-10' ;
> insert into quotes_2012_09_11 select * from new where
> cast(new.received_time as date) = '2012-09-11' ;
> ...
>
> 2012/12/27 Stephen Frost 
>
>> * Jeff Janes (jeff.ja...@gmail.com) wrote:
>> > If the main goal is to make it faster, I'd rather see all of plpgsql get
>> > faster, rather than just a special case of partitioning triggers.  For
>> > example, right now a CASE  statement with 100 branches is
>> about
>> > the same speed as an equivalent list of 100 elsif.  So it seems to be
>> doing
>> > a linear search, when it could be doing a hash that should be a lot
>> faster.
>>
>> That's a nice thought, but I'm not sure that it'd really be practical.
>> CASE statements in plpgsql are completely general and really behave more
>> like an if/elsif tree than a C-style switch() statement or similar.  For
>> one thing, the expression need not use the same variables, could be
>> complex multi-variable conditionals, etc.
>>
>> Figuring out that you could build a dispatch table for a given CASE
>> statement and then building it, storing it, and remembering to use it,
>> wouldn't be cheap.
>>
>> On the other hand, I've actually *wanted* a simpler syntax on occation.
>> I have no idea if there'd be a way to make it work, but this would be
>> kind of nice:
>>
>> CASE OF x -- or whatever
>>   WHEN 1 THEN blah blah
>>   WHEN 2 THEN blah blah
>>   WHEN 3 THEN blah blah
>> END
>>
>> which would be possible to build into a dispatch table by looking at the
>> type of x and the literals used in the overall CASE statement.  Even so,
>> there would likely be some number of WHEN conditions required before
>> it'd actually be more efficient to use, though perhaps getting rid of
>> the expression evaluation (if that'd be possible) would make up for it.
>>
>> Thanks,
>>
>> Stephen
>>
>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>



-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Stephen Frost
Vitalii,

* Vitalii Tymchyshyn (tiv...@gmail.com) wrote:
> There is switch-like sql case:
[...]
> It should work like C switch statement.

It does and it doesn't.  It behaves generally like a C switch statement,
but is much more flexible and therefore can't be optimized like a C
switch statement can be.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Pavel Stehule
Hello

>
> Also, for bulk insert, have you tried "for each statement" triggers instead
> of "for each row"?
> This would look like a lot of inserts and would not be fast in
> single-row-insert case, but can give you benefit for huge inserts.
> It should look like
> insert into quotes_2012_09_10 select * from new where cast(new.received_time
> as date) = '2012-09-10' ;
> insert into quotes_2012_09_11 select * from new where cast(new.received_time
> as date) = '2012-09-11' ;
> ...

It has only one problem - PostgreSQL has not relations NEW and OLD for
statements triggers.

Regards

Pavel


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Vitalii Tymchyshyn
Why so? Basic form "case lvalue when rvalue then out ... end" is much like
switch.
The "case when condition then out ... end" is different, more complex
beast, but first one is essentially a switch. If it is now trnasformed into
"case when lvalue = rvalue1 then out1 when lvalue=rvalue2 then out2 ...
end" then this can be optimized and this would benefit many users, not only
ones that use partitioning.


2012/12/28 Stephen Frost 

> Vitalii,
>
> * Vitalii Tymchyshyn (tiv...@gmail.com) wrote:
> > There is switch-like sql case:
> [...]
> > It should work like C switch statement.
>
> It does and it doesn't.  It behaves generally like a C switch statement,
> but is much more flexible and therefore can't be optimized like a C
> switch statement can be.
>
> Thanks,
>
> Stephen
>



-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Vitalii Tymchyshyn
It's a pity. Why does not it listed in "Compatibility" section of create
trigger documentation? I think, this makes "for each statement" triggers
not compatible with SQL99.


2012/12/28 Pavel Stehule 

> Hello
>
> >
> > Also, for bulk insert, have you tried "for each statement" triggers
> instead
> > of "for each row"?
> > This would look like a lot of inserts and would not be fast in
> > single-row-insert case, but can give you benefit for huge inserts.
> > It should look like
> > insert into quotes_2012_09_10 select * from new where
> cast(new.received_time
> > as date) = '2012-09-10' ;
> > insert into quotes_2012_09_11 select * from new where
> cast(new.received_time
> > as date) = '2012-09-11' ;
> > ...
>
> It has only one problem - PostgreSQL has not relations NEW and OLD for
> statements triggers.
>
> Regards
>
> Pavel
>



-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Pavel Stehule
2012/12/28 Vitalii Tymchyshyn :
> Why so? Basic form "case lvalue when rvalue then out ... end" is much like
> switch.
> The "case when condition then out ... end" is different, more complex beast,
> but first one is essentially a switch. If it is now trnasformed into
> "case when lvalue = rvalue1 then out1 when lvalue=rvalue2 then out2 ... end"
> then this can be optimized and this would benefit many users, not only ones
> that use partitioning.

please, look to plpgsql source code. PL/pgSQL is too simply and has
not own arithmetic unit - all is transformed to SELECTs, has not any
optimization. But is really short and maintainable.

These SELECTs are evaluated only when it is necessary - but it is
evaluated by PostgreSQL expression executor - not by PL/pgSQL directly
- PL/pgSQL cannot process constant by self.

So any enhancing needs PL/pgSQL redesign and I am not sure, so this
use case has accurate benefit, because expression bottleneck is only
one part of partitioning triggers bottleneck. More - if you need
really fast code, you can use own code in C - and it be 10x times
faster than any optimized PL/pgSQL code. And using C triggers in
PostgreSQL is not terrible work.

Using plpgsql row triggers for partitioning is not good idea - it is
just work around from my perspective, and we should to solve source of
problem - missing native support.

Regards

Pavel Stehule



>
>
> 2012/12/28 Stephen Frost 
>>
>> Vitalii,
>>
>> * Vitalii Tymchyshyn (tiv...@gmail.com) wrote:
>> > There is switch-like sql case:
>> [...]
>> > It should work like C switch statement.
>>
>> It does and it doesn't.  It behaves generally like a C switch statement,
>> but is much more flexible and therefore can't be optimized like a C
>> switch statement can be.
>>
>> Thanks,
>>
>> Stephen
>
>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Jeff Janes
On Friday, December 28, 2012, Vitalii Tymchyshyn wrote:

> There is switch-like sql case:
> 39.6.2.4. Simple CASE
>
> CASE search-expression
> WHEN expression [, expression [ ... ]] THEN
>   statements
>   [ WHEN expression [, expression [ ... ]] THEN
>   statements
> ... ]
>   [ ELSE
>   statements ]
> END CASE;
>
> It should work like C switch statement.
>
>
I had thought that too, but the catch is that the target expressions do not
need to be constants when the function is created.  Indeed, they can even
be volatile.

CREATE OR REPLACE FUNCTION foo(x integer)
RETURNS integer AS $$
BEGIN
case x
when 0 then return -5;
when (random()*10)::integer then return 1;
when (random()*10)::integer then return 2;
when (random()*10)::integer then return 3;
when (random()*10)::integer then return 4;
when (random()*10)::integer then return 5;
when (random()*10)::integer then return 6;
when (random()*10)::integer then return 7;
when (random()*10)::integer then return 8;
when (random()*10)::integer then return 9;
when (random()*10)::integer then return 10;
else return -6;

Cheers,

Jeff

>


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Stephen Frost
2012/12/28 Vitalii Tymchyshyn :
> Why so? Basic form "case lvalue when rvalue then out ... end" is much like
> switch.

Sorry, to be honest, I missed that distinction and didn't expect that to
work as-is, yet apparently it does.  Does it currently perform the same
as an if/elsif tree or is it implemented to actually use a table lookup?

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
> please, look to plpgsql source code. PL/pgSQL is too simply and has
> not own arithmetic unit - all is transformed to SELECTs, has not any
> optimization. But is really short and maintainable.

I was thinking we'd actually do this for all CASE statements, those in
plpgsql and those in regular SQL, if it's possible to do.  Hopefully
it'd be possible to do easily in plpgsql once the SQL-level CASE is
done.

> These SELECTs are evaluated only when it is necessary - but it is
> evaluated by PostgreSQL expression executor - not by PL/pgSQL directly
> - PL/pgSQL cannot process constant by self.

Right, but I wonder if we could pass the entire CASE tree to the
executor, with essentially pointers to the code blocks which will be
executed, and get back a function which we can call over and over that
takes whatever the parameter is and returns the 'right' pointer?

> So any enhancing needs PL/pgSQL redesign and I am not sure, so this
> use case has accurate benefit, because expression bottleneck is only
> one part of partitioning triggers bottleneck. More - if you need
> really fast code, you can use own code in C - and it be 10x times
> faster than any optimized PL/pgSQL code. And using C triggers in
> PostgreSQL is not terrible work.

It's quite a bit of work for people who don't know C or are
(understandably) concerned about writing things which can easily
segfault the entire backend.

> Using plpgsql row triggers for partitioning is not good idea - it is
> just work around from my perspective, and we should to solve source of
> problem - missing native support.

I agree that native partitioning would certainly be nice.  I was really
hoping that was going to happen for 9.3, but it seems unlikely now
(unless I've missed something).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Stephen Frost
* Jeff Janes (jeff.ja...@gmail.com) wrote:
> I had thought that too, but the catch is that the target expressions do not
> need to be constants when the function is created.  Indeed, they can even
> be volatile.

Right, any optimization in this regard would only work in certain
instances- eg: when the 'WHEN' components are all constants and the data
type is something we can manage, etc, etc.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Pavel Stehule
2012/12/28 Stephen Frost :
> 2012/12/28 Vitalii Tymchyshyn :
>> Why so? Basic form "case lvalue when rvalue then out ... end" is much like
>> switch.
>
> Sorry, to be honest, I missed that distinction and didn't expect that to
> work as-is, yet apparently it does.  Does it currently perform the same
> as an if/elsif tree or is it implemented to actually use a table lookup?

both IF and CASE has very similar implementation - table lookup is not
used - there are not special path for searching constants

>
> * Pavel Stehule (pavel.steh...@gmail.com) wrote:
>> please, look to plpgsql source code. PL/pgSQL is too simply and has
>> not own arithmetic unit - all is transformed to SELECTs, has not any
>> optimization. But is really short and maintainable.
>
> I was thinking we'd actually do this for all CASE statements, those in
> plpgsql and those in regular SQL, if it's possible to do.  Hopefully
> it'd be possible to do easily in plpgsql once the SQL-level CASE is
> done.
>

I am not sure - SQL case is not heavy specially optimized too :(

I see only one possible way, do almost work when CASE statement is
parsed and bypass executor - this can work, but I afraid so it can
slowdown first start and some use cases where is not too much paths,
because we have to check all paths before executions.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Jeff Janes
On Thursday, December 20, 2012, Scott Marlowe wrote:

>
> 3: Someone above mentioned rules being faster than triggers.  In my
> experience they're WAY slower than triggers but maybe that was just on
> the older pg versions (8.3 and lower) we were doing this on.  I'd be
> interested in seeing some benchmarks if rules have gotten faster or I
> was just doing it wrong.
>

It apparently depends on how you use them.

To load 1e6 rows into the parent, redistributing to 100 partitions (rows
evenly distributed over partitions) using RULEs, it took 14.5 seconds using
a "insert into foo select * from foo_tmp" (not counting the time it took to
prepopulate the foo_tmp via \copy).

This is about 25% faster than the 18.4 seconds it took to load the same
data via \copy using a plpgsql trigger which was structured with nested IF
... ELSE...END IF that do a binary search over the partitions.

However if I didn't use \copy or "insert into...select", but rather used a
Perl loop invoking normal single-row inserts (but all in a single
transaction) with DBD::Pg, then the RULEs took 596 seconds, an astonishing
seven times slower than the 83 seconds it took the previously mentioned
plpgsql trigger to do the same thing.

This was under 9.1.7.

In 9.2.2, it seems to get 3 times worse yet for RULEs in the insert loop.
 But that result seems hard to believe, so I am repeating it.

Cheers

Jeff

>
>


RES: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Luciano Ernesto da Silva
UNSUBSCRIBE

 

De: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] Em nome de Jeff Janes
Enviada em: sexta-feira, 28 de dezembro de 2012 14:31
Para: Scott Marlowe
Cc: Tom Lane; Charles Gomes; Ondrej Ivanič; pgsql-performance@postgresql.org
Assunto: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

 



On Thursday, December 20, 2012, Scott Marlowe wrote:


3: Someone above mentioned rules being faster than triggers. In my
experience they're WAY slower than triggers but maybe that was just on
the older pg versions (8.3 and lower) we were doing this on. I'd be
interested in seeing some benchmarks if rules have gotten faster or I
was just doing it wrong.

 

It apparently depends on how you use them.

 

To load 1e6 rows into the parent, redistributing to 100 partitions (rows evenly 
distributed over partitions) using RULEs, it took 14.5 seconds using a "insert 
into foo select * from foo_tmp" (not counting the time it took to prepopulate 
the foo_tmp via \copy).

 

This is about 25% faster than the 18.4 seconds it took to load the same data 
via \copy using a plpgsql trigger which was structured with nested IF ... 
ELSE...END IF that do a binary search over the partitions.

However if I didn't use \copy or "insert into...select", but rather used a Perl 
loop invoking normal single-row inserts (but all in a single transaction) with 
DBD::Pg, then the RULEs took 596 seconds, an astonishing seven times slower 
than the 83 seconds it took the previously mentioned plpgsql trigger to do the 
same thing.

 

This was under 9.1.7. 

 

In 9.2.2, it seems to get 3 times worse yet for RULEs in the insert loop. But 
that result seems hard to believe, so I am repeating it.

 

Cheers

 

Jeff

 



Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

2012-12-28 Thread Pavel Stehule
Hello

2012/12/28 Luciano Ernesto da Silva :
> UNSUBSCRIBE
>
>
>
> De: pgsql-performance-ow...@postgresql.org
> [mailto:pgsql-performance-ow...@postgresql.org] Em nome de Jeff Janes
> Enviada em: sexta-feira, 28 de dezembro de 2012 14:31
> Para: Scott Marlowe
> Cc: Tom Lane; Charles Gomes; Ondrej Ivanič; pgsql-performance@postgresql.org
> Assunto: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>
>
>
>
>
> On Thursday, December 20, 2012, Scott Marlowe wrote:
>
>
> 3: Someone above mentioned rules being faster than triggers. In my
> experience they're WAY slower than triggers but maybe that was just on
> the older pg versions (8.3 and lower) we were doing this on. I'd be
> interested in seeing some benchmarks if rules have gotten faster or I
> was just doing it wrong.
>
>

I am not sure, but I expect so speed or slowness of rules depends
primary on number of partitions. More significantly than triggers.

Regards

Pavel

>
> It apparently depends on how you use them.
>
>
>
> To load 1e6 rows into the parent, redistributing to 100 partitions (rows
> evenly distributed over partitions) using RULEs, it took 14.5 seconds using
> a "insert into foo select * from foo_tmp" (not counting the time it took to
> prepopulate the foo_tmp via \copy).
>
>
>
> This is about 25% faster than the 18.4 seconds it took to load the same data
> via \copy using a plpgsql trigger which was structured with nested IF ...
> ELSE...END IF that do a binary search over the partitions.
>
> However if I didn't use \copy or "insert into...select", but rather used a
> Perl loop invoking normal single-row inserts (but all in a single
> transaction) with DBD::Pg, then the RULEs took 596 seconds, an astonishing
> seven times slower than the 83 seconds it took the previously mentioned
> plpgsql trigger to do the same thing.
>
>
>
> This was under 9.1.7.
>
>
>
> In 9.2.2, it seems to get 3 times worse yet for RULEs in the insert loop.
> But that result seems hard to believe, so I am repeating it.
>
>
>
> Cheers
>
>
>
> Jeff
>
>


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance