Re: [GENERAL] Performance appending to an array column

2017-09-22 Thread Imre Samu
>I also tried cstore_fdw for this, but my queries
>(building a 2-D histogram) were taking 4+ seconds,
>compared to 500ms  using arrays.
> ...
> but maybe I could write my own extension

Have you checked the new TimescaleDB extension? [
https://github.com/timescale/timescaledb ]
"TimescaleDB is packaged as a PostgreSQL extension and released under the
Apache 2 open-source license."

"TimescaleDB is an open-source database designed to make SQL scalable for
time-series data.
It is engineered up from PostgreSQL, providing automatic partitioning
across time and space (partitioning key), as well as full SQL support."

and it has a built in histogram function:
https://docs.timescale.com/latest/api/api-timescaledb#histogram

Regards,
Imre





2017-09-21 23:05 GMT+02:00 Paul A Jungwirth :

> > It's going to suck big-time :-(.
>
> Ha ha that's what I thought, but thank you for confirming. :-)
>
> > We ended up keeping
> > the time series data outside the DB; I doubt the conclusion would be
> > different today.
>
> Interesting. That seems a little radical to me, but I'll consider it
> more seriously now. I also tried cstore_fdw for this, but my queries
> (building a 2-D histogram) were taking 4+ seconds, compared to 500ms
> using arrays. Putting everything into regular files gives up filtering
> and other SQL built-ins, but maybe I could write my own extension to
> load regular files into Postgres arrays, sort of getting the best of
> both worlds.
>
> Anyway, thanks for sharing your experience!
>
> Yours,
> Paul
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Performance appending to an array column

2017-09-21 Thread Thomas Kellerer

Paul A Jungwirth schrieb am 21.09.2017 um 23:05:

but maybe I could write my own extension to
load regular files into Postgres arrays, sort of getting the best of
both worlds.


There is a foreign data wrapper for that:

   https://github.com/adunstan/file_text_array_fdw

but it's pretty old and seems un-maintained.









--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance appending to an array column

2017-09-21 Thread Paul A Jungwirth
> It's going to suck big-time :-(.

Ha ha that's what I thought, but thank you for confirming. :-)

> We ended up keeping
> the time series data outside the DB; I doubt the conclusion would be
> different today.

Interesting. That seems a little radical to me, but I'll consider it
more seriously now. I also tried cstore_fdw for this, but my queries
(building a 2-D histogram) were taking 4+ seconds, compared to 500ms
using arrays. Putting everything into regular files gives up filtering
and other SQL built-ins, but maybe I could write my own extension to
load regular files into Postgres arrays, sort of getting the best of
both worlds.

Anyway, thanks for sharing your experience!

Yours,
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance appending to an array column

2017-09-21 Thread Tom Lane
Paul A Jungwirth  writes:
> I'm considering a table structure where I'd be continuously appending
> to long arrays of floats (10 million elements or more). Keeping the
> data in arrays gives me much faster SELECT performance vs keeping it
> in millions of rows.

> But since these arrays keep growing, I'm wondering about the UPDATE
> performance.

It's going to suck big-time :-(.  You'd be constantly replacing all
of a multi-megabyte toasted field.  Even if the UPDATE speed per se
seemed tolerable, this would be pretty nasty in terms of the
vacuuming overhead and/or bloat it would impose.

My very first use of Postgres, twenty years ago, involved time series
data which perhaps is much like what you're doing.  We ended up keeping
the time series data outside the DB; I doubt the conclusion would be
different today.  I seem to recall having heard about a commercial fork
of PG that is less bad for this type of data, but the community code
is not the weapon you want.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance appending to an array column

2017-09-21 Thread Paul A Jungwirth
I'm considering a table structure where I'd be continuously appending
to long arrays of floats (10 million elements or more). Keeping the
data in arrays gives me much faster SELECT performance vs keeping it
in millions of rows.

But since these arrays keep growing, I'm wondering about the UPDATE
performance. I was reading this commit message about improving
performance of *overwriting* individual array elements, and I was
wondering if there is anything similar for growing an array column?:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1dc5ebc9077ab742

Is there a faster way to append to an array than just this?:

UPDATE measurements
SET vals = vals || ARRAY[5.0, 4.2, 9.9]::float[]
;

Thanks!
Paul


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance with high correlation in group by on PK

2017-09-15 Thread Alban Hertroys
On 8 September 2017 at 00:23, Jeff Janes  wrote:
> On Tue, Aug 29, 2017 at 1:20 AM, Alban Hertroys  wrote:
>>
>> On 28 August 2017 at 21:32, Jeff Janes  wrote:
>> > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys 
>> > wrote:
>> >>
>> >> Hi all,
>> >>
>> >> It's been a while since I actually got to use PG for anything serious,
>> >> but we're finally doing some experimentation @work now to see if it is
>> >> suitable for our datawarehouse. So far it's been doing well, but there
>> >> is a particular type of query I run into that I expect we will
>> >> frequently use and that's choosing a sequential scan - and I can't
>> >> fathom why.
>> >>
>> >> This is on:
>> >>
>> >>
>> >> The query in question is:
>> >> select "VBAK_MANDT", max("VBAK_VBELN")
>> >>   from staging.etl1_vbak
>> >>  group by "VBAK_MANDT";
>> >>
>> >> This is the header-table for another detail table, and in this case
>> >> we're already seeing a seqscan. The thing is, there are 15M rows in
>> >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
>> >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
>> >> we only have 1 at the moment!).
>> >
>> >
>> > You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
>> > currently detect and implement them automatically, but you can use a
>> > recursive CTE to get it to work.  There are some examples at
>> > https://wiki.postgresql.org/wiki/Loose_indexscan
>>
>> Thanks Jeff, that's an interesting approach. It looks very similar to
>> correlated subqueries.
>>
>> Unfortunately, it doesn't seem to help with my issue. The CTE is
>> indeed fast, but when querying the results from the 2nd level ov the
>> PK with the CTE results, I'm back at a seqscan on pdw2_vbak again.
>
>
> Something like this works:
>
> create table foo as select trunc(random()*5) as col1, random() as col2 from
> generate_series(1,1);
> create index on foo (col1, col2);
> vacuum analyze foo;
>
>
> with recursive t as (
>select * from (select col1, col2 from foo order by col1 desc, col2 desc
> limit 1) asdfsaf
> union all
>   select
>  (select col1 from foo where foo.col1 < t.col1 order by col1 desc, col2
> desc limit 1) as col1,
>  (select col2 from foo where foo.col1 < t.col1 order by col1 desc, col2
> desc limit 1) as col2
>from t where t.col1 is not null
> )
> select * from t where t is not null;
>
> It is pretty ugly that you need one subquery in the select list for each
> column to be returned.  Maybe someone can find a way to avoid that part.  I
> tried using lateral joins to get around it, but couldn't make that work.
>
> Cheers,
>
> Jeff

Thanks Jeff. That does indeed look ugly.

Since we're dealing with a 4GL language (FOCUS) that translates to
SQL, I don't think we'll attempt your workaround, even though we can
use SQL directly in our reports if we want to.

But, I just remembered giving someone else in a similar situation some
advice on this very list; Obviously, when my first primary key field
is not very selective, I should change the order of the fields in the
PK!

But let's first enjoy the weekend.

Alban.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance with high correlation in group by on PK

2017-09-07 Thread Jeff Janes
On Tue, Aug 29, 2017 at 1:20 AM, Alban Hertroys  wrote:

> On 28 August 2017 at 21:32, Jeff Janes  wrote:
> > On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys 
> wrote:
> >>
> >> Hi all,
> >>
> >> It's been a while since I actually got to use PG for anything serious,
> >> but we're finally doing some experimentation @work now to see if it is
> >> suitable for our datawarehouse. So far it's been doing well, but there
> >> is a particular type of query I run into that I expect we will
> >> frequently use and that's choosing a sequential scan - and I can't
> >> fathom why.
> >>
> >> This is on:
> >>
> >>
> >> The query in question is:
> >> select "VBAK_MANDT", max("VBAK_VBELN")
> >>   from staging.etl1_vbak
> >>  group by "VBAK_MANDT";
> >>
> >> This is the header-table for another detail table, and in this case
> >> we're already seeing a seqscan. The thing is, there are 15M rows in
> >> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
> >> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
> >> we only have 1 at the moment!).
> >
> >
> > You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
> > currently detect and implement them automatically, but you can use a
> > recursive CTE to get it to work.  There are some examples at
> > https://wiki.postgresql.org/wiki/Loose_indexscan
>
> Thanks Jeff, that's an interesting approach. It looks very similar to
> correlated subqueries.
>
> Unfortunately, it doesn't seem to help with my issue. The CTE is
> indeed fast, but when querying the results from the 2nd level ov the
> PK with the CTE results, I'm back at a seqscan on pdw2_vbak again.
>

Something like this works:

create table foo as select trunc(random()*5) as col1, random() as col2 from
generate_series(1,1);
create index on foo (col1, col2);
vacuum analyze foo;


with recursive t as (
   select * from (select col1, col2 from foo order by col1 desc, col2 desc
limit 1) asdfsaf
union all
  select
 (select col1 from foo where foo.col1 < t.col1 order by col1 desc, col2
desc limit 1) as col1,
 (select col2 from foo where foo.col1 < t.col1 order by col1 desc, col2
desc limit 1) as col2
   from t where t.col1 is not null
)
select * from t where t is not null;

It is pretty ugly that you need one subquery in the select list for each
column to be returned.  Maybe someone can find a way to avoid that part.  I
tried using lateral joins to get around it, but couldn't make that work.

Cheers,

Jeff


Re: [GENERAL] Performance with high correlation in group by on PK

2017-08-29 Thread Alban Hertroys
On 28 August 2017 at 21:32, Jeff Janes  wrote:
> On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys  wrote:
>>
>> Hi all,
>>
>> It's been a while since I actually got to use PG for anything serious,
>> but we're finally doing some experimentation @work now to see if it is
>> suitable for our datawarehouse. So far it's been doing well, but there
>> is a particular type of query I run into that I expect we will
>> frequently use and that's choosing a sequential scan - and I can't
>> fathom why.
>>
>> This is on:
>>
>>
>> The query in question is:
>> select "VBAK_MANDT", max("VBAK_VBELN")
>>   from staging.etl1_vbak
>>  group by "VBAK_MANDT";
>>
>> This is the header-table for another detail table, and in this case
>> we're already seeing a seqscan. The thing is, there are 15M rows in
>> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
>> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
>> we only have 1 at the moment!).
>
>
> You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
> currently detect and implement them automatically, but you can use a
> recursive CTE to get it to work.  There are some examples at
> https://wiki.postgresql.org/wiki/Loose_indexscan

Thanks Jeff, that's an interesting approach. It looks very similar to
correlated subqueries.

Unfortunately, it doesn't seem to help with my issue. The CTE is
indeed fast, but when querying the results from the 2nd level ov the
PK with the CTE results, I'm back at a seqscan on pdw2_vbak again.

Just the CTE plan is in skipScan-top.sql.
The complete plan is in skipScan-full.sql

Note: I cloned the original etl1_vbak table into a new
pdw2_vbak table that has varchars instead of chars, which reduced
the table size to just over half the original's size. Hence the
different table names, but the behaviour for this particular issue is
the same between them.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

warehouse=# explain (analyze, buffers)
with recursive t as (
select min("VBAK_MANDT") as "VBAK_MANDT" from staging.pdw2_vbak
union all
select (select min("VBAK_MANDT") as "VBAK_MANDT" from staging.pdw2_vbak 
where "VBAK_MANDT" > t."VBAK_MANDT") from t where t."VBAK_MANDT" is not null
) select "VBAK_MANDT" from t;

   QUERY PLAN  
-
 CTE Scan on t  (cost=98.31..100.33 rows=101 width=32) (actual 
time=0.031..0.054 rows=2 loops=1)
   Buffers: shared hit=9
   CTE t
 ->  Recursive Union  (cost=0.73..98.31 rows=101 width=32) (actual 
time=0.029..0.052 rows=2 loops=1)
   Buffers: shared hit=9
   ->  Result  (cost=0.73..0.74 rows=1 width=32) (actual 
time=0.029..0.029 rows=1 loops=1)
 Buffers: shared hit=5
 InitPlan 3 (returns $1)
   ->  Limit  (cost=0.56..0.73 rows=1 width=32) (actual 
time=0.026..0.027 rows=1 loops=1)
 Buffers: shared hit=5
 ->  Index Only Scan using pdw2_vbak_pkey on 
pdw2_vbak pdw2_vbak_1  (cost=0.56..2375293.75 rows=14214332 width=32) 
(actual time=0.024..0.024 rows=1 loops=1)
   Index Cond: ("VBAK_MANDT" IS NOT NULL)
   Heap Fetches: 1
   Buffers: shared hit=5
   ->  WorkTable Scan on t t_1  (cost=0.00..9.56 rows=10 width=32) 
(actual time=0.009..0.010 rows=0 loops=2)
 Filter: ("VBAK_MANDT" IS NOT NULL)
 Rows Removed by Filter: 0
 Buffers: shared hit=4
 SubPlan 2
   ->  Result  (cost=0.93..0.94 rows=1 width=32) (actual 
time=0.015..0.015 rows=1 loops=1)
 Buffers: shared hit=4
 InitPlan 1 (returns $3)
   ->  Limit  (cost=0.56..0.93 rows=1 width=32) (actual 
time=0.013..0.013 rows=0 loops=1)
 Buffers: shared hit=4
 ->  Index Only Scan using pdw2_vbak_pkey 
on pdw2_vbak  (cost=0.56..1732075.91 rows=4738111 width=32) (actual 
time=0.012..0.012 rows=0 loops=1)
   Index Cond: (("VBAK_MANDT" IS NOT NULL) 
AND ("VBAK_MANDT" > t_1."VBAK_MANDT"))
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning time: 0.346 ms
 Execution time: 0.100 ms
(30 rows)
warehouse=# explain (analyze, buffers)
with recursive t as (
select min("VBAK_MANDT") as "VBAK_MANDT" from staging.pdw2_vbak
union all
select (select min("VBAK_MANDT") as 

Re: [GENERAL] Performance with high correlation in group by on PK

2017-08-28 Thread Jeff Janes
On Mon, Aug 28, 2017 at 5:22 AM, Alban Hertroys  wrote:

> Hi all,
>
> It's been a while since I actually got to use PG for anything serious,
> but we're finally doing some experimentation @work now to see if it is
> suitable for our datawarehouse. So far it's been doing well, but there
> is a particular type of query I run into that I expect we will
> frequently use and that's choosing a sequential scan - and I can't
> fathom why.
>
> This is on:
>
>
> The query in question is:
> select "VBAK_MANDT", max("VBAK_VBELN")
>   from staging.etl1_vbak
>  group by "VBAK_MANDT";
>
> This is the header-table for another detail table, and in this case
> we're already seeing a seqscan. The thing is, there are 15M rows in
> the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
> "VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
> we only have 1 at the moment!).
>

You need an "index skip-scan" or "loose index scan".  PostgreSQL doesn't
currently detect and implement them automatically, but you can use a
recursive CTE to get it to work.  There are some examples at
https://wiki.postgresql.org/wiki/Loose_indexscan

Cheers,

Jeff


Re: [GENERAL] Performance with high correlation in group by on PK

2017-08-28 Thread Alban Hertroys
On 28 August 2017 at 14:22, Alban Hertroys  wrote:

> This is on:
Just noticed I forgot to paste this in:
warehouse=# select version();
 version
--
 PostgreSQL 9.6.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)


--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance with high correlation in group by on PK

2017-08-28 Thread Alban Hertroys
Hi all,

It's been a while since I actually got to use PG for anything serious,
but we're finally doing some experimentation @work now to see if it is
suitable for our datawarehouse. So far it's been doing well, but there
is a particular type of query I run into that I expect we will
frequently use and that's choosing a sequential scan - and I can't
fathom why.

This is on:


The query in question is:
select "VBAK_MANDT", max("VBAK_VBELN")
  from staging.etl1_vbak
 group by "VBAK_MANDT";

This is the header-table for another detail table, and in this case
we're already seeing a seqscan. The thing is, there are 15M rows in
the table (disk usage is 15GB), while the PK is on ("VBAK_MANDT",
"VBAK_VBELN") with very few distinct values for "VBAK_MANDT" (in fact,
we only have 1 at the moment!).

Explain analyze says the following about this query:
warehouse=# explain (analyze, buffers) select "VBAK_MANDT",
max("VBAK_VBELN") from staging.etl1_vbak group by "VBAK_MANDT";
  QUERY PLAN
--
 HashAggregate  (cost=1990054.08..1990054.09 rows=1 width=36) (actual
time=38723.602..38723.602 rows=1 loops=1)
   Group Key: "VBAK_MANDT"
   Buffers: shared hit=367490 read=1409344
   ->  Seq Scan on etl1_vbak  (cost=0.00..1918980.72 rows=14214672
width=15) (actual time=8.886..31317.283 rows=14214672 loops=1)
 Buffers: shared hit=367490 read=1409344
 Planning time: 0.126 ms
 Execution time: 38723.674 ms
(7 rows)

As you can see, a sequential scan. The statistics seem quite correct.

If I change the query to select a single value of "VBAK_MANDT" we get:
warehouse=# explain (analyze, buffers) select max("VBAK_VBELN") from
staging.etl1_vbak where "VBAK_MANDT" = '300';

   QUERY PLAN
---
 Result  (cost=1.37..1.38 rows=1 width=32) (actual time=14.911..14.911
rows=1 loops=1)
   Buffers: shared hit=2 read=3
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.56..1.37 rows=1 width=11) (actual
time=14.907..14.908 rows=1 loops=1)
   Buffers: shared hit=2 read=3
   ->  Index Only Scan Backward using etl1_vbak_pkey on
etl1_vbak  (cost=0.56..11498362.31 rows=14214672 width=11) (actual
time=14.906..14.906 rows=1 loops=1)
 Index Cond: (("VBAK_MANDT" = '300'::bpchar) AND
("VBAK_VBELN" IS NOT NULL))
 Heap Fetches: 1
 Buffers: shared hit=2 read=3
 Planning time: 0.248 ms
 Execution time: 14.945 ms
(11 rows)

That is more in line with my expectations.

Oddly enough, adding "MANDT_VBAK" and the group by back into that last
query, the result is a seqscan again.

For "VBAK_MANDT" we see these statistics:
Null fraction: 0
Average width: 4
Distinct values: 1
Most common values: {300}
Most common frequencies: {1}
Histogram bounds :
Correlation :1

The table definition is:
  Table "staging.etl1_vbak"
   Column| Type  | Modifiers
-+---+---
 VBAK_MANDT  | character(3)  | not null
 VBAK_VBELN  | character(10) | not null
 VBAK_ERDAT  | date  | not null
 VBAK_ERZET  | character(6)  | not null
 VBAK_ERNAM  | character(12) | not null
 VBAK_ANGDT  | date  | not null
 VBAK_BNDDT  | date  | not null
 VBAK_AUDAT  | date  | not null
...
VBAK_MULTI  | character(1)  | not null
 VBAK_SPPAYM | character(2)  | not null
Indexes:
"etl1_vbak_pkey" PRIMARY KEY, btree ("VBAK_MANDT", "VBAK_VBELN")
"idx_etl1_vbak_erdat" btree ("VBAK_ERDAT")

A final remark: The table definition was generated by our
reporting/ETL software, hence the case-sensitive column names and the
use of the character type instead of varchar (or text).

What can be done to help the planner choose a smarter plan?
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Merlin Moncure
On Mon, Jun 19, 2017 at 2:29 PM, Rob Nikander  wrote:
> I'm wondering about the tradeoffs, specifically: is it possible to update
> one piece of a jsonb value without having to rewrite the entire field? There
> are cases where that data field was getting pretty big (500kb). Would you
> expect any obvious performance differences between these two options?

You are basically asking, what are the relative
advantages/disadvantages of document model vs data store?  This is a
complicated discussion.  Here are some tradeoffs:

*) Document is always read/written in bulk.  Row data reads/writes are
more discrete (but generally postgres always reads/writes 8kb
minimum!)

*) for documents transaction tracking is for the entire document.
This is more efficient for storage but can have very serious
consequences if sub-portions of the document are updated under heavy
concurrency.

*) Documents are a pain if the structure changes in such a way so as
to require invalidation of all of them.

*) Documents can be a *real* pain if the data relationships change in
some fundamental way.   This is a pain with traditional tables as
well, but relational type models tend to be the most flexible vs other
approaches.

Basically there is a reason why SQL and relational type systems won
the 'data wars' of the 1970's and 1980's.   There are downsides to the
basic approach (especially performance due to joining) but the
simplicity and elegance of being able to model just about any problem
tends to compensate certain performance disadvantages.

Document style storage tends to move the database model out of the
database and into the application (which is a very poor tradeoff IMO)
and fall over when some of the initial assumptions with respect to the
document modeling discrete business units break down; you end up
storing the same information over and over in different documents
which causes all kinds of problems.  They do tend to work well in low-
or no- update applications however.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Dmitry Dolgov
> On 19 June 2017 at 21:29, Rob Nikander  wrote:
>
> I'm wondering about the tradeoffs, specifically: is it possible to update
one piece of a jsonb value without having to rewrite the entire field?
There are cases where that data field was getting pretty big (500kb). Would
you expect any obvious performance differences between these two options?

Unfortunately no, an entire jsonb field has to be written back even if
you've touched only one key.
>From my own benchmarks it looks like you'll scarcely notice this (e.g. in
comparison with MongoDB) only if you work
with small enough documents (about 2kb), and everything above this limit
more or less seriously hit the performance.
You can take a look at this presentation [1] from Oleg Bartunov, it
contains results of some benchmarks (from slide 44).

[1]: http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconf.us-2017.pdf


Re: [GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Andreas Kretschmer
Am 19. Juni 2017 21:29:40 MESZ schrieb Rob Nikander :

>
>I'm wondering about the tradeoffs, specifically: is it possible to
>update one piece of a jsonb value without having to rewrite the entire
>field? 


Updates in PostgreSQL are always Delete & Insert. So the answer is no. For 
session-data please consider unlogged tables.


Regards, Andreas


-- 
2ndQuadrant - The PostgreSQL Support Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] performance considerations of jsonb vs separate rows

2017-06-19 Thread Rob Nikander
Hi,

I’ve got a web app where I want to store user’s session data. The schema in 
this data changes a lot so it may be useful here to store the session 
properties in either a jsonb column, or in multiple rows. Something like:

  session_id | data 
  100  { a: 1, bar: 2 ...
  101  { a: 3, baz: 123 …

or

  session_id | name | value
  100  a  1
  100  bar2
  101  baz123
  101  a  3
  ... 

The app currently does something like option 1, but on an older pre-jsonb 
version of postgres, so the field is just text. I’m hoping to upgrade Postgres 
soon so jsonb is an option.

I'm wondering about the tradeoffs, specifically: is it possible to update one 
piece of a jsonb value without having to rewrite the entire field? There are 
cases where that data field was getting pretty big (500kb). Would you expect 
any obvious performance differences between these two options?

Yes, I’ll need to build performance tests myself, but that’s a lot of work to 
get two realistic situations with millions of rows, so I’m wondering about 
guesses or common knowledge on this.

thanks,
Rob

Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-12 Thread Eric Lemoine

> Another idea, if you haven't tried it already, is to run these test cases
> in a server built with --enable-debug and --enable-cassert.  The memory
> clobber stuff that's enabled by the latter is very good at turning coding
> errors into reproducible, debuggable crashes ;-)
> 
>   regards, tom lane


We found the cause of the issue! Both the PostGIS and the Pointcloud
extensions define the hexbytes_from_bytes function, and the PostGIS
version is faster. The fix involves prefixing the function name in
Pointcloud, and using a similar implementation as PostGIS [*].

Thanks a lot for helping me fix that issue.

[*] 


-- 
Éric Lemoine
Oslandia


<>

signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-09 Thread Tom Lane
Jeff Janes  writes:
> On Fri, Jun 9, 2017 at 9:13 AM, Eric Lemoine 
> wrote:
>> I now think that the performance bug is not related to the fn_extra
>> thing. I had hope but not anymore :) I don't see where the Pointcloud
>> and PostGIS extensions could conflict.

> Can you run 'perf top' on the slow query?  That might pretty quickly tell
> you which function is taking up your time.

Another idea, if you haven't tried it already, is to run these test cases
in a server built with --enable-debug and --enable-cassert.  The memory
clobber stuff that's enabled by the latter is very good at turning coding
errors into reproducible, debuggable crashes ;-)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-09 Thread Jeff Janes
On Fri, Jun 9, 2017 at 9:13 AM, Eric Lemoine 
wrote:

> On 06/08/2017 10:41 PM, Éric wrote:
> >
> >
> >
> >> Have you experimented with other queries that don't involve PostGIS?
> >> I'm wondering if your hook-installation code fails to work properly
> >> unless PostGIS was loaded first.  This would be easier to credit if
> >> there are hooks both extensions try to get into.
> >
> >
> > I think you're right on Tom. It looks like I cannot reproduce the issue
> if I start by calling a PostGIS function rather than a Pointcloud function.
> So it may well be a conflict between PostGIS and Pointcloud. Both use
> fn_extra, and that makes we wonder. This old thread [*] makes me wonder
> too! I still need to figure out the bug, but I can see some light now!
> thanks
> >
> > [*]  984D0F47C5FF4D0DB0D71A4F6EF670ED%40cleverelephant.ca#
> 984d0f47c5ff4d0db0d71a4f6ef67...@cleverelephant.ca>
>
>
> I now think that the performance bug is not related to the fn_extra
> thing. I had hope but not anymore :) I don't see where the Pointcloud
> and PostGIS extensions could conflict.
>

Can you run 'perf top' on the slow query?  That might pretty quickly tell
you which function is taking up your time.

Cheers,

Jeff


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-09 Thread Adrian Klaver

On 06/09/2017 09:13 AM, Eric Lemoine wrote:

On 06/08/2017 10:41 PM, Éric wrote:





Have you experimented with other queries that don't involve PostGIS?
I'm wondering if your hook-installation code fails to work properly
unless PostGIS was loaded first.  This would be easier to credit if
there are hooks both extensions try to get into.



I think you're right on Tom. It looks like I cannot reproduce the issue if I 
start by calling a PostGIS function rather than a Pointcloud function. So it 
may well be a conflict between PostGIS and Pointcloud. Both use fn_extra, and 
that makes we wonder. This old thread [*] makes me wonder too! I still need to 
figure out the bug, but I can see some light now! thanks

[*] 




I now think that the performance bug is not related to the fn_extra
thing. I had hope but not anymore :) I don't see where the Pointcloud
and PostGIS extensions could conflict.



Crank up the logging detail?:

https://www.postgresql.org/docs/9.6/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

log_min_messages (enum)

https://www.postgresql.org/docs/9.6/static/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

log_connections (boolean)

log_disconnections (boolean)

log_duration (boolean)


--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-09 Thread Eric Lemoine
On 06/08/2017 10:41 PM, Éric wrote:
> 
> 
> 
>> Have you experimented with other queries that don't involve PostGIS?
>> I'm wondering if your hook-installation code fails to work properly
>> unless PostGIS was loaded first.  This would be easier to credit if
>> there are hooks both extensions try to get into.
> 
> 
> I think you're right on Tom. It looks like I cannot reproduce the issue if I 
> start by calling a PostGIS function rather than a Pointcloud function. So it 
> may well be a conflict between PostGIS and Pointcloud. Both use fn_extra, and 
> that makes we wonder. This old thread [*] makes me wonder too! I still need 
> to figure out the bug, but I can see some light now! thanks
> 
> [*] 
> 


I now think that the performance bug is not related to the fn_extra
thing. I had hope but not anymore :) I don't see where the Pointcloud
and PostGIS extensions could conflict.

-- 
Éric Lemoine
Oslandia
+33 1 86 95 95 55
<>

signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Éric



>Have you experimented with other queries that don't involve PostGIS?
>I'm wondering if your hook-installation code fails to work properly
>unless PostGIS was loaded first.  This would be easier to credit if
>there are hooks both extensions try to get into.


I think you're right on Tom. It looks like I cannot reproduce the issue if I 
start by calling a PostGIS function rather than a Pointcloud function. So it 
may well be a conflict between PostGIS and Pointcloud. Both use fn_extra, and 
that makes we wonder. This old thread [*] makes me wonder too! I still need to 
figure out the bug, but I can see some light now! thanks

[*] 


-- 
Éric


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Tom Lane
Eric Lemoine  writes:
> The initial "select pc_typmod_pcid(1)" query completely screws the
> connection.
> "select pc_typmod_pcid(1)" is just an example of a simple query that
> triggers the problem. There are many others. But it has to be a query
> using the Pointcloud extension.

My guess is that it's got nothing to do with the specific query, but
that Pointcloud is installing some hook functions when it's loaded,
and that one or another of those hooks is killing performance for
subsequent queries, either by taking too long in itself or by
defeating some important optimization.  Hard to speculate further
with just this much data.

> I have no problem if I start with the main query (my "select points
> from" query of interest). And running the "select pc_typmod_pcid(1)"
> query in the middle does not cause any problem. It has to be run first
> on the connection to do the harm.

Have you experimented with other queries that don't involve PostGIS?
I'm wondering if your hook-installation code fails to work properly
unless PostGIS was loaded first.  This would be easier to credit if
there are hooks both extensions try to get into.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Eric Lemoine
On 06/08/2017 07:27 PM, Moreno Andreo wrote:
> Il 08/06/2017 19:10, Eric Lemoine ha scritto:
>>
>> How can such a thing happen? Thanks for any insight on what could cause
>> this.
>>
>>
> I'd try raising shared_buffers to 1 GB or something near 40% of the
> available memory

I tried to make it 4G, but it does not make a difference. My machine has
16G of RAM.


> 
> If you run the query again, after getting bad results, what do you get?

Always bad results.

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# select pc_typmod_pcid(1);
 pc_typmod_pcid

  1
(1 row)

Time: 4.887 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));


Time: 3522.135 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 3395.672 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));

Time: 3454.466 ms



The initial "select pc_typmod_pcid(1)" query completely screws the
connection.

"select pc_typmod_pcid(1)" is just an example of a simple query that
triggers the problem. There are many others. But it has to be a query
using the Pointcloud extension.

I have no problem if I start with the main query (my "select points
from" query of interest). And running the "select pc_typmod_pcid(1)"
query in the middle does not cause any problem. It has to be run first
on the connection to do the harm. See below.

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 280.117 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 210.080 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 233.095 ms
lopocs=# select pc_typmod_pcid(1);
 pc_typmod_pcid

  1
(1 row)

Time: 0.686 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 199.150 ms




-- 
Éric Lemoine
Oslandia
<>

signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Moreno Andreo

Il 08/06/2017 19:10, Eric Lemoine ha scritto:


How can such a thing happen? Thanks for any insight on what could cause
this.


I'd try raising shared_buffers to 1 GB or something near 40% of the 
available memory


If you run the query again, after getting bad results, what do you get?

Cheers

Moreno.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Eric Lemoine

> Note that the execution time is 46 ms when the query is wrapped in an
> explain analyze (while it's 3 s when it's not!)


Actually, it seems to me that the performance issue is not on the query
itself, it is on the fetching of the data returned by the query. Which
explains why the query is fast when executed in an explain analyze. I've
observed this by using a cursor.

The query returns 2506 rows. I use a cursor to fetch the resulting rows
500 by 500. The fetching of 500 rows (fetch 500 from c) takes about 50
ms in the good/normal case, i.e. when the "select pc_typmod_pcid(1)" is
not executed first. While it takes around 600 ms in the pathological case!

Below is the full test case.


Good case:

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# begin;
BEGIN
Time: 0.373 ms
lopocs=# declare c cursor for select points from public.sthelens where
pc_intersects(points, st_geomfromtext('polygon ((-2357334.41980829
-3742654.00016992, -2356120.91980829 -3742654.00016992,
-2356120.91980829 -3741278.00016992, -2357334.41980829
-3741278.00016992, -2357334.41980829 -3742654.00016992))', 4978));
DECLARE CURSOR
Time: 75.976 ms
lopocs=# fetch 500 from c;


Time: 44.648 ms
lopocs=# fetch 500 from c;
Time: 40.693 ms
lopocs=# fetch 500 from c;
Time: 45.218 ms


Base case:

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# select pc_typmod_pcid(1);  -- that screws up everything
 pc_typmod_pcid

  1
(1 row)

Time: 5.702 ms
lopocs=# begin;
BEGIN
Time: 0.234 ms
lopocs=# declare c cursor for select points from public.sthelens where
pc_intersects(points, st_geomfromtext('polygon ((-2357334.41980829
-3742654.00016992, -2356120.91980829 -3742654.00016992,
-2356120.91980829 -3741278.00016992,
-2357334.41980829 -3741278.00016992, -2357334.41980829
-3742654.00016992))', 4978));
DECLARE CURSOR
Time: 76.806 ms
lopocs=# fetch 500 from c;
Time: 669.834 ms
lopocs=# fetch 500 from c;
Time: 652.738 ms
lopocs=# fetch 500 from c;
Time: 604.293 ms




How can such a thing happen? Thanks for any insight on what could cause
this.


-- 
Éric Lemoine
Oslandia
+33 1 86 95 95 55
<>

signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Eric Lemoine

> Lots of missing information here ...
> 
> Is there an index on public.sthelens.points?

Yes, there are.

lopocs=# \d sthelens;
   Table "public.sthelens"
 Column |Type|   Modifiers
++---
 id | integer| not null default nextval('sthelens_id_seq'::regclass)
 points | pcpatch(2) |
 morton | bigint |
Indexes:
"sthelens_pkey" PRIMARY KEY, btree (id)
"sthelens_pc_envelopegeometry_idx" gist (pc_envelopegeometry(points))

So two indices, one for the primary key, and a Postgis index on the
Postgis geometry returned by the Pointcloud pc_envelopegeometry function.


> How many rows are in that table?

30971



> What are your shared_buffers settings?

128 MB (Debian unstable)


> How much RAM does the server have?

16 GB

> What does EXPLAIN look like for that query? How large (in bytes) are the
> tables in question?


 QUERY PLAN



 Bitmap Heap Scan on sthelens  (cost=383.26..2496.67 rows=2065 width=32)
(actual time=3.213..46.674 rows=2506 loops=1)
   Recheck Cond:
('0103207213010005002E47BC352BFC41C164910500DF8D4CC12E47BC75CCF941C164910500DF8D4CC12E47BC75CCF941C1649105002F8B4CC12E47BC352BFC41C1649105002F8B4CC12E47BC352BFC41C164910500DF8D4CC1'::geometry
&& st_geomfromewkb(pc_envelopeasbinary(points)))
   Filter:
_st_intersects('0103207213010005002E47BC352BFC41C164910500DF8D4CC12E47BC75CCF941C164910500DF8D4CC12E47BC75CCF941C1649105002F8B4CC12E47BC352BFC41C1649105002F8B4CC12E47BC352BFC41C164910500DF8D4CC1'::geometry,
st_geomfromewkb(pc_envelopeasbinary(points)))
   Heap Blocks: exact=36
   ->  Bitmap Index Scan on sthelens_pc_envelopegeometry_idx
(cost=0.00..382.75 rows=6196 width=0) (actual time=1.626..1.626
rows=2506 loops=1)
 Index Cond:
('0103207213010005002E47BC352BFC41C164910500DF8D4CC12E47BC75CCF941C164910500DF8D4CC12E47BC75CCF941C1649105002F8B4CC12E47BC352BFC41C1649105002F8B4CC12E47BC352BFC41C164910500DF8D4CC1'::geometry
&& st_geomfromewkb(pc_envelopeasbinary(points)))
 Planning time: 0.525 ms
 Execution time: 46.999 ms
(8 rows)


Note that the execution time is 46 ms when the query is wrapped in an
explain analyze (while it's 3 s when it's not!)


> What does pc_typmod_pcid() actually do?

It is one of the simplest functions of Pointcloud.


See

and
.


> 
> There are probably lots of other questions I could ask, but those questions
> are based on the fact that this _looks_ like a classic cache blowout. I.e.,
> the query runs quickly when all the related d> ata is in RAM, but is
> significantly slower when the data has to be pulled from disk. Answering
> the quesitons above will likely help to determine if my guess is correct.


I don't know. The query is fast if I run it first on the database
connection. And it is *always* very slow after the "select
pc_typmod_pcid(1)" query has run.


> 
> If my guess is correct, there are any number of potential ways to improve
> things: Add RAM to the machine, enlarge shared_buffers, put a geo index on
> public.sthelens.points so it doesn't have to scan the entire table; as a
> few examples.


It sounds like a performance issue to you, while it sounds like a bug to
me :)

> 
> Understanding what pc_typmod_pcid() actually does would help, but even 
> without that you can test things in a few ways. One would be to substitute
> a different query in your testing for select pc_typmod_pcid(1) that is
> known to push the contents of public.sthelens out of memory and see if
> the behavior is similar. Any count(*) query on some other large table
> would probably suffice. A better way would probalby be to install the
> pg_buffercache module and see what's actually in the cache at each step
> of the testing process.


I'll try to use pg_buffercache.



Thanks a lot for your response. That at least gives me courage in
debugging this :)


-- 
Éric Lemoine
Oslandia
+33 1 86 95 95 55
<>

signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Adrian Klaver

On 06/08/2017 09:20 AM, Eric Lemoine wrote:



Looks like you also have postgis and pointcloud_postgis in mix. I would
say this may get an answer sooner here:

http://lists.osgeo.org/mailman/listinfo/pgpointcloud/


I am actually one of the developers of the Pointcloud extension. I
haven't been able to debug this up to now.



Oops on my part.


--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Eric Lemoine

> Looks like you also have postgis and pointcloud_postgis in mix. I would
> say this may get an answer sooner here:
> 
> http://lists.osgeo.org/mailman/listinfo/pgpointcloud/

I am actually one of the developers of the Pointcloud extension. I
haven't been able to debug this up to now.


-- 
Éric Lemoine
Oslandia
<>

signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Adrian Klaver

On 06/08/2017 09:00 AM, Eric Lemoine wrote:

Hi

We have a rather strange performance issue with the Pointcloud extension
[*]. The issue/bug may be in the extension, but we don't know for sure
at this point. I'm writing to the list to hopefully get some guidance on
how to further debug this.

[*] 

A query takes around 250 ms when executed first on a database
connection. But it takes like 3 s when executed after a first very
simple Pointcloud query.

Below is a test-case with psql.

Case #1 (works normally):

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 236.423 ms


Case #2 (works abnormally):

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# select pc_typmod_pcid(1);
  pc_typmod_pcid

   1
(1 row)

Time: 4.917 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 2987.491 ms


The query takes 236 ms in case #1, and 2987 ms in case #2! Huge difference.

Anyone has any idea where this performance drop may come from? The
problem may be in the Pointcloud in the extension, but I have no idea
where the bug may be.

Any idea? Any suggestion on how to debug this? This has been driving us
crazy for some time now.


Looks like you also have postgis and pointcloud_postgis in mix. I would 
say this may get an answer sooner here:


http://lists.osgeo.org/mailman/listinfo/pgpointcloud/



Thanks.





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Bill Moran
On Thu, 8 Jun 2017 18:00:04 +0200
Eric Lemoine  wrote:

> We have a rather strange performance issue with the Pointcloud extension
> [*]. The issue/bug may be in the extension, but we don't know for sure
> at this point. I'm writing to the list to hopefully get some guidance on
> how to further debug this.
> 
> [*] 
> 
> A query takes around 250 ms when executed first on a database
> connection. But it takes like 3 s when executed after a first very
> simple Pointcloud query.
> 
> Below is a test-case with psql.
> 
> Case #1 (works normally):
> 
> psql (9.6.3)
> Type "help" for help.
> 
> lopocs=# \timing
> Timing is on.
> lopocs=# select points from public.sthelens where pc_intersects(points,
> st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
> -2356120.91980829 -3742654.00016992, -2356120.91980829
> -3741278.00016992, -2357334.41980829 -3741278.00016992,
> -2357334.41980829 -3742654.00016992))', 4978));
> Time: 236.423 ms
> 
> 
> Case #2 (works abnormally):
> 
> psql (9.6.3)
> Type "help" for help.
> 
> lopocs=# \timing
> Timing is on.
> lopocs=# select pc_typmod_pcid(1);
>  pc_typmod_pcid
> 
>   1
> (1 row)
> 
> Time: 4.917 ms
> lopocs=# select points from public.sthelens where pc_intersects(points,
> st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
> -2356120.91980829 -3742654.00016992, -2356120.91980829
> -3741278.00016992, -2357334.41980829 -3741278.00016992,
> -2357334.41980829 -3742654.00016992))', 4978));
> Time: 2987.491 ms
> 
> 
> The query takes 236 ms in case #1, and 2987 ms in case #2! Huge difference.
> 
> Anyone has any idea where this performance drop may come from? The
> problem may be in the Pointcloud in the extension, but I have no idea
> where the bug may be.
> 
> Any idea? Any suggestion on how to debug this? This has been driving us
> crazy for some time now.

Lots of missing information here ...

Is there an index on public.sthelens.points? How many rows are in that table?
What are your shared_buffers settings? How much RAM does the server have?
What does EXPLAIN look like for that query? How large (in bytes) are the
tables in question? What does pc_typmod_pcid() actually do?

There are probably lots of other questions I could ask, but those questions
are based on the fact that this _looks_ like a classic cache blowout. I.e.,
the query runs quickly when all the related data is in RAM, but is
significantly slower when the data has to be pulled from disk. Answering
the quesitons above will likely help to determine if my guess is correct.

If my guess is correct, there are any number of potential ways to improve
things: Add RAM to the machine, enlarge shared_buffers, put a geo index on
public.sthelens.points so it doesn't have to scan the entire table; as a
few examples.

Understanding what pc_typmod_pcid() actually does would help, but even 
without that you can test things in a few ways. One would be to substitute
a different query in your testing for select pc_typmod_pcid(1) that is
known to push the contents of public.sthelens out of memory and see if
the behavior is similar. Any count(*) query on some other large table
would probably suffice. A better way would probalby be to install the
pg_buffercache module and see what's actually in the cache at each step
of the testing process.

In any event, if your testing doesn't help any; you'll probably need to
include answers to at least the above questions before the list will be
much help.

That is, of course, unless someone familar with pointcloud has seen this
exact problem and already knows the answer ...

-- 
Bill Moran 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance issue with Pointcloud extension

2017-06-08 Thread Eric Lemoine
Hi

We have a rather strange performance issue with the Pointcloud extension
[*]. The issue/bug may be in the extension, but we don't know for sure
at this point. I'm writing to the list to hopefully get some guidance on
how to further debug this.

[*] 

A query takes around 250 ms when executed first on a database
connection. But it takes like 3 s when executed after a first very
simple Pointcloud query.

Below is a test-case with psql.

Case #1 (works normally):

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 236.423 ms


Case #2 (works abnormally):

psql (9.6.3)
Type "help" for help.

lopocs=# \timing
Timing is on.
lopocs=# select pc_typmod_pcid(1);
 pc_typmod_pcid

  1
(1 row)

Time: 4.917 ms
lopocs=# select points from public.sthelens where pc_intersects(points,
st_geomfromtext('polygon ((-2357334.41980829 -3742654.00016992,
-2356120.91980829 -3742654.00016992, -2356120.91980829
-3741278.00016992, -2357334.41980829 -3741278.00016992,
-2357334.41980829 -3742654.00016992))', 4978));
Time: 2987.491 ms


The query takes 236 ms in case #1, and 2987 ms in case #2! Huge difference.

Anyone has any idea where this performance drop may come from? The
problem may be in the Pointcloud in the extension, but I have no idea
where the bug may be.

Any idea? Any suggestion on how to debug this? This has been driving us
crazy for some time now.

Thanks.


-- 
Éric Lemoine
Oslandia
<>

signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance degradation when using auto_explain

2017-01-04 Thread Tomas Vondra

On 01/04/2017 08:54 PM, Kisung Kim wrote:



On Wed, Jan 4, 2017 at 1:21 AM, Andreas Kretschmer
> wrote:

Kisung Kim > wrote:

> And finally I found that auto_explain is the cause of the problem.

real hardware or virtual hardware? On virtual there are sometimes
problems with exact timings, please read:

https://www.postgresql.org/docs/current/static/pgtesttiming.html



Thank you for your reply.
I use real hardware.
I am curious timing functions have contentions when multi-threads call them.



It's not so much about contention between threads/processes, but mostly 
about the cost of actually reading data from the clock source. So even 
if you run on physical hardware, the clock source may be slow. Actually, 
there may be multiple clock sources available, differing in precision 
and overhead.


See:

/sys/devices/system/clocksource/*/available_clocksource

On my machine I see 3 different sources "tsc hpet acpi_pm" and after 
running the pg_test_timing tool, linked by Andreas, and I get this for 
'tsc' clock source


Testing timing overhead for 3 seconds.
Per loop time including overhead: 29.87 nsec
Histogram of timing durations:
< usec   % of total  count
 1 97.09159   97499400
 2  2.900852913031
 4  0.00193   1936
 8  0.00089891
16  0.00140   1405
32  0.00335   3366
64  0.0  2

suggesting that 97% of calls took less than 1 usec, which is quite good. 
For comparison, using 'hpet' gives me this:


Testing timing overhead for 3 seconds.
Per loop time including overhead: 766.92 ns
Histogram of timing durations:
  < us   % of total  count
 1 27.695581083389
 2 71.284372788485
 4  0.75748  29631
 8  0.02886   1129
16  0.06578   2573
32  0.16755   6554
64  0.00033 13
   128  0.3  1
   256  0.3  1

Which is clearly much worse (it increased the per-loop cost from 30ns to 
767ns, which is ~25x more).


So check which clock source you have selected, and test how expensive 
that is. But even with a fast clock source, the additional timing 
overhead may make EXPLAIN ANALYZE considerably slower. There's not much 
we can do about it :-(


And to make matters worse, it may affect different plans differently 
(e.g. nested loop joins do many more gettimeofday calls than other join 
types, amplifying the timing costs).


But the question is whether you actually need the timing - the total 
duration + row counts are far more important in my experience, so by setting


auto_explain.log_analyze = on
auto_explain.log_timing = off

you may significantly reduce the impact on query performance, while 
retaining the most valuable information.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2017-01-04 Thread Merlin Moncure
On Wed, Dec 28, 2016 at 3:15 AM, Tim Uckun  wrote:
> I have seen various links on the internet which indicate that PLV8 is
> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>
> Is this uniformly true or is it just in certain circumstances?
>
> Is there any benefit to choosing PL-PGSQL?
>
> Is there work going on to make PL-PGSQL more performant or has it gotten
> significantly faster in the last two years or so (some of the links are a
> from a while ago).

Pavel covered it pretty well but I'll chime in also.  In typical usage
of pl/pgsql the overhead of the language itself is minimal.  Most of
the heavy lifting is done by the SQL engine. pl/pgsql is glue that
sticks SQL statements together basically.  Heavy computation in
pl/pgsql can be problematic but is rarely necessary.  pl/v8 is
definitively a better choice for such cases but not being in core can
be an issue for some people.

The advantage of pl/pgsql is that SQL statements, types and errors are
native to the language itself.  This makes many things much easier to
do...pl/pgsql is maybe the most efficient data processing language (in
terms of efficiency of coding) in existence.   That may smack of
overstatement but my entire career is built around being good at it so
I'm inclined to effusiveness :-D.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance degradation when using auto_explain

2017-01-04 Thread Andreas Kretschmer
Kisung Kim  wrote:

> And finally I found that auto_explain is the cause of the problem.

real hardware or virtual hardware? On virtual there are sometimes
problems with exact timings, please read:

https://www.postgresql.org/docs/current/static/pgtesttiming.html


Regards, Andreas Kretschmer
-- 
Andreas Kretschmer
http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance degradation when using auto_explain

2017-01-03 Thread Kisung Kim
Hi,

I found performance degradation when using auto_explain with log_analyze
option true.
It automatically logs query plan analyze results.

But when there are many concurrent sessions, the performance degrades in
proportion to the number of concurrent sessions.
These queries are all read-only queries.
And I found that the time consumed in a plan node like IndexScan takes more
times when there are many sessions.

At first, I thought that it may be due to CPU scheduling contentions incurs
the degradation.
But there are more CPU cores than the session counts.

And finally I found that auto_explain is the cause of the problem.

I am curious now, why auto_explain incurs contentions between the session?
I use Ubuntu linux. Is it a problem of gettimeofday()?

Thank you.

Kisung Kim
-- 




Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-31 Thread Tim Uckun
Yes I use a migrations tool as well. I like ruby so I use either the Sequel
migrations or the ActiveRecord migrations depending on the project. That's
a great way to work. Having said that I think it might be an interesting
thing to do to create the PGAdmin hierarchy of objects on disk. Not the
data itself, just the table defs, views, code etc. It might be interesting
to see what ctags could do with that. Being able to grep to find something
would be useful. A proper IDE could also keep track of dependencies and
when you came to deploy it could deploy the children changes first etc.

Might be an interesting way to work, treat your database schema as if it
was programming poject.

On Sat, Dec 31, 2016 at 2:32 AM, Michael Sheaver  wrote:

> The reason that you cannot use git for that is that it was never intended
> for live data and stuff that resides on a database.
>
> That said, I agree with you that all table DDLs and stored procedures
> should be kept under version control. And in fact I do keep them under VC
> on my database. How do I do this? I have a directory in which I keep all my
> SQL scripts, and this directory is a git repo. In this directory/project I
> keep:
> 1. For every table that is created on the database, I have an SQL script
> file with the DDL that creates it
> 2. For every stored proc that I create, I put it in its own SQL script as
> well
> 3. For any (relatively) static lookup tables (i.e. state-region
> associations, etc.), they are put in a SQL script
>
> This workflow gives me several important benefits, including:
> 1. Everything needed to recreate the database is kept under version
> control, including the DDL, stored procedures and lookup tables
> 2. This script folder does not need to reside on the DB server, can be
> kept anywhere, even on my local laptop; I pull them up in DataGrip and run
> them on the server (my scripts are on my laptop)
> 3. All these scripts are VC'd under git
> 4. They are easily and quickly pushed to my remote repo on Github and
> Bitbucket for backup
> 5. When I need to create a new fresh, empty copy of the database
> somewhere, I just run these scripts and it is done quickly and easily
>
> One more little trick I have is to use Gitkraken for my git GUI. It. is
> free, and is absolutely the best git GUI available. The devs made it an
> absolute joy to use, and I never need to use the command line anymore.
>
> Hope this helps!
>
> On Dec 29, 2016, at 10:43 PM, Tim Uckun  wrote:
>
> I have datagrip and it's OK but it doesn't really do everything I want.
>
> I don't understand why it doesn't fetch all objects from the database and
> then put them into the disk in a directory so I can put it all under git
> and then let me work on them syncing the files back as they change.  For
> example today I just renamed a function. It didn't refactor properly by
> identifying stored procs that reference it. If I was using another
> jetbrains IDE it would have built an index of the project files and did a
> proper refactor.
>
> This would also allow you to make wholesale disk changes and then sync
> them up properly to get around postgres dependency issues.
>
> On Fri, Dec 30, 2016 at 1:40 PM, Michael Sheaver  wrote:
>
>> If you want an IDE, Jetbrains, the makers of great IDEs like IntelliJ,
>> PyCharm. and AppCode, among others, have recently come out with what is
>> arguably the BEST IDE for DBAs, DataGrip. It runs on most major platforms,
>> and is so good that I have bitten the bullet and paid the yearly
>> subscription for it.
>>
>> Leave the Postgres core alone focus on what they do best, and that is
>> making the very BEST database environment that can be had at any price,
>> period. Is Postgres perfect? No, not at all. But no other group is so
>> focused on performance, scalability and security as these folks are. And
>> the rate of development, enhancement and continual improvement is, quite
>> honestly, astounding.
>>
>> So here is my hat tip to the Postgres team for an awesome job they are
>> doing!
>>
>> On Dec 29, 2016, at 7:19 PM, Tim Uckun  wrote:
>>
>> I am not saying the postgres core people should work on an IDE, just that
>> an IDE like thing would be nice.
>>
>> On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent 
>> wrote:
>>
>>> I would hope Postgres core folk take no more than a nanosecond to reject
>>> the idea that they work on an IDE. Focus on reading and writing faster and
>>> faster ACID all the while.
>>>
>>> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
>>>
>>> Honestly I don't even like JS. Having said that I am not too crazy about
>>> PL-PGSQL either. I am willing to put up with either given that they are
>>> supported widely in default installs of postgres in AWS, Linux and MacOSX,
>>>
>>> As I said before, I think posgres gives a unique and underutilized
>>> language platform. You can code in different languages, it has a 

Re: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]

2016-12-30 Thread Amitabh Kant
On Fri, Dec 30, 2016 at 12:06 PM, ajmcello  wrote:

> Reducing worker mem shaved about 12 minutes off the query time.. Thanks
> for the suggestion. I lowered it to 10MB instead of 100MB
>
> [SNIP]
>
> >>> [postgresql.conf]
> >>> max_connections = 10
> >>> max_files_per_process = 100
> >>> shared_buffers = 24GB
> >>> max_locks_per_transaction  = 1000
> >>> effective_cache_size = 50GB
> >>> work_mem = 100MB
> >>> maintenance_work_mem = 2GB
> >>> log_min_duration_statement = 1
> >>> checkpoint_completion_target = 0.9
> >>> wal_buffers = 32MB
> >>> default_statistics_target = 100
> >>> listen_addresses = '*'
> >>> port = 5432
> >>> ssl = off
> >>> wal_sync_method = fdatasync
> >>> synchronous_commit = on
> >>> fsync = off
> >>> wal_level = minimal
> >>> #client_min_messages = fatal
> >>> #log_min_messages = fatal
> >>> #log_min_error_statement = fatal
> >>> datestyle = 'iso, mdy'
> >>> debug_pretty_print = off
> >>> debug_print_parse = off
> >>> debug_print_plan = off
> >>> debug_print_rewritten = off
> >>> default_text_search_config = 'pg_catalog.english'
> >>> enable_bitmapscan = on
> >>> enable_hashagg = on
> >>> enable_hashjoin = on
> >>> enable_indexonlyscan = on
> >>> enable_indexscan = on
> >>> enable_material = on
> >>> enable_mergejoin = on
> >>> enable_nestloop = on
> >>> enable_seqscan = on
> >>> enable_sort = on
> >>> enable_tidscan = on
> >>> from_collapse_limit = 8
> >>> geqo = on
> >>> geqo_threshold = 12
> >>> log_checkpoints = off
> >>>
> >>> log_connections = off
> >>> log_disconnections = off
> >>> log_duration = off
> >>> log_executor_stats = off
> >>> log_hostname = off
> >>> log_parser_stats = off
> >>> log_planner_stats = off
> >>> log_replication_commands = off
> >>> log_statement_stats = off
> >>> log_timezone = 'UTC'
> >>> max_wal_size = 1GB
> >>> min_wal_size = 80MB
> >>> shared_preload_libraries = '$libdir/dbms_pipe,$libdir/edb_gen'
> >>> stats_temp_directory = 'pg_stat_tmp'
> >>> timezone = 'US/Pacific'
> >>> track_activities = on
> >>> track_counts = on
> >>> track_io_timing = off
> >>>
> >>>
> >>> Thanks in advance.
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>


The number of connections that you are attempting from Postgres is way too
high. You should be using a connection pooler like pgbouncer, and reduce
the number of connections at  postgres level.

Amitabh


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-30 Thread Michael Sheaver
The reason that you cannot use git for that is that it was never intended for 
live data and stuff that resides on a database.

That said, I agree with you that all table DDLs and stored procedures should be 
kept under version control. And in fact I do keep them under VC on my database. 
How do I do this? I have a directory in which I keep all my SQL scripts, and 
this directory is a git repo. In this directory/project I keep:
1. For every table that is created on the database, I have an SQL script file 
with the DDL that creates it
2. For every stored proc that I create, I put it in its own SQL script as well
3. For any (relatively) static lookup tables (i.e. state-region associations, 
etc.), they are put in a SQL script

This workflow gives me several important benefits, including:
1. Everything needed to recreate the database is kept under version control, 
including the DDL, stored procedures and lookup tables
2. This script folder does not need to reside on the DB server, can be kept 
anywhere, even on my local laptop; I pull them up in DataGrip and run them on 
the server (my scripts are on my laptop)
3. All these scripts are VC'd under git
4. They are easily and quickly pushed to my remote repo on Github and Bitbucket 
for backup
5. When I need to create a new fresh, empty copy of the database somewhere, I 
just run these scripts and it is done quickly and easily

One more little trick I have is to use Gitkraken for my git GUI. It. is free, 
and is absolutely the best git GUI available. The devs made it an absolute joy 
to use, and I never need to use the command line anymore.

Hope this helps!

> On Dec 29, 2016, at 10:43 PM, Tim Uckun  wrote:
> 
> I have datagrip and it's OK but it doesn't really do everything I want.
> 
> I don't understand why it doesn't fetch all objects from the database and 
> then put them into the disk in a directory so I can put it all under git and 
> then let me work on them syncing the files back as they change.  For example 
> today I just renamed a function. It didn't refactor properly by identifying 
> stored procs that reference it. If I was using another jetbrains IDE it would 
> have built an index of the project files and did a proper refactor.
> 
> This would also allow you to make wholesale disk changes and then sync them 
> up properly to get around postgres dependency issues.
> 
> On Fri, Dec 30, 2016 at 1:40 PM, Michael Sheaver  > wrote:
> If you want an IDE, Jetbrains, the makers of great IDEs like IntelliJ, 
> PyCharm. and AppCode, among others, have recently come out with what is 
> arguably the BEST IDE for DBAs, DataGrip. It runs on most major platforms, 
> and is so good that I have bitten the bullet and paid the yearly subscription 
> for it.
> 
> Leave the Postgres core alone focus on what they do best, and that is making 
> the very BEST database environment that can be had at any price, period. Is 
> Postgres perfect? No, not at all. But no other group is so focused on 
> performance, scalability and security as these folks are. And the rate of 
> development, enhancement and continual improvement is, quite honestly, 
> astounding. 
> 
> So here is my hat tip to the Postgres team for an awesome job they are doing!
> 
>> On Dec 29, 2016, at 7:19 PM, Tim Uckun > > wrote:
>> 
>> I am not saying the postgres core people should work on an IDE, just that an 
>> IDE like thing would be nice.
>> 
>> On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent > > wrote:
>> I would hope Postgres core folk take no more than a nanosecond to reject the 
>> idea that they work on an IDE. Focus on reading and writing faster and 
>> faster ACID all the while. 
>> 
>> On Dec 29, 2016, at 5:32 PM, Tim Uckun > > wrote:
>> 
>>> Honestly I don't even like JS. Having said that I am not too crazy about 
>>> PL-PGSQL either. I am willing to put up with either given that they are 
>>> supported widely in default installs of postgres in AWS, Linux and MacOSX,
>>> 
>>> As I said before, I think posgres gives a unique and underutilized language 
>>> platform. You can code in different languages, it has a good variety of 
>>> built in types, and of course you get persistance and caching built in!  
>>> Using DBLINK you might even be able to separate out your code from the bulk 
>>> of your data in another database. Postgres all the way down!
>>> 
>>> It's fun to play around with.  There is a lot of missing pieces though. A 
>>> good IDE like thing would be good, version control would be nice, deeper 
>>> namespacing (hierarchical schemas?), easier testing etc would go a long 
>>> way. 
>>> 
>>> Thanks for all the input guys! 
>>> 
>>> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo 
>>> > wrote:
>>> On 12/29/2016 

Re: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]

2016-12-29 Thread ajmcello
Reducing worker mem shaved about 12 minutes off the query time.. Thanks for the 
suggestion. I lowered it to 10MB instead of 100MB



> On Dec 29, 2016, at 8:07 PM, Charles Clavadetscher 
> <clavadetsc...@swisspug.org> wrote:
> 
> Forwarding to list.
> 
> -Original Message-
> From: ajmcello [mailto:ajmcell...@gmail.com] 
> Sent: Freitag, 30. Dezember 2016 07:05
> To: Charles Clavadetscher <clavadetsc...@swisspug.org>
> Subject: Re: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]
> 
> There are no connections except one cli when running the query. After that 
> finishes then I get connection refused  or cannot connect
> to server due to load increasing because of server connections. But I'm more 
> interested in tuning the server for better query
> response time. Is there anything in the configuration that would help?
> 
> Sent from my iPhone
> 
>> On Dec 29, 2016, at 7:35 PM, Charles Clavadetscher 
>> <clavadetsc...@swisspug.org> wrote:
>> 
>> Hello
>> 
>>> -Original Message-
>>> From: pgsql-general-ow...@postgresql.org 
>>> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of ajmcello
>>> Sent: Freitag, 30. Dezember 2016 05:54
>>> To: POSTGRES <pgsql-general@postgresql.org>
>>> Subject: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]
>>> 
>>> 
>>> I am trying to optimize and tune my server for fastest simple queries with 
>>> highest connection to server possible.
>>> Basically, a SELECT item from table takes 30 minutes on a machine with SSD 
>>> drives. The table has 900K entries and 12
>>> columns.  Using that SELECT query, I then have the ability to make about 
>>> 500 simultaneous connections to the server
>>> before errors start to occur. So, I can live with 500, but the slow query 
>>> gives me grief.
>> 
>> From previous posts of other users, I assume that in order to get help you 
>> will need to provide some more information. Here the
> questions that come to my mind.
>> 
>> What errors do you get from the server when you reach the 500 connections?
>> 
>> How long does it take to run the query without heavy load, e.g. just one 
>> user connected?
>> 
>> \timing on
>> query
>> 
>> How does the execution plan look like? There you may see if the index is 
>> used at all.
>> 
>> EXPLAIN ANALYZE query;
>> 
>>> I have a GIST index for the table. I tried btree, but according to 
>>> performance tests, GIST was faster. So I went
>>> with GIST.
>>> 
>>> The system drive and pgsql drive are separate. I can separate them further 
>>> if need to be. Total ram is 56GB. I added
>>> 32GB of swap.
>>> 
>>> Is there anything I can do to speed up the SELECT statement?
>> 
>> Could you provide the statement and the table definition?
>> 
>>> Here is what I have:
>>> 
>>> [sysctl.conf]
>>> net.ipv4.conf.default.rp_filter=1
>>> net.ipv4.conf.all.rp_filter=1
>>> net.ipv4.tcp_syncookies=1
>>> net.ipv4.ip_forward=1
>>> net.ipv6.conf.all.forwarding=1
>>> net.ipv4.conf.all.accept_redirects=0
>>> net.ipv6.conf.all.accept_redirects=0
>>> net.ipv4.conf.all.accept_source_route=0
>>> net.ipv6.conf.all.accept_source_route=0
>>> net.ipv4.conf.all.log_martians=1
>>> kernel.sysrq=0
>>> kernel.shmmax=214748399
>>> kernel.shmall=209715999
>>> #32GBkernel.shmmax=17179869184
>>> #32GBkernel.shmall=4194304
>>> kernel.shmmni=9
>>> kernel.shmmin=1
>>> kernel.shmseg=10
>>> semmsl, semmns, semopm, semmni kernel.sem=250 32000 100 128
>>> fs.file-max=65536
>>> kern.maxfiles=5
>>> kern.maxfilesperproc=5
>>> net.ipv4.ip_local_port_range=1024 65535
>>> net.ipv4.tcp_tw_recycle=1
>>> net.ipv4.tcp_fin_timeout=10
>>> net.ipv4.tcp_tw_reuse=1
>>> net.core.rmem_max=16777216
>>> net.core.wmem_max=16777216
>>> net.ipv4.tcp_max_syn_backlog=4096
>>> net.ipv4.tcp_syncookies=1
>>> kernel.sched_migration_cost_ns=500
>>> kernel.sched_migration_cost_ns=500
>>> kernel.sched_autogroup_enabled=0
>>> vm.swappiness=10
>>> 
>>> 
>> 
>> Here are some helpful informations on the settings below:
>> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>> 
>> I am not really the big expert, but 100'000 max_connections and work_mem of 
>> 100MB seems

FW: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]

2016-12-29 Thread Charles Clavadetscher
Forwarding to list.

-Original Message-
From: ajmcello [mailto:ajmcell...@gmail.com] 
Sent: Freitag, 30. Dezember 2016 07:05
To: Charles Clavadetscher <clavadetsc...@swisspug.org>
Subject: Re: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]

There are no connections except one cli when running the query. After that 
finishes then I get connection refused  or cannot connect
to server due to load increasing because of server connections. But I'm more 
interested in tuning the server for better query
response time. Is there anything in the configuration that would help?

Sent from my iPhone

> On Dec 29, 2016, at 7:35 PM, Charles Clavadetscher 
> <clavadetsc...@swisspug.org> wrote:
> 
> Hello
> 
>> -Original Message-
>> From: pgsql-general-ow...@postgresql.org 
>> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of ajmcello
>> Sent: Freitag, 30. Dezember 2016 05:54
>> To: POSTGRES <pgsql-general@postgresql.org>
>> Subject: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]
>> 
>> 
>> I am trying to optimize and tune my server for fastest simple queries with 
>> highest connection to server possible.
>> Basically, a SELECT item from table takes 30 minutes on a machine with SSD 
>> drives. The table has 900K entries and 12
>> columns.  Using that SELECT query, I then have the ability to make about 500 
>> simultaneous connections to the server
>> before errors start to occur. So, I can live with 500, but the slow query 
>> gives me grief.
> 
> From previous posts of other users, I assume that in order to get help you 
> will need to provide some more information. Here the
questions that come to my mind.
> 
> What errors do you get from the server when you reach the 500 connections?
> 
> How long does it take to run the query without heavy load, e.g. just one user 
> connected?
> 
> \timing on
> query
> 
> How does the execution plan look like? There you may see if the index is used 
> at all.
> 
> EXPLAIN ANALYZE query;
> 
>> I have a GIST index for the table. I tried btree, but according to 
>> performance tests, GIST was faster. So I went
>> with GIST.
>> 
>> The system drive and pgsql drive are separate. I can separate them further 
>> if need to be. Total ram is 56GB. I added
>> 32GB of swap.
>> 
>> Is there anything I can do to speed up the SELECT statement?
> 
> Could you provide the statement and the table definition?
> 
>> Here is what I have:
>> 
>> [sysctl.conf]
>> net.ipv4.conf.default.rp_filter=1
>> net.ipv4.conf.all.rp_filter=1
>> net.ipv4.tcp_syncookies=1
>> net.ipv4.ip_forward=1
>> net.ipv6.conf.all.forwarding=1
>> net.ipv4.conf.all.accept_redirects=0
>> net.ipv6.conf.all.accept_redirects=0
>> net.ipv4.conf.all.accept_source_route=0
>> net.ipv6.conf.all.accept_source_route=0
>> net.ipv4.conf.all.log_martians=1
>> kernel.sysrq=0
>> kernel.shmmax=214748399
>> kernel.shmall=209715999
>> #32GBkernel.shmmax=17179869184
>> #32GBkernel.shmall=4194304
>> kernel.shmmni=9
>> kernel.shmmin=1
>> kernel.shmseg=10
>> semmsl, semmns, semopm, semmni kernel.sem=250 32000 100 128
>> fs.file-max=65536
>> kern.maxfiles=5
>> kern.maxfilesperproc=5
>> net.ipv4.ip_local_port_range=1024 65535
>> net.ipv4.tcp_tw_recycle=1
>> net.ipv4.tcp_fin_timeout=10
>> net.ipv4.tcp_tw_reuse=1
>> net.core.rmem_max=16777216
>> net.core.wmem_max=16777216
>> net.ipv4.tcp_max_syn_backlog=4096
>> net.ipv4.tcp_syncookies=1
>> kernel.sched_migration_cost_ns=500
>> kernel.sched_migration_cost_ns=500
>> kernel.sched_autogroup_enabled=0
>> vm.swappiness=10
>> 
>> 
> 
> Here are some helpful informations on the settings below:
> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
> 
> I am not really the big expert, but 100'000 max_connections and work_mem of 
> 100MB seems to me to be a problem:
> 
> From the link mentioned right above:
> 
> "This size (work_mem) is applied to each and every sort done by each user, 
> and complex queries can use multiple working memory
sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are 
soon using 1.5GB of real memory."
> 
> This is:
> SELECT * FROM pg_size_pretty((50.0*(2^20)*30.0)::BIGINT);
> -[ RECORD 1 ]--+
> pg_size_pretty | 1500 MB
> 
> Applied to your settings:
> 
> SELECT * FROM pg_size_pretty((100.0*(2^20)*10.0)::BIGINT);
> -[ RECORD 1 ]--+
> pg_size_pretty | 9766 GB
> 
> This could explain the err

Re: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]

2016-12-29 Thread Charles Clavadetscher
Hello

> -Original Message-
> From: pgsql-general-ow...@postgresql.org 
> [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of ajmcello
> Sent: Freitag, 30. Dezember 2016 05:54
> To: POSTGRES <pgsql-general@postgresql.org>
> Subject: [GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]
> 
> 
> I am trying to optimize and tune my server for fastest simple queries with 
> highest connection to server possible.
> Basically, a SELECT item from table takes 30 minutes on a machine with SSD 
> drives. The table has 900K entries and 12
> columns.  Using that SELECT query, I then have the ability to make about 500 
> simultaneous connections to the server
> before errors start to occur. So, I can live with 500, but the slow query 
> gives me grief.

>From previous posts of other users, I assume that in order to get help you 
>will need to provide some more information. Here the questions that come to my 
>mind.

What errors do you get from the server when you reach the 500 connections?

How long does it take to run the query without heavy load, e.g. just one user 
connected?

\timing on
query

How does the execution plan look like? There you may see if the index is used 
at all.

EXPLAIN ANALYZE query;

> I have a GIST index for the table. I tried btree, but according to 
> performance tests, GIST was faster. So I went
> with GIST.
> 
> The system drive and pgsql drive are separate. I can separate them further if 
> need to be. Total ram is 56GB. I added
> 32GB of swap.
> 
> Is there anything I can do to speed up the SELECT statement?

Could you provide the statement and the table definition?

> Here is what I have:
> 
> [sysctl.conf]
> net.ipv4.conf.default.rp_filter=1
> net.ipv4.conf.all.rp_filter=1
> net.ipv4.tcp_syncookies=1
> net.ipv4.ip_forward=1
> net.ipv6.conf.all.forwarding=1
> net.ipv4.conf.all.accept_redirects=0
> net.ipv6.conf.all.accept_redirects=0
> net.ipv4.conf.all.accept_source_route=0
> net.ipv6.conf.all.accept_source_route=0
> net.ipv4.conf.all.log_martians=1
> kernel.sysrq=0
> kernel.shmmax=214748399
> kernel.shmall=209715999
> #32GBkernel.shmmax=17179869184
> #32GBkernel.shmall=4194304
> kernel.shmmni=9
> kernel.shmmin=1
> kernel.shmseg=10
> semmsl, semmns, semopm, semmni kernel.sem=250 32000 100 128
> fs.file-max=65536
> kern.maxfiles=5
> kern.maxfilesperproc=5
> net.ipv4.ip_local_port_range=1024 65535
> net.ipv4.tcp_tw_recycle=1
> net.ipv4.tcp_fin_timeout=10
> net.ipv4.tcp_tw_reuse=1
> net.core.rmem_max=16777216
> net.core.wmem_max=16777216
> net.ipv4.tcp_max_syn_backlog=4096
> net.ipv4.tcp_syncookies=1
> kernel.sched_migration_cost_ns=500
> kernel.sched_migration_cost_ns=500
> kernel.sched_autogroup_enabled=0
> vm.swappiness=10
> 
> 

Here are some helpful informations on the settings below:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

I am not really the big expert, but 100'000 max_connections and work_mem of 
100MB seems to me to be a problem:

From the link mentioned right above:

"This size (work_mem) is applied to each and every sort done by each user, and 
complex queries can use multiple working memory sort buffers. Set it to 50MB, 
and have 30 users submitting queries, and you are soon using 1.5GB of real 
memory."

This is:
SELECT * FROM pg_size_pretty((50.0*(2^20)*30.0)::BIGINT);
-[ RECORD 1 ]--+
pg_size_pretty | 1500 MB

Applied to your settings:

SELECT * FROM pg_size_pretty((100.0*(2^20)*10.0)::BIGINT);
-[ RECORD 1 ]--+
pg_size_pretty | 9766 GB

This could explain the errors you get from the server. You may be trying to use 
much more memory than you have.

Regards
Charles

> [postgresql.conf]
> max_connections = 10
> max_files_per_process = 100
> shared_buffers = 24GB
> max_locks_per_transaction  = 1000
> effective_cache_size = 50GB
> work_mem = 100MB
> maintenance_work_mem = 2GB
> log_min_duration_statement = 1
> checkpoint_completion_target = 0.9
> wal_buffers = 32MB
> default_statistics_target = 100
> listen_addresses = '*'
> port = 5432
> ssl = off
> wal_sync_method = fdatasync
> synchronous_commit = on
> fsync = off
> wal_level = minimal
> #client_min_messages = fatal
> #log_min_messages = fatal
> #log_min_error_statement = fatal
> datestyle = 'iso, mdy'
> debug_pretty_print = off
> debug_print_parse = off
> debug_print_plan = off
> debug_print_rewritten = off
> default_text_search_config = 'pg_catalog.english'
> enable_bitmapscan = on
> enable_hashagg = on
> enable_hashjoin = on
> enable_indexonlyscan = on
> enable_indexscan = on
> enable_material = on
> enable_mergejoin = on
> enable_nestloop = on
> enable_seqscan = on
> enable_sort = on
>

[GENERAL] performance tuning postgresql 9.5.5.10 [enterprisedb]

2016-12-29 Thread ajmcello
I am trying to optimize and tune my server for fastest simple queries with
highest connection to server possible. Basically, a SELECT item from table
takes 30 minutes on a machine with SSD drives. The table has 900K entries
and 12 columns.  Using that SELECT query, I then have the ability to make
about 500 simultaneous connections to the server before errors start to
occur. So, I can live with 500, but the slow query gives me grief.

I have a GIST index for the table. I tried btree, but according to
performance tests, GIST was faster. So I went with GIST.

The system drive and pgsql drive are separate. I can separate them further
if need to be. Total ram is 56GB. I added 32GB of swap.

Is there anything I can do to speed up the SELECT statement?

Here is what I have:

[sysctl.conf]
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.tcp_syncookies=1
net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1
net.ipv4.conf.all.accept_redirects=0
net.ipv6.conf.all.accept_redirects=0
net.ipv4.conf.all.accept_source_route=0
net.ipv6.conf.all.accept_source_route=0
net.ipv4.conf.all.log_martians=1
kernel.sysrq=0
kernel.shmmax=214748399
kernel.shmall=209715999
#32GBkernel.shmmax=17179869184
#32GBkernel.shmall=4194304
kernel.shmmni=9
kernel.shmmin=1
kernel.shmseg=10
semmsl, semmns, semopm, semmni kernel.sem=250 32000 100 128
fs.file-max=65536
kern.maxfiles=5
kern.maxfilesperproc=5
net.ipv4.ip_local_port_range=1024 65535
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_fin_timeout=10
net.ipv4.tcp_tw_reuse=1
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_max_syn_backlog=4096
net.ipv4.tcp_syncookies=1
kernel.sched_migration_cost_ns=500
kernel.sched_migration_cost_ns=500
kernel.sched_autogroup_enabled=0
vm.swappiness=10


[postgresql.conf]
max_connections = 10
max_files_per_process = 100
shared_buffers = 24GB
max_locks_per_transaction  = 1000
effective_cache_size = 50GB
work_mem = 100MB
maintenance_work_mem = 2GB
log_min_duration_statement = 1
checkpoint_completion_target = 0.9
wal_buffers = 32MB
default_statistics_target = 100
listen_addresses = '*'
port = 5432
ssl = off
wal_sync_method = fdatasync
synchronous_commit = on
fsync = off
wal_level = minimal
#client_min_messages = fatal
#log_min_messages = fatal
#log_min_error_statement = fatal
datestyle = 'iso, mdy'
debug_pretty_print = off
debug_print_parse = off
debug_print_plan = off
debug_print_rewritten = off
default_text_search_config = 'pg_catalog.english'
enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = on
enable_indexonlyscan = on
enable_indexscan = on
enable_material = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
enable_tidscan = on
from_collapse_limit = 8
geqo = on
geqo_threshold = 12
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_executor_stats = off
log_hostname = off
log_parser_stats = off
log_planner_stats = off
log_replication_commands = off
log_statement_stats = off
log_timezone = 'UTC'
max_wal_size = 1GB
min_wal_size = 80MB
shared_preload_libraries = '$libdir/dbms_pipe,$libdir/edb_gen'
stats_temp_directory = 'pg_stat_tmp'
timezone = 'US/Pacific'
track_activities = on
track_counts = on
track_io_timing = off


Thanks in advance.


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I have datagrip and it's OK but it doesn't really do everything I want.

I don't understand why it doesn't fetch all objects from the database and
then put them into the disk in a directory so I can put it all under git
and then let me work on them syncing the files back as they change.  For
example today I just renamed a function. It didn't refactor properly by
identifying stored procs that reference it. If I was using another
jetbrains IDE it would have built an index of the project files and did a
proper refactor.

This would also allow you to make wholesale disk changes and then sync them
up properly to get around postgres dependency issues.

On Fri, Dec 30, 2016 at 1:40 PM, Michael Sheaver  wrote:

> If you want an IDE, Jetbrains, the makers of great IDEs like IntelliJ,
> PyCharm. and AppCode, among others, have recently come out with what is
> arguably the BEST IDE for DBAs, DataGrip. It runs on most major platforms,
> and is so good that I have bitten the bullet and paid the yearly
> subscription for it.
>
> Leave the Postgres core alone focus on what they do best, and that is
> making the very BEST database environment that can be had at any price,
> period. Is Postgres perfect? No, not at all. But no other group is so
> focused on performance, scalability and security as these folks are. And
> the rate of development, enhancement and continual improvement is, quite
> honestly, astounding.
>
> So here is my hat tip to the Postgres team for an awesome job they are
> doing!
>
> On Dec 29, 2016, at 7:19 PM, Tim Uckun  wrote:
>
> I am not saying the postgres core people should work on an IDE, just that
> an IDE like thing would be nice.
>
> On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent 
> wrote:
>
>> I would hope Postgres core folk take no more than a nanosecond to reject
>> the idea that they work on an IDE. Focus on reading and writing faster and
>> faster ACID all the while.
>>
>> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
>>
>> Honestly I don't even like JS. Having said that I am not too crazy about
>> PL-PGSQL either. I am willing to put up with either given that they are
>> supported widely in default installs of postgres in AWS, Linux and MacOSX,
>>
>> As I said before, I think posgres gives a unique and underutilized
>> language platform. You can code in different languages, it has a good
>> variety of built in types, and of course you get persistance and caching
>> built in!  Using DBLINK you might even be able to separate out your code
>> from the bulk of your data in another database. Postgres all the way down!
>>
>> It's fun to play around with.  There is a lot of missing pieces though. A
>> good IDE like thing would be good, version control would be nice, deeper
>> namespacing (hierarchical schemas?), easier testing etc would go a long
>> way.
>>
>> Thanks for all the input guys!
>>
>> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo <
>> m...@webthatworks.it> wrote:
>>
>>> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>>>
>>> 2016-12-29 10:03 GMT+01:00 Tim Uckun >:

 I think it's awesome that postgres allows you to code in different
 languages like this. It really is a unique development environment
 and one that is overlooked as a development platform.  It would be
 nice if more languages were delivered in the default package
 especially lua, V8 and mruby.


 It is about dependencies and maintenance. There are not too much people
 who has good experience with C embedding Lua, V8 and others. Any people
 who can do some work are welcome.

 The living outside main package has disadvantages - only enthusiast
 knows about it, but some advantages too - you are not fixed on
 PostgreSQL development cycle, and development can be faster.

>>>
>>> I'll add my 2 cents.
>>>
>>> Postgresql and in general SQL are about integrity and coherency.
>>> Checking coherency is much easier with strict data type.
>>> PL/PGSQL gives you that, JS is far far away from that.
>>>
>>> Postgresql is a very flexible database and you can stretch it to do
>>> "MEAN like"[1] stuff but that's going to increase your "impedance mismatch".
>>>
>>> If you think there is some space for JS in your application stack that's
>>> nearer to the client rather than to the DB.
>>> Or possibly you need to do "MEAN like" stuff but you don't want to
>>> install another "database".
>>>
>>> As other said using stored procedures is a two edged sword.
>>> It can decouple DB schema from the application or it can increase the
>>> coupling.
>>> Choosing JS for performance in the stored procedure realm is going to
>>> encourage coupling and make scalability harder and it is going to become a
>>> mess when you'll need to refactor.
>>>
>>> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)
>>>
>>> --
>>> Ivan Sergio 

Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Michael Sheaver
If you want an IDE, Jetbrains, the makers of great IDEs like IntelliJ, PyCharm. 
and AppCode, among others, have recently come out with what is arguably the 
BEST IDE for DBAs, DataGrip. It runs on most major platforms, and is so good 
that I have bitten the bullet and paid the yearly subscription for it.

Leave the Postgres core alone focus on what they do best, and that is making 
the very BEST database environment that can be had at any price, period. Is 
Postgres perfect? No, not at all. But no other group is so focused on 
performance, scalability and security as these folks are. And the rate of 
development, enhancement and continual improvement is, quite honestly, 
astounding. 

So here is my hat tip to the Postgres team for an awesome job they are doing!

> On Dec 29, 2016, at 7:19 PM, Tim Uckun  wrote:
> 
> I am not saying the postgres core people should work on an IDE, just that an 
> IDE like thing would be nice.
> 
> On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent  > wrote:
> I would hope Postgres core folk take no more than a nanosecond to reject the 
> idea that they work on an IDE. Focus on reading and writing faster and faster 
> ACID all the while. 
> 
> On Dec 29, 2016, at 5:32 PM, Tim Uckun  > wrote:
> 
>> Honestly I don't even like JS. Having said that I am not too crazy about 
>> PL-PGSQL either. I am willing to put up with either given that they are 
>> supported widely in default installs of postgres in AWS, Linux and MacOSX,
>> 
>> As I said before, I think posgres gives a unique and underutilized language 
>> platform. You can code in different languages, it has a good variety of 
>> built in types, and of course you get persistance and caching built in!  
>> Using DBLINK you might even be able to separate out your code from the bulk 
>> of your data in another database. Postgres all the way down!
>> 
>> It's fun to play around with.  There is a lot of missing pieces though. A 
>> good IDE like thing would be good, version control would be nice, deeper 
>> namespacing (hierarchical schemas?), easier testing etc would go a long way. 
>> 
>> Thanks for all the input guys! 
>> 
>> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo 
>> > wrote:
>> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>> 
>> 2016-12-29 10:03 GMT+01:00 Tim Uckun > 
>> >>:
>> 
>> I think it's awesome that postgres allows you to code in different
>> languages like this. It really is a unique development environment
>> and one that is overlooked as a development platform.  It would be
>> nice if more languages were delivered in the default package
>> especially lua, V8 and mruby.
>> 
>> 
>> It is about dependencies and maintenance. There are not too much people
>> who has good experience with C embedding Lua, V8 and others. Any people
>> who can do some work are welcome.
>> 
>> The living outside main package has disadvantages - only enthusiast
>> knows about it, but some advantages too - you are not fixed on
>> PostgreSQL development cycle, and development can be faster.
>> 
>> I'll add my 2 cents.
>> 
>> Postgresql and in general SQL are about integrity and coherency.
>> Checking coherency is much easier with strict data type.
>> PL/PGSQL gives you that, JS is far far away from that.
>> 
>> Postgresql is a very flexible database and you can stretch it to do "MEAN 
>> like"[1] stuff but that's going to increase your "impedance mismatch".
>> 
>> If you think there is some space for JS in your application stack that's 
>> nearer to the client rather than to the DB.
>> Or possibly you need to do "MEAN like" stuff but you don't want to install 
>> another "database".
>> 
>> As other said using stored procedures is a two edged sword.
>> It can decouple DB schema from the application or it can increase the 
>> coupling.
>> Choosing JS for performance in the stored procedure realm is going to 
>> encourage coupling and make scalability harder and it is going to become a 
>> mess when you'll need to refactor.
>> 
>> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle) 
>> 
>> 
>> -- 
>> Ivan Sergio Borgonovo
>> http://www.webthatworks.it  
>> http://www.borgonovo.net 
>> 
>> 
>> 
>> 
>> -- 
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org 
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general 
>> 
>> 
> 



Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I am not saying the postgres core people should work on an IDE, just that
an IDE like thing would be nice.

On Fri, Dec 30, 2016 at 12:51 PM, Rob Sargent  wrote:

> I would hope Postgres core folk take no more than a nanosecond to reject
> the idea that they work on an IDE. Focus on reading and writing faster and
> faster ACID all the while.
>
> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
>
> Honestly I don't even like JS. Having said that I am not too crazy about
> PL-PGSQL either. I am willing to put up with either given that they are
> supported widely in default installs of postgres in AWS, Linux and MacOSX,
>
> As I said before, I think posgres gives a unique and underutilized
> language platform. You can code in different languages, it has a good
> variety of built in types, and of course you get persistance and caching
> built in!  Using DBLINK you might even be able to separate out your code
> from the bulk of your data in another database. Postgres all the way down!
>
> It's fun to play around with.  There is a lot of missing pieces though. A
> good IDE like thing would be good, version control would be nice, deeper
> namespacing (hierarchical schemas?), easier testing etc would go a long
> way.
>
> Thanks for all the input guys!
>
> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo <
> m...@webthatworks.it> wrote:
>
>> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>>
>> 2016-12-29 10:03 GMT+01:00 Tim Uckun >> >:
>>>
>>> I think it's awesome that postgres allows you to code in different
>>> languages like this. It really is a unique development environment
>>> and one that is overlooked as a development platform.  It would be
>>> nice if more languages were delivered in the default package
>>> especially lua, V8 and mruby.
>>>
>>>
>>> It is about dependencies and maintenance. There are not too much people
>>> who has good experience with C embedding Lua, V8 and others. Any people
>>> who can do some work are welcome.
>>>
>>> The living outside main package has disadvantages - only enthusiast
>>> knows about it, but some advantages too - you are not fixed on
>>> PostgreSQL development cycle, and development can be faster.
>>>
>>
>> I'll add my 2 cents.
>>
>> Postgresql and in general SQL are about integrity and coherency.
>> Checking coherency is much easier with strict data type.
>> PL/PGSQL gives you that, JS is far far away from that.
>>
>> Postgresql is a very flexible database and you can stretch it to do "MEAN
>> like"[1] stuff but that's going to increase your "impedance mismatch".
>>
>> If you think there is some space for JS in your application stack that's
>> nearer to the client rather than to the DB.
>> Or possibly you need to do "MEAN like" stuff but you don't want to
>> install another "database".
>>
>> As other said using stored procedures is a two edged sword.
>> It can decouple DB schema from the application or it can increase the
>> coupling.
>> Choosing JS for performance in the stored procedure realm is going to
>> encourage coupling and make scalability harder and it is going to become a
>> mess when you'll need to refactor.
>>
>> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)
>>
>> --
>> Ivan Sergio Borgonovo
>> http://www.webthatworks.it http://www.borgonovo.net
>>
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Rob Sargent
I would hope Postgres core folk take no more than a nanosecond to reject the 
idea that they work on an IDE. Focus on reading and writing faster and faster 
ACID all the while. 

> On Dec 29, 2016, at 5:32 PM, Tim Uckun  wrote:
> 
> Honestly I don't even like JS. Having said that I am not too crazy about 
> PL-PGSQL either. I am willing to put up with either given that they are 
> supported widely in default installs of postgres in AWS, Linux and MacOSX,
> 
> As I said before, I think posgres gives a unique and underutilized language 
> platform. You can code in different languages, it has a good variety of built 
> in types, and of course you get persistance and caching built in!  Using 
> DBLINK you might even be able to separate out your code from the bulk of your 
> data in another database. Postgres all the way down!
> 
> It's fun to play around with.  There is a lot of missing pieces though. A 
> good IDE like thing would be good, version control would be nice, deeper 
> namespacing (hierarchical schemas?), easier testing etc would go a long way. 
> 
> Thanks for all the input guys! 
> 
>> On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo 
>>  wrote:
>> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>> 
>>> 2016-12-29 10:03 GMT+01:00 Tim Uckun >> >:
>>> 
>>> I think it's awesome that postgres allows you to code in different
>>> languages like this. It really is a unique development environment
>>> and one that is overlooked as a development platform.  It would be
>>> nice if more languages were delivered in the default package
>>> especially lua, V8 and mruby.
>>> 
>>> 
>>> It is about dependencies and maintenance. There are not too much people
>>> who has good experience with C embedding Lua, V8 and others. Any people
>>> who can do some work are welcome.
>>> 
>>> The living outside main package has disadvantages - only enthusiast
>>> knows about it, but some advantages too - you are not fixed on
>>> PostgreSQL development cycle, and development can be faster.
>> 
>> I'll add my 2 cents.
>> 
>> Postgresql and in general SQL are about integrity and coherency.
>> Checking coherency is much easier with strict data type.
>> PL/PGSQL gives you that, JS is far far away from that.
>> 
>> Postgresql is a very flexible database and you can stretch it to do "MEAN 
>> like"[1] stuff but that's going to increase your "impedance mismatch".
>> 
>> If you think there is some space for JS in your application stack that's 
>> nearer to the client rather than to the DB.
>> Or possibly you need to do "MEAN like" stuff but you don't want to install 
>> another "database".
>> 
>> As other said using stored procedures is a two edged sword.
>> It can decouple DB schema from the application or it can increase the 
>> coupling.
>> Choosing JS for performance in the stored procedure realm is going to 
>> encourage coupling and make scalability harder and it is going to become a 
>> mess when you'll need to refactor.
>> 
>> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)
>> 
>> -- 
>> Ivan Sergio Borgonovo
>> http://www.webthatworks.it http://www.borgonovo.net
>> 
>> 
>> 
>> 
>> -- 
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
> 


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
Honestly I don't even like JS. Having said that I am not too crazy about
PL-PGSQL either. I am willing to put up with either given that they are
supported widely in default installs of postgres in AWS, Linux and MacOSX,

As I said before, I think posgres gives a unique and underutilized language
platform. You can code in different languages, it has a good variety of
built in types, and of course you get persistance and caching built in!
Using DBLINK you might even be able to separate out your code from the bulk
of your data in another database. Postgres all the way down!

It's fun to play around with.  There is a lot of missing pieces though. A
good IDE like thing would be good, version control would be nice, deeper
namespacing (hierarchical schemas?), easier testing etc would go a long
way.

Thanks for all the input guys!

On Fri, Dec 30, 2016 at 12:14 AM, Ivan Sergio Borgonovo <
m...@webthatworks.it> wrote:

> On 12/29/2016 10:35 AM, Pavel Stehule wrote:
>
> 2016-12-29 10:03 GMT+01:00 Tim Uckun > >:
>>
>> I think it's awesome that postgres allows you to code in different
>> languages like this. It really is a unique development environment
>> and one that is overlooked as a development platform.  It would be
>> nice if more languages were delivered in the default package
>> especially lua, V8 and mruby.
>>
>>
>> It is about dependencies and maintenance. There are not too much people
>> who has good experience with C embedding Lua, V8 and others. Any people
>> who can do some work are welcome.
>>
>> The living outside main package has disadvantages - only enthusiast
>> knows about it, but some advantages too - you are not fixed on
>> PostgreSQL development cycle, and development can be faster.
>>
>
> I'll add my 2 cents.
>
> Postgresql and in general SQL are about integrity and coherency.
> Checking coherency is much easier with strict data type.
> PL/PGSQL gives you that, JS is far far away from that.
>
> Postgresql is a very flexible database and you can stretch it to do "MEAN
> like"[1] stuff but that's going to increase your "impedance mismatch".
>
> If you think there is some space for JS in your application stack that's
> nearer to the client rather than to the DB.
> Or possibly you need to do "MEAN like" stuff but you don't want to install
> another "database".
>
> As other said using stored procedures is a two edged sword.
> It can decouple DB schema from the application or it can increase the
> coupling.
> Choosing JS for performance in the stored procedure realm is going to
> encourage coupling and make scalability harder and it is going to become a
> mess when you'll need to refactor.
>
> [1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it http://www.borgonovo.net
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Gavin Flower

On 29/12/16 09:12, Francisco Olarte wrote:

On Wed, Dec 28, 2016 at 5:53 PM, Jan de Visser  wrote:

.but the term "impedance mismatch"
is at least 25 year old;

Much older, I was told it in class at least 32 years ago.


as far as I know it was coined

_Borrowed_ from electrical engineering / communication techs.

It is used to highlight how signals 'bounce' at the points of a
transmision path where impedances do not match. It extrapolates the
fact that if you have a battery with an internal resistance R the way
to extract the maximum energy on a load is for it to match the
impedance, be R too. Higher load impedance and the fraction of energy
in the load goes up, the total down. Lower load impedance and the
fraction in the load goes down, the total up. In either case absolute
power in the load goes down. Match the impedance and the energy in the
load is the maximum ( and equal to the internal loss in the battery ).

[...]

From my distant memory of studying AC stuff at University many moons ago...

You want the two impedances to be complex conjugates of each other (this 
means the MAGNITUDES will be equal) - which means the phase change 
should be equal & opposite, and the resistance to match.



Cheers,
Gavin



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Ivan Sergio Borgonovo

On 12/29/2016 10:35 AM, Pavel Stehule wrote:


2016-12-29 10:03 GMT+01:00 Tim Uckun >:

I think it's awesome that postgres allows you to code in different
languages like this. It really is a unique development environment
and one that is overlooked as a development platform.  It would be
nice if more languages were delivered in the default package
especially lua, V8 and mruby.


It is about dependencies and maintenance. There are not too much people
who has good experience with C embedding Lua, V8 and others. Any people
who can do some work are welcome.

The living outside main package has disadvantages - only enthusiast
knows about it, but some advantages too - you are not fixed on
PostgreSQL development cycle, and development can be faster.


I'll add my 2 cents.

Postgresql and in general SQL are about integrity and coherency.
Checking coherency is much easier with strict data type.
PL/PGSQL gives you that, JS is far far away from that.

Postgresql is a very flexible database and you can stretch it to do 
"MEAN like"[1] stuff but that's going to increase your "impedance mismatch".


If you think there is some space for JS in your application stack that's 
nearer to the client rather than to the DB.
Or possibly you need to do "MEAN like" stuff but you don't want to 
install another "database".


As other said using stored procedures is a two edged sword.
It can decouple DB schema from the application or it can increase the 
coupling.
Choosing JS for performance in the stored procedure realm is going to 
encourage coupling and make scalability harder and it is going to become 
a mess when you'll need to refactor.


[1] https://en.wikipedia.org/wiki/MEAN_(software_bundle)

--
Ivan Sergio Borgonovo
http://www.webthatworks.it http://www.borgonovo.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Pavel Stehule
2016-12-29 10:03 GMT+01:00 Tim Uckun :

> I think it's awesome that postgres allows you to code in different
> languages like this. It really is a unique development environment and one
> that is overlooked as a development platform.  It would be nice if more
> languages were delivered in the default package especially lua, V8 and
> mruby.
>

It is about dependencies and maintenance. There are not too much people who
has good experience with C embedding Lua, V8 and others. Any people who can
do some work are welcome.

The living outside main package has disadvantages - only enthusiast knows
about it, but some advantages too - you are not fixed on PostgreSQL
development cycle, and development can be faster.

Regards

Pavel


>
>
> On Thu, Dec 29, 2016 at 9:31 PM, Chris Travers 
> wrote:
>
>> My recommendation.  See them as tools in a toolkit, not a question of
>> what is best.
>>
>> For places where you have SQL statements as primary do SQL or PLPGSQL
>> functions.
>>
>> For places where you are manipulating values (parsing strings for
>> example) use something else (I usually use pl/perl for string manipulation
>> but ymmv).
>>
>> PLPGSQL works best where you have a large query and some procedurally
>> supporting logic.  It becomes a lot less usable, performant, and
>> maintainable the further you get away from that.
>>
>> So there is no best just different tools in a toolkit.
>>
>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Pavel Stehule
2016-12-29 10:04 GMT+01:00 Tim Uckun :

> Mostly generating SQL statements to execute. Like for example deciding
> which partition to insert into.
>

Then you don't find any possible performance difference - the query is
about 10-100x slower than expression  - so the plpgsql should be good.

More you can use a "format" function - implemented in C.

Regards

Pavel




>
> On Thu, Dec 29, 2016 at 10:00 PM, Pavel Stehule 
> wrote:
>
>>
>>
>> 2016-12-29 9:23 GMT+01:00 Tim Uckun :
>>
>>> I am not doubting the efficacy of stored procs, just wondering which
>>> language is better. From the sound of it string manupilation is slow in
>>> PL-PGSQL but looking at my procs there does seem to be a lot of string
>>> manipulation going on so maybe I better do some tests.
>>>
>>
>> It is interesting, what string operations you are doing in stored
>> procedures?
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>>
>>> On Thu, Dec 29, 2016 at 3:02 AM, Mike Sofen  wrote:
>>>
 *From:* Tim Uckun
 I have seen various links on the internet which indicate that PLV8 is
 significantly faster than PL-PGSQL sometimes an order of magnitude faster.



 Is there any benefit to choosing PL-PGSQL?

 

 I can’t speak to PLV8.  However, I can speak to plpgsql, and
 specifically stored functions (procs).  I use it exclusively to create a
 database API for real-time web applications to hit.  My API calls (procs)
 are hitting large tables, sometimes doing complex logic within the sproc.
 It allows me to provide a simple, standardized interface to the web devs,
 allowing them to focus on the app code work.



 Performance is superb and continues to surprise me (I came from the SQL
 Server world).  As others have mentioned, the natural lashup of plpgsql to
 postgres (I liked Alban’s term, “impedance”), is a key aspect.  Also:



 -stored procs provide another security layer against sql
 injection attacks.

 -Caching SEEMS to be more efficient/effective with stored
 procs (that could be wishful thinking too).

 -Stored procs allow skilled sql practitioners to provide far
 more sophisticated sql solutions than the typical python developer is
 capable of…my experience is that most web devs don’t really understand
 databases (or even care about them – they are a necessary evil), so
 providing a pure encapsulated sql solution (via stored procs) removes that
 mental impedance mismatch.

 -Performance?  Simple “get” procs that return data for a
 specific indexed query against larger tables (50m+ rows) in a few
 milliseconds…I can live with that kind of performance.

 -I’m also doing some heavy lifting in the sql, calculating
 histograms and boxplots for data visualizations.  This is an unusual
 scenario, but the other option is sending a massive chunk of data to
 another server for processing – just the transit time would kill the deal.
 I am mindful that at a certain point, there won’t be enough memory and i/o
 to go around, but the web app is a low user count/high user task complexity
 app, so I’ve tailored the model to match.



 Mike Sofen  (Synthetic Genomics)

>>>
>>>
>>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
Mostly generating SQL statements to execute. Like for example deciding
which partition to insert into.

On Thu, Dec 29, 2016 at 10:00 PM, Pavel Stehule 
wrote:

>
>
> 2016-12-29 9:23 GMT+01:00 Tim Uckun :
>
>> I am not doubting the efficacy of stored procs, just wondering which
>> language is better. From the sound of it string manupilation is slow in
>> PL-PGSQL but looking at my procs there does seem to be a lot of string
>> manipulation going on so maybe I better do some tests.
>>
>
> It is interesting, what string operations you are doing in stored
> procedures?
>
> Regards
>
> Pavel
>
>
>>
>>
>> On Thu, Dec 29, 2016 at 3:02 AM, Mike Sofen  wrote:
>>
>>> *From:* Tim Uckun
>>> I have seen various links on the internet which indicate that PLV8 is
>>> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>>>
>>>
>>>
>>> Is there any benefit to choosing PL-PGSQL?
>>>
>>> 
>>>
>>> I can’t speak to PLV8.  However, I can speak to plpgsql, and
>>> specifically stored functions (procs).  I use it exclusively to create a
>>> database API for real-time web applications to hit.  My API calls (procs)
>>> are hitting large tables, sometimes doing complex logic within the sproc.
>>> It allows me to provide a simple, standardized interface to the web devs,
>>> allowing them to focus on the app code work.
>>>
>>>
>>>
>>> Performance is superb and continues to surprise me (I came from the SQL
>>> Server world).  As others have mentioned, the natural lashup of plpgsql to
>>> postgres (I liked Alban’s term, “impedance”), is a key aspect.  Also:
>>>
>>>
>>>
>>> -stored procs provide another security layer against sql
>>> injection attacks.
>>>
>>> -Caching SEEMS to be more efficient/effective with stored procs
>>> (that could be wishful thinking too).
>>>
>>> -Stored procs allow skilled sql practitioners to provide far
>>> more sophisticated sql solutions than the typical python developer is
>>> capable of…my experience is that most web devs don’t really understand
>>> databases (or even care about them – they are a necessary evil), so
>>> providing a pure encapsulated sql solution (via stored procs) removes that
>>> mental impedance mismatch.
>>>
>>> -Performance?  Simple “get” procs that return data for a
>>> specific indexed query against larger tables (50m+ rows) in a few
>>> milliseconds…I can live with that kind of performance.
>>>
>>> -I’m also doing some heavy lifting in the sql, calculating
>>> histograms and boxplots for data visualizations.  This is an unusual
>>> scenario, but the other option is sending a massive chunk of data to
>>> another server for processing – just the transit time would kill the deal.
>>> I am mindful that at a certain point, there won’t be enough memory and i/o
>>> to go around, but the web app is a low user count/high user task complexity
>>> app, so I’ve tailored the model to match.
>>>
>>>
>>>
>>> Mike Sofen  (Synthetic Genomics)
>>>
>>
>>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I think it's awesome that postgres allows you to code in different
languages like this. It really is a unique development environment and one
that is overlooked as a development platform.  It would be nice if more
languages were delivered in the default package especially lua, V8 and
mruby.



On Thu, Dec 29, 2016 at 9:31 PM, Chris Travers 
wrote:

> My recommendation.  See them as tools in a toolkit, not a question of what
> is best.
>
> For places where you have SQL statements as primary do SQL or PLPGSQL
> functions.
>
> For places where you are manipulating values (parsing strings for example)
> use something else (I usually use pl/perl for string manipulation but ymmv).
>
> PLPGSQL works best where you have a large query and some procedurally
> supporting logic.  It becomes a lot less usable, performant, and
> maintainable the further you get away from that.
>
> So there is no best just different tools in a toolkit.
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Pavel Stehule
2016-12-29 9:23 GMT+01:00 Tim Uckun :

> I am not doubting the efficacy of stored procs, just wondering which
> language is better. From the sound of it string manupilation is slow in
> PL-PGSQL but looking at my procs there does seem to be a lot of string
> manipulation going on so maybe I better do some tests.
>

It is interesting, what string operations you are doing in stored
procedures?

Regards

Pavel


>
>
> On Thu, Dec 29, 2016 at 3:02 AM, Mike Sofen  wrote:
>
>> *From:* Tim Uckun
>> I have seen various links on the internet which indicate that PLV8 is
>> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>>
>>
>>
>> Is there any benefit to choosing PL-PGSQL?
>>
>> 
>>
>> I can’t speak to PLV8.  However, I can speak to plpgsql, and specifically
>> stored functions (procs).  I use it exclusively to create a database API
>> for real-time web applications to hit.  My API calls (procs) are hitting
>> large tables, sometimes doing complex logic within the sproc.  It allows me
>> to provide a simple, standardized interface to the web devs, allowing them
>> to focus on the app code work.
>>
>>
>>
>> Performance is superb and continues to surprise me (I came from the SQL
>> Server world).  As others have mentioned, the natural lashup of plpgsql to
>> postgres (I liked Alban’s term, “impedance”), is a key aspect.  Also:
>>
>>
>>
>> -stored procs provide another security layer against sql
>> injection attacks.
>>
>> -Caching SEEMS to be more efficient/effective with stored procs
>> (that could be wishful thinking too).
>>
>> -Stored procs allow skilled sql practitioners to provide far
>> more sophisticated sql solutions than the typical python developer is
>> capable of…my experience is that most web devs don’t really understand
>> databases (or even care about them – they are a necessary evil), so
>> providing a pure encapsulated sql solution (via stored procs) removes that
>> mental impedance mismatch.
>>
>> -Performance?  Simple “get” procs that return data for a
>> specific indexed query against larger tables (50m+ rows) in a few
>> milliseconds…I can live with that kind of performance.
>>
>> -I’m also doing some heavy lifting in the sql, calculating
>> histograms and boxplots for data visualizations.  This is an unusual
>> scenario, but the other option is sending a massive chunk of data to
>> another server for processing – just the transit time would kill the deal.
>> I am mindful that at a certain point, there won’t be enough memory and i/o
>> to go around, but the web app is a low user count/high user task complexity
>> app, so I’ve tailored the model to match.
>>
>>
>>
>> Mike Sofen  (Synthetic Genomics)
>>
>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Chris Travers
My recommendation.  See them as tools in a toolkit, not a question of what
is best.

For places where you have SQL statements as primary do SQL or PLPGSQL
functions.

For places where you are manipulating values (parsing strings for example)
use something else (I usually use pl/perl for string manipulation but ymmv).

PLPGSQL works best where you have a large query and some procedurally
supporting logic.  It becomes a lot less usable, performant, and
maintainable the further you get away from that.

So there is no best just different tools in a toolkit.


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-29 Thread Tim Uckun
I am not doubting the efficacy of stored procs, just wondering which
language is better. From the sound of it string manupilation is slow in
PL-PGSQL but looking at my procs there does seem to be a lot of string
manipulation going on so maybe I better do some tests.



On Thu, Dec 29, 2016 at 3:02 AM, Mike Sofen  wrote:

> *From:* Tim Uckun
> I have seen various links on the internet which indicate that PLV8 is
> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>
>
>
> Is there any benefit to choosing PL-PGSQL?
>
> 
>
> I can’t speak to PLV8.  However, I can speak to plpgsql, and specifically
> stored functions (procs).  I use it exclusively to create a database API
> for real-time web applications to hit.  My API calls (procs) are hitting
> large tables, sometimes doing complex logic within the sproc.  It allows me
> to provide a simple, standardized interface to the web devs, allowing them
> to focus on the app code work.
>
>
>
> Performance is superb and continues to surprise me (I came from the SQL
> Server world).  As others have mentioned, the natural lashup of plpgsql to
> postgres (I liked Alban’s term, “impedance”), is a key aspect.  Also:
>
>
>
> -stored procs provide another security layer against sql
> injection attacks.
>
> -Caching SEEMS to be more efficient/effective with stored procs
> (that could be wishful thinking too).
>
> -Stored procs allow skilled sql practitioners to provide far more
> sophisticated sql solutions than the typical python developer is capable
> of…my experience is that most web devs don’t really understand databases
> (or even care about them – they are a necessary evil), so providing a pure
> encapsulated sql solution (via stored procs) removes that mental impedance
> mismatch.
>
> -Performance?  Simple “get” procs that return data for a specific
> indexed query against larger tables (50m+ rows) in a few milliseconds…I can
> live with that kind of performance.
>
> -I’m also doing some heavy lifting in the sql, calculating
> histograms and boxplots for data visualizations.  This is an unusual
> scenario, but the other option is sending a massive chunk of data to
> another server for processing – just the transit time would kill the deal.
> I am mindful that at a certain point, there won’t be enough memory and i/o
> to go around, but the web app is a low user count/high user task complexity
> app, so I’ve tailored the model to match.
>
>
>
> Mike Sofen  (Synthetic Genomics)
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Francisco Olarte
On Wed, Dec 28, 2016 at 5:53 PM, Jan de Visser  wrote:
> .but the term "impedance mismatch"
> is at least 25 year old;

Much older, I was told it in class at least 32 years ago.

> as far as I know it was coined

_Borrowed_ from electrical engineering / communication techs.

It is used to highlight how signals 'bounce' at the points of a
transmision path where impedances do not match. It extrapolates the
fact that if you have a battery with an internal resistance R the way
to extract the maximum energy on a load is for it to match the
impedance, be R too. Higher load impedance and the fraction of energy
in the load goes up, the total down. Lower load impedance and the
fraction in the load goes down, the total up. In either case absolute
power in the load goes down. Match the impedance and the energy in the
load is the maximum ( and equal to the internal loss in the battery ).

The term has been used in radio texts since the dawn of ( radio ) times.

It's used a lot as a similar problem appears when mixing to different
technology, each time you cross the barrier you loose something, or
hit a problem.

> And despite the smart people in academia warning us about that mismatch in the
> early 90s, we bravely soldiered (I'm taking full blame myself here) on and
> 10-15 years later came up with abominations like Hibernate...
> History lesson over, carry on...

I think that goes together with "everyone can be a programmer" and
"every Java ( a language with several apparent concessions made to
people who did not even understand objects, like String.format, and
targeted to enable "everyone" to do OO ) coder can do databases".

Well, rant mode off. Today is "dia de los Inocentes", spanish version
of April Fools I was tempted to write something about different
impedances in the copper tracks used for DB data traffic when entering
the CPU silicon interconnects via golden cables.


Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Jan de Visser
On Wednesday, December 28, 2016 6:02:51 AM EST Mike Sofen wrote:
> the natural lashup of plpgsql to postgres (I liked Alban’s term,
> “impedance”), is a key aspect.

Not to deprive Alban of any of his credit, but the term "impedance mismatch" 
is at least 25 year old; as far as I know it was coined to describe the 
problems arising from attempting to shoehorn an OO model onto a relational 
database.

And despite the smart people in academia warning us about that mismatch in the 
early 90s, we bravely soldiered (I'm taking full blame myself here) on and 
10-15 years later came up with abominations like Hibernate...

History lesson over, carry on...




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Mike Sofen
From: Tim Uckun
I have seen various links on the internet which indicate that PLV8 is 
significantly faster than PL-PGSQL sometimes an order of magnitude faster. 

 

Is there any benefit to choosing PL-PGSQL?



I can’t speak to PLV8.  However, I can speak to plpgsql, and specifically 
stored functions (procs).  I use it exclusively to create a database API for 
real-time web applications to hit.  My API calls (procs) are hitting large 
tables, sometimes doing complex logic within the sproc.  It allows me to 
provide a simple, standardized interface to the web devs, allowing them to 
focus on the app code work.

 

Performance is superb and continues to surprise me (I came from the SQL Server 
world).  As others have mentioned, the natural lashup of plpgsql to postgres (I 
liked Alban’s term, “impedance”), is a key aspect.  Also:

 

-stored procs provide another security layer against sql injection 
attacks.

-Caching SEEMS to be more efficient/effective with stored procs (that 
could be wishful thinking too).

-Stored procs allow skilled sql practitioners to provide far more 
sophisticated sql solutions than the typical python developer is capable of…my 
experience is that most web devs don’t really understand databases (or even 
care about them – they are a necessary evil), so providing a pure encapsulated 
sql solution (via stored procs) removes that mental impedance mismatch.

-Performance?  Simple “get” procs that return data for a specific 
indexed query against larger tables (50m+ rows) in a few milliseconds…I can 
live with that kind of performance.

-I’m also doing some heavy lifting in the sql, calculating histograms 
and boxplots for data visualizations.  This is an unusual scenario, but the 
other option is sending a massive chunk of data to another server for 
processing – just the transit time would kill the deal.  I am mindful that at a 
certain point, there won’t be enough memory and i/o to go around, but the web 
app is a low user count/high user task complexity app, so I’ve tailored the 
model to match.

 

Mike Sofen  (Synthetic Genomics)



Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Pavel Stehule
2016-12-28 10:46 GMT+01:00 Pavel Stehule :

> Hi
>
> 2016-12-28 10:15 GMT+01:00 Tim Uckun :
>
>> I have seen various links on the internet which indicate that PLV8 is
>> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>>
>> Is this uniformly true or is it just in certain circumstances?
>>
>
> It depends on usage
>
>
>>
>> Is there any benefit to choosing PL-PGSQL?
>>
>
> there are more benefits
>
> 0. this language has integrated SQL - the code with lot of SQL is more
> readable.
> 1. it is everywhere, where is Postgres
> 2. it uses same data types like Postgres, there is not any conversion
> related overhead
> 3. this engine has quick start, faster than anything else.
>

4. It is not slow language - it is AST interpret - it is significantly
faster than byte code based interpreter without JIT. The expression are
slower - the PostgerSQL expression interpret. This is bottle neck of
PLpgSQL performance - on second hand - it is main benefit of PLpgSQL -
there is full integration with PostgreSQL runtime without any exceptions.

Years ago PLpgSQL has very slow operations on arrays - this was fixed in
9.5. Still PLpgSQL has slow string updates - Strings are immutable in
PLpgSQL - so update means generating new string. But usually it is not
bottleneck in PL environment.

Regards

Pavel


>
>
>>
>> Is there work going on to make PL-PGSQL more performant or has it gotten
>> significantly faster in the last two years or so (some of the links are a
>> from a while ago).
>>
>
> What I know no. There is not any reason why to do it. This language is
> designed be glue of SQL statements. Nothing more, nothing less. I did
> tests, and it is significantly faster than SQL engine.
>
> Some years ago I though about compilation to C language, but the target
> was better obfuscation not performance.
>
> PLpgSQL is not designed for hard calculation - the critical path is in SQL
> always. If you need different performance, than you can use PLV8, PLPython
> or native C extension. Postgres has very good API for writing C extensions.
>
> Regards
>
> Pavel
>
>
>>
>> Thanks.
>>
>
>


Re: [GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Pavel Stehule
Hi

2016-12-28 10:15 GMT+01:00 Tim Uckun :

> I have seen various links on the internet which indicate that PLV8 is
> significantly faster than PL-PGSQL sometimes an order of magnitude faster.
>
> Is this uniformly true or is it just in certain circumstances?
>

It depends on usage


>
> Is there any benefit to choosing PL-PGSQL?
>

there are more benefits

0. this language has integrated SQL - the code with lot of SQL is more
readable.
1. it is everywhere, where is Postgres
2. it uses same data types like Postgres, there is not any conversion
related overhead
3. this engine has quick start, faster than anything else.


>
> Is there work going on to make PL-PGSQL more performant or has it gotten
> significantly faster in the last two years or so (some of the links are a
> from a while ago).
>

What I know no. There is not any reason why to do it. This language is
designed be glue of SQL statements. Nothing more, nothing less. I did
tests, and it is significantly faster than SQL engine.

Some years ago I though about compilation to C language, but the target was
better obfuscation not performance.

PLpgSQL is not designed for hard calculation - the critical path is in SQL
always. If you need different performance, than you can use PLV8, PLPython
or native C extension. Postgres has very good API for writing C extensions.

Regards

Pavel


>
> Thanks.
>


[GENERAL] Performance PLV8 vs PLPGSQL

2016-12-28 Thread Tim Uckun
I have seen various links on the internet which indicate that PLV8 is
significantly faster than PL-PGSQL sometimes an order of magnitude faster.

Is this uniformly true or is it just in certain circumstances?

Is there any benefit to choosing PL-PGSQL?

Is there work going on to make PL-PGSQL more performant or has it gotten
significantly faster in the last two years or so (some of the links are a
from a while ago).

Thanks.


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-11-25 Thread Chris Withers

Hey Tom,

I appreciate you're busy, but did you ever get a chance to look at this?

On 19/09/2016 08:40, Chris Withers wrote:

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers  writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the
range just started)?



It's open ended, so the period is [start_ts, )


I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.


If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)

How would I verify your suspicions?


cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-11-24 Thread Chris Withers

Hey Tom,

I appreciate you're busy, but did you ever get a chance to look at this?

On 19/09/2016 08:40, Chris Withers wrote:

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers  writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the
range just started)?



It's open ended, so the period is [start_ts, )


I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.


If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)

How would I verify your suspicions?


cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-19 Thread Jeff Janes
On Fri, Sep 16, 2016 at 2:01 AM, Chris Withers 
wrote:

> Hi All,
>
> I have quite a few tables that follow a pattern like this:
>
>  Table "public.my_model"
>   Column |   Type| Modifiers
> +---+---
>   period | tsrange   | not null
>   key| character varying | not null
>   value  | integer   |
> Indexes:
>  "my_model_pkey" PRIMARY KEY, btree (period, key)
>  "my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, key
> WITH =)
> Check constraints:
>  "my_model_period_check" CHECK (period <> 'empty'::tsrange)
>

Try swapping the order of the columns in the exclude constraint.  You want
the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.

Cheers,

Jeff


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-19 Thread Chris Withers

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers  writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the range just 
started)?



It's open ended, so the period is [start_ts, )


I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.


If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending 
type (inclusive/exclusive) would just be sentinel values)


How would I verify your suspicions?

cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in 
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or 
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period 
column for a row to set its end ts, and then inserting a new row with a 
start ts running on from that.


Of course, the adds are just inserting new rows.

cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 12:00, John R Pierce wrote:

On 9/16/2016 3:46 AM, Chris Withers wrote:


when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.


thats expensive, as it has to reindex that row.   and range indexes are
more expensive than timestamp indexes

modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.


Right, but these rows have no natural primary key. Would it help if I 
just added an auto-incrementing integer key? Would that make a positive 
difference or would it just be a wasted column?



I expect the expensive one is the constraint that ensures no periods
overlap for the given key.I'm not sure how that can be done short of
a full scan for each update/insert.


Indeed, I wonder if making the constraint deferrable might help for the 
bulk case?



it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?


You mean:

PRIMARY KEY, btree (period, key) as opposed to




--
john r pierce, recycling bits in santa cruz




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

Hi All,

I have quite a few tables that follow a pattern like this:

Table "public.my_model"
 Column |   Type| Modifiers
+---+---
 period | tsrange   | not null
 key| character varying | not null
 value  | integer   |
Indexes:
"my_model_pkey" PRIMARY KEY, btree (period, key)
"my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, key 
WITH =)

Check constraints:
"my_model_period_check" CHECK (period <> 'empty'::tsrange)

So, a primary key of a period column and one or more other columns 
(usually int or string) and an exclude constraint to prevent overlaps, 
and a check constraint to prevent empty ranges.


However, I'm hitting performance problems on moderate bulk inserts and 
updates, with ~700k rows taking around 13 minutes. Profiling my python 
code suggests that most of the time is being taken by Postgres (9.4 in 
this case...)


What can I do to speed things up? Is there a different type of index I 
can use to achieve the same exclude constraint? Is there something I can 
do to have the index changes only done on the commit of the bulk batches?


cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Tom Lane
Chris Withers  writes:
> On 16/09/2016 14:54, Igor Neyman wrote:
>> So, what is the value for "end ts", when the record is inserted (the range 
>> just started)?

> It's open ended, so the period is [start_ts, )

I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 14:54, Igor Neyman wrote:


-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce <pie...@hogranch.com>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on 
tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period column 
for a row to set its end ts, and then inserting a new row with a start ts running on from 
that.

Of course, the adds are just inserting new rows.

So, what is the value for "end ts", when the record is inserted (the range just 
started)?


It's open ended, so the period is [start_ts, )

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Igor Neyman

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce <pie...@hogranch.com>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on 
tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:
> On 9/16/2016 2:23 AM, John R Pierce wrote:
>>
>> wait, what is a tsrange?   the standard textsearch data types in
>> postgres are tsvector and tsquery,
>
> never mind,  I should have known, its a timestamp range.   ...
>
>
> when you do updates, are you changing any of the indexed fields, or 
> just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period column 
for a row to set its end ts, and then inserting a new row with a start ts 
running on from that.

Of course, the adds are just inserting new rows.

cheers,

Chris



So, what is the value for "end ts", when the record is inserted (the range just 
started)?

Regards,
Igor Neyman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 12:00, John R Pierce wrote:

On 9/16/2016 3:46 AM, Chris Withers wrote:


when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.


thats expensive, as it has to reindex that row.   and range indexes are
more expensive than timestamp indexes

modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.


Right, but these rows have no natural primary key. Would it help if I 
just added an auto-incrementing integer key? Would that make a positive 
difference or would it just be a wasted column?



I expect the expensive one is the constraint that ensures no periods
overlap for the given key.I'm not sure how that can be done short of
a full scan for each update/insert.


Indeed, I wonder if making the constraint deferrable might help for the 
bulk case?



it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?


You mean:

PRIMARY KEY, btree (key1, key2, period)

as opposed to

PRIMARY KEY, btree (period, key)

Interesting, I'd assumed postgres would optimise that under the covers...

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 3:46 AM, Chris Withers wrote:


when you do updates, are you changing any of the indexed fields, or
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period 
column for a row to set its end ts, and then inserting a new row with 
a start ts running on from that. 


thats expensive, as it has to reindex that row.   and range indexes are 
more expensive than timestamp indexes


modifiyng the primary key is kind of a violation of one of the basic 
rules of relational databases as it means the row can't be referenced by 
another table.


I expect the expensive one is the constraint that ensures no periods 
overlap for the given key.I'm not sure how that can be done short of 
a full scan for each update/insert.   it might actually perform better 
if you write the index with the key first as presumably the key is 
invariant ?




--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period 
column for a row to set its end ts, and then inserting a new row with a 
start ts running on from that.


Of course, the adds are just inserting new rows.

cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in 
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or just 
"value" ?





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 2:12 AM, John R Pierce wrote:

  Column |   Type| Modifiers
+---+---
  period | tsrange   | not null 


wait, what is a tsrange?   the standard textsearch data types in 
postgres are tsvector and tsquery,



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 2:01 AM, Chris Withers wrote:

Hi All,

I have quite a few tables that follow a pattern like this:

 Table "public.my_model"
  Column |   Type| Modifiers
+---+---
  period | tsrange   | not null
  key| character varying | not null
  value  | integer   |
Indexes:
 "my_model_pkey" PRIMARY KEY, btree (period, key)
 "my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, 
key WITH =)

Check constraints:
 "my_model_period_check" CHECK (period <> 'empty'::tsrange)

So, a primary key of a period column and one or more other columns 
(usually int or string) and an exclude constraint to prevent overlaps, 
and a check constraint to prevent empty ranges.


However, I'm hitting performance problems on moderate bulk inserts and 
updates, with ~700k rows taking around 13 minutes. Profiling my python 
code suggests that most of the time is being taken by Postgres (9.4 in 
this case...)


What can I do to speed things up? Is there a different type of index I 
can use to achieve the same exclude constraint? Is there something I 
can do to have the index changes only done on the commit of the bulk 
batches? 


if (period,key) is unique, by virtue of being the primary key, then 
whats the point of the exclusion ??


I'm curious, how fast do your insert/updates run if you remove the key 
exclusion and check constraint ?  tsvector operations are a lot more 
complicated than simple matches in indexing





--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

Hi All,

I have quite a few tables that follow a pattern like this:

 Table "public.my_model"
  Column |   Type| Modifiers
+---+---
  period | tsrange   | not null
  key| character varying | not null
  value  | integer   |
Indexes:
 "my_model_pkey" PRIMARY KEY, btree (period, key)
 "my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, key 
WITH =)

Check constraints:
 "my_model_period_check" CHECK (period <> 'empty'::tsrange)

So, a primary key of a period column and one or more other columns 
(usually int or string) and an exclude constraint to prevent overlaps, 
and a check constraint to prevent empty ranges.


However, I'm hitting performance problems on moderate bulk inserts and 
updates, with ~700k rows taking around 13 minutes. Profiling my python 
code suggests that most of the time is being taken by Postgres (9.4 in 
this case...)


What can I do to speed things up? Is there a different type of index I 
can use to achieve the same exclude constraint? Is there something I can 
do to have the index changes only done on the commit of the bulk batches?


cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance options for CPU bound multi-SUM query

2016-01-27 Thread David Rowley
On 28 January 2016 at 08:41, Matt  wrote:
> Moving from NUMERIC to FLOAT(8) did indeed lower query times by about 20%.
>
> I will try fixeddecimal and agg() as time permits.

That's surprisingly little gain. Please note that you'll not gain any
further improvements from the fixeddecimal type than you won't have
already gotten from float8. My tests showed that it's very slightly
slower than float8, which is possibly due to float8 addition not
having overflow checks.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance options for CPU bound multi-SUM query

2016-01-27 Thread Matt
Moving from NUMERIC to FLOAT(8) did indeed lower query times by about 
20%.


I will try fixeddecimal and agg() as time permits.


On 25 Jan 2016, at 4:44, David Rowley wrote:


On 25 January 2016 at 15:45, Matt  wrote:
I have a warehousing case where data is bucketed by a key of an 
hourly
timestamp and 3 other columns. In addition there are 32 numeric 
columns. The
tables are partitioned on regular date ranges, and aggregated to the 
lowest

resolution usable.

The principal use case is to select over a range of dates or hours, 
filter

by the other key columns, and SUM() all 32 of the other columns. The
execution plan shows the primary key index limits row selection 
efficiently,

but the query appears CPU bound in performing all of those 32 SUM()
aggregates.



SUM(numeric) also has to work quite a bit harder than an an aggregate
like sum(float8) too since the addition in numeric is implemented in
software.
It depends on the use case, but for some cases the float4 or float8
types might be an option and it will offer much faster aggregation.
There is also https://github.com/2ndQuadrant/fixeddecimal which may be
of some use if you need fixed precision up to a predefined scale. We
found that using fixeddecimal instead of numeric for the TPC-H
benchmark improved performance of query 1 significantly.

--
David Rowley   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance options for CPU bound multi-SUM query

2016-01-25 Thread Andreas Kretschmer
Matt  wrote:

> I have a warehousing case where data is bucketed by a key of an hourly
> timestamp and 3 other columns. In addition there are 32 numeric columns. The
> tables are partitioned on regular date ranges, and aggregated to the lowest
> resolution usable.
> 
> The principal use case is to select over a range of dates or hours, filter by
> the other key columns, and SUM() all 32 of the other columns. The execution
> plan shows the primary key index limits row selection efficiently, but the
> query appears CPU bound in performing all of those 32 SUM() aggregates.
> 
> I am looking at a couple of distributed PostgreSQL forks, but until those 
> reach
> feature parity with 9.5 I am hoping to stay with single node PostgreSQL.
> 
> Are there any other options I can use to improve query times?

Maybe cybertec's agg() - patch, see
http://www.cybertec.at/postgresql_produkte/agg-parallele-aggregierungen-fuer-postgresql/

(and ask Hans for an english docu!)

But, i see, it needs 9.5.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance options for CPU bound multi-SUM query

2016-01-25 Thread David Rowley
On 25 January 2016 at 15:45, Matt  wrote:
> I have a warehousing case where data is bucketed by a key of an hourly
> timestamp and 3 other columns. In addition there are 32 numeric columns. The
> tables are partitioned on regular date ranges, and aggregated to the lowest
> resolution usable.
>
> The principal use case is to select over a range of dates or hours, filter
> by the other key columns, and SUM() all 32 of the other columns. The
> execution plan shows the primary key index limits row selection efficiently,
> but the query appears CPU bound in performing all of those 32 SUM()
> aggregates.
>

SUM(numeric) also has to work quite a bit harder than an an aggregate
like sum(float8) too since the addition in numeric is implemented in
software.
It depends on the use case, but for some cases the float4 or float8
types might be an option and it will offer much faster aggregation.
There is also https://github.com/2ndQuadrant/fixeddecimal which may be
of some use if you need fixed precision up to a predefined scale. We
found that using fixeddecimal instead of numeric for the TPC-H
benchmark improved performance of query 1 significantly.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance options for CPU bound multi-SUM query

2016-01-24 Thread Matt
I have a warehousing case where data is bucketed by a key of an hourly 
timestamp and 3 other columns. In addition there are 32 numeric columns. 
The tables are partitioned on regular date ranges, and aggregated to the 
lowest resolution usable.


The principal use case is to select over a range of dates or hours, 
filter by the other key columns, and SUM() all 32 of the other columns. 
The execution plan shows the primary key index limits row selection 
efficiently, but the query appears CPU bound in performing all of those 
32 SUM() aggregates.


I am looking at a couple of distributed PostgreSQL forks, but until 
those reach feature parity with 9.5 I am hoping to stay with single node 
PostgreSQL.


Are there any other options I can use to improve query times?

Server is 64GB RAM, with work_mem set to 1GB. All SSD storage, with 
separate RAID-10 volumes for OS, data, and indexes. Additional setting 
beyond defaults as follows.


~~~
default_statistics_target = 500
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = 48GB
work_mem = 1GB
wal_buffers = 16MB
checkpoint_segments = 128
shared_buffers = 16GB
max_connections = 20
~~~

[GENERAL] Performance tuning assisted by a GUI application

2015-04-21 Thread Jacek Wielemborek
Hello,

I had a brief discussion on #postgresql and thought that perhaps there
might be a need for a tool that would enable a fine-tuning of PostgreSQL
performance settings by conveniently testing them with a sample SQL
query with the aid of a simple GUI application. To illustrate this, I
created this little proof of concept:

https://gist.github.com/d33tah/d01f3599e55e53d00f68

Screenshot can be found here: https://imgur.com/TguH6Xq

This is by far not even alpha quality here and would need some basic
Python knowledge to set up the connection string. This only supports
modifying cpu_tuple_cost right now, but I guess that you can extrapolate
where this would go.

What do you think? Would anybody be interested in an application like this?

Cheers,
Jacek Wielemborek





signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance slowing down when doing same UPDATE many times

2015-03-09 Thread Chris Mair
 Hi,
 
  
 does no one have an idea?
 
 It may be a rare case doing the same UPDATE a thousand times. But I´m really 
 interested why this is not happening when doing DIFFERENT updates. And, of 
 course,  if something could be done on the database side to prevent this 
 behavior in case some application developer does the same “mistake” again.
 
  
 Thanks
 
 Jan


Hi,

in an UPDATE operation PostgreSQL has to create a new tuple and marking the old 
as unread.

A long time ago, what you see here was very common: for a heavy update load 
frequent vaccum
was recommended. Then, in 8.3, a feature called HOT (heap-only tuples) was 
introduced that
made away with this problem.

I'm not 100% sure what happens in your case, but I think the problem is the 
updates
are all in the *same* transaction. That is indeed a rare situation.

Bye,
Chris.


 
  
  
 From: Jan Strube 
 Sent: Tuesday, February 10, 2015 12:03 PM
 To: 'pgsql-general@postgresql.org'
 Subject: Performance slowing down when doing same UPDATE many times
 
  
 Hi,
 
  
 we recently found a bug in one of our applications which was doing exactly 
 the same UPDATE operation a few thousand times inside a transaction. This 
 caused the UPDATEs to become slower and slower from some milliseconds to some 
 seconds. We already fixed the application but I am wondering if this might be 
 a PostgreSQL bug, too.
 
  
 Here is a simple test case that performs and benchmarks 100,000 UPDATEs 
 (benchmarking only every 10,000th to reduce output):
 
  
 BEGIN;
 
 CREATE TEMP TABLE test (id integer PRIMARY KEY, flag boolean DEFAULT false);
 
 INSERT INTO test (id) SELECT generate_series(1, 10);
 
  
 DO $$
 
 DECLARE
 
   s timestamp;
 
   e timestamp;
 
 BEGIN
 
   FOR i IN 1..10 LOOP
 
 SELECT clock_timestamp() INTO s;
 
 UPDATE test SET flag = true WHERE id = 12345;
 
 SELECT clock_timestamp() INTO e;
 
  
 IF i%1 = 0 THEN
 
   RAISE NOTICE '%', e-s;
 
 END IF;
 
   END LOOP;
 
 END $$;
 
 ROLLBACK;
 
  
 The output looks like this:
 
  
 NOTICE:  00:00:00.000525
 
 NOTICE:  00:00:00.000992
 
 NOTICE:  00:00:00.001404
 
 NOTICE:  00:00:00.001936
 
 NOTICE:  00:00:00.002374
 
 NOTICE:  00:00:00.002925
 
 NOTICE:  00:00:00.003525
 
 NOTICE:  00:00:00.004015
 
 NOTICE:  00:00:00.00453
 
 NOTICE:  00:00:00.004976
 
  
 The problem only occurs inside a transaction and if the same dataset is 
 updated. I´m using PostgreSQL 9.1.15.
 
  
 Jan
 
  
 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance slowing down when doing same UPDATE many times

2015-03-09 Thread Pavel Stehule
Hi

it is side effect of MVCC implementation of Postgres. There is not possible
vacuum inside open transaction.

If you need it, then you should to use a different database - Postgres
doesn't work well when one record is highly often used in transaction.
Usual solution for Postgres is some proxy, that work like write cache.

Regards

Pavel

2015-03-09 8:47 GMT+01:00 Jan Strube j...@deriva.de:

 Hi,



 does no one have an idea?

 It may be a rare case doing the same UPDATE a thousand times. But I´m
 really interested why this is not happening when doing DIFFERENT updates.
 And, of course,  if something could be done on the database side to prevent
 this behavior in case some application developer does the same “mistake”
 again.



 Thanks

 Jan





 *From:* Jan Strube
 *Sent:* Tuesday, February 10, 2015 12:03 PM
 *To:* 'pgsql-general@postgresql.org'
 *Subject:* Performance slowing down when doing same UPDATE many times



 Hi,



 we recently found a bug in one of our applications which was doing exactly
 the same UPDATE operation a few thousand times inside a transaction. This
 caused the UPDATEs to become slower and slower from some milliseconds to
 some seconds. We already fixed the application but I am wondering if this
 might be a PostgreSQL bug, too.



 Here is a simple test case that performs and benchmarks 100,000 UPDATEs
 (benchmarking only every 10,000th to reduce output):



 BEGIN;

 CREATE TEMP TABLE test (id integer PRIMARY KEY, flag boolean DEFAULT
 false);

 INSERT INTO test (id) SELECT generate_series(1, 10);



 DO $$

 DECLARE

   s timestamp;

   e timestamp;

 BEGIN

   FOR i IN 1..10 LOOP

 SELECT clock_timestamp() INTO s;

 UPDATE test SET flag = true WHERE id = 12345;

 SELECT clock_timestamp() INTO e;



 IF i%1 = 0 THEN

   RAISE NOTICE '%', e-s;

 END IF;

   END LOOP;

 END $$;

 ROLLBACK;



 The output looks like this:



 NOTICE:  00:00:00.000525

 NOTICE:  00:00:00.000992

 NOTICE:  00:00:00.001404

 NOTICE:  00:00:00.001936

 NOTICE:  00:00:00.002374

 NOTICE:  00:00:00.002925

 NOTICE:  00:00:00.003525

 NOTICE:  00:00:00.004015

 NOTICE:  00:00:00.00453

 NOTICE:  00:00:00.004976



 The problem only occurs inside a transaction and if the same dataset is
 updated. I´m using PostgreSQL 9.1.15.



 Jan





Re: [GENERAL] Performance slowing down when doing same UPDATE many times

2015-03-09 Thread Jan Strube
Hi,

 
does no one have an idea?

It may be a rare case doing the same UPDATE a thousand times. But I´m really 
interested why this is not happening when doing DIFFERENT updates. And, of 
course,  if something could be done on the database side to prevent this 
behavior in case some application developer does the same “mistake” again.

 
Thanks

Jan

 
 
From: Jan Strube 
Sent: Tuesday, February 10, 2015 12:03 PM
To: 'pgsql-general@postgresql.org'
Subject: Performance slowing down when doing same UPDATE many times

 
Hi,

 
we recently found a bug in one of our applications which was doing exactly the 
same UPDATE operation a few thousand times inside a transaction. This caused 
the UPDATEs to become slower and slower from some milliseconds to some seconds. 
We already fixed the application but I am wondering if this might be a 
PostgreSQL bug, too.

 
Here is a simple test case that performs and benchmarks 100,000 UPDATEs 
(benchmarking only every 10,000th to reduce output):

 
BEGIN;

CREATE TEMP TABLE test (id integer PRIMARY KEY, flag boolean DEFAULT false);

INSERT INTO test (id) SELECT generate_series(1, 10);

 
DO $$

DECLARE

  s timestamp;

  e timestamp;

BEGIN

  FOR i IN 1..10 LOOP

    SELECT clock_timestamp() INTO s;

    UPDATE test SET flag = true WHERE id = 12345;

    SELECT clock_timestamp() INTO e;

 
    IF i%1 = 0 THEN

  RAISE NOTICE '%', e-s;

    END IF;

  END LOOP;

END $$;

ROLLBACK;

 
The output looks like this:

 
NOTICE:  00:00:00.000525

NOTICE:  00:00:00.000992

NOTICE:  00:00:00.001404

NOTICE:  00:00:00.001936

NOTICE:  00:00:00.002374

NOTICE:  00:00:00.002925

NOTICE:  00:00:00.003525

NOTICE:  00:00:00.004015

NOTICE:  00:00:00.00453

NOTICE:  00:00:00.004976

 
The problem only occurs inside a transaction and if the same dataset is 
updated. I´m using PostgreSQL 9.1.15.

 
Jan

 


Re: [GENERAL] Performance on DISABLE TRIGGER

2015-03-03 Thread gmb
David Steele wrote
 
 ALTER TABLE requires an exclusive lock - my guess is that another
 process has a lock on the table.  It could even be a select.
 
 pg_locks is your friend in this case:
 http://www.postgresql.org/docs/9.4/static/view-pg-locks.html

Hi David 
I'm a bit confused on how to interpret the result of the pg_locks view.
After running the following (as per linked page).
  SELECT pl.* , psa.query FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON
pl.pid = psa.pid;

I get a result of 2 locks on the ALTER TABLE ... statement:

Expanded display is on.
-[ RECORD 1 ]--+-
locktype   | virtualxid
database   | 
relation   | 
page   | 
tuple  | 
virtualxid | 5/57182
transactionid  | 
classid| 
objid  | 
objsubid   | 
virtualtransaction | 5/57182
pid| 6128
mode   | ExclusiveLock
granted| t
fastpath   | t
query  | ALTER TABLE tab DISABLE TRIGGER trig;
-[ RECORD 2 ]--+
locktype   | relation
database   | 16393
relation   | 22595
page   | 
tuple  | 
virtualxid | 
transactionid  | 
classid| 
objid  | 
objsubid   | 
virtualtransaction | 5/57182
pid| 6128
mode   | AccessExclusiveLock
granted| f
fastpath   | f
query  | ALTER TABLE tab DISABLE TRIGGER trig;


Something else I noticed while checking out the Server Status window in the
pgAdmin tool:
The stats_activity query run there displays  a blocked by column , which
I'm assuming is retrieved using data from pg_locks .
I assume I'm seeing the pid of the process which is causing the block. 
THe process however, is a query generated by the pgADmin tool itself:

   SELECT CASE WHEN typbasetype=0 THEN oid else typbasetype END AS basetype
   FROM pg_type WHERE oid=1700

May this be the case of pg_catalog data being in need of maintenance ? 

Regards



--
View this message in context: 
http://postgresql.nabble.com/Performance-on-DISABLE-TRIGGER-tp5839727p5840221.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance on DISABLE TRIGGER (resend)

2015-03-03 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 I'm doing some maintenance - which is done quite often, never had this
 problem before - which requires me to disable triggers, run some updates and
 then re-enable the triggers. 

Disabling triggers requires a heavy lock. A better way is to use 
the session_replication_role feature. See:

http://blog.endpoint.com/2015/01/postgres-sessionreplication-role.html

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201502271149
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAlTwoDMACgkQvJuQZxSWSshyzwCfQulu6DCOBu28gvoY++evftuo
xAAAn01YlcLj+TvkCsur10riMUD1y5uY
=UR3z
-END PGP SIGNATURE-




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance on DISABLE TRIGGER (resend)

2015-03-03 Thread gmb
Greg Sabino Mullane wrote
 Disabling triggers requires a heavy lock. A better way is to use 
 the session_replication_role feature. See:
 
 http://blog.endpoint.com/2015/01/postgres-sessionreplication-role.html

This is a very effective solution to my problem. Thanks for the tip, Greg.



--
View this message in context: 
http://postgresql.nabble.com/Performance-on-DISABLE-TRIGGER-tp5839727p5840247.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance on DISABLE TRIGGER

2015-02-28 Thread David Steele
On 2/27/15 9:07 AM, gmb wrote:
 Hi all
 
 I'm doing some maintenance - which is done quite often, never had this
 problem before - which requires me to disable triggers, run some updates and
 then re-enable the triggers. 
 Where the whole process normally take 30 sec , it took much longer today and
 I cancelled after 5 minutes.
 
 After running the statements individually to pinpoint the issue , I
 identified that the problem is on the first line.
   ALTER TABLE tab DISABLE TRIGGER trig;
 I have not been able to run this successfully -- after 10 minutes the
 connection ( to remote DB ) got severed .
 Any ideas on where I can start to look for the cause of the problem ? 

ALTER TABLE requires an exclusive lock - my guess is that another
process has a lock on the table.  It could even be a select.

pg_locks is your friend in this case:
http://www.postgresql.org/docs/9.4/static/view-pg-locks.html

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [GENERAL] Performance on DISABLE TRIGGER

2015-02-27 Thread Greg Sabino Mullane


binK_rPgnIklP.bin
Description: charset


[GENERAL] Performance on DISABLE TRIGGER

2015-02-27 Thread gmb
Hi all

I'm doing some maintenance - which is done quite often, never had this
problem before - which requires me to disable triggers, run some updates and
then re-enable the triggers. 
Where the whole process normally take 30 sec , it took much longer today and
I cancelled after 5 minutes.

After running the statements individually to pinpoint the issue , I
identified that the problem is on the first line.
  ALTER TABLE tab DISABLE TRIGGER trig;
I have not been able to run this successfully -- after 10 minutes the
connection ( to remote DB ) got severed .
Any ideas on where I can start to look for the cause of the problem ? 

Thanks



--
View this message in context: 
http://postgresql.nabble.com/Performance-on-DISABLE-TRIGGER-tp5839727.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance slowing down when doing same UPDATE many times

2015-02-10 Thread Jan Strube
Hi,

 
we recently found a bug in one of our applications which was doing exactly the 
same UPDATE operation a few thousand times inside a transaction. This caused 
the UPDATEs to become slower and slower from some milliseconds to some seconds. 
We already fixed the application but I am wondering if this might be a 
PostgreSQL bug, too.

 
Here is a simple test case that performs and benchmarks 100,000 UPDATEs 
(benchmarking only every 10,000th to reduce output):

 
BEGIN;

CREATE TEMP TABLE test (id integer PRIMARY KEY, flag boolean DEFAULT false);

INSERT INTO test (id) SELECT generate_series(1, 10);

 
DO $$

DECLARE

  s timestamp;

  e timestamp;

BEGIN

  FOR i IN 1..10 LOOP

    SELECT clock_timestamp() INTO s;

    UPDATE test SET flag = true WHERE id = 12345;

    SELECT clock_timestamp() INTO e;

 
    IF i%1 = 0 THEN

  RAISE NOTICE '%', e-s;

    END IF;

  END LOOP;

END $$;

ROLLBACK;

 
The output looks like this:

 
NOTICE:  00:00:00.000525

NOTICE:  00:00:00.000992

NOTICE:  00:00:00.001404

NOTICE:  00:00:00.001936

NOTICE:  00:00:00.002374

NOTICE:  00:00:00.002925

NOTICE:  00:00:00.003525

NOTICE:  00:00:00.004015

NOTICE:  00:00:00.00453

NOTICE:  00:00:00.004976

 
The problem only occurs inside a transaction and if the same dataset is 
updated. I´m using PostgreSQL 9.1.15.

 
Jan

 


Re: [GENERAL] Performance question

2014-11-22 Thread Anil Menon
Thanks Laurenz, very good point!

Luckily (phew!) the business scenario is such that race conditions cannot
occur (and the transaction table is append only). There is business
workflow to address duplicates but
1) it occurs extremely rarely (it would be a deliberate sabotage if it
occurs)
2) there is no impact on business

Yours
Anil



On Fri, Nov 21, 2014 at 5:16 PM, Albe Laurenz laurenz.a...@wien.gv.at
wrote:

 Anil Menon wrote:
  I would like to ask from your experience which would be the best
 generic method for checking if row
  sets of a certain condition exists in a PLPGSQL function.
 
  I know of 4 methods so far (please feel free to add if I missed out any
 others)
 [...]

 Are you aware that all of these methods have a race condition unless
 you use isolation level READ STABILITY or better?

 It may be that rows are added or removed between the check and the
 corresponding action.

 Yours,
 Laurenz Albe



Re: [GENERAL] Performance question

2014-11-21 Thread Albe Laurenz
Anil Menon wrote:
 I would like to ask from your experience which would be the best generic 
 method for checking if row
 sets of a certain condition exists in a PLPGSQL function.
 
 I know of 4 methods so far (please feel free to add if I missed out any 
 others)
[...]

Are you aware that all of these methods have a race condition unless
you use isolation level READ STABILITY or better?

It may be that rows are added or removed between the check and the
corresponding action.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance question

2014-11-20 Thread Anil Menon
Thanks Adrian


On Thu, Nov 20, 2014 at 3:46 AM, Adrian Klaver adrian.kla...@aklaver.com
wrote:

 On 11/19/2014 08:26 AM, Anil Menon wrote:

 Hello,

 I would like to ask from your experience which would be the best
 generic method for checking if row sets of a certain condition exists
 in a PLPGSQL function.

 I know of 4 methods so far (please feel free to add if I missed out any
 others)

 1) get a count (my previous experience with ORCL shaped this option)

 select count(*) into vcnt
 from table
 where condition
 if vcnt 0 then
do X
 else
do y
 end if
 Cons : It seems doing a count(*) is not the best option for PG



 Well that would depend on the table size, whether it was 100 rows vs
 1,000,000 rows



​The table is estimated/guesstimated to be ~900 million rows (~30Ma day​,
90 days history, though initially it would be ~30M), though the where
part of the query would return between 0 and ~2 rows





 2) Use a non-count option
 select primary_key_Col into vcnt
 from table
 where condition
 if found then
do X
 else
do y
 end if
 Cons :Some people seems not to prefer this as (AFAIU) it causes a
 plpgsql-sql-plpgsql switches


 plpgsql is fairly tightly coupled to SQL, so I have not really seen any
 problems. But then I am not working on large datasets.


​I think that ~900M rows would constitute a large data set most likely
​




 3) using perform
 perform primary_key_Col into vcnt
 from table
 where condition
 if found then
do X
 else
do y
 end if

 Seems to remove the above (item 2) issues (if any)


 AFAIK, you cannot do the above as written. PERFORM does not return a
 result:

 http://www.postgresql.org/docs/9.3/interactive/plpgsql-
 statements.html#PLPGSQL-STATEMENTS-SQL-NORESULT

 It would have to be more like:

 perform primary_key_Col from table where condition


​You are absolutely right - my bad​.


 4) using exists
 if exists ( select 1 from table where condition ) then
   do x
 else
   do y
 end if


 My question is what would be the best (in terms of performance) method
 to use? My gut feeling is to use option 4 for PG. Am I right or is there
 any other method?


 All of the above is context specific. To know for sure you will need to
 test on actual data.


​Absolutely right, just that I want to ensure that I follow the most
optimal method before the DB goes into production, after which priorities
change on what needs to be changed.​




 --
 Adrian Klaver
 adrian.kla...@aklaver.com


​I guess the best answer would be its very context specific​, but picking
the brains of experienced resources helps :-)

Thanks again
Anil​


Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4

2014-11-19 Thread Robert Haas
On Mon, Nov 17, 2014 at 4:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 One thing that occurs to me is that if the generic plan estimate comes
 out much cheaper than the custom one, maybe we should assume that the
 generic's cost estimate is bogus.  Right offhand I can't think of a reason
 for a custom plan to look worse than a generic one, unless there's a
 statistical quirk like this one.

 That's an interesting idea, but what do we do after deciding that it's
 bogus?

 Keep using custom plans.  It's possible that the estimate that's in error
 is the custom one, but that's not the way to bet IMO, since the custom
 plan estimate is based on better information.

 The generic plan really can't be cheaper than the custom plan,
 but it could be the same price, or as close as makes no difference.

 Right, and what we want to do is use the generic plan as long as it's
 close to the same cost (close enough to not justify replanning effort).
 The trick here is to not be fooled by estimation errors.  Can we assume
 that generic cost  custom cost is always an estimation error?

Maybe.  It seems like kind of a fragile bet to me.  There's going to
be some qual selectivity below which an index scan on a particular
table outperforms a sequential scan, but the selectivity estimated for
a generic plan can be either higher or lower than the selectivity we'd
estimate for some particular value.  And once one of the two plans
decides on an index scan while the other one divides on a sequential
scan, it can cascade through and change the whole plan - e.g. because
it affects whether the tuples emerge with usable pathkeys.  I don't
feel very confident about assuming that applying  to the result of
all that is going to tell us anything useful.

I think what's really going on here is that the generic plan will be
optimal for some range of possible qual selectivities.  Typically, the
qual is of the form col = val, so the plan will be optimal for all
values where the estimated frequency is between some values A and B.
What would be nice is to notice when we see a value that is outside
that range, and switch to a custom plan in that case.  I realize that
the planner isn't really well set up to deliver the information we'd
need to test that for each new supplied value, but that's really what
we need to do.  The current system wastes CPU cycles by replanning up
to 5 times even when there is no benefit to be gained by it, but can
also cause big performance problems when it settles into a generic
plan and then a value with different characteristics shows up later
on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance question

2014-11-19 Thread Anil Menon
Hello,

I would like to ask from your experience which would be the best generic
method for checking if row sets of a certain condition exists in a PLPGSQL
function.

I know of 4 methods so far (please feel free to add if I missed out any
others)

1) get a count (my previous experience with ORCL shaped this option)

select count(*) into vcnt
from table
where condition
if vcnt 0 then
  do X
else
  do y
end if
Cons : It seems doing a count(*) is not the best option for PG

2) Use a non-count option
select primary_key_Col into vcnt
from table
where condition
if found then
  do X
else
  do y
end if
Cons :Some people seems not to prefer this as (AFAIU) it causes a
plpgsql-sql-plpgsql switches

3) using perform
perform primary_key_Col into vcnt
from table
where condition
if found then
  do X
else
  do y
end if

Seems to remove the above (item 2) issues (if any)

4) using exists
if exists ( select 1 from table where condition ) then
 do x
else
 do y
end if


My question is what would be the best (in terms of performance) method to
use? My gut feeling is to use option 4 for PG. Am I right or is there any
other method?

Thanks in advance
Anil


  1   2   3   4   5   6   7   8   9   >