Re: 8.2 Autovacuum BUG ?

2018-01-28 Thread Pavan Teja
On Jan 26, 2018 6:02 AM, "Pavan Teja"  wrote:



On Jan 26, 2018 3:00 AM, "Alvaro Herrera"  wrote:

pavan95 wrote:
> Hi Álvaro Herrera,
>
> Please find the corresponding output:

OK, these settings look pretty normal, so they don't explain your
problem.

What is checkpoint_segments set to?  And checkpoint_timeout?

--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi,

  checkpoint_segments are set to '3' &
  checkpoint_timeout was set to '5min'.

 Regards,
 Pavan.

  Any clue???

 Regards,
  Pavan.


Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2018-01-28 Thread Bruce Momjian
On Thu, Oct 26, 2017 at 02:45:15PM -0500, Justin Pryzby wrote:
> On Tue, Oct 24, 2017 at 04:15:59PM +0200, johannes graën wrote:
> > Hi Pavel, *,
> > 
> > you were right with ANALYZing the DB first. However, even after doing
> > so, I frequently see Seq Scans where an index was used before. This
> > usually cooccurs with parallelization and looked different before
> > upgrading to 10. I can provide an example for 10 [1], but I cannot
> > generate a query plan for 9.6 anymore.
> > 
> > Any ideas what makes the new version more seqscanny?
> 
> Is it because max_parallel_workers_per_gather now defaults to 2 ?
> 
> BTW, I would tentatively expect a change in default to be documented in the
> release notes but can't see that it's.
> 77cd477c4ba885cfa1ba67beaa82e06f2e182b85

Oops, you are correct.  The PG 10 release notes, which I wrote, should
have mentioned this.  :-(

https://www.postgresql.org/docs/10/static/release-10.html

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Query Slow After 2018

2018-01-28 Thread David G. Johnston
On Sunday, January 28, 2018, Nur Agus  wrote:

>
> Is this a bug on postgres-9.4.4 ?
>
> We are considering upgrade to postgres-10 but since this is a very
> critical system, it takes a lot of test and approval :)
>
>

Upgrade to 9.4.15.  Asking if 9.4.4 might have a bug is a question most
people here aren't going to be inclined to answer given the 11 bug fixes
that version has received since.

David J.


Re: Query Slow After 2018

2018-01-28 Thread Justin Pryzby
On Mon, Jan 29, 2018 at 12:32:59AM +0700, Nur Agus wrote:
> The following query run in just 9 ms:

>  "distrib_reports"."month" = 1 AND
>  "distrib_reports"."year" = 2017 AND
>  "distrib_reports"."state" = 'SUBMITTED' AND
>  "distrib_report_groups"."distrib_report_group_type_id" =
> '559a5fdc-418d-4494-aebf-80ecf8743d35'

> The explain analyze of the 2 queries are resulting on really different
> query plan, here are the links to depesz:
> 2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1
> 2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y

> The question is, why the query planner choose such very different path just
> by changing one parameter?

Looks like this badly underestimates its rowcount:

Index Scan using index_distrib_reports_on_year on distrib_reports 
(cost=0.42..40.62 rows=8 width=32) (actual time=0.034..50.452 rows=17,055 
loops=1)
Index Cond: (year = 2018)
Filter: ((month = 1) AND ((state)::text = 'SUBMITTED'::text))
Rows Removed by Filter: 1049

Maybe because "if year==2018" then, month=1 does essentialy nothing ..
..but postgres thinks it'll filters out some 90% of the rows.

And possibly the same for SUBMITTED (?)
You should probably use timestamp column rather than integer year+month.

On PG10, you could probably work around it using "CREATE STATISTICS".

> This behaviour is *not-reproducable* on postgres-10. On postgres-10, the
> query plan are consistent, and both have very acceptable time:
> 2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5
> 2018 --> --> explain result on postgres-10:
> https://explain.depesz.com/s/Tf5K
..I think default max_parallel_workers_per_gather=3 by chance causes the plan
to be the same.

I think there's still a underestimate rowcount with PG10 (without CREATE
STATISTICS), but it's masked by "rows=1 actual rows=0" roundoff error with
high loop count.

Justin



Query Slow After 2018

2018-01-28 Thread Nur Agus
Hello,

I would like to report a strange behaviour on postgresql 9.4.4.

The following query run in just 9 ms:

SELECT SUM("distrib_report_items"."qty") AS sum_id
FROM
 "distrib_report_items" INNER JOIN
 "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id"
INNER JOIN
 "distrib_reports" ON "distrib_reports"."id" =
"distrib_report_items"."distrib_report_id" INNER JOIN
 "distrib_report_groups" ON "distrib_report_groups"."id" =
"distrib_reports"."distrib_report_group_id"
WHERE
 "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND
 "distrib_report_items"."product_id" =
'05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND
 "distrib_reports"."month" = 1 AND
 "distrib_reports"."year" = 2017 AND
 "distrib_reports"."state" = 'SUBMITTED' AND
 "distrib_report_groups"."distrib_report_group_type_id" =
'559a5fdc-418d-4494-aebf-80ecf8743d35'

But changing just one parameter (the year) from 2017 to 2018, the "exactly
same query", become incredebly slow, at 8 seconds. This is the full query
after changing the year:

SELECT SUM("distrib_report_items"."qty") AS sum_id
FROM
 "distrib_report_items" INNER JOIN
 "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id"
INNER JOIN
 "distrib_reports" ON "distrib_reports"."id" =
"distrib_report_items"."distrib_report_id" INNER JOIN
 "distrib_report_groups" ON "distrib_report_groups"."id" =
"distrib_reports"."distrib_report_group_id"
WHERE
 "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND
 "distrib_report_items"."product_id" =
'05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND
 "distrib_reports"."month" = 1 AND
 "distrib_reports"."year" = 2018 AND
 "distrib_reports"."state" = 'SUBMITTED' AND
 "distrib_report_groups"."distrib_report_group_type_id" =
'559a5fdc-418d-4494-aebf-80ecf8743d35'

The explain analyze of the 2 queries are resulting on really different
query plan, here are the links to depesz:
2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1
2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y

The table growth itself are normal. distrib_report_items table are growing
from 1.9++ millions row on december 2017 to 2.3++ million rows on january
2018. Not a really significant growth.
The distrib_reports table (on which the year is filtered) has even less
rows on 2018 (10k rows) compared to 400.000++ rows on 2017, which is very
obvious.

The question is, why the query planner choose such very different path just
by changing one parameter?

The table structures are below:
https://pastebin.com/T6AmtQ3z

This behaviour is *not-reproducable* on postgres-10. On postgres-10, the
query plan are consistent, and both have very acceptable time:
2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5
2018 --> --> explain result on postgres-10:
https://explain.depesz.com/s/Tf5K

Is this a bug on postgres-9.4.4 ?

We are considering upgrade to postgres-10 but since this is a very critical
system, it takes a lot of test and approval :)

Thank you very much.


SV: copy csv into partitioned table with unique index

2018-01-28 Thread Lars Aksel Opsahl
Hei


Sorry it's was a zero to much, it should 30-40 million weather observations pr 
second.


Lars



Fra: Lars Aksel Opsahl 
Sendt: 28. januar 2018 12:57
Til: Mariel Cherkassky; PostgreSQL mailing lists
Emne: SV: copy csv into partitioned table with unique index


Hi


We had the same problems with performance when testing with more than 100 
billion weather observations. We now have a solution where we can push between 
300 and 400 million weather observations pr. Second into the database.


We download date from NetCDF files. The date are joined based on time and 
geolocation, so data from many different NetCDF should end up in the same row 
and table.


Doing this based on database join and table update was taking a long time as 
you also have noticed.


To get high performance we ended up this solution

- Uses os commands like awk,sed,join,cut,.. to prepare CSV file for the 
database.

- Use multithreads.

- Insert data directly into child tables.

- No triggers, constraints and indexes on working table.

- Don't update rows.

- Unlogged tables.


We first download NetCDF and make CSV files that fits in perfect for the copy a 
command and with complete files for each child tables it's created for, this is 
a time consuming operation.


So before the copy in into database we just do a truncate on the selected 
table. We are then able to insert between 300 and 400 mill. weather 
observations pr. Second. We have 11 observations pr row so it means around 35 
mill rows pr second. We have one child table for each year and month.


The database we working on have 16 dual core CPU's and SSD discs. When testing 
I was running 11 threads in parallel.


Indexes and constraints are added later based on needs.


How can you take on chance om using something like unlogged tables?


Linux system are quite stable and if we keep the a copy of the CVS files it 
does not take long time to insert data after crash.


You can also change your table to logged later if you need to secure your data 
in the database.


Lars




Fra: Mariel Cherkassky 
Sendt: 28. januar 2018 10:11
Til: PostgreSQL mailing lists
Emne: copy csv into partitioned table with unique index

Hi,
I configured a master table that is called "year_2018" :
create table year_2018(a int,b int c date);

The master table has a unique constraint on those 3 columns so that I wont have 
any duplicated rows. Moreover, I configured a before insert trigger on that 
table that creates a child table for each day in the year. The child should 
include all the data related to that specific day.

Now, every day I got a csv file that I need to load to the year table. I must 
load the data as fast as possible but I have 2 problems :
1)I must load the data as a bulk - via the copy command. However, the copy 
command fails because sometimes I have duplicated rows.
2)I tried to use the pgloader extension but it fails because I have a trigger 
before each insert.

-I cant load all the data into a temp table and then run insert into year_2018 
select * from temp because it takes too much time.

Any idea ?




SV: copy csv into partitioned table with unique index

2018-01-28 Thread Lars Aksel Opsahl
Hi


We had the same problems with performance when testing with more than 100 
billion weather observations. We now have a solution where we can push between 
300 and 400 million weather observations pr. Second into the database.


We download date from NetCDF files. The date are joined based on time and 
geolocation, so data from many different NetCDF should end up in the same row 
and table.


Doing this based on database join and table update was taking a long time as 
you also have noticed.


To get high performance we ended up this solution

- Uses os commands like awk,sed,join,cut,.. to prepare CSV file for the 
database.

- Use multithreads.

- Insert data directly into child tables.

- No triggers, constraints and indexes on working table.

- Don't update rows.

- Unlogged tables.


We first download NetCDF and make CSV files that fits in perfect for the copy a 
command and with complete files for each child tables it's created for, this is 
a time consuming operation.


So before the copy in into database we just do a truncate on the selected 
table. We are then able to insert between 300 and 400 mill. weather 
observations pr. Second. We have 11 observations pr row so it means around 35 
mill rows pr second. We have one child table for each year and month.


The database we working on have 16 dual core CPU's and SSD discs. When testing 
I was running 11 threads in parallel.


Indexes and constraints are added later based on needs.


How can you take on chance om using something like unlogged tables?


Linux system are quite stable and if we keep the a copy of the CVS files it 
does not take long time to insert data after crash.


You can also change your table to logged later if you need to secure your data 
in the database.


Lars




Fra: Mariel Cherkassky 
Sendt: 28. januar 2018 10:11
Til: PostgreSQL mailing lists
Emne: copy csv into partitioned table with unique index

Hi,
I configured a master table that is called "year_2018" :
create table year_2018(a int,b int c date);

The master table has a unique constraint on those 3 columns so that I wont have 
any duplicated rows. Moreover, I configured a before insert trigger on that 
table that creates a child table for each day in the year. The child should 
include all the data related to that specific day.

Now, every day I got a csv file that I need to load to the year table. I must 
load the data as fast as possible but I have 2 problems :
1)I must load the data as a bulk - via the copy command. However, the copy 
command fails because sometimes I have duplicated rows.
2)I tried to use the pgloader extension but it fails because I have a trigger 
before each insert.

-I cant load all the data into a temp table and then run insert into year_2018 
select * from temp because it takes too much time.

Any idea ?




Re: copy csv into partitioned table with unique index

2018-01-28 Thread legrand legrand
Did you try to transform your temp table into a table partition using the
ALTER TABLE ATTACH syntax
https://www.postgresql.org/docs/10/static/ddl-partitioning.html

Regards
PAscal



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



copy csv into partitioned table with unique index

2018-01-28 Thread Mariel Cherkassky
Hi,
I configured a master table that is called "year_2018" :
create table year_2018(a int,b int c date);

The master table has a unique constraint on those 3 columns so that I wont
have any duplicated rows. Moreover, I configured a before insert trigger on
that table that creates a child table for each day in the year. The child
should include all the data related to that specific day.

Now, every day I got a csv file that I need to load to the year table. I
must load the data as fast as possible but I have 2 problems :
1)I must load the data as a bulk - via the copy command. However, the copy
command fails because sometimes I have duplicated rows.
2)I tried to use the pgloader extension but it fails because I have a
trigger before each insert.

-I cant load all the data into a temp table and then run insert into
year_2018 select * from temp because it takes too much time.

Any idea ?