Re: 8.2 Autovacuum BUG ?
On Jan 26, 2018 6:02 AM, "Pavan Teja" wrote: On Jan 26, 2018 3:00 AM, "Alvaro Herrera" wrote: pavan95 wrote: > Hi Álvaro Herrera, > > Please find the corresponding output: OK, these settings look pretty normal, so they don't explain your problem. What is checkpoint_segments set to? And checkpoint_timeout? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services Hi, checkpoint_segments are set to '3' & checkpoint_timeout was set to '5min'. Regards, Pavan. Any clue??? Regards, Pavan.
Re: [PERFORM] performance drop after upgrade (9.6 > 10)
On Thu, Oct 26, 2017 at 02:45:15PM -0500, Justin Pryzby wrote: > On Tue, Oct 24, 2017 at 04:15:59PM +0200, johannes graën wrote: > > Hi Pavel, *, > > > > you were right with ANALYZing the DB first. However, even after doing > > so, I frequently see Seq Scans where an index was used before. This > > usually cooccurs with parallelization and looked different before > > upgrading to 10. I can provide an example for 10 [1], but I cannot > > generate a query plan for 9.6 anymore. > > > > Any ideas what makes the new version more seqscanny? > > Is it because max_parallel_workers_per_gather now defaults to 2 ? > > BTW, I would tentatively expect a change in default to be documented in the > release notes but can't see that it's. > 77cd477c4ba885cfa1ba67beaa82e06f2e182b85 Oops, you are correct. The PG 10 release notes, which I wrote, should have mentioned this. :-( https://www.postgresql.org/docs/10/static/release-10.html -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Query Slow After 2018
On Sunday, January 28, 2018, Nur Agus wrote: > > Is this a bug on postgres-9.4.4 ? > > We are considering upgrade to postgres-10 but since this is a very > critical system, it takes a lot of test and approval :) > > Upgrade to 9.4.15. Asking if 9.4.4 might have a bug is a question most people here aren't going to be inclined to answer given the 11 bug fixes that version has received since. David J.
Re: Query Slow After 2018
On Mon, Jan 29, 2018 at 12:32:59AM +0700, Nur Agus wrote: > The following query run in just 9 ms: > "distrib_reports"."month" = 1 AND > "distrib_reports"."year" = 2017 AND > "distrib_reports"."state" = 'SUBMITTED' AND > "distrib_report_groups"."distrib_report_group_type_id" = > '559a5fdc-418d-4494-aebf-80ecf8743d35' > The explain analyze of the 2 queries are resulting on really different > query plan, here are the links to depesz: > 2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1 > 2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y > The question is, why the query planner choose such very different path just > by changing one parameter? Looks like this badly underestimates its rowcount: Index Scan using index_distrib_reports_on_year on distrib_reports (cost=0.42..40.62 rows=8 width=32) (actual time=0.034..50.452 rows=17,055 loops=1) Index Cond: (year = 2018) Filter: ((month = 1) AND ((state)::text = 'SUBMITTED'::text)) Rows Removed by Filter: 1049 Maybe because "if year==2018" then, month=1 does essentialy nothing .. ..but postgres thinks it'll filters out some 90% of the rows. And possibly the same for SUBMITTED (?) You should probably use timestamp column rather than integer year+month. On PG10, you could probably work around it using "CREATE STATISTICS". > This behaviour is *not-reproducable* on postgres-10. On postgres-10, the > query plan are consistent, and both have very acceptable time: > 2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5 > 2018 --> --> explain result on postgres-10: > https://explain.depesz.com/s/Tf5K ..I think default max_parallel_workers_per_gather=3 by chance causes the plan to be the same. I think there's still a underestimate rowcount with PG10 (without CREATE STATISTICS), but it's masked by "rows=1 actual rows=0" roundoff error with high loop count. Justin
Query Slow After 2018
Hello, I would like to report a strange behaviour on postgresql 9.4.4. The following query run in just 9 ms: SELECT SUM("distrib_report_items"."qty") AS sum_id FROM "distrib_report_items" INNER JOIN "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id" INNER JOIN "distrib_reports" ON "distrib_reports"."id" = "distrib_report_items"."distrib_report_id" INNER JOIN "distrib_report_groups" ON "distrib_report_groups"."id" = "distrib_reports"."distrib_report_group_id" WHERE "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND "distrib_report_items"."product_id" = '05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND "distrib_reports"."month" = 1 AND "distrib_reports"."year" = 2017 AND "distrib_reports"."state" = 'SUBMITTED' AND "distrib_report_groups"."distrib_report_group_type_id" = '559a5fdc-418d-4494-aebf-80ecf8743d35' But changing just one parameter (the year) from 2017 to 2018, the "exactly same query", become incredebly slow, at 8 seconds. This is the full query after changing the year: SELECT SUM("distrib_report_items"."qty") AS sum_id FROM "distrib_report_items" INNER JOIN "retailers" ON "retailers"."id" = "distrib_report_items"."retailer_id" INNER JOIN "distrib_reports" ON "distrib_reports"."id" = "distrib_report_items"."distrib_report_id" INNER JOIN "distrib_report_groups" ON "distrib_report_groups"."id" = "distrib_reports"."distrib_report_group_id" WHERE "retailers"."sub_district_id" = 'f4bff929-f911-4ab8-b1b2-aaa50e0ccb39' AND "distrib_report_items"."product_id" = '05167ad0-d2fa-4a4a-bd13-be8f89ce34a2' AND "distrib_reports"."month" = 1 AND "distrib_reports"."year" = 2018 AND "distrib_reports"."state" = 'SUBMITTED' AND "distrib_report_groups"."distrib_report_group_type_id" = '559a5fdc-418d-4494-aebf-80ecf8743d35' The explain analyze of the 2 queries are resulting on really different query plan, here are the links to depesz: 2017 --> explain result on postgres-9: https://explain.depesz.com/s/qJF1 2018 --> explain result on postgres-9: https://explain.depesz.com/s/pT0y The table growth itself are normal. distrib_report_items table are growing from 1.9++ millions row on december 2017 to 2.3++ million rows on january 2018. Not a really significant growth. The distrib_reports table (on which the year is filtered) has even less rows on 2018 (10k rows) compared to 400.000++ rows on 2017, which is very obvious. The question is, why the query planner choose such very different path just by changing one parameter? The table structures are below: https://pastebin.com/T6AmtQ3z This behaviour is *not-reproducable* on postgres-10. On postgres-10, the query plan are consistent, and both have very acceptable time: 2017 --> explain result on postgres-10: https://explain.depesz.com/s/N9r5 2018 --> --> explain result on postgres-10: https://explain.depesz.com/s/Tf5K Is this a bug on postgres-9.4.4 ? We are considering upgrade to postgres-10 but since this is a very critical system, it takes a lot of test and approval :) Thank you very much.
SV: copy csv into partitioned table with unique index
Hei Sorry it's was a zero to much, it should 30-40 million weather observations pr second. Lars Fra: Lars Aksel Opsahl Sendt: 28. januar 2018 12:57 Til: Mariel Cherkassky; PostgreSQL mailing lists Emne: SV: copy csv into partitioned table with unique index Hi We had the same problems with performance when testing with more than 100 billion weather observations. We now have a solution where we can push between 300 and 400 million weather observations pr. Second into the database. We download date from NetCDF files. The date are joined based on time and geolocation, so data from many different NetCDF should end up in the same row and table. Doing this based on database join and table update was taking a long time as you also have noticed. To get high performance we ended up this solution - Uses os commands like awk,sed,join,cut,.. to prepare CSV file for the database. - Use multithreads. - Insert data directly into child tables. - No triggers, constraints and indexes on working table. - Don't update rows. - Unlogged tables. We first download NetCDF and make CSV files that fits in perfect for the copy a command and with complete files for each child tables it's created for, this is a time consuming operation. So before the copy in into database we just do a truncate on the selected table. We are then able to insert between 300 and 400 mill. weather observations pr. Second. We have 11 observations pr row so it means around 35 mill rows pr second. We have one child table for each year and month. The database we working on have 16 dual core CPU's and SSD discs. When testing I was running 11 threads in parallel. Indexes and constraints are added later based on needs. How can you take on chance om using something like unlogged tables? Linux system are quite stable and if we keep the a copy of the CVS files it does not take long time to insert data after crash. You can also change your table to logged later if you need to secure your data in the database. Lars Fra: Mariel Cherkassky Sendt: 28. januar 2018 10:11 Til: PostgreSQL mailing lists Emne: copy csv into partitioned table with unique index Hi, I configured a master table that is called "year_2018" : create table year_2018(a int,b int c date); The master table has a unique constraint on those 3 columns so that I wont have any duplicated rows. Moreover, I configured a before insert trigger on that table that creates a child table for each day in the year. The child should include all the data related to that specific day. Now, every day I got a csv file that I need to load to the year table. I must load the data as fast as possible but I have 2 problems : 1)I must load the data as a bulk - via the copy command. However, the copy command fails because sometimes I have duplicated rows. 2)I tried to use the pgloader extension but it fails because I have a trigger before each insert. -I cant load all the data into a temp table and then run insert into year_2018 select * from temp because it takes too much time. Any idea ?
SV: copy csv into partitioned table with unique index
Hi We had the same problems with performance when testing with more than 100 billion weather observations. We now have a solution where we can push between 300 and 400 million weather observations pr. Second into the database. We download date from NetCDF files. The date are joined based on time and geolocation, so data from many different NetCDF should end up in the same row and table. Doing this based on database join and table update was taking a long time as you also have noticed. To get high performance we ended up this solution - Uses os commands like awk,sed,join,cut,.. to prepare CSV file for the database. - Use multithreads. - Insert data directly into child tables. - No triggers, constraints and indexes on working table. - Don't update rows. - Unlogged tables. We first download NetCDF and make CSV files that fits in perfect for the copy a command and with complete files for each child tables it's created for, this is a time consuming operation. So before the copy in into database we just do a truncate on the selected table. We are then able to insert between 300 and 400 mill. weather observations pr. Second. We have 11 observations pr row so it means around 35 mill rows pr second. We have one child table for each year and month. The database we working on have 16 dual core CPU's and SSD discs. When testing I was running 11 threads in parallel. Indexes and constraints are added later based on needs. How can you take on chance om using something like unlogged tables? Linux system are quite stable and if we keep the a copy of the CVS files it does not take long time to insert data after crash. You can also change your table to logged later if you need to secure your data in the database. Lars Fra: Mariel Cherkassky Sendt: 28. januar 2018 10:11 Til: PostgreSQL mailing lists Emne: copy csv into partitioned table with unique index Hi, I configured a master table that is called "year_2018" : create table year_2018(a int,b int c date); The master table has a unique constraint on those 3 columns so that I wont have any duplicated rows. Moreover, I configured a before insert trigger on that table that creates a child table for each day in the year. The child should include all the data related to that specific day. Now, every day I got a csv file that I need to load to the year table. I must load the data as fast as possible but I have 2 problems : 1)I must load the data as a bulk - via the copy command. However, the copy command fails because sometimes I have duplicated rows. 2)I tried to use the pgloader extension but it fails because I have a trigger before each insert. -I cant load all the data into a temp table and then run insert into year_2018 select * from temp because it takes too much time. Any idea ?
Re: copy csv into partitioned table with unique index
Did you try to transform your temp table into a table partition using the ALTER TABLE ATTACH syntax https://www.postgresql.org/docs/10/static/ddl-partitioning.html Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
copy csv into partitioned table with unique index
Hi, I configured a master table that is called "year_2018" : create table year_2018(a int,b int c date); The master table has a unique constraint on those 3 columns so that I wont have any duplicated rows. Moreover, I configured a before insert trigger on that table that creates a child table for each day in the year. The child should include all the data related to that specific day. Now, every day I got a csv file that I need to load to the year table. I must load the data as fast as possible but I have 2 problems : 1)I must load the data as a bulk - via the copy command. However, the copy command fails because sometimes I have duplicated rows. 2)I tried to use the pgloader extension but it fails because I have a trigger before each insert. -I cant load all the data into a temp table and then run insert into year_2018 select * from temp because it takes too much time. Any idea ?