[GENERAL] Getting getsockopt(TCP_KEEPALIVE) failed LOG message in PG Logs on Solaris 10
Hi, I am getting getsockopt(TCP_KEEPALIVE) failed: Option not supported by protocol log message in the PG Logs whenever I run a query referencing pg_catalog.pg_settings on Solaris 10 (both Sprac and x86-64). You can reproduce this case by running a query like: SELECT name, setting, unit FROM pg_catalog.pg_settings; This happens only the first time you execute the query though. This issue was reproducible to me on both PostgreSQL 9.0.4 and PostgreSQL 9.2.4. This happens when I run the query via psql or pgAdminIII by connecting to the database server over the TCP socket (i.e. by providing host details in -h). Same behavior is not reproducible if I connect to database server over Unix Socket. I am not sure if this has been reported before, but I thought I should bring this issue to your notice. regards, *Dhiraj Chawla* Senior Software Engineer EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91-20-30589522
[GENERAL] Incorrect index being used
Hi PostgreSQL community, I have the following query, run immediately after executing VACUUM in the database. There is only one connection to the database. The query runs for much longer than I expect it to run for, and I think this is due to it using the incorrect subplan. As you can see, subplans 1 and 3 make use of and index, but these subplans are not used. Subplans and 4 are seqscan, and they are used. How can I get PostgreSQL to use subplan 1 and 3? Thanks, Jesse testdb= explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND r0.ARCHIVE_DATE = '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1 WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS ( SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10; QUERY PLAN Limit (cost=0.56..151.79 rows=10 width=122) (actual time=44601.350..97649.196 rows=2 loops=1) - Index Scan Backward using idx_archive_document_x_node_id_archive_date on archive_document r0 (cost=0.56..8258406.24 rows=546105 width=122) (actual time=44601.33 Index Cond: ((node_id = 29) AND (archive_date = '2013-07-08 18:28:00'::timestamp without time zone)) Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) Rows Removed by Filter: 710851 SubPlan 1 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text)) Heap Fetches: 0 SubPlan 2 - Seq Scan on archive_document_index r1_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=44418.383..44558.293 rows=4 loops=1) Filter: ((value)::text = 'BSH70002152'::text) Rows Removed by Filter: 95009919 SubPlan 3 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text)) Heap Fetches: 0 SubPlan 4 - Seq Scan on archive_document_index r2_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=41659.464..41663.342 rows=1 loops=1) Filter: ((value)::text = 'TC212592'::text) Rows Removed by Filter: 95009922 Total runtime: 97683.836 ms (22 rows) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can checkpoint creation be parallel?
高健 wrote: The background writer and ordinary backends might write data (for their own reasons) that the checkpointer would have otherwise needed to write anyway. And does the ordinary backends refer to the ones created when a client make a connection to PG? Yes. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hi, Friends, are there any ETL tools (free or commercial) available for PostgreSQL?
On Mon, Oct 7, 2013 at 8:32 PM, sunpeng blueva...@gmail.com wrote: Hi, Friends, are there any ETL tools (free or commercial) available for PostgreSQL? *ETL Tools for PostgreSQL::* Definition: An ETL process data to load into the database from a flat file A. Extract B. Transform C. Load 1. PGLoader - Load .csv file 2. Benetl 1.8 - Load .txt or .csv or .xls file 3. Talend Open Studio = http://www.talend.com/index.php 4. Pentaho Kettle = http://kettle.pentaho.com/ 5. Visual Importer = Data can be imported from flat files,Excel.MS Access,Oracle.Mysql.Interbase,PostgreSQL,oleDB,ODBC and DBF files. 6. CloverETL = http://en.wikipedia.org/wiki/CloverETL Both [3 4] of these ETL solutions have a lot of the same types of features: - Both are Java based and run on Linux and Windows (Talend is Eclipse-based) - Visual designers for creating the transformations - Connectivity for a myriad of databases, including all the big DBs, text formats, etc. - Supports distributing jobs across multiple servers if you are doing serious lifting - Excellent error handling and error notification systems - Active and helpful forums (Kettle is older and seems to have a larger community however) - Free and open source - They are complex enough to handle a lot of tasks, but not so much as to kill you - There are versions of both (GeoKettle and Spatial Data Integrator) that are tailored for GIS Thanks Regards Raghu Ram
Re: [GENERAL] Many, many materialised views - Performance?
On Oct 9, 2013, at 4:08, Kevin Grittner kgri...@ymail.com wrote: Toby Corkindale toby.corkind...@strategicdata.com.au wrote: In this instance, we have a lot of queries that build certain aggregate results, which are very slow. The queries were initially all implemented as views, but then we started doing a type of materialising of our own, turning them into tables with CREATE TABLE AS SELECT This does make the results very fast to access now, but the side effect is a vast number of (very small) tables. If you have multiple tables with identical layout but different subsets of the data, you will probably get better performance by putting them into a single table with indexes which allow you to quickly search the smaller sets within the table. I was thinking just that while reading Toby's message. For example, you could put the results of several related aggregations into a single materialized view, if they share the same key columns (year, month, factory or something similar). I'm not sure the new built-in materialized views can be updated like that though, unless you manage to combine those aggregations into a single monster-query, but that will probably not perform well... What we tend to do at work (no PostgreSQL, unfortunately) is to use external tools to combine those aggregated results and store that back into the database (which we often need to do anyway, as we deal with several databases on several servers). Additionally, if you have that many tables, it sounds like you partitioned your data. With aggregated results, the need for partitioning is much less (or perhaps it isn't even needed at all). And perhaps you don't even need the data from all partitions; say if you have monthly partitions of data, do you really need aggregated results from 5 years ago? That said, users excel in finding data to request that you thought they wouldn't need. Which brings me to another question: Do your users really need the data from all those views or do they only think they need that? Frequently, users create elaborate Excel sheets and then request tons of data to fill them, while what they're really interested in is the _result_ of that Excel sheet. If you can provide them with that, they're happy and you can rest assured that they're at least using correct results. Plus, it removes some of _this_ burden from your database. I've seen users who're busy creating sheets like that for 2 weeks, twice a year, to create data that I can prepare for them in a couple of days into a report that takes 2 minutes to load (which is long, but not compared to their 2 weeks). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect index being used
Jesse Long wrote: I have the following query, run immediately after executing VACUUM in the database. There is only one connection to the database. You should run ANALYZE, not VACUUM. The query runs for much longer than I expect it to run for, and I think this is due to it using the incorrect subplan. As you can see, subplans 1 and 3 make use of and index, but these subplans are not used. Subplans and 4 are seqscan, and they are used. How can I get PostgreSQL to use subplan 1 and 3? They are only possible if an Index Only Scan is possible, which can only be used if the respective table entries are visible for all transactions. testdb= explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND r0.ARCHIVE_DATE = '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1 WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS ( SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10; [...] Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) Rows Removed by Filter: 710851 SubPlan 1 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text)) Heap Fetches: 0 SubPlan 2 - Seq Scan on archive_document_index r1_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=44418.383..44558.293 rows=4 loops=1) Filter: ((value)::text = 'BSH70002152'::text) Rows Removed by Filter: 95009919 SubPlan 3 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text)) Heap Fetches: 0 SubPlan 4 - Seq Scan on archive_document_index r2_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=41659.464..41663.342 rows=1 loops=1) Filter: ((value)::text = 'TC212592'::text) Rows Removed by Filter: 95009922 The estimates are quite off. Does ANALYZE archive_document, possibly after increasing default_statistics_target, make a difference? Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect index being used
On 09/10/2013 12:10, Albe Laurenz wrote: Jesse Long wrote: I have the following query, run immediately after executing VACUUM in the database. There is only one connection to the database. You should run ANALYZE, not VACUUM. The query runs for much longer than I expect it to run for, and I think this is due to it using the incorrect subplan. As you can see, subplans 1 and 3 make use of and index, but these subplans are not used. Subplans and 4 are seqscan, and they are used. How can I get PostgreSQL to use subplan 1 and 3? They are only possible if an Index Only Scan is possible, which can only be used if the respective table entries are visible for all transactions. testdb= explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND r0.ARCHIVE_DATE = '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1 WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS ( SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10; [...] Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) Rows Removed by Filter: 710851 SubPlan 1 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text)) Heap Fetches: 0 SubPlan 2 - Seq Scan on archive_document_index r1_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=44418.383..44558.293 rows=4 loops=1) Filter: ((value)::text = 'BSH70002152'::text) Rows Removed by Filter: 95009919 SubPlan 3 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text)) Heap Fetches: 0 SubPlan 4 - Seq Scan on archive_document_index r2_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=41659.464..41663.342 rows=1 loops=1) Filter: ((value)::text = 'TC212592'::text) Rows Removed by Filter: 95009922 The estimates are quite off. Does ANALYZE archive_document, possibly after increasing default_statistics_target, make a difference? Yours, Laurenz Albe Hi Laurenz, Thank you for the feedback. There is no problem with row visibility, there is only one connection to the database - the connection I am using to do these selects. Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both tables concerned, but not much changed: QUERY PLAN Limit (cost=0.56..151.85 rows=10 width=122) (actual time=40841.984..85668.213 rows=2 loops=1) - Index Scan Backward using idx_archive_document_x_node_id_archive_date on archive_document r0 (cost=0.56..7627640.20 rows=504186 width=122) (actual time=40841.98 Index Cond: ((node_id = 29) AND (archive_date = '2013-07-08 18:28:00'::timestamp without time zone)) Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) Rows Removed by Filter: 710851 SubPlan 1 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text)) Heap Fetches: 0 SubPlan 2 - Seq Scan on archive_document_index r1_1 (cost=0.00..1958101.80 rows=1568 width=16) (actual time=36633.365..40841.909 rows=4 loops=1) Filter: ((value)::text = 'BSH70002152'::text) Rows Removed by Filter: 95009919 SubPlan 3 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text)) Heap Fetches: 0 SubPlan 4 - Seq Scan on archive_document_index r2_1 (cost=0.00..1958101.80 rows=1568 width=16) (actual time=40241.599..44462.485 rows=1 loops=1)
Re: [GENERAL] Incorrect index being used
On 09/10/2013 12:57, Jesse Long wrote: On 09/10/2013 12:10, Albe Laurenz wrote: Jesse Long wrote: I have the following query, run immediately after executing VACUUM in the database. There is only one connection to the database. You should run ANALYZE, not VACUUM. The query runs for much longer than I expect it to run for, and I think this is due to it using the incorrect subplan. As you can see, subplans 1 and 3 make use of and index, but these subplans are not used. Subplans and 4 are seqscan, and they are used. How can I get PostgreSQL to use subplan 1 and 3? They are only possible if an Index Only Scan is possible, which can only be used if the respective table entries are visible for all transactions. testdb= explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND r0.ARCHIVE_DATE = '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1 WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS ( SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10; [...] Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) Rows Removed by Filter: 710851 SubPlan 1 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text)) Heap Fetches: 0 SubPlan 2 - Seq Scan on archive_document_index r1_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=44418.383..44558.293 rows=4 loops=1) Filter: ((value)::text = 'BSH70002152'::text) Rows Removed by Filter: 95009919 SubPlan 3 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text)) Heap Fetches: 0 SubPlan 4 - Seq Scan on archive_document_index r2_1 (cost=0.00..1958104.00 rows=1520 width=16) (actual time=41659.464..41663.342 rows=1 loops=1) Filter: ((value)::text = 'TC212592'::text) Rows Removed by Filter: 95009922 The estimates are quite off. Does ANALYZE archive_document, possibly after increasing default_statistics_target, make a difference? Yours, Laurenz Albe Hi Laurenz, Thank you for the feedback. There is no problem with row visibility, there is only one connection to the database - the connection I am using to do these selects. Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both tables concerned, but not much changed: QUERY PLAN Limit (cost=0.56..151.85 rows=10 width=122) (actual time=40841.984..85668.213 rows=2 loops=1) - Index Scan Backward using idx_archive_document_x_node_id_archive_date on archive_document r0 (cost=0.56..7627640.20 rows=504186 width=122) (actual time=40841.98 Index Cond: ((node_id = 29) AND (archive_date = '2013-07-08 18:28:00'::timestamp without time zone)) Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4)) Rows Removed by Filter: 710851 SubPlan 1 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text)) Heap Fetches: 0 SubPlan 2 - Seq Scan on archive_document_index r1_1 (cost=0.00..1958101.80 rows=1568 width=16) (actual time=36633.365..40841.909 rows=4 loops=1) Filter: ((value)::text = 'BSH70002152'::text) Rows Removed by Filter: 95009919 SubPlan 3 - Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2 (cost=0.57..4.59 rows=1 width=0) (never executed) Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text)) Heap Fetches: 0 SubPlan 4 - Seq Scan on archive_document_index r2_1 (cost=0.00..1958101.80 rows=1568 width=16) (actual time=40241.599..44462.485 rows=1 loops=1) Filter: ((value)::text =
Re: [GENERAL] Many, many materialised views - Performance?
On Tue, 8 Oct 2013 19:08:45 -0700 (PDT) Kevin Grittner kgri...@ymail.com wrote: In this instance, we have a lot of queries that build certain aggregate results, which are very slow. The queries were initially all implemented as views, but then we started doing a type of materialising of our own, turning them into tables with CREATE TABLE AS SELECT This does make the results very fast to access now, but the side effect is a vast number of (very small) tables. I missed the start of this thread, so apologies if my suggestion is off-base. When there are lots of tables, I've seen performance improvements from distributing the tables through schemas. It seems to improve name resolution performance. -- Bill Moran wmo...@potentialtech.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_similarity
Now the Steps: $ USE_PGXS=1 make $ USE_PGXS=1 make install are working. But I still don't have the directory 'SHAREDIR/contrib/pg_similarity.sql' Janek Sendrowski -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Hi, Friends, are there any ETL tools (free or commercial) available for PostgreSQL?
On 09/10/2013, at 8:39 PM, raghu ram raghuchenn...@gmail.com wrote: On Mon, Oct 7, 2013 at 8:32 PM, sunpeng blueva...@gmail.com wrote: Hi, Friends, are there any ETL tools (free or commercial) available for PostgreSQL? ETL Tools for PostgreSQL:: Definition: An ETL process data to load into the database from a flat file A. Extract B. Transform C. Load 1. PGLoader - Load .csv file 2. Benetl 1.8 - Load .txt or .csv or .xls file 3. Talend Open Studio = http://www.talend.com/index.php 4. Pentaho Kettle = http://kettle.pentaho.com/ 5. Visual Importer = Data can be imported from flat files,Excel.MS Access,Oracle.Mysql.Interbase,PostgreSQL,oleDB,ODBC and DBF files. 6. CloverETL = http://en.wikipedia.org/wiki/CloverETL Both [3 4] of these ETL solutions have a lot of the same types of features: Both are Java based and run on Linux and Windows (Talend is Eclipse-based) Visual designers for creating the transformations Connectivity for a myriad of databases, including all the big DBs, text formats, etc. Supports distributing jobs across multiple servers if you are doing serious lifting Excellent error handling and error notification systems Active and helpful forums (Kettle is older and seems to have a larger community however) Free and open source They are complex enough to handle a lot of tasks, but not so much as to kill you There are versions of both (GeoKettle and Spatial Data Integrator) that are tailored for GIS Not exactly an ETL tool, but I've recently felt a sense of liberation since switching to make/python/psql/wget... for my ETL needs. Make is probably the cornerstone as it handles dependencies (think foreign key ordering with schema introspection) and parallel loading in a very intuitive way. Cheers, Tony
Re: [GENERAL] Hi, Friends, are there any ETL tools (free or commercial) available for PostgreSQL?
On 09/10/2013, at 11:03 PM, Tony Theodore tony.theod...@gmail.com wrote: On 09/10/2013, at 8:39 PM, raghu ram raghuchenn...@gmail.com wrote: ETL Tools for PostgreSQL:: Definition: An ETL process data to load into the database from a flat file A. Extract B. Transform C. Load Not exactly an ETL tool, but I've recently felt a sense of liberation since switching to make/python/psql/wget... for my ETL needs. Make is probably the cornerstone as it handles dependencies (think foreign key ordering with schema introspection) and parallel loading in a very intuitive way. Forgot to mention that I do something more like ELT. Postgres itself is the best Transform tool. Cheers, Tony
[GENERAL] no syntax error on limit1
Hi, Postgres 9.1.9 gives me no syntax error on this, but all the records: with a as (values (1),(2),(3)) select * from a limit1 Cheers, WBL -- Quality comes from focus and clarity of purpose -- Mark Shuttleworth
Re: [GENERAL] no syntax error on limit1
On 9 October 2013 14:24, Willy-Bas Loos willy...@gmail.com wrote: Hi, Postgres 9.1.9 gives me no syntax error on this, but all the records: with a as (values (1),(2),(3)) select * from a limit1 Hi, that's quite OK. The limit1 is just an alias for the table a Szymon
Re: [GENERAL] no syntax error on limit1
duh! thx. On Wed, Oct 9, 2013 at 2:30 PM, Szymon Guz mabew...@gmail.com wrote: On 9 October 2013 14:24, Willy-Bas Loos willy...@gmail.com wrote: Hi, Postgres 9.1.9 gives me no syntax error on this, but all the records: with a as (values (1),(2),(3)) select * from a limit1 Hi, that's quite OK. The limit1 is just an alias for the table a Szymon -- Quality comes from focus and clarity of purpose -- Mark Shuttleworth
Re: [GENERAL] no syntax error on limit1
here the limit1 is a table alias, = as limit1 jov 在 2013-10-9 下午8:27,Willy-Bas Loos willy...@gmail.com写道: Hi, Postgres 9.1.9 gives me no syntax error on this, but all the records: with a as (values (1),(2),(3)) select * from a limit1 Cheers, WBL -- Quality comes from focus and clarity of purpose -- Mark Shuttleworth
Re: [GENERAL] Incorrect index being used
Jesse Long wrote: There is no problem with row visibility, there is only one connection to the database - the connection I am using to do these selects. No idea why the plans cannot be used. It might be helpful to see the table and index definitions. Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both tables concerned, but not much changed: Did you try increasing default_statistics_target before ANALYZE? Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] pg_similarity
Thanks a lot! Now it's working :) Janek Sendrowksi -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] streaming replication timeout error
On 10/08/2013 07:58 PM, 高健 wrote: Hello: My customer encountered some connection timeout, while using one primary-one standby streaming replication. The original log is japanese, because there are no error-code like oracle's ora-xxx, I tried to translate the japanese information into English, But that might be not correct English for PG. The most important part is: 2013-09-22 09:52:47 JST[28297][51d1fbcb.6e89-2][0][XX000]FATAL: Could not receive data from WAL stream: could not receive data from server: connection timeout scp: /opt/PostgresPlus/9.2AS/data/arch/00AC01F1004A: No such file or directory I was asked about: In what occasion will the above fatal error occur? I looked into the postgresql.conf file for the primary and standby server. And made some experiences. I found: Senario I: If the wal file wanted is removed manually: Both in primary and standby, log will be like this: FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00010011 has already been removed But I haven't found a good explanation fitting the logs' FATAL error. Can anybody give me some info? Would seem to me the interesting part is: scp: /opt/PostgresPlus/9.2AS/data/arch/00AC01F1004A: No such file or directory Are using scp to move WAL files to an archive directory? If so, it seems scp is having issues, either network interruption or the file is disappearing under it. Thanks in advance jian gao -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] I need more specific instructions for switching to digest mode for this list
Trying to switch to the digest didn't work. How do I find more specific details about switching? On Wed, Oct 9, 2013 at 12:27 AM, Bob Futrelle bob.futre...@gmail.comwrote: set pgsql-general digest
Re: [GENERAL] I need more specific instructions for switching to digest mode for this list
On 09/10/2013 16:47, Bob Futrelle wrote: Trying to switch to the digest didn't work. How do I find more specific details about switching? On Wed, Oct 9, 2013 at 12:27 AM, Bob Futrelle bob.futre...@gmail.com mailto:bob.futre...@gmail.com wrote: set pgsql-general digest You sent your command to the list, rather than to the list server. I can't remember the the correct address off the top of my head, but it might be something like pgsql-general-requ...@postgresql.org. Ray. -- Raymond O'Donnell :: Galway :: Ireland r...@iol.ie -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] I need more specific instructions for switching to digest mode for this list
-Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- ow...@postgresql.org] On Behalf Of Raymond O'Donnell Sent: Wednesday, October 09, 2013 11:54 AM To: Bob Futrelle Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] I need more specific instructions for switching to digest mode for this list On 09/10/2013 16:47, Bob Futrelle wrote: Trying to switch to the digest didn't work. How do I find more specific details about switching? On Wed, Oct 9, 2013 at 12:27 AM, Bob Futrelle bob.futre...@gmail.com mailto:bob.futre...@gmail.com wrote: set pgsql-general digest -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general It says it all right at the bottom: To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect index being used
Jesse Long j...@unknown.za.net writes: The query runs for much longer than I expect it to run for, and I think this is due to it using the incorrect subplan. As you can see, subplans 1 and 3 make use of and index, but these subplans are not used. Subplans and 4 are seqscan, and they are used. How can I get PostgreSQL to use subplan 1 and 3? You can't, and you would not like the results if you did. The construct that's being described (perhaps not very intelligibly) by this EXPLAIN output is an alternative pair of subplans. Actually there are two such alternative pairs in this example. The indexscan variants are subplans that would be fast if executed only once or twice. The seqscan variants, if used, are used to load a hashtable that is then probed for each row of the outer plan. If there are a lot of rows to be considered in the outer plan, then it's better to pay the price of loading the hashtable, because each hashtable probe will be a lot cheaper than doing a fresh indexscan with the comparison value from the current outer row. In this example, we can see that the outer scan that the subplans are attached to eliminated 710851 rows by means of the subplan filters, meaning that the subplans were probed 710851+2 times. If each of those probes had been done with a separate indexscan, you'd likely still be waiting for the result. Using the seqscan+hashtable was definitely the right choice here. BTW, the reason it looks like this rather than just hard-wiring the seqscan choice is a planner implementation artifact --- at the time that the subplan plans are created, we don't know how many rows are expected to pass through the outer plan level. So we plan it both ways and leave the choice to be made during executor startup. What I'd suggest is that you see if you can't get rid of the EXISTS() OR EXISTS() construction in favor of a single EXISTS clause --- I'm too lazy to work out the details but it looks like you could do the OR in the WHERE clause of a single EXISTS sub-select. That would allow the planner to convert the EXISTS into a semi-join, which might work better than what you've got. As is, you're dealing with fairly generic sub-select logic that isn't going to be terribly well optimized. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] String reverse funtion?
ginkgo36 wrote Hello everyone I have to reverse a string like EA;BX;CA to CA;BX;EA. or EA,BX,CA to CA,BX,EA Is there any function to do this? Thanks all! No. You will have to write your own. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/String-reverse-funtion-tp5773871p5773887.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] String reverse funtion?
On 10/9/2013 11:52 AM, David Johnston wrote: ginkgo36 wrote Hello everyone I have to reverse a string like EA;BX;CA to CA;BX;EA. or EA,BX,CA to CA,BX,EA Is there any function to do this? Thanks all! No. You will have to write your own. David J. Based upon the example, it's probably very easy to use a split/explode in your language of choice (VB.NET, perl, python, etc). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] String reverse funtion?
2013/10/9 John Meyer johnme...@pueblocomputing.com On 10/9/2013 11:52 AM, David Johnston wrote: ginkgo36 wrote Hello everyone I have to reverse a string like EA;BX;CA to CA;BX;EA. or EA,BX,CA to CA,BX,EA Is there any function to do this? Thanks all! No. You will have to write your own. David J. Based upon the example, it's probably very easy to use a split/explode in your language of choice (VB.NET, perl, python, etc). or SQL select string_agg(u, ';' order by r desc) from (select row_number() over () r, u from unnest(string_to_array('EA;BX;CA',';')) u) x; string_agg CA;BX;EA (1 row) Regards Pavel -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-generalhttp://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] String reverse funtion?
That works too. Point being ginkgo36 has his solution. On 10/9/2013 12:07 PM, Pavel Stehule wrote: 2013/10/9 John Meyer johnme...@pueblocomputing.com mailto:johnme...@pueblocomputing.com On 10/9/2013 11:52 AM, David Johnston wrote: ginkgo36 wrote Hello everyone I have to reverse a string like EA;BX;CA to CA;BX;EA. or EA,BX,CA to CA,BX,EA Is there any function to do this? Thanks all! No. You will have to write your own. David J. Based upon the example, it's probably very easy to use a split/explode in your language of choice (VB.NET http://VB.NET, perl, python, etc). or SQL select string_agg(u, ';' order by r desc) from (select row_number() over () r, u from unnest(string_to_array('EA;BX;CA',';')) u) x; string_agg CA;BX;EA (1 row) Regards Pavel -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org mailto:pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Urgent Help Required
On 10/8/13 5:55 AM, shailesh singh wrote: HINT: To avoid a database shutdown, execute a full-database VACUUM in debug. ERROR: could not access status of transaction 449971277 DETAIL: could not open file pg_clog/01AD: No such file or directory Unless I'm mistaken, that missing CLOG file is a bad sign...? -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] declare constraint as valid
Hi, assuming a constraint is added to a table as NOT VALID. Now I know it IS valid. Can I simply declare it as valid by update pg_constraint set convalidated='t' where conrelid=(select c.oid from pg_class c join pg_namespace n on (n.oid=c.relnamespace) where c.relname='tablename' and n.nspname='schemaname') and conname='constraintname'; instead of alter table tablename validate constraint ... Or does the latter have other side effects? I am asking because I want to avoid the ACCESS EXCLUSIVE lock required by the ALTER TABLE. I am sure there are no rows violating the constraint. Thanks, Torsten -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] streaming replication timeout error
Hello: Thanks for replying. The recovery.conf file on standby(DB2) is like that: standby_mode = 'on' primary_conninfo = 'host=DB1 port=5432 application_name=testpg user=postgres connect_timeout=10 keepalives_idle=5 keepalives_interval=1' recovery_target_timeline = 'latest' restore_command = 'scp -o ConnectTimeout 5 -i /opt/PostgresPlus/9.2AS/.ssh/id_edb DB1:/opt/PostgresPlus/9.2AS/data/arch/%f %p' I am not familiar with the scp command, I think that here scp is used to copy archive wal log files from primary to standby... Maybe the ConnectionTimeout is too small, And sometimes when network is not very well, the restore_command will fail and return FATAL error? In fact I am a little confused about restore_command, we are using streaming replication, but why restore_command is still needed to copy archive wal log, isn't it the old warm standby (file shipping)? Best Regards jian gao 2013/10/9 Adrian Klaver adrian.kla...@gmail.com On 10/08/2013 07:58 PM, 高健 wrote: Hello: My customer encountered some connection timeout, while using one primary-one standby streaming replication. The original log is japanese, because there are no error-code like oracle's ora-xxx, I tried to translate the japanese information into English, But that might be not correct English for PG. The most important part is: 2013-09-22 09:52:47 JST[28297][51d1fbcb.6e89-2][0]**[XX000]FATAL: Could not receive data from WAL stream: could not receive data from server: connection timeout scp: /opt/PostgresPlus/9.2AS/data/**arch/00AC01F1004A: No such file or directory I was asked about: In what occasion will the above fatal error occur? I looked into the postgresql.conf file for the primary and standby server. And made some experiences. I found: Senario I: If the wal file wanted is removed manually: Both in primary and standby, log will be like this: FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 00010011 has already been removed But I haven't found a good explanation fitting the logs' FATAL error. Can anybody give me some info? Would seem to me the interesting part is: scp: /opt/PostgresPlus/9.2AS/data/**arch/00AC01F1004A: No such file or directory Are using scp to move WAL files to an archive directory? If so, it seems scp is having issues, either network interruption or the file is disappearing under it. Thanks in advance jian gao -- Adrian Klaver adrian.kla...@gmail.com
Re: [GENERAL] ERROR: invalid value ???? for YYYY
But from a user's perspective, why would it ever make sense that by adding an additional where clause, it actually brings in more data into the picture? If I have query returning 100 rows. Adding an additional where clause should only cut down the number of rows, not increase it. And the extra data that's showing up is being added to the resultset cuz without the additional where clause, the result set did not contain any of those rows like pg_statistics/etc. Brian On Tue, Oct 8, 2013 at 4:10 PM, Steve Crawford scrawf...@pinpointresearch.com wrote: On 10/08/2013 01:50 PM, Brian Wong wrote: I'm posting this question to pgsql-generalhttp://www.postgresql.org/list/pgsql-general/. Hopefully someone can share some insights with me. I have a bunch of tables in the database and in a separate schema. The tables' names are in this format: ???_???_???_MMDD where the last 8 characters is a date. *When I query either the information_schema.tables or pg_tables extracting the last 8 characters out and converting it to a date, it works: * select table_name, to_date(right(table_name, 8), 'MMDD') blah from information_schema.tables where table_schema = '' and table_catalog = ''; *But as soon as I reference it in the where clause, it gives a weird error:* select table_name, to_date(right(table_name, 8), 'MMDD') blah from information_schema.tables where table_schema = '' and table_catalog = '' *and to_date(right(table_name, 8), 'MMDD') is not null;* *ERROR: invalid value tati for * DETAIL: Value must be an integer. It seems like some strange values were passed into the to_date function, but I'm seeing that the rightmost 8 characters of all the table names are dates. So only date strings are passed to the to_date function. Absolutely nothing containing the string tati is passed to the to_date function. What is going on? Is that a bug? As Rowan said, you can't necessarily rely on the order of testing in the where clause. In part, this is because one of the first things the planner does is to take the whole shebang and rewrite it into a consolidated statement that it can then refine and optimize. The tests that take place in the view can ultimately happen before, after, or intermixed with the tests in your where-clause as long as they are logically equivalent. In your example, you are querying information_schema.tables which is not a table, it is a view that references, among other things, a subset of the pg_catalog.pg_class table. When the planner gets through with its first steps you won't be calling information_schema.tables, you will be calling pg_catalog.pg_class and doing some where-clause tests that logically combine your where-clause with those in the view. Why tati? When I query pg_class directly, the first row has a relname of pg_statistic - it's not in the schema/catalog you seek but the executor hasn't checked for that, yet. The right eight characters of that relname are are tatistic thus the characters in the position are tati so based on the plan and testing order this just happens to be the first thing upon which the execution chokes. Cheers, Steve
Re: [GENERAL] ERROR: invalid value ???? for YYYY
Brian Wong-2 wrote But from a user's perspective, why would it ever make sense that by adding an additional where clause, it actually brings in more data into the picture? If I have query returning 100 rows. Adding an additional where clause should only cut down the number of rows, not increase it. And the extra data that's showing up is being added to the resultset cuz without the additional where clause, the result set did not contain any of those rows like pg_statistics/etc. No it does not. Your general case is flawed in that adding an OR condition will indeed cause more rows to be added. In this case you are adding an AND clause but since it is at the same level as the existing conditions all possible records must evaluate against this new clause even if one of the other clauses returns false. There is no short-circuiting. This may be confusing but that is part of the education process. If that expression (the one in the where clause) did not cause an error there would be at most the same number of records output as the original query. And the same number of rows are being processed at the WHERE clause in both cases. Since one of the conditions only makes sense on a sub-set of valid rows there must be two levels of WHERE clauses in the query - or use CASE regexp THEN test ELSE false END so the substring test only is performed against valid table names. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/ERROR-invalid-value-for--tp5773787p5773944.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] String reverse funtion?
Thanks alot everyone, Actually, I just start learning about Programming language. So, each your solution is a good suggestion for me to learning. Thanks so much. Please help me one User-defined Functions to solves this when my data like this: EA;BX;CA CA;EA BX;EA And when run UDFs, output is: CA;BX;EA EA;CA EA;BX One again, thanks so much Pavel Stehule wrote Based upon the example, it's probably very easy to use a split/explode in your language of choice (VB.NET, perl, python, etc). or SQL select string_agg(u, ';' order by r desc) from (select row_number() over () r, u from unnest(string_to_array('EA;BX;CA',';')) u) x; string_agg CA;BX;EA (1 row) Regards Pavel -- Sent via pgsql-general mailing list ( pgsql-general@ ) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-generallt;http://www.postgresql.org/mailpref/pgsql-generalgt; -- View this message in context: http://postgresql.1045698.n5.nabble.com/String-reverse-funtion-tp5773871p5773950.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Urgent Help Required
On 10/8/2013 8:35 AM, Chris Travers wrote: First, while vacuum is usually preferred to vacuum full, in this case, I usually find that vacuum full clears up enough cruft to be worth it (not always, but especially if you are also having performance issues). IIRC, vacuum full was pretty broken in 8.1, which the output the original postered showed indicated they were running. -- john r pierce 37N 122W somewhere on the middle of the left coast -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Forms for entering data into postgresql
I have developed an application using MS SQL. I have used MS Access for creating forms to enter data into the database. I am thinking of changing over to postgresql and would also like to use any other available open source tool for creating forms. Are there any free applications available for creating forms similar to the ones I have made in MS Access?. Any alternative suggestions will be appreciated.
[GENERAL] Re: [GENERAL] Forms for entering data into postgresql
On Wed, Oct 9, 2013 at 7:05 PM, Sudhir P.B. pb_sud...@hotmail.com wrote: I have developed an application using MS SQL. I have used MS Access for creating forms to enter data into the database. I am thinking of changing over to postgresql and would also like to use any other available open source tool for creating forms. Are there any free applications available for creating forms similar to the ones I have made in MS Access?. Any alternative suggestions will be appreciated. Here is a list of tools that has been more or less maintained over the past several years: http://wiki.postgresql.org/wiki/Community_Guide_to_PostgreSQL_GUI_Tools I have no idea how many of them are still actively developed or used though.
Re: [GENERAL] streaming replication timeout error
On 10/09/2013 05:51 PM, 高健 wrote: Hello: Thanks for replying. The recovery.conf file on standby(DB2) is like that: standby_mode = 'on' primary_conninfo = 'host=DB1 port=5432 application_name=testpg user=postgres connect_timeout=10 keepalives_idle=5 keepalives_interval=1' recovery_target_timeline = 'latest' restore_command = 'scp -o ConnectTimeout 5 -i /opt/PostgresPlus/9.2AS/.ssh/id_edb DB1:/opt/PostgresPlus/9.2AS/data/arch/%f %p' I am not familiar with the scp command, I think that here scp is used to copy archive wal log files from primary to standby... Maybe the ConnectionTimeout is too small, And sometimes when network is not very well, the restore_command will fail and return FATAL error? In fact I am a little confused about restore_command, we are using streaming replication, but why restore_command is still needed to copy archive wal log, isn't it the old warm standby (file shipping)? Best explanation is in the docs: http://www.postgresql.org/docs/9.3/static/warm-standby.html At startup, the standby begins by restoring all WAL available in the archive location, calling restore_command. Once it reaches the end of WAL available there and restore_command fails, it tries to restore any WAL available in the pg_xlog directory. If that fails, and streaming replication has been configured, the standby tries to connect to the primary server and start streaming WAL from the last valid record found in archive or pg_xlog. If that fails or streaming replication is not configured, or if the connection is later disconnected, the standby goes back to step 1 and tries to restore the file from the archive again. This loop of retries from the archive, pg_xlog, and via streaming replication goes on until the server is stopped or failover is triggered by a trigger file. Basically by having a restore_command and primary_conninfo you are telling the standby to do both, following the sequence described above. FYI ConnectTimeout is a SSH option passed to scp. man ssh_config will get you more information. Would seem both your streaming and archiving are using the same network, is that correct? If so you have a single point of failure, the network. Best Regards jian gao -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: [GENERAL] Forms for entering data into postgresql
On 10/09/2013 07:05 PM, Sudhir P.B. wrote: I have developed an application using MS SQL. I have used MS Access for creating forms to enter data into the database. I am thinking of changing over to postgresql and would also like to use any other available open source tool for creating forms. Are there any free applications available for creating forms similar to the ones I have made in MS Access?. Any alternative suggestions will be appreciated. There are tools available, as mentioned in a previous post. There are good tools in the list and I have tried quite a few of them. Just know they will require a bit more effort then Access to get a finished form. I am not the biggest fan of Access, but it does shine in getting fairly involved forms up quickly. FYI, you can use it with Postgres, which may be the way to go until you have completed the change over to Postgres. Welcome to the Postgres community. -- Adrian Klaver adrian.kla...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Many, many materialised views - Performance?
On 09/10/13 21:05, Alban Hertroys wrote: On Oct 9, 2013, at 4:08, Kevin Grittner kgri...@ymail.com wrote: Toby Corkindale toby.corkind...@strategicdata.com.au wrote: In this instance, we have a lot of queries that build certain aggregate results, which are very slow. The queries were initially all implemented as views, but then we started doing a type of materialising of our own, turning them into tables with CREATE TABLE AS SELECT This does make the results very fast to access now, but the side effect is a vast number of (very small) tables. If you have multiple tables with identical layout but different subsets of the data, you will probably get better performance by putting them into a single table with indexes which allow you to quickly search the smaller sets within the table. I was thinking just that while reading Toby's message. For example, you could put the results of several related aggregations into a single materialized view, if they share the same key columns (year, month, factory or something similar). I'm not sure the new built-in materialized views can be updated like that though, unless you manage to combine those aggregations into a single monster-query, but that will probably not perform well... What we tend to do at work (no PostgreSQL, unfortunately) is to use external tools to combine those aggregated results and store that back into the database (which we often need to do anyway, as we deal with several databases on several servers). Thanks for the suggestions, all. As I noted in an earlier email -- we're aware that the schema could be better designed, but making large changes is tricky in production systems. Many of the tables are actually unique, but only in the sense that you have various (common) identifier fields, and then a few (unique) aggregate-results per table. eg: int id_key_1, int id_key_2, .., float FooBarXResult I suspect the correct way to handle this would actually be a table that looked like: int id_key_1, int id_key_2, .., text result_name, float result_value Although that would in turn make other queries more verbose, for example, currently one can do: select * from FooResult join BarResult using (id_key_1, id_key_2) where FooResultX 0.9 and BarResultY 0.1; I guess that turns into something like this: select id_key_1, id_key_2, a.result_value as FooResultX, b.result_value as FooResultY from AllResults a join AllResults b using (id_key_1, id_key_2) where a.result_name = FooResultX and a.result_value 0.9 and b.result_name = BarResultY and b.result_value 0.1; So it's all do-able, but it does look nicer to separate things into their own tables with named columns. Additionally, if you have that many tables, it sounds like you partitioned your data. With aggregated results, the need for partitioning is much less (or perhaps it isn't even needed at all). And perhaps you don't even need the data from all partitions; say if you have monthly partitions of data, do you really need aggregated results from 5 years ago? You're correct, we do have partitioned tables due to the amount of data in the system, but that's for just the non-aggregated data. Those tables perform just fine! It's the hundreds of thousands of views and tables with just a few rows in them that worry me.. :) That said, users excel in finding data to request that you thought they wouldn't need. Which brings me to another question: Do your users really need the data from all those views or do they only think they need that? Ah, indeed, users have not individually requested each of these many thousands of tables and views. They are part of a large application, and the results from all of those are required by it. If I rewrote the application today, I'd be looking at doing things very differently, knowing how it would eventually scale. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Many thousands of partitions
Hello, Thanks for sharing your experiences with the problem. W dniu 2013-10-09 00:47, Gabriel E. Sánchez Martínez pisze: Partioning seems to be a good idea if a single table would be too big to fit in your server's file cache, and also for management, since you can drop partitions of old data and create new ones without having to reindex and lock. Does your data partition nicely by date, for example? If most of the inserts are new data and old data is read-mostly, then partitioning may make sense because you would not have to reindex old partitions. In fact, you could very well not have an index on the hot, write-mostly partition of, say, the current month, until the write activity on that table diminishes, which would make inserts faster. If, on the other hand, your writes are scattered across many partitions, a single large table with an index may be a better solution. Changes are scattered, so single large table already is not a good solution. I like the idea of hot, write-mostly partition, because I might as well use only two partitions and merge changes from small table to the large one once the processing is done. Rows are grouped by some key and when I start processing some group I could move all rows from large table in a batch (INSERT INTO .. SELECT .. WHERE group=x; DELETE FROM WHERE group=x). This way the read only part of the system will continue work without problems and processing should be much faster. Although this will not solve the problem of neverending vacuums on large table, postgres could easily become the bottleneck. I am using 8.3 for this, but I will make an upgrade at some point, however I don't think it will change the design. Thanks -- Regards, Grzegorz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general