Re: FW: [PERFORM] Performance 8.4.0
On Sun, Aug 2, 2009 at 10:04 PM, Chris Dunn wrote: > The database is 8gb currently. Use to be a lot bigger but we removed all > large objects out and developed a file server storage for it, and using > default page costs for 8.4, I did have it changed in 8.1.4 You might want to play with lowering them. The default page costs make page accesses expensive relative to per-tuple operations, which is appropriate if you are I/O-bound but not so much if you are CPU bound, and especially if the whole database is memory resident. I'd try something like random_page_cost = seq_page_cost = 0.1 for starters, or whatever values were working for you in 8.1, but the sweet spot may be higher or lower. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Greenplum MapReduce
Suvankar Roy wrote: Hi all, Has anybody worked on Greenplum MapReduce programming ? It's a commercial product, you need to contact greenplum. -- Postgresql & php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
FW: [PERFORM] Performance 8.4.0
The database is 8gb currently. Use to be a lot bigger but we removed all large objects out and developed a file server storage for it, and using default page costs for 8.4, I did have it changed in 8.1.4 -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Sunday, 2 August 2009 11:26 PM To: Chris Dunn Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance 8.4.0 On Fri, Jul 31, 2009 at 12:22 AM, Chris Dunn wrote: > constraint_exclusion = on This is critical if you need it, but a waste of CPU time if you don't. Other than that your paramaters look good. Are you using the default page cost settings? I see you have 12 GB RAM; how big is your database? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version
parimala escreveu: [Don't repeat your answer. It's a PITA to receive multiple identical copies] > We are using Postgres 8.3.7 in our java application. We are doing > performances tuning and load testing in our setup. we have noticed that > ,some of our queries to the database taking long time to return the > results.Please find our setup details belows. > > We observed that postgres is running in windows is slower than the linux . > That is true and it will be for quite some time. Windows port is very recent if we compare it with the long road Unix support. > Postgresql configuration: > > shared_buffers: 1 GB I don't use Windows but I read some Windows users saying that it isn't appropriate to set the shared_buffers too high. Take a look at the archives. > Effective_cache_size: 2GB > fsync: off (even we tested this parameter is on ,we observed the same > slowness ) > What about the other parameters that are different from default (uncommented parameters)? Also, don't turn off the fsync unless you're pretty sure about the consequences. > We have 300k row's in PolledData Table.In each STATSDATA table ,we have > almost 12 to 13 million rows. Every one minute interval ,we insert data > into to STATSDATA table. In our application ,we use insert and select > query to STATSDATA table at regular interval. Please let us know why the > below query takes more time to return the results. is there any thing we > need to do to tune the postgres database ? > It seems very strange that your queries are not using the indexes. Do you have autovacuum turn on? Do you recently analyze your tables? > Merge Cond: (statsdata8_21_2009.pollid = > ((polleddata.id)::numeric)) Out of curiosity, why does foreign key have different datatype of its primary key? -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version
,some of our queries to the database taking long time to return the results. fsync: off (even we tested this parameter is on ,we observed the same slowness ) If your queries take long time to return results, I suppose you are talking about SELECTs. fsync = off will not make SELECTs faster (only inserts, updates, deletes) but it is not worth it as you risk data loss. synchronous_commit = on has about the same advantages (faster...) as fsync=off, but with no risk of data loss, so it is much better ! We have 300k row's in PolledData Table.In each STATSDATA table ,we have almost 12 to 13 million rows. OK. So you insert 13 million rows per day ? That is about 150 rows per second. Every one minute interval ,we insert data into to STATSDATA table. I assume you are making an INSERT INTO statsdata VALUES (.. 150 values .) and not 150 inserts, yes ? First Query : SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_21_2009 WHERE ( ( PolledData.ID=STATSDATA8_21_2009.POLLID) AND ( ( TTIME >= 1250838027454) AND ( TTIME <=1250838079654) ) ) ) t1; * You could rewrite as : SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID) WHERE TTIME BETWEEN ... AND ... - It is exactly the same query, but much easier to read. * some ANALYZE-ing of your tables would be useful, since the estimates from the planner look suspiciously different from reality - ANALYZE is fast, you can run it often if you INSERT rows all the time * You are joining on POLLID which is a NUMERIC in one table and a BIGINT in the other table. - Is there any reason for this type difference ? - Could you use BIGINT in both tables ? - BIGINT is faster than NUMERIC and uses less space. - Type conversions use CPU cycles too. * Should StatsData.ID have a foreign key REFERENCES PolledData.ID ? - This won't make the query faster, but if you know all rows in StatsData reference rows in PolledData (because of the FK constraint) and you want a count(*) like above, you don't need to JOIN. * TTIME >= 1250838027454 AND TTIME <=1250838079654 - TTIME should be TIMESTAMP (with or without TIMEZONE) or BIGINT but certainly not NUMERIC - An index on StatsData.TTIME would be useful, it would avoid Seq Scan, replacing it with a Bitmap Scan, much faster * work_mem - since you have few connections you could increase work_mem Second Query : Same as first query Third Query SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_21_2009 WHERE ( ( PolledData.ID=STATSDATA8_21_2009.POLLID) AND ( ( TTIME >= 1250838027454) AND ( TTIME <=1250838027454) ) ) union all SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_20_2009 WHERE ( ( PolledData.ID=STATSDATA8_20_2009.POLLID) AND ( ( TTIME >= 1250767134601) AND ( TTIME <= 1250767134601) ) ) )t1 ; Basically this is, again, exactly the same query as above, but two times, and UNION ALL'ed * You could rewrite it like this : SELECT ID, PolledData.AGENT, NAME, INSTANCE, TTIME, VAL FROM ( SELECT ... FROM STATSDATA8_21_2009 WHERE TTIME BETWEEN ... AND ... ) UNION ALL SELECT ... FROM STATSDATA8_20_2009 WHERE TTIME BETWEEN ... AND ... ) ) JOIN STATSDATA8_21_2009 ON ( PolledData.ID = STATSDATA8_21_2009.POLLID) * If TTIME is the current time, and you insert data as it comes, data in StatsData tables is probably already ordered on TTIME. - If it is not the case, once a table is filled and becomes read-only, consider CLUSTER on the index you created on TTIME - It will make range queries on TTIME much faster * Query plan Seq Scan on statsdata8_21_2009 (cost=0.00..70574.88 rows=1 width=32) (actual time=0.047..29066.227 rows=227 loops=1) Seq Scan on statsdata8_20_2009 (cost=0.00..382519.60 rows=1 width=32) (actual time=3136.008..93985.540 rows=1 loops=1) Postgres thinks there is 1 row in those tables... that's probably not the case ! The first one returns 227 rows, so the plan chosen in a catastrophe. I was a bit intrigued by your query, so I made a little test... BEGIN; CREATE TABLE test( x INT, y INT ); INSERT INTO test (SELECT n,n FROM generate_series( 1,100 ) AS n ); CREATE INDEX test_x ON test( x ); CREATE INDEX test_y ON test( y ); COMMIT; ANALYZE test; test=> EXPLAIN ANALYZE SELECT * FROM test a JOIN test b ON (b.x=a.x) WHERE a.x BETWEEN 0 AND 1; QUERY PLAN - Hash Join (cost=480.53..23759.14 rows=10406 width=16) (actual time=15.614..1085.085 rows=1 loops=1) Hash Cond: (b.x = a.x) -> Seq Scan on test b (cost=0.00..14424.76 ro
Re: [PERFORM] Performance Monitoring tool
depending on what you mean with 'monitor'. for up/down monitoring use nagios (http://www.nagios.org)for performance monitoring (and I guess the reason why you ask this on the postgresql performance list), use pgstatspack: ( http://pgfoundry.org/projects/pgstatspack/) frits On Fri, Jul 31, 2009 at 7:49 AM, mukeshp wrote: > > Hi > > Can anyone tell me what are the tools to monitor postgres server. ? I am > running my Postgres server on RHEL 5 machine. > -- > View this message in context: > http://www.nabble.com/Performance-Monitoring-tool-tp24751382p24751382.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Performance 8.4.0
* Robert Haas (robertmh...@gmail.com) wrote: > On Fri, Jul 31, 2009 at 12:22 AM, Chris Dunn wrote: > > constraint_exclusion = on > > This is critical if you need it, but a waste of CPU time if you don't. > Other than that your paramaters look good. Are you using the default > page cost settings? I see you have 12 GB RAM; how big is your > database? With 8.4, you can set 'constraint_exclusion = partition', where it'll handle inheirited tables and UNION ALL queries but not other possible cases. It's set that way by default, and is pretty inexpensive to leave in place (since it only gets tried when it's likely you want it). I'd recommend setting it to partition under 8.4 rather than disabling it entirely. Under older versions, set it to 'off' if you don't need it. Thanks, Stephen signature.asc Description: Digital signature
[PERFORM] Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version
Dear All, We are using Postgres 8.3.7 in our java application. We are doing performances tuning and load testing in our setup. we have noticed that ,some of our queries to the database taking long time to return the results.Please find our setup details belows. We observed that postgres is running in windows is slower than the linux . Machine & Database Details : Windows configuration: 4 GB RAM 4*1.6 GHZ windows 2008 server standard edition Postgresql configuration: shared_buffers: 1 GB Effective_cache_size: 2GB fsync: off (even we tested this parameter is on ,we observed the same slowness ) Database Details : Postgres Database : PostgreSQL 8.3.7.1 Driver Version: PostgreSQL 8.3 JDBC4 with SSL (build 604) We are using 40 database connections. We have few tables which will be having more amount data.While running our application STATSDATA table will be created daily with table name with date. like as STATSDATA8_21_2009 Schema for STATSDATA table create table STATSDATA8_21_2009( POLLID Numeric(19), INSTANCE varchar(100), TTIME Numeric(19), VAL Numeric(13)) ;CREATE INDEX POLLID%_ndx on STATSDATA%(POLLID) Schema for PolledData create table PolledData( "NAME" varchar(50) NOT NULL , "ID" BIGINT NOT NULL , "AGENT" varchar(50) NOT NULL , "COMMUNITY" varchar(100) NOT NULL , "PERIOD" INTEGER NOT NULL, "ACTIVE" varchar(10), "OID" varchar(200) NOT NULL, "LOGDIRECTLY" varchar(10), "LOGFILE" varchar(100), "SSAVE" varchar(10), "THRESHOLD" varchar(10), "ISMULTIPLEPOLLEDDATA" varchar(10), "PREVIOUSSEVERITY" INTEGER, "NUMERICTYPE" INTEGER, "SAVEABSOLUTES" varchar(10), "TIMEAVG" varchar(10), "PORT" INTEGER, "WEBNMS" varchar(100), "GROUPNAME" varchar(100), "LASTCOUNTERVALUE" BIGINT , "LASTTIMEVALUE" BIGINT , "TIMEVAL" BIGINT NOT NULL , "POLICYNAME" varchar(100), "THRESHOLDLIST" varchar(200), "DNSNAME" varchar(100), "SUFFIX" varchar(20), "STATSDATATABLENAME" varchar(100), "POLLERNAME" varchar(200), "FAILURECOUNT" INTEGER, "FAILURETHRESHOLD" INTEGER, "PARENTOBJ" varchar(100), "PROTOCOL" varchar(50), "SAVEPOLLCOUNT" INTEGER, "CURRENTSAVECOUNT" INTEGER, "SAVEONTHRESHOLD" varchar(10), "SNMPVERSION" varchar(10), "USERNAME" varchar(30), "CONTEXTNAME" varchar(30), PRIMARY KEY ("ID","NAME","AGENT","OID"), index PolledData0_ndx ( "NAME"), index PolledData1_ndx ( "AGENT"), index PolledData2_ndx ( "OID"), index PolledData3_ndx ( "ID"), index PolledData4_ndx ( "PARENTOBJ"), ) We have 300k row's in PolledData Table.In each STATSDATA table ,we have almost 12 to 13 million rows. Every one minute interval ,we insert data into to STATSDATA table. In our application ,we use insert and select query to STATSDATA table at regular interval. Please let us know why the below query takes more time to return the results. is there any thing we need to do to tune the postgres database ? Please find explain analyze output. First Query : postgres=# explain analyze SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, N AME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_21_2009 WHERE ( ( PolledDa ta.ID=STATSDATA8_21_2009.POLLID) AND ( ( TTIME >= 1250838027454) AND ( TTIME <= 1250838079654) ) ) ) t1; QUERY PLAN -- Aggregate (cost=773897.12..773897.13 rows=1 width=0) (actual time=17818.410..1 7818.412 rows=1 loops=1) -> Merge Join (cost=717526.23..767505.06 rows=2556821 width=0) (actual time =17560.469..17801.790 rows=13721 loops=1) Merge Cond: (statsdata8_21_2009.pollid = ((polleddata.id)::numeric)) -> Sort (cost=69708.44..69742.49 rows=13619 width=8) (actual time=239 2.659..2416.093 rows=13721 loops=1) Sort Key: statsdata8_21_2009.pollid Sort Method: quicksort Memory: 792kB -> Seq Scan on statsdata8_21_2009 (cost=0.00..68773.27 rows=136 19 width=8) (actual time=0.077..2333.132 rows=13721 loops=1) Filter: ((ttime >= 1250838027454::numeric) AND (ttime <= 12 50838079654::numeric)) -> Materialize (cost=647817.78..688331.92 rows=3241131 width=8) (actu al time=15167.767..15282.232 rows=21582 loops=1) -> Sort (cost=647817.78..655920.61 rows=3241131 width=8) (actua l time=15167.756..15218.645 rows=21574 loops=1) Sort Key: ((polleddata.id)::numeric) Sort Method: external merge Disk: 736kB -> Seq Scan on polleddata (cost=0.00..164380.31 rows=3241 131 width=8) (actual time=1197.278..14985.665 rows=23474 loops=1) Total runtime: 17826.511 ms (14 rows) Second Query : postgres=# explain analyze SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, N AME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_20_2009 WHERE ( ( PolledDa ta.ID=STATSDATA8_20_2009.POLLID) AND ( ( TTIME >= 1250767134601) AND ( TTIME <= 1250767384601) ) ) ) t1;
[PERFORM] Performance Monitoring tool
Hi Can anyone tell me what are the tools to monitor postgres server. ? I am running my Postgres server on RHEL 5 machine. -- View this message in context: http://www.nabble.com/Performance-Monitoring-tool-tp24751382p24751382.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] SQL select query becomes slow when using limit (with no offset)
Hi folks, We have problems with performance of a simple SQL statement. If we add a LIMIT 50, the query is about 6 times slower than without a limit (query returns 2 rows). I have read this discussion: http://archives.postgresql.org/pgsql-performance/2008-09/msg5.php but there seems to be no solution in it. I tried this things: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server but changing settings doesn't have significant effect. The DDL statements (create tables, indices) are attached. The events_events table contains 375K rows, the events_event_types contains 71 rows. The query: select events_events.id FROM events_events left join events_event_types on events_events.eventType_id= events_event_types.id where events_event_types.severity=70 and events_events.cleared='f' order by events_events.dateTime DESC It takes 155ms to run this query (returning 2 rows) After adding LIMIT 10, it takes 950 ms to run. Query plan: without limit: "Sort (cost=20169.62..20409.50 rows=95952 width=16)" " Sort Key: events_events.datetime" " -> Hash Join (cost=2.09..12229.58 rows=95952 width=16)" "Hash Cond: (events_events.eventtype_id = events_event_types.id)" "-> Seq Scan on events_events (cost=0.00..9918.65 rows=359820 width=24)" " Filter: (NOT cleared)" "-> Hash (cost=1.89..1.89 rows=16 width=8)" " -> Seq Scan on events_event_types (cost=0.00..1.89 rows=16 width=8)" "Filter: (severity = 70)" Query plan: with limit: "Limit (cost=0.00..12.50 rows=10 width=16)" " -> Nested Loop (cost=0.00..119932.21 rows=95952 width=16)" "-> Index Scan Backward using events_events_datetime_ind on events_events (cost=0.00..18242.28 rows=359820 width=24)" " Filter: (NOT cleared)" "-> Index Scan using events_event_types_pkey on events_event_types (cost=0.00..0.27 rows=1 width=8)" " Index Cond: (events_event_types.id = events_events.eventtype_id)" " Filter: (events_event_types.severity = 70)" So postgres seems to handle a query with limit different internally. Tried to set default_statistics_target to 10, 100, 200, but no significant differences. This problem appears on both Postgres 8.3 and 8.4. Any suggestions? Thanks in advance! Best regards, Kees van Dieren -- Squins | IT, Honestly Oranjestraat 23 2983 HL Ridderkerk The Netherlands Phone: +31 (0)180 414520 Mobile: +31 (0)6 30413841 www.squins.com Chamber of commerce Rotterdam: 22048547 events_schema.sql Description: Binary data -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Why is PostgreSQL so slow on Windows ( Postgres 8.3.7) version
Dear All, We are using Postgres 8.3.7 in our java application. We are doing performances tuning and load testing in our setup. we have noticed that ,some of our queries to the database taking long time to return the results.Please find our setup details belows. We observed that postgres is running in windows is slower than the linux . Machine & Database Details : Windows configuration: 4 GB RAM 4*1.6 GHZ windows 2008 server standard edition Postgresql configuration: shared_buffers: 1 GB Effective_cache_size: 2GB fsync: off (even we tested this parameter is on ,we observed the same slowness ) Database Details : Postgres Database : PostgreSQL 8.3.7.1 Driver Version : PostgreSQL 8.3 JDBC4 with SSL (build 604) We are using 40 database connections. We have few tables which will be having more amount data.While running our application STATSDATA table will be created daily with table name with date. like as STATSDATA8_21_2009 Schema for STATSDATA table create table STATSDATA8_21_2009( POLLID Numeric(19), INSTANCE varchar(100), TTIME Numeric(19), VAL Numeric(13)) ;CREATE INDEX POLLID%_ndx on STATSDATA%(POLLID) Schema for PolledData create table PolledData( "NAME" varchar(50) NOT NULL , "ID" BIGINT NOT NULL , "AGENT" varchar(50) NOT NULL , "COMMUNITY" varchar(100) NOT NULL , "PERIOD" INTEGER NOT NULL, "ACTIVE" varchar(10), "OID" varchar(200) NOT NULL, "LOGDIRECTLY" varchar(10), "LOGFILE" varchar(100), "SSAVE" varchar(10), "THRESHOLD" varchar(10), "ISMULTIPLEPOLLEDDATA" varchar(10), "PREVIOUSSEVERITY" INTEGER, "NUMERICTYPE" INTEGER, "SAVEABSOLUTES" varchar(10), "TIMEAVG" varchar(10), "PORT" INTEGER, "WEBNMS" varchar(100), "GROUPNAME" varchar(100), "LASTCOUNTERVALUE" BIGINT , "LASTTIMEVALUE" BIGINT , "TIMEVAL" BIGINT NOT NULL , "POLICYNAME" varchar(100), "THRESHOLDLIST" varchar(200), "DNSNAME" varchar(100), "SUFFIX" varchar(20), "STATSDATATABLENAME" varchar(100), "POLLERNAME" varchar(200), "FAILURECOUNT" INTEGER, "FAILURETHRESHOLD" INTEGER, "PARENTOBJ" varchar(100), "PROTOCOL" varchar(50), "SAVEPOLLCOUNT" INTEGER, "CURRENTSAVECOUNT" INTEGER, "SAVEONTHRESHOLD" varchar(10), "SNMPVERSION" varchar(10), "USERNAME" varchar(30), "CONTEXTNAME" varchar(30), PRIMARY KEY ("ID","NAME","AGENT","OID"), index PolledData0_ndx ( "NAME"), index PolledData1_ndx ( "AGENT"), index PolledData2_ndx ( "OID"), index PolledData3_ndx ( "ID"), index PolledData4_ndx ( "PARENTOBJ"), ) We have 300k row's in PolledData Table.In each STATSDATA table ,we have almost 12 to 13 million rows. Every one minute interval ,we insert data into to STATSDATA table. In our application ,we use insert and select query to STATSDATA table at regular interval. Please let us know why the below query takes more time to return the results. is there any thing we need to do to tune the postgres database ? Please find explain analyze output. First Query : postgres=# explain analyze SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, N AME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_21_2009 WHERE ( ( PolledDa ta.ID=STATSDATA8_21_2009.POLLID) AND ( ( TTIME >= 1250838027454) AND ( TTIME <= 1250838079654) ) ) ) t1; QUERY PLAN -- Aggregate (cost=773897.12..773897.13 rows=1 width=0) (actual time=17818.410..1 7818.412 rows=1 loops=1) -> Merge Join (cost=717526.23..767505.06 rows=2556821 width=0) (actual time =17560.469..17801.790 rows=13721 loops=1) Merge Cond: (statsdata8_21_2009.pollid = ((polleddata.id)::numeric)) -> Sort (cost=69708.44..69742.49 rows=13619 width=8) (actual time=239 2.659..2416.093 rows=13721 loops=1) Sort Key: statsdata8_21_2009.pollid Sort Method: quicksort Memory: 792kB -> Seq Scan on statsdata8_21_2009 (cost=0.00..68773.27 rows=136 19 width=8) (actual time=0.077..2333.132 rows=13721 loops=1) Filter: ((ttime >= 1250838027454::numeric) AND (ttime <= 12 50838079654::numeric)) -> Materialize (cost=647817.78..688331.92 rows=3241131 width=8) (actu al time=15167.767..15282.232 rows=21582 loops=1) -> Sort (cost=647817.78..655920.61 rows=3241131 width=8) (actua l time=15167.756..15218.645 rows=21574 loops=1) Sort Key: ((polleddata.id)::numeric) Sort Method: external merge Disk: 736kB -> Seq Scan on polleddata (cost=0.00..164380.31 rows=3241 131 width=8) (actual time=1197.278..14985.665 rows=23474 loops=1) Total runtime: 17826.511 ms (14 rows) Second Query : postgres=# explain analyze SELECT COUNT(*) FROM ( SELECT ID, PolledData.AGENT, N AME, INSTANCE, TTIME, VAL FROM PolledData, STATSDATA8_20_2009 WHERE ( ( PolledDa ta.ID=STATSDATA8_20_2009.POLLID) AND ( ( TTIME >= 1250767134601) AND ( TTIME <= 1250767384601) ) ) ) t1; QUERY PLAN
[PERFORM] Greenplum MapReduce
Hi all, Has anybody worked on Greenplum MapReduce programming ? I am facing a problem while trying to execute the below Greenplum Mapreduce program written in YAML (in blue). The error is thrown in the 7th line as: Error: YAML syntax error - found character that cannot start any token while scanning for the next token, at line 7 (in red) If somebody can explain this and the potential solution %YAML 1.1 --- VERSION: 1.0.0.1 DATABASE: test_db1 USER: gpadmin DEFINE: - INPUT: NAME: doc TABLE: documents - INPUT: NAME: kw TABLE: keywords - MAP: NAME: doc_map LANGUAGE: python FUNCTION:| i = 0 terms = {} for term in data.lower().split(): i = i + 1 if term in terms: terms[term] += ','+str(i) else: terms[term] = str(i) for term in terms: yield([doc_id, term, terms[term]]) OPTIMIZE: STRICT IMMUTABLE PARAMETERS: - doc_id integer - data text RETURNS: - doc_id integer - term text - positions text - MAP: NAME: kw_map LANGUAGE: python FUNCTION: | i = 0 terms = {} for term in keyword.lower().split(): i = i + 1 if term in terms: terms[term] += ','+str(i) else: terms[term] = str(i) yield([keyword_id, i, term, terms[term]]) OPTIMIZE: STRICT IMMUTABLE PARAMETERS: - keyword_id integer - keyword text RETURNS: - keyword_id integer - nterms integer - term text - positions text - TASK: NAME: doc_prep SOURCE: doc MAP: doc_map - TASK: NAME: kw_prep SOURCE: kw MAP: kw_map - INPUT: NAME: term_join QUERY: | SELECT doc.doc_id, kw.keyword_id, kw.term, kw.nterms, doc.positions as doc_positions, kw.positions as kw_positions FROM doc_prep doc INNER JOIN kw_prep kw ON (doc.term = kw.term) - REDUCE: NAME: term_reducer TRANSITION: term_transition FINALIZE: term_finalizer - TRANSITION: NAME: term_transition LANGUAGE: python PARAMETERS: - state text - term text - nterms integer - doc_positions text - kw_positions text FUNCTION: | if state: kw_split = state.split(':') else: kw_split = [] for i in range(0,nterms): kw_split.append('') for kw_p in kw_positions.split(','): kw_split[int(kw_p)-1] = doc_positions outstate = kw_split[0] for s in kw_split[1:]: outstate = outstate + ':' + s return outstate - FINALIZE: NAME: term_finalizer LANGUAGE: python RETURNS: - count integer MODE: MULTI FUNCTION: | if not state: return 0 kw_split = state.split(':') previous = None for i in range(0,len(kw_split)): isplit = kw_split[i].split(',') if any(map(lambda(x): x == '', isplit)): return 0 adjusted = set(map(lambda(x): int(x)-i, isplit)) if (previous):
[PERFORM] select count(idn) is slow (Seq Scan) instead of Bitmap Heap.. why?
Hello, I have a problem with an inner join + count(). my query is: explain analyze select k.idn,k.kerdes_subject,k.kerdes_text,u.vezeteknev,u.keresztnev,u.idn as user_id, kg.kategoria_neve, count(v.idn) FROM kategoriak as kg INNER JOIN kerdesek as k on kg.idn = k.kategoria_id INNER JOIN users as u ON k.user_id = u.idn INNER JOIN valaszok as v ON k.idn = v.kerdes_id where kg.idn=15 group by k.idn, k.kerdes_subject,k.kerdes_text, u.idn,u.vezeteknev,u.keresztnev,kg.kategoria_neve The problem is with the count(v.idn). This column has a relation with: v.kerdes_id = k.idn => k.kategoria_id = kg.idn and the WHERE says: kg.idn = 15. Why does it run through all lines in v? the explain sais: GroupAggregate (cost=103238.59..103602.66 rows=10402 width=1382) (actual time=8531.405..8536.633 rows=73 loops=1) -> Sort (cost=103238.59..103264.59 rows=10402 width=1382) (actual time=8531.339..8533.199 rows=1203 loops=1) Sort Key: k.idn, k.kerdes_subject, k.kerdes_text, u.idn, u.vezeteknev, u.keresztnev, kg.kategoria_neve -> Hash Join (cost=3827.79..89951.54 rows=10402 width=1382) (actual time=1778.590..8523.015 rows=1203 loops=1) Hash Cond: (v.kerdes_id = k.idn) -> Seq Scan on valaszok v (cost=0.00..78215.98 rows=2080998 width=8) (actual time=59.714..5009.171 rows=2080998 loops=1) -> Hash (cost=3823.42..3823.42 rows=350 width=1378) (actual time=12.553..12.553 rows=74 loops=1) -> Nested Loop (cost=14.98..3823.42 rows=350 width=1378) (actual time=0.714..12.253 rows=74 loops=1) -> Nested Loop (cost=14.98..1056.38 rows=350 width=830) (actual time=0.498..5.952 rows=117 loops=1) -> Seq Scan on kategoriak kg (cost=0.00..1.30 rows=1 width=278) (actual time=0.066..0.076 rows=1 loops=1) Filter: (idn = 15) -> Bitmap Heap Scan on kerdesek k (cost=14.98..1051.58 rows=350 width=560) (actual time=0.374..5.430 rows=117 loops=1) Recheck Cond: (15 = kategoria_id) -> Bitmap Index Scan on kategoria_id_id_idx (cost=0.00..14.89 rows=350 width=0) (actual time=0.212..0.212 rows=117 loops=1) Index Cond: (15 = kategoria_id) -> Index Scan using users_pkey on users u (cost=0.00..7.89 rows=1 width=552) (actual time=0.047..0.048 rows=1 loops=117) Index Cond: (k.user_id = u.idn) Total runtime: 8536.936 ms So it run through more than 2 mill lines... but why? It should only count those lines which has the category_id = 15... What am I doing wrong? -- Adam PAPAI -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance 8.4.0
On Fri, Jul 31, 2009 at 12:22 AM, Chris Dunn wrote: > constraint_exclusion = on This is critical if you need it, but a waste of CPU time if you don't. Other than that your paramaters look good. Are you using the default page cost settings? I see you have 12 GB RAM; how big is your database? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance