Re: [GENERAL] understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr

2014-08-06 Thread Marc Mamin
Hello,

 index_testruns_on_custom_spawnid btree ((custom_data - 'SpawnID'::text))
 ..
 WHERE testruns.custom_data-'SpawnID' = 'SpawnID-428842195.338828'
 ...

If all your SpawnID have this prefix, you may consider remove it from your 
index to reduce its size:


= index_testruns_on_custom_spawnid btree ((substring(custom_data - 
'SpawnID'::text) , 9))
you must of course then use exactly this clause within your query:

=  WHERE substring(testruns.custom_data-'SpawnID' , 9) = '428842195.338828'

And as it now looks like a number, maybe go a step further depending on which 
patterns these SpawnID can have.
This given value can for example be stored as 2 int4 or one int8: 
('428842195.338828'::numeric * 100)::int8


On the other hand this will only address a limited part of your performance 
issue.

regards,

Marc Mamin


 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of john gale
 Sent: Dienstag, 5. August 2014 21:17
 To: pgsql-general@postgresql.org general
 Subject: [GENERAL] understanding why two nearly identical queries take two
 different planner routes, one 5s and one 2hr
 
 
 I would be appreciative if somebody could help explain why we have two nearly
 identical queries taking two different planner routes;  one a nested index
 loop that takes about 5s to complete, and the other a hash join  heap scan
 that takes about 2hr.  This is using Postgres 9.3.3 on OS X 10.9.4.
 
 These two queries are different only in the value of an hstore entry and a
 date entry:
 
 SELECT *, testruns.id FROM testruns JOIN test_types ON
 testruns.test_type_id=test_types.id WHERE testruns.custom_data-'SpawnID' =
 'SpawnID-428842195.338828' AND testruns.started_at  '2014-08-03
 10:49:55.338828'; SELECT *, testruns.id FROM testruns JOIN test_types ON
 testruns.test_type_id=test_types.id WHERE testruns.custom_data-'SpawnID' =
 'SpawnID-428870395.258592' AND testruns.started_at  '2014-08-03
 18:39:55.258592';
 
 We have built an index for the hstore entry for SpawnID and most queries use
 it correctly.
 
   Column  |Type |
 Modifiers
 --+-+---
 --+-+---
 --+-+-
  id   | integer | not null default
 nextval('testruns_id_seq'::regclass)
  started_at   | timestamp without time zone | not null
 ...
 Indexes:
 ...
 index_testruns_on_custom_spawnid btree ((custom_data -
 'SpawnID'::text))
 
 However after realizing that these two queries differed in execution time so
 drastically, an explain showed that they are taking two very different paths:
 
 db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
 testruns.test_type_id=test_types.id WHERE testruns.custom_data-'SpawnID' =
 'SpawnID-428842195.338828' AND testruns.started_at  '2014-08-03
 10:49:55.338828';
   QUERY PLAN
 --
 
  Nested Loop  (cost=0.85..2696.12 rows=88 width=1466)
-  Index Scan using index_testruns_on_custom_spawnid on testruns
 (cost=0.57..2378.40 rows=88 width=1399)
  Index Cond: ((custom_data - 'SpawnID'::text) = 'SpawnID-
 428842195.338828'::text)
  Filter: (started_at  '2014-08-03 10:49:55.338828'::timestamp without
 time zone)
-  Index Scan using test_types_pkey on test_types  (cost=0.29..3.60 rows=1
 width=67)
  Index Cond: (id = testruns.test_type_id)
 (6 rows)
 
 Time: 22.363 ms
 db=# explain SELECT *, testruns.id FROM testruns JOIN test_types ON
 testruns.test_type_id=test_types.id WHERE testruns.custom_data-'SpawnID' =
 'SpawnID-428870395.258592' AND testruns.started_at  '2014-08-03
 18:39:55.258592';
 
 QUERY PLAN
 --
 --
 
  Hash Join  (cost=292249.24..348608.93 rows=28273 width=1466)
Hash Cond: (testruns.test_type_id = test_types.id)
-  Bitmap Heap Scan on testruns  (cost=291510.31..347269.21 rows=28273
 width=1399)
  Recheck Cond: (((custom_data - 'SpawnID'::text) = 'SpawnID-
 428870395.258592'::text) AND (started_at  '2014-08-03
 18:39:55.258592'::timestamp without time zone))
  -  BitmapAnd  (cost=291510.31..291510.31 rows=28273 width=0)
-  Bitmap Index Scan on index_testruns_on_custom_spawnid
 (cost=0.00..41383.84 rows=500170 width=0)
  Index Cond: ((custom_data - 'SpawnID'::text) = 'SpawnID-
 428870395.258592'::text)
-  Bitmap Index Scan on index_testruns_on_started_at_2
 (cost=0.00..250112.08 rows=3188736 width=0)
  Index Cond: (started_at  '2014-08-03
 

[GENERAL] Adding 3 hours while inserting data into table

2014-08-06 Thread M Tarkeshwar Rao
Hi Team,

We are facing some inconsistence behaviour of Postgres. We have deployed our 
database on a server where timezone is GMT+3 hours.
We have application which is running on the same server.

When application starts, it is inserting the correct timestamp in the table but 
after running few minutes/hours we have observed that 3 hours is added into the 
timestamp in table. The problem resolved once the application restarted.

Our application is putting correct data (checked by the insert query) and in DB 
it is seen that 3 hours is added. The figure 3 is slightly important as the 
server is deployed GMT+3 hours.

Below are some observations and some command output along with table structure.

JEDEMM02:/# date;
Tue Aug  5 16:41:52 AST 2014
db_1=# show timezone;
  TimeZone
-
Asia/Riyadh
(1 row)

db_1=# select now();
  now
---
2014-08-05 16:43:06.372363+03
(1 row)


db_1=# select current_time;
   timetz

16:43:55.629946+03
(1 row)

Property in Postgres.conf
#timezone = '(defaults to server environment setting)'

Table Structure:
Column |Type | Modifiers
+-+---
msisdn | character varying(100)  |
offerid| character varying(100)  |
expdatetime| timestamp without time zone |
smslang| character varying(20)   |
renewalflag| character varying(100)  |
insuffbalflag  | character varying(100)  |
unsubscribeoninsufflag | character varying(100)  |
preexpiryduration  | character varying(10)   |


Regards,
M Tarkeshwar Rao


Re: [GENERAL] Adding 3 hours while inserting data into table

2014-08-06 Thread Adrian Klaver

On 08/06/2014 03:50 AM, M Tarkeshwar Rao wrote:

Hi Team,

We are facing some inconsistence behaviour of Postgres. We have deployed
our database on a server where timezone is GMT+3 hours.


What Postgres version?
How was Postgres installed and on what OS?



We have application which is running on the same server.


What is the application using to connect to the server, ODBC, JDBC, etc?



When application starts, it is inserting the correct timestamp in the
table but after running few minutes/hours we have observed that 3 hours
is added into the timestamp in table. The problem resolved once the
application restarted.


So once the application is restarted the timestamp offset never changes 
or do you have to restart periodically to reset?




Our application is putting correct data (checked by the insert query)
and in DB it is seen that 3 hours is added. The figure 3 is slightly
important as the server is deployed GMT+3 hours.


Can you show an example of an INSERT.



Below are some observations and some command output along with table
structure.

JEDEMM02:/# date;

Tue Aug  5 16:41:52 AST 2014





*Property in Postgres.conf*

#timezone = '(defaults to server environment setting)'


Is that the actual setting from postgresql.conf?
There is not a timezone specified?



**

*Table Structure:*

*Column |Type | Modifiers*

*+-+---*

*msisdn | character varying(100)  |*

*offerid| character varying(100)  |*

*expdatetime| timestamp without time zone |*


If you are interested in timezone(time) correctness this should be 'with 
timezone'.




Regards,

M Tarkeshwar Rao




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-06 Thread Alexey Klyukin
On Sun, Aug 3, 2014 at 3:20 AM, Phoenix Kiula phoenix.ki...@gmail.com
wrote:

 Hi. I've been patient. PG is 9.0.17, updated via Yum yesterday.

 One of my large tables (101 GB on disk, about 1.1 billion rows) used
 to take too long to vacuum. Not sure if it's an index corruption
 issue. But I tried VACUUM FULL ANALYZE as recommended in another
 thread yesterday, which took 5 hours on the two times I tried, without
 finishing.

 Now the REINDEX TABLE has taken over 6 hours as I decided to be
 patient and just let something finish. Not sure this is normal though!
 How do production level DBAs do this if it takes so long?

 If I open another SSH window to my server and try select * from
 pg_stats_activity it just hangs there, as the REINDEX I presume is
 taking up all the memory? I basically can't do anything else on this
 server.



From my experience REINDEX on a 100GB table with such a hardware will
definitely take hours.
It might be actually CPU bound, not I/O, if you have a large functional
index on a table (like lower(text_column)),
and since PostgreSQL can only take use of a single core - you are out of
luck.

In order to speed up the process without locking your data, you may
consider create the new index with create index concurrently,
and then just drop the old one (make sure your DROP won't wait trying to
acquire a lock when you do it, otherwise
other processes will start to queue after it).

I'd question the usefulness of running VACUUM FULL on a production server
(there are other ways around, i.e
pg_repack or some ideas from this post:
http://www.depesz.com/2013/06/21/bloat-removal-by-tuples-moving/).

-- 
Regards,
Alexey Klyukin


Re: [GENERAL] postgresql referencing and creating types as record

2014-08-06 Thread David G Johnston
vpmm2007 wrote
 type function is record (f1 NUMERIC,f2 NUMERIC..); this is in oracle 
 
 kindly tell me what is the substitute to use is record  in postgres.
 
 its urgent .
 
 thanks and rgds
 vpmm

No idea on exactly what Oracle is creating here (a type or a set returning
function) but either:

CREATE TYPE ( ... )
 
or 

CREATE FUNCTION ( ... ) RETURNS TABLE ( ... )

The documentation will provide specifics.

http://www.postgresql.org/docs/9.3/interactive/sql-commands.html

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/postgresql-referencing-and-creating-types-as-record-tp5813901p5813912.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Questions on dynamic execution and sqlca

2014-08-06 Thread Bill Epstein


I'm very new to Postgres, but have plenty of experience developing stored
procs in Oracle.

I'm going to be creating Postgres stored procedures (functions actually,
since I discovered that in postgres, everything is a function) to do a
variety of batch-type processing.  These functions may or may not be called
by the .Net application that is being developed.  To support both my
Postgres function development and run-time monitoring, I wanted to develop
generic logging functions that would be called by other Postgres functions
to be developed in order to help trace through code and collect error
information.

The attached create_log_utilities.sql holds plsql for creating two logging
functions (one for logging status messages, and one for logging errors).
In the log_msg function, the various sets of EXEC and EXECUTE statements
are from my experimenting with dynamically generating SQL.  If I could get
it working, the intent is to be able to add a LogTableName as an input
parameter, thereby allowing individual developers to utilize their own
version of the log table (w/ the same columns).  I've been able to do this
sort of thing w/ Oracle before.

I've tried a variety of ways based on the on-line docs I've seen, but I
always get a syntax error on EXEC when I use only the line EXEC  statement
(is there a setting I need to set in order to be able to include EXEC
directives?).  The closest I've come is the currently uncommented prepared
statement - it compiles, but I get the following error messages:

   INFO:  INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY, AUDIT_LEVEL,
   AUDIT_TIME, NOTE, SQL) VALUES ('Overpayment','Create
   TLI','LOG','2014-08-06 10:44:23.933','Created TLI','INSERT INTO
   TLIA...')
   CONTEXT:  SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
   p_function, p_note, p_sql)
   PL/pgSQL function utility.logging_test() line 24 at PERFORM
   ERROR:  INSERT has more expressions than target columns
   LINE 3:  VALUES ($1, $2, $3, $4, $5, $6)
^
   QUERY:  PREPARE myinsert7 (text, text, text, timestamp, text, text) AS
INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY,
   AUDIT_LEVEL, NOTE, SQL)
VALUES ($1, $2, $3, $4, $5, $6)
   CONTEXT:  PL/pgSQL function utility.log_msg
   (character,text,text,text,text) line 48 at SQL statement
   SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
   p_function, p_note, p_sql)
   PL/pgSQL function utility.logging_test() line 24 at PERFORM
   ** Error **

   ERROR: INSERT has more expressions than target columns
   SQL state: 42601
   Context: PL/pgSQL function utility.log_msg
   (character,text,text,text,text) line 48 at SQL statement
   SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
   p_function, p_note, p_sql)
   PL/pgSQL function utility.logging_test() line 24 at PERFORM


In the other function (log_error ), the problem I'm having is that I'm
trying to pull out the sqlca error code and description (as I've done in
the past w/ Oracle), in order to write that information in my log table.
The intent is that this function will only be called from within an
EXCEPTION block (as I do in my logging_test  function - I purposely run a
bad query to trigger it).

To exercise the code, I'm just executing select utility.logging_test(); in
a query window.

A few other items I could use clarification on:
- What's the difference between hitting the Execute Query and Execute
PGScript buttons?  Both seem to compile the functions.

- What are the differences among PL/SQL,  PL/PGSQL and pgScript.

- I installed Postgres 9.3.4 and  I'm using PEM v4.0.2.  When I click on
the icon to Execute arbitrary SQL queries, I notice that the icons on the
window that opens are different from the pgAdmin PostgreSQL Tools window
that opens if I double-click on one of my .sql files.  Is there a
difference in these tools?


Attached are the relevant scripts:
(See attached file: create_bpc_audit.sql) - Create the log table
(See attached file: create_log_utilities.sql)- Code to create the two
logging functions
(See attached file: test_log_utilities.sql)- Code to exercise the msg and
error logging functions


Thanks.
Bill

_
William Epstein
Consulting I/T Specialist
AIS ADM Information Management
US Federal
Office/Fax:  301-240-3887, Tie Line:  372-3887
International Business Machines (IBM) Corporation
Global Business Services (GBS)

create_bpc_audit.sql
Description: Binary data


create_log_utilities.sql
Description: Binary data


test_log_utilities.sql
Description: Binary data

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Questions on dynamic execution and sqlca

2014-08-06 Thread David G Johnston
Bill Epstein wrote
 I've tried a variety of ways based on the on-line docs I've seen, but I
 always get a syntax error on EXEC when I use only the line EXEC  statement

You likely need to use EXECUTE in PostgreSQL


INFO:  INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY, AUDIT_LEVEL,
AUDIT_TIME, NOTE, SQL) VALUES ('Overpayment','Create
TLI','LOG','2014-08-06 10:44:23.933','Created TLI','INSERT INTO
TLIA...')
CONTEXT:  SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
p_function, p_note, p_sql)
PL/pgSQL function utility.logging_test() line 24 at PERFORM
ERROR:  INSERT has more expressions than target columns
LINE 3:  VALUES ($1, $2, $3, $4, $5, $6)
 ^
QUERY:  PREPARE myinsert7 (text, text, text, timestamp, text, text) AS
 INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY,
AUDIT_LEVEL, NOTE, SQL)
 VALUES ($1, $2, $3, $4, $5, $6)
CONTEXT:  PL/pgSQL function utility.log_msg
(character,text,text,text,text) line 48 at SQL statement
SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
p_function, p_note, p_sql)
PL/pgSQL function utility.logging_test() line 24 at PERFORM
** Error **
 
ERROR: INSERT has more expressions than target columns
SQL state: 42601
Context: PL/pgSQL function utility.log_msg
(character,text,text,text,text) line 48 at SQL statement
SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
p_function, p_note, p_sql)
PL/pgSQL function utility.logging_test() line 24 at PERFORM

Since COMPONENT, ACTIVITY, AUDIT_LEVEL, NOTE, SQL is only 5 columns and
you are sending 6 it is not surprising that you are getting an error.


 In the other function (log_error ), the problem I'm having is that I'm
 trying to pull out the sqlca error code and description (as I've done in
 the past w/ Oracle), in order to write that information in my log table.
 The intent is that this function will only be called from within an
 EXCEPTION block (as I do in my logging_test  function - I purposely run a
 bad query to trigger it).

You still have to deal with the fact that PostgreSQL functions operate in
the transaction context of the caller; they cannot set their own.  Depending
on how you write the function and the caller if you eventually ROLLBACK you
could lose the logging.


 - What's the difference between hitting the Execute Query and Execute
 PGScript buttons?  Both seem to compile the functions.

Execute Query just sends the statement(s) to the server
Execute PGScript wraps the statements in a transaction so that either they
are succeed or all fail.
Basically with Execute Query if a statement in the middle fails everything
before it still commits (auto-commit)

For a single statement there is no difference.

 - What are the differences among PL/SQL,  PL/PGSQL and pgScript.

The first two are languages you write functions in.  pgScript is simply an
informal way to group a series of statements together and have them execute
within a transaction.


 - I installed Postgres 9.3.4 and  I'm using PEM v4.0.2.  When I click on
 the icon to Execute arbitrary SQL queries, I notice that the icons on
 the
 window that opens are different from the pgAdmin PostgreSQL Tools window
 that opens if I double-click on one of my .sql files.  Is there a
 difference in these tools?

No idea - but probably.  But there are likely many similarities too.


 Attached are the relevant scripts:
 (See attached file: create_bpc_audit.sql) - Create the log table
 (See attached file: create_log_utilities.sql)- Code to create the two
 logging functions
 (See attached file: test_log_utilities.sql)- Code to exercise the msg and
 error logging functions

Didn't even open these...


David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Questions-on-dynamic-execution-and-sqlca-tp5813929p5813934.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr

2014-08-06 Thread Jeff Janes
On Tue, Aug 5, 2014 at 2:08 PM, john gale j...@smadness.com wrote:


 -  Bitmap Index Scan on
  index_testruns_on_custom_spawnid  (cost=0.00..41437.84 rows=500170
  width=0) (actual time=4872.404..4872.404 rows=2438520 loops=1)
 
  Ouch, ouch, and more ouch. Your index_testruns_on_custom_spawnid index
 on matched 2.5 million rows...


 Yes, although it's still better than the 40mil rows that we have in the
 table itself...

 Also, that doesn't make sense to me, since we don't have 2.5mil rows that
 match this one SpawnID.  Could this suggest that my partial hstore index is
 somehow misconstructed?  Or is that saying that 2.5mil rows have a SpawnID,
 not all of which will be the one I'm looking for?


Have you tripled checked that for 'SpawnID-428870395.258592' ?

That seems like something a human is much more likely to get wrong than a
computer is.

Anyway, it seems like an compound index on ((custom_data -
'SpawnID'::text),started_at) could do wonders for this query.

Cheers,

Jeff


Re: [GENERAL] Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr

2014-08-06 Thread Jeff Janes
On Tue, Aug 5, 2014 at 4:30 PM, David G Johnston david.g.johns...@gmail.com
 wrote:


 NOTE: I am confused by this line:
 -  BitmapAnd  (cost=291564.31..291564.31 rows=28273 width=0) (actual
 time=23843.870..23843.870 rows=0 loops=1)

 How did actual match zero rows?  It should be something like 2.2M


The accounting for bitmap operations seems to be a bit of a mess.  In some
cases, it reports the number of rows represented in the bitmap.  Sometimes
it counts a bitmap itself as a row, and so there is just one of them no
matter how many rows it represents.  In this case, it seems to consider a
bitmap not to be a row at all.   The problem with counting the number of
rows represented by the bitmap is that that value is unknown if either if
the input bitmaps has gone lossy.



 Anyway, you should probably experiment with creating a multi-column index
 instead of allowing PostgreSQL to BitmapAnd them together.  Likely the
 timestamp will have higher cardinality and so should be listed first in the
 index.


No, the timestamp should almost certainly come second because it is used
with inequality operators.

Cheers,

Jeff


Re: [GENERAL] Questions on dynamic execution and sqlca

2014-08-06 Thread Ray Stell

On Aug 6, 2014, at 12:28 PM, Bill Epstein epste...@us.ibm.com wrote:

 I'm very new to Postgres, but have plenty of experience developing stored 
 procs in Oracle.  
 

I found this helpful:

http://www.amazon.com/PostgreSQL-Server-Programming-Hannu-Krosing-ebook/dp/B00DMYO2D2/ref=tmm_kin_swatch_0?_encoding=UTF8sr=8-2qid=1407345445
Krosing, Hannu; Mlodgenski, Jim; Roybal, Kirk (2013-06-25). PostgreSQL Server 
Programming (Kindle Locations 272-273). Packt Publishing. Kindle Edition. 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [GENERAL] Re: understanding why two nearly identical queries take two different planner routes, one 5s and one 2hr

2014-08-06 Thread David Johnston
On Wed, Aug 6, 2014 at 10:08 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Aug 5, 2014 at 4:30 PM, David G Johnston 
 david.g.johns...@gmail.com wrote:


 Anyway, you should probably experiment with creating a multi-column index
 instead of allowing PostgreSQL to BitmapAnd them together.  Likely the
 timestamp will have higher cardinality and so should be listed first in
 the
 index.


 No, the timestamp should almost certainly come second because it is used
 with inequality operators.


​Wouldn't that only matter if a typical inequality was expected to return
more rows than a given equality on the other field?  Depending on the
cardinality of the ID field I would expect a very large range of dates to
be required before digging down into ID becomes more effective.  My
instinct say there are relatively few IDs in play but that they are
continually adding new rows.

What statistics would the OP have to provide in order to actually make a
fact-based determination?

David J​.


[GENERAL] Recursive CTE trees + Sorting by votes

2014-08-06 Thread Gregory Taylor
We are working on a threaded comment system, and found this post by Disqus
to be super helpful:

http://cramer.io/2010/05/30/scaling-threaded-comments-on-django-at-disqus/

The CTE works wonderfully, and we're really happy with the results. The
last obstacle is figuring out how to sort by a votes field, meanwhile
preserving the tree structure.

If we ORDER BY path, votes (assuming we have the same structure as in the
article), we never need tie-breaking on path, so the votes part of this
doesn't even come into the equation.

I suspect we need to do some path manipulation, but I'm not too sure of
where to begin with this. I attempted incorporating votes into the path,
but I failed pretty badly with this. It's probably way off, but here's my
last (failed) attempt:

https://gist.github.com/gtaylor/e3926a90fe108d52a4c8

Any ideas would be greatly appreciated! If we can retain the path structure
and also sort by votes, we'll be able to paginate freely without issues.

-- 
Greg Taylor
http://gc-taylor.com


Re: [GENERAL] Recursive CTE trees + Sorting by votes

2014-08-06 Thread Martijn van Oosterhout
On Wed, Aug 06, 2014 at 05:28:09PM -0400, Gregory Taylor wrote:
 We are working on a threaded comment system, and found this post by Disqus
 to be super helpful:
 
 http://cramer.io/2010/05/30/scaling-threaded-comments-on-django-at-disqus/
 
 The CTE works wonderfully, and we're really happy with the results. The
 last obstacle is figuring out how to sort by a votes field, meanwhile
 preserving the tree structure.

What do you mean exactly? Do you mean that want everything at the same
level to be sorted by vote?

 If we ORDER BY path, votes (assuming we have the same structure as in the
 article), we never need tie-breaking on path, so the votes part of this
 doesn't even come into the equation.
 
 I suspect we need to do some path manipulation, but I'm not too sure of
 where to begin with this. I attempted incorporating votes into the path,
 but I failed pretty badly with this. It's probably way off, but here's my
 last (failed) attempt:
 
 https://gist.github.com/gtaylor/e3926a90fe108d52a4c8

I think what you need to do is do the ordering withing the CTE itself.
Something like:

WITH RECUSIVE cte () AS (
   SELECT ... ORDER BY vote DESC
UNION ALL
   SELECT ... JOIN cte ... ORDER BY vote DESC
) SELECT * from cte;

Or another idea, add a column that is the path of the parent:

WITH RECUSIVE cte () AS (
   SELECT array[] as path_parent, array[id] as path, ... ORDER BY vote DESC
UNION ALL
   SELECT cte.path as path_parent, cte.path || comments.id as path, ... JOIN 
cte ... ORDER BY vote DESC
) SELECT * from cte order by path, votes desc;
  
Hope this helps,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


[GENERAL] Connecting with PostgreSQL 9.1 using the Ubuntu machine user and peer authentication method

2014-08-06 Thread Jorge Arevalo
Hello,

I want to connect to my local installation of PostgreSQL 9.1 using my
machine user (who is vagrant). So, after reading PostgreSQL documentation,
I thought I just needed to:

1. Add username map in pg_ident.conf:

# MAPNAME   SYSTEM-USERNAME PG-USERNAME

vp   vagrantpostgres


2. Using the map in pg_hba.conf

# TYPE  DATABASEUSERADDRESS METHOD

local   all  all
peer map=vp

But I'm getting the error

sql: FATAL:  Peer authentication failed for user vagrant

If I try to connect to my server using psql.

I guess I'm misunderstanding the PostgreSQL manual. But, how could I get
what I need? (locally connect with the user vagrant like if it was the
postgres user)

Many thanks in advance (and sorry for the cross-posting. I asked this in
serverfault too, but I think this is the right place)

-- 
Jorge Arevalo

http://about.me/jorgeas80


Re: [GENERAL] Connecting with PostgreSQL 9.1 using the Ubuntu machine user and peer authentication method

2014-08-06 Thread Frank Pinto
Looks like you're doing it right, you actually have to specify the user
though:

psql -U postgres

and make sure you restarted the server so your changes take effect.

Frank


On Wed, Aug 6, 2014 at 4:43 PM, Jorge Arevalo jorgearev...@libregis.org
wrote:

 Hello,

 I want to connect to my local installation of PostgreSQL 9.1 using my
 machine user (who is vagrant). So, after reading PostgreSQL documentation,
 I thought I just needed to:

 1. Add username map in pg_ident.conf:

 # MAPNAME   SYSTEM-USERNAME PG-USERNAME

 vp   vagrantpostgres


 2. Using the map in pg_hba.conf

 # TYPE  DATABASEUSERADDRESS METHOD

 local   all  all
 peer map=vp

 But I'm getting the error

 sql: FATAL:  Peer authentication failed for user vagrant

 If I try to connect to my server using psql.

 I guess I'm misunderstanding the PostgreSQL manual. But, how could I get
 what I need? (locally connect with the user vagrant like if it was the
 postgres user)

 Many thanks in advance (and sorry for the cross-posting. I asked this in
 serverfault too, but I think this is the right place)

 --
 Jorge Arevalo

 http://about.me/jorgeas80



Re: [GENERAL] Connecting with PostgreSQL 9.1 using the Ubuntu machine user and peer authentication method

2014-08-06 Thread John R Pierce

On 8/6/2014 3:43 PM, Jorge Arevalo wrote:


I want to connect to my local installation of PostgreSQL 9.1 using my 
machine user (who is vagrant). So, after reading PostgreSQL 
documentation, I thought I just needed to:





wouldn't it be easier to ...

create user vagrant superuser;
create database vagrant owner vagrant;

?



--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Connecting with PostgreSQL 9.1 using the Ubuntu machine user and peer authentication method

2014-08-06 Thread Adrian Klaver

On 08/06/2014 03:43 PM, Jorge Arevalo wrote:

Hello,

I want to connect to my local installation of PostgreSQL 9.1 using my
machine user (who is vagrant). So, after reading PostgreSQL
documentation, I thought I just needed to:

1. Add username map in pg_ident.conf:

# MAPNAME   SYSTEM-USERNAME PG-USERNAME

vp   vagrantpostgres


2. Using the map in pg_hba.conf

# TYPE  DATABASEUSERADDRESS METHOD

local   all  all
   peer map=vp

But I'm getting the error

sql: FATAL:  Peer authentication failed for user vagrant

If I try to connect to my server using psql.

I guess I'm misunderstanding the PostgreSQL manual. But, how could I get
what I need? (locally connect with the user vagrant like if it was the
postgres user)



What OS are you on?

Per:
http://www.postgresql.org/docs/9.1/interactive/auth-methods.html#AUTH-PEER

Peer authentication is only available on operating systems providing 
the getpeereid() function, the SO_PEERCRED socket parameter, or similar 
mechanisms. Currently that includes Linux, most flavors of BSD including 
Mac OS X, and Solaris.





--
Jorge Arevalo

http://about.me/jorgeas80



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] The dangers of streaming across versions of glibc: A cautionary tale

2014-08-06 Thread Matthew Kelly
The following is a real critical problem that we ran into here at TripAdvisor, 
but have yet figured out a clear way to mitigate.

TL;DR:
Streaming replicas—and by extension, base backups—can become dangerously broken 
when the source and target machines run slightly different versions of glibc.  
Particularly, differences in strcoll and strcoll_l leave corrupt indexes on 
the slave.  These indexes are sorted out of order with respect to the strcoll 
running on the slave.  Because postgres is unaware of the discrepancy is uses 
these corrupt indexes to perform merge joins; merges rely heavily on the 
assumption that the indexes are sorted and this causes all the results of the 
join past the first poison pill entry to not be returned.  Additionally, if the 
slave becomes master, the corrupt indexes will in cases be unable to enforce 
uniqueness, but quietly allow duplicate values.

Context:
We were doing a hardware upgrade on a large internal machine a couple months 
ago.  We followed a common procedure here: stand up a the new HA pair as 
streaming replica's of the old system; then failover to the new pair.  All 
systems involved were running 9.1.9 (though that is not relevant as we'll see), 
and built from source.

Immediately, after the failover we saw some weird cases with some small 
indexes.  We thought it was because the streaming replication failover had gone 
poorly (and because we weren't running latest version of postgres on that 
machine), so we rebuilt them and moved on.  Until last week when an important 
query stopped getting optimized as a hash join and turned into a merge join.  
From that query I generated a simple, single column join between two tables.  
That query returns 50 million rows with merge joins disabled and 0 rows with 
them enabled.  Rebuilding the index fixed the issue, but this was an important 
table and so we did some digging.

Using some query optimizer coercion, I was able to show that 1. the corrupt 
index had the same number of rows as the table, and 2. the index returned rows 
in a different, but nearly identical ordering to the one that you would receive 
by explicitly sorting the column.  Taking a pair of rows that were out of 
place, I manage to narrow the issue down.  Luckily, we able to find the old 
server sitting on the floor.  The simplest form of this issue is:

SELECT 'M'  'ஐ';

Root cause:
Depending on your charset the first character might look like an ascii 'M'.  It 
is not.  The two characters in question are the utf8 representations of 
http://www.fileformat.info/info/unicode/char/ff2d/index.htm and 
http://www.fileformat.info/info/unicode/char/0b90/index.htm respectively.  
Across different machines, running the same version of postgres, and in 
databases with identical character encodings and collations ('en_US.UTF-8') 
that select will return different results if the version of glibc is different. 
 This holds whether one pg instance is a pg_basebackup of the other or if you 
run an initdb and then immediately start postgres.

Digging further lead me to: master:src/backend/utils/adt/varlena.c:1494,1497  
These are the lines where postgres calls strcoll_l and strcoll, in order to 
sort strings in a locale aware manner.

In the simplest case, the attached c file returns inconsistent results across 
glibc versions and environments.  It just sets the collation to 'en_US.UTF-8', 
and then compares two one character strings (the characters above).  Depending 
on the version of glibc you are running, you may see positive, negative or zero 
as the output.  I have observed:

Old Server (CentOS 5.8, kernel 2.6.18-308.24.1.el5):
0 - glibc-2.5-81.el5_8.7
1 - glibc-devel-2.5-81.el5_8.7 statically linked
0 - Source builds of glibc (2.5, 2.6, 2.10)

New Server (CentOS 6.4, kernel 2.6.32-358.el6.x86_64)
-1 - glibc-2.12-1.132.el6.x86_64

Dev server (CentOS 6.4, kernel 2.6.32-358.el6.x86_64)
-1 - glibc-2.12-1.107.el6_4.5.x86_64
-1 - Source build (2.12.2, 2.16, 2.18)
15 - Source build (HEAD)

Laptop (Ubuntu, kernel 3.11.0-12-generic)
-1 - 2.17-93ubuntu4
15 - 2.17-93ubuntu4 statically linked

Mac OS (For comparison only)
62365 - OSX 10.8, 10.9


From my digging, I have been unable to figure out why glibc is returning 
different results in different situations.  It is probably worth getting a 
discussion going on their mailing lists as well.

Regardless, the reality is that there are different versions of glibc out there 
in the wild, and they do not sort consistently across versions/environments.  
Streaming replica's rely on the assumption that the sort order within a 
collation is consistent across machines.  If they differ ever so slightly, then 
a single pair of rows with poison pill characters can yield an entire index 
invalid.  We were lucky that the first discrepancy was at the beginning of the 
index.  If it was 85% of the way through then we probably would never have 
noticed that merge joins were broken on that machine.

We still are discussing 

Re: [GENERAL] The dangers of streaming across versions of glibc: A cautionary tale

2014-08-06 Thread Bruce Momjian
On Wed, Aug  6, 2014 at 09:24:17PM +, Matthew Kelly wrote:
 The following is a real critical problem that we ran into here at TripAdvisor,
 but have yet figured out a clear way to mitigate.
 
 TL;DR:
 Streaming replicas—and by extension, base backups—can become dangerously 
 broken
 when the source and target machines run slightly different versions of glibc.
  Particularly, differences in strcoll and strcoll_l leave corrupt indexes on
 the slave.  These indexes are sorted out of order with respect to the strcoll
 running on the slave.  Because postgres is unaware of the discrepancy is uses
 these corrupt indexes to perform merge joins; merges rely heavily on the
 assumption that the indexes are sorted and this causes all the results of the
 join past the first poison pill entry to not be returned.  Additionally, if 
 the
 slave becomes master, the corrupt indexes will in cases be unable to enforce
 uniqueness, but quietly allow duplicate values.

No surprise;  I have been expecting to hear about such breakage, and am
surprised we hear about it so rarely.  We really have no way of testing
for breakage either.  :-(

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] The dangers of streaming across versions of glibc: A cautionary tale

2014-08-06 Thread Peter Geoghegan
On Wed, Aug 6, 2014 at 5:11 PM, Bruce Momjian br...@momjian.us wrote:
 No surprise;  I have been expecting to hear about such breakage, and am
 surprised we hear about it so rarely.  We really have no way of testing
 for breakage either.  :-(

I guess that Trip Advisor were using some particular collation that
had a chance of changing. Sorting rules for English text (so, say,
en_US.UTF-8) are highly unlikely to change. That might be much less
true for other locales.

Unicode Technical Standard #10 states:


Collation order is not fixed.

Over time, collation order will vary: there may be fixes needed as
more information becomes available about languages; there may be new
government or industry standards for the language that require
changes; and finally, new characters added to the Unicode Standard
will interleave with the previously-defined ones. This means that
collations must be carefully versioned.


So, the reality is that we only have ourselves to blame.  :-(

LC_IDENTIFICATION serves this purpose on glibc. Here is what en_US
looks like on my machine:


escape_char /
comment_char %
% Locale for English locale in the USA
% Contributed by Ulrich Drepper drep...@redhat.com, 2000

LC_IDENTIFICATION
title  English locale for the USA
source Free Software Foundation, Inc.
address59 Temple Place - Suite 330, Boston, MA 02111-1307, USA
contact
email  bug-glibc-loca...@gnu.org
tel
fax
language   English
territory  USA
revision   1.0
date   2000-06-24
%
category  en_US:2000;LC_IDENTIFICATION
category  en_US:2000;LC_CTYPE
category  en_US:2000;LC_COLLATE
category  en_US:2000;LC_TIME
category  en_US:2000;LC_NUMERIC
category  en_US:2000;LC_MONETARY
category  en_US:2000;LC_MESSAGES
category  en_US:2000;LC_PAPER
category  en_US:2000;LC_NAME
category  en_US:2000;LC_ADDRESS
category  en_US:2000;LC_TELEPHONE
*** SNIP ***


This is a GNU extension [1]. If the OS adds a new version of a
collation, that probably accidentally works a lot of the time, because
the collation rule added or removed was fairly esoteric anyway, such
is the nature of these things. If it was something that came up a lot,
it would surely have been settled by standardization years ago.

If OS vendors are not going to give us a standard API for versioning,
we're hosed. I thought about suggesting that we hash a strxfrm() blob
for about 2 minutes, before realizing that that's a stupid idea. Glibc
would be a good start.

[1] 
https://www.gnu.org/software/autoconf/manual/autoconf-2.63/html_node/Special-Shell-Variables.html
-- 
Regards,
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need help in tuning

2014-08-06 Thread Phoenix Kiula
My PG server is still going down. After spending the weekend doing a
CLUSTER of my largest table (it's a RAID 1 system with SATA hard disks
and 4 GB memory, mostly devoted to PG) I still have this issue.

When I do a top command, 99% of the CPU and about 15% of the memory
is being taken by PG. When I press a c in the top UI, I see that
postmaster is doing some CLUSTER. However, I don't do any more
clustering. The only automatic setting I can think of are autovacuum.

So, question: to the degree that my system allows for performance,
what steps can I take to find out what's happening? I see some things
mentioned: I/O, vacuum settings, pg_stats, pg_activity -- is there a
simple guide somewhere which shows me step by step what to do? Google
hasn't been much help. Postgresql performance tunings brings a lot
of esoteric articles.

Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] The dangers of streaming across versions of glibc: A cautionary tale

2014-08-06 Thread Tatsuo Ishii
 Over time, collation order will vary: there may be fixes needed as
 more information becomes available about languages; there may be new
 government or industry standards for the language that require
 changes; and finally, new characters added to the Unicode Standard
 will interleave with the previously-defined ones. This means that
 collations must be carefully versioned.

Another idea could be having our own collation data to isolate any
changes from outside world. I vaguley recall this had been discussed
before.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] The dangers of streaming across versions of glibc: A cautionary tale

2014-08-06 Thread Peter Geoghegan
On Wed, Aug 6, 2014 at 6:30 PM, Tatsuo Ishii is...@postgresql.org wrote:
 Another idea could be having our own collation data to isolate any
 changes from outside world. I vaguley recall this had been discussed
 before.

That's probably the best solution. It would not be the first time that
we decided to stop relying on the operating system's facilities due to
various problems (e.g. we used to use the C standard library qsort()
until about 2006). The only problem is that it's a lot of work. One
possible solution that has been proposed is to adopt ICU [1]. That
might allow us to say this is the official way that PostgreSQL 9.6
sorts Japanese; you may use the old way if you want, but it's
incompatible with the new way. ICU would give us a standard
versioning interface [2]. They seem to take this seriously, and are
aware of our considerations around B-Tree indexes on text.

[1] https://wiki.postgresql.org/wiki/Todo:ICU
[2] http://userguide.icu-project.org/collation/architecture#TOC-Versioning
-- 
Regards,
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reindex taking forever, and 99% CPU

2014-08-06 Thread Phoenix Kiula
Thank you for the very specific idea of pg_stat_user.

This is what I see (the output is also included in email below, but
this is easier to read) --
https://gist.github.com/anonymous/53f748a8c6c454b804b3

The output here  (might become a jumbled mess)--

=# SELECT * from pg_stat_user_tables where relname='bigtb';
 relid  | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd |
n_live_tup | n_dead_tup |  last_vacuum  |
last_autovacuum | last_analyze  | last_autoanalyze
++-+--+--+--+---+---+---+---+---+++---+-+---+--
 105954 | public | bigtb   |0 |0 |   220396 |
   89781 | 63516 | 6 |   910 | 1 |
634879579 | 39 | 2014-08-06 20:12:47.163055-04 |
  | 2014-08-06 20:19:40.317771-04 | (1 row)

Time: 50.844 ms



We spent some time to do some massive cleaning of the data from this
table. Brought it down to around 630 million rows. Overall size of the
table including indexes is about 120GB anyway.

More stats that we could manage are pretty-pasted here:
https://gist.github.com/anonymous/21aaeae10584013c3820

The biggest table (bigtb -- codename for pasting on public forum)
stores some URLs. The most important index is for this table is the
alias column, which is varchar(35) as you can see.

Table definition also pasted below:



  Table public.bigtb
 Column  |Type |Modifiers
-+-+-
 alias   | character varying(35)   | not null
 url | text| not null
 user_registered | boolean |
 private_key | character varying(6)| default NULL::character varying
 modify_date | timestamp without time zone | default now()
 ip  | bigint  |
 url_md5 | text|

Indexes:
idx_bigtb_pkey PRIMARY KEY, btree (alias)
idx_bigtb_ip_url UNIQUE, btree (ip, url_md5)
idx_bigtb_modify_date btree (modify_date)
idx_bigtb_urlmd5 btree (url_md5)
Check constraints:
bigtb_alias_check CHECK (alias::text ~ '[-.~a-z0-9_]'::text)
Referenced by:
TABLE bigtb_registered CONSTRAINT fk_bigtb_registered FOREIGN
KEY (alias) REFERENCES bigtb(alias) MATCH FULL ON UPDATE CASCADE ON
DELETE CASCADE
TABLE interesting CONSTRAINT interesting_alias_fkey FOREIGN
KEY (alias) REFERENCES bigtb(alias) MATCH FULL ON UPDATE CASCADE
Rules:
__track_bigtb_deleted AS
ON DELETE TO bigtb
   WHERE NOT (EXISTS ( SELECT bigtb_deleted.alias
   FROM bigtb_deleted
  WHERE bigtb_deleted.alias::text = old.alias::text)) DO
INSERT INTO bigtb_deleted (alias, url, user_registered, modify_date)
  VALUES (old.alias, old.url, old.user_registered, old.modify_date)




What else could I do here?

As you will see in the code shared above (GIST Github link) the stats
for this table are:

bigtb -

   row count: 634,879,168
   inserted:  65613
   updated: 6
   deleted:  1013



There are recent numbers. The DB has been going down often. But
deletions would be around 20,000 per week. Updates are lowest. INSERT
and SELECT are huge, with of course SELECT being the biggest activity
(high traffic website).

We did put PGBouncer for some pooling benefits, and memcached for
taking some load off the postgresql server. As of this writing, the
memcached thing is caching around 200,000 URLs which would otherwise
have been a query based on the index on the alias column --
idx_bigtb_pkey.

What other info can I share?

Suppose we might have to explore partitioning, which would probably be
via first letter of the alias? This would lead to around 26 + 9 = 35
sub-tables. Is this too many?

My CONFIG settings:


max_connections = 180   # Was 250!  -
http://www.php.net/manual/en/function.pg-pconnect.php#20309
superuser_reserved_connections  = 5
shared_buffers  = 512MB
effective_cache_size= 1200MB   # Nov 11 2011, was 1500MB
temp_buffers= 32MB # min 800kB
maintenance_work_mem= 320MB# min 1MB, was 128MB
work_mem= 64MB
wal_buffers = 20MB # min 32kB
fsync   = on   # turns forced
synchronization on or off
checkpoint_segments = 128  # was 128
checkpoint_timeout  = 1000 # was 1000
enable_indexscan= on
log_min_duration_statement  = 1000



Much appreciate any further ideas!





On Sun, Aug 3, 2014 at 9:29 PM, Adrian Klaver adrian.kla...@aklaver.com wrote:
 On 08/02/2014 07:37 PM, Phoenix Kiula wrote:

 

Re: [GENERAL] Need help in tuning

2014-08-06 Thread David G Johnston
Phoenix Kiula wrote
 My PG server is still going down. After spending the weekend doing a
 CLUSTER of my largest table (it's a RAID 1 system with SATA hard disks
 and 4 GB memory, mostly devoted to PG) I still have this issue.
 
 When I do a top command, 99% of the CPU and about 15% of the memory
 is being taken by PG. When I press a c in the top UI, I see that
 postmaster is doing some CLUSTER. However, I don't do any more
 clustering. The only automatic setting I can think of are autovacuum.
 
 So, question: to the degree that my system allows for performance,
 what steps can I take to find out what's happening? I see some things
 mentioned: I/O, vacuum settings, pg_stats, pg_activity -- is there a
 simple guide somewhere which shows me step by step what to do? Google
 hasn't been much help. Postgresql performance tunings brings a lot
 of esoteric articles.

SELECT * FROM pg_stat_activity;

This will tell you what the server thinks it is doing.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Need-help-in-tuning-tp5813983p5813987.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Avoid WAL archiving when idle?

2014-08-06 Thread Laurence Rowe
I have WAL archiving setup on Postgres 9.3.2 using WAL-E on CentOS 6.4
using the postgresql.org RPM. This is working fine, except I see a lot of
spurious activity in the S3 bucket with wal files being backed up every 5
minutes even when the database is idle. This can make restoring to a dev
server really slow if it's been a long time since the last base backup. The
only non-default configuration is:

wal_level = archive
archive_mode = on
archive_command = '/usr/local/bin/envdir /etc/wal-e.d/env
/tools/python/current/bin/wal-e wal-push %p'
archive_timeout = 60

The 5 minute interval matches the default checkpoint_timeout, so I guess
I'm seeing the same problem as mentioned here:
http://www.postgresql.org/message-id/CAMkU=1wcyn7jnotxcncqpultznfv8zwh5bqrqzha+ugb1x-...@mail.gmail.com

Is there anyway I can configure PostgreSQL to avoid continuously archiving
WAL files while idle but still place a limit on the time until a database
write is archived?

Laurence


Re: [GENERAL] Questions on dynamic execution and sqlca

2014-08-06 Thread Guillaume Lelarge
Le 6 août 2014 18:47, David G Johnston david.g.johns...@gmail.com a
écrit :

 Bill Epstein wrote
  I've tried a variety of ways based on the on-line docs I've seen, but I
  always get a syntax error on EXEC when I use only the line EXEC
 statement

 You likely need to use EXECUTE in PostgreSQL


 INFO:  INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY,
AUDIT_LEVEL,
 AUDIT_TIME, NOTE, SQL) VALUES ('Overpayment','Create
 TLI','LOG','2014-08-06 10:44:23.933','Created TLI','INSERT INTO
 TLIA...')
 CONTEXT:  SQL statement SELECT utility.LOG_MSG (p_log_yn,
p_component,
 p_function, p_note, p_sql)
 PL/pgSQL function utility.logging_test() line 24 at PERFORM
 ERROR:  INSERT has more expressions than target columns
 LINE 3:  VALUES ($1, $2, $3, $4, $5, $6)
  ^
 QUERY:  PREPARE myinsert7 (text, text, text, timestamp, text, text)
AS
  INSERT INTO UTILITY.BPC_AUDIT (COMPONENT, ACTIVITY,
 AUDIT_LEVEL, NOTE, SQL)
  VALUES ($1, $2, $3, $4, $5, $6)
 CONTEXT:  PL/pgSQL function utility.log_msg
 (character,text,text,text,text) line 48 at SQL statement
 SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
 p_function, p_note, p_sql)
 PL/pgSQL function utility.logging_test() line 24 at PERFORM
 ** Error **
 
 ERROR: INSERT has more expressions than target columns
 SQL state: 42601
 Context: PL/pgSQL function utility.log_msg
 (character,text,text,text,text) line 48 at SQL statement
 SQL statement SELECT utility.LOG_MSG (p_log_yn, p_component,
 p_function, p_note, p_sql)
 PL/pgSQL function utility.logging_test() line 24 at PERFORM

 Since COMPONENT, ACTIVITY, AUDIT_LEVEL, NOTE, SQL is only 5 columns and
 you are sending 6 it is not surprising that you are getting an error.


  In the other function (log_error ), the problem I'm having is that I'm
  trying to pull out the sqlca error code and description (as I've done in
  the past w/ Oracle), in order to write that information in my log table.
  The intent is that this function will only be called from within an
  EXCEPTION block (as I do in my logging_test  function - I purposely run
a
  bad query to trigger it).

 You still have to deal with the fact that PostgreSQL functions operate in
 the transaction context of the caller; they cannot set their own.
 Depending
 on how you write the function and the caller if you eventually ROLLBACK
you
 could lose the logging.


  - What's the difference between hitting the Execute Query and Execute
  PGScript buttons?  Both seem to compile the functions.

 Execute Query just sends the statement(s) to the server
 Execute PGScript wraps the statements in a transaction so that either they
 are succeed or all fail.
 Basically with Execute Query if a statement in the middle fails everything
 before it still commits (auto-commit)

 For a single statement there is no difference.

  - What are the differences among PL/SQL,  PL/PGSQL and pgScript.

 The first two are languages you write functions in.  pgScript is simply an
 informal way to group a series of statements together and have them
execute
 within a transaction.


AFAICT, this isn't true. Pgscript is a client specific language. There is a
whole description of what it can do in pgadmin's manual. This was
interesting when PostgreSQL didn't have the DO statement. Now that we do,
it's rather pointless.

  - I installed Postgres 9.3.4 and  I'm using PEM v4.0.2.  When I click on
  the icon to Execute arbitrary SQL queries, I notice that the icons on
  the
  window that opens are different from the pgAdmin PostgreSQL Tools window
  that opens if I double-click on one of my .sql files.  Is there a
  difference in these tools?

 No idea - but probably.  But there are likely many similarities too.


  Attached are the relevant scripts:
  (See attached file: create_bpc_audit.sql) - Create the log table
  (See attached file: create_log_utilities.sql)- Code to create the two
  logging functions
  (See attached file: test_log_utilities.sql)- Code to exercise the msg
and
  error logging functions

 Didn't even open these...


 David J.




 --
 View this message in context:
http://postgresql.1045698.n5.nabble.com/Questions-on-dynamic-execution-and-sqlca-tp5813929p5813934.html
 Sent from the PostgreSQL - general mailing list archive at Nabble.com.


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Questions on dynamic execution and sqlca

2014-08-06 Thread David Johnston


   - What are the differences among PL/SQL,  PL/PGSQL and pgScript.
 
  The first two are languages you write functions in.  pgScript is simply
 an
  informal way to group a series of statements together and have them
 execute
  within a transaction.
 

 AFAICT, this isn't true. Pgscript is a client specific language. There is
 a whole description of what it can do in pgadmin's manual. This was
 interesting when PostgreSQL didn't have the DO statement. Now that we do,
 it's rather pointless.



​Yeah, I probably should have either researched the answer or just left it
alone.  I am not all that familiar with pgAdmin - I figured it was just a
souped up script runner with maybe a couple of features like variables but
otherwise allowing only SQL commands.

David J.​


Re: [GENERAL] Avoid WAL archiving when idle?

2014-08-06 Thread David G Johnston
Laurence Rowe wrote
 I have WAL archiving setup on Postgres 9.3.2 using WAL-E on CentOS 6.4
 using the postgresql.org RPM. This is working fine, except I see a lot of
 spurious activity in the S3 bucket with wal files being backed up every 5
 minutes even when the database is idle. This can make restoring to a dev
 server really slow if it's been a long time since the last base backup.
 The
 only non-default configuration is:
 
 wal_level = archive
 archive_mode = on
 archive_command = '/usr/local/bin/envdir /etc/wal-e.d/env
 /tools/python/current/bin/wal-e wal-push %p'
 archive_timeout = 60
 
 The 5 minute interval matches the default checkpoint_timeout, so I guess
 I'm seeing the same problem as mentioned here:
 http://www.postgresql.org/message-id/CAMkU=1wCyN7JNOTXCnCqpULtzNfV8ZWH5BqrqZhA+uGB1x-fTA@.gmail
 
 Is there anyway I can configure PostgreSQL to avoid continuously archiving
 WAL files while idle but still place a limit on the time until a database
 write is archived?
 
 Laurence

The better solution would be for the checkpointer to simply not checkpoint
if there is nothing to write out.  It should not require manual
configuration since as soon as anything gets written to the WAL the maximum
delay would kick in and in the absence of anything to archive no time period
would make sense.

I'm do not believe there is anything you can do currently - the most likely
recommendation is that you simply need to perform the base backup more
frequently - possibly during those long idle periods you mention.  Keeping
in mind your idea of idle and Postgres' may differ...

But I do agree that this functionality would make using PostgreSQL in a
small-ish scale environment more user friendly in cases where streaming
replication isn't a requirement.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Avoid-WAL-archiving-when-idle-tp5813992p5813999.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general