from:"justin"

On Tue, Oct 24, 2017 at 02:57:47PM -0700, Peter Geoghegan wrote:
> On Tue, Oct 24, 2017 at 1:11 PM, Justin Pryzby <pry...@telsasoft.com> wrote:
> > ..which I gather just verifies that the index is corrupt, not sure if 
> > there's
> > anything else to do with it?  Note, we've already removed the duplicate 
> > rows.
> 
> Yes, the index itself is definitely corrupt -- this failed before the
> new "heapallindexed" check even started. Though it looks like that
> would have failed too, if you got that far, since the index points to
> a row that does not contain the same data. (I only know this because
> you dumped the heap tuple and the index tuple.)

I think you must have compared these:

On Tue, Oct 24, 2017 at 03:11:44PM -0500, Justin Pryzby wrote:
> ts=# SELECT * FROM bt_page_items(get_raw_page('sites_idx', 1));
> 
> itemoffset | 48
> ctid   | (1,37)
> itemlen| 32
> nulls  | f
> vars   | t
> data   | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 0b 31 31 31 31 00 00 00 
> 00 00 00
...
> itemoffset | 37
> ctid   | (0,97)
> itemlen| 24
> nulls  | f
> vars   | t
> data   | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 03 00 00

..but note those are both items in sites_idx (48 and 37, which I seem to have
pasted out of order)..  I included both because I'm not confident I know what
the "index tid=(1,37)" referred to, but I gather it means item at offset=37
(and not item with ctid=(1,37).)

| [pryzbyj@database amcheck]$ psql --port 5678 ts -c "SELECT 
bt_index_check('sites_idx'::regclass::oid, heapallindexed=>True)"
| ERROR:  high key invariant violated for index "sites_idx"
| DETAIL:  Index tid=(1,37) points to heap tid=(0,97) page lsn=0/0.

ts=# SELECT * FROM page_header(get_raw_page('sites_idx', 1));
 lsn | checksum | flags | lower | upper | special | pagesize | version | 
prune_xid 
-+--+---+---+---+-+--+-+---
 0/0 |0 | 0 |   872 |  1696 |8176 | 8192 |   4 |
 0

Here is its heap page:

ts=# SELECT * FROM heap_page_items(get_raw_page('sites', 0)) WHERE lp=97;
 lp | lp_off | lp_flags | lp_len | t_xmin |  t_xmax  | t_field3 | t_ctid | 
t_infomask2 | t_infomask | t_hoff |  t_bits  | t_oid |  
 t_data   
++--+++--+--++-+++--+---+
 97 |968 |1 | 52 |  21269 | 33567444 |0 | (3,27) |  
  8204 |   2307 | 32 | 11101001 |   | 
\x70001b4352434c4d542d43454d5330030303

Which I see ends with 0303 vs ..

t_infomask=2307=2048+256+3 =>
#define HEAP_HASNULL0x0001  /* has null attribute(s) */
#define HEAP_HASVARWIDTH0x0002  /* has variable-width 
attribute(s) */
#define HEAP_XMIN_COMMITTED 0x0100  /* t_xmin committed */
#define HEAP_XMAX_INVALID   0x0800  /* t_xmax invalid/aborted */

t_infomask2=8204 => 8192+12 =>
#define HEAP_KEYS_UPDATED   0x2000  /* tuple was updated and key 
cols modified, or tuple deleted */

Maybe this is relevant ?
ts=# SELECT * FROM heap_page_items(get_raw_page('sites', 3)) WHERE lp=27;
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | 
t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data 
++--++++--++-+++----+---+
 27 |  0 |0 |  0 |||  ||
 ||||   | 

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unique index violation after pg_upgrade to PG10

On Tue, Oct 24, 2017 at 02:57:47PM -0700, Peter Geoghegan wrote:
> On Tue, Oct 24, 2017 at 1:11 PM, Justin Pryzby <pry...@telsasoft.com> wrote:
> > ..which I gather just verifies that the index is corrupt, not sure if 
> > there's
> > anything else to do with it?  Note, we've already removed the duplicate 
> > rows.
> Maybe you could try verifying a different index on the same table with
> "heapallindexed", too. Perhaps that would fail in a more interesting
> way.

The only other index seems fine:

[pryzbyj@database ~]$ psql --port 5678 ts -txc "SELECT 
bt_index_parent_check('sites_pkey'::regclass::oid, heapallindexed=>True)"
bt_index_parent_check | 

[pryzbyj@database ~]$ psql --port 5678 ts -txc "SELECT 
bt_index_check('sites_pkey'::regclass::oid, heapallindexed=>True)"
bt_index_check | 

> You're using LVM snapshots -- I hope that you're aware that they're
> not guaranteed to be consistent across logical volumes. There are a
> few different ways that they could cause corruption like this if you
> weren't careful. (In general, I wouldn't recommend using LVM snapshots
> as any kind of backup solution.)

Right, I asked a coworker to create the snapshot before trying to fix the
immediate problem, as a forensic measure.  We have postgres on just one LVM LV.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unique index violation after pg_upgrade to PG10

On Tue, Oct 24, 2017 at 01:48:55PM -0500, Kenneth Marshall wrote:
> I just dealt with a similar problem with pg_repack and a PostgreSQL 9.5 DB,
> the exact same error. It seemed to caused by a tuple visibility issue that
> allowed the "working" unique index to be built, even though a duplicate row
> existed. Then the next pg_repack would fail with the error you got.

FTR, I was able to run the repack script several times without issue, hitting
that table each time.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unique index violation after pg_upgrade to PG10

On Tue, Oct 24, 2017 at 12:31:49PM -0700, Peter Geoghegan wrote:
> On Tue, Oct 24, 2017 at 11:48 AM, Kenneth Marshall <k...@rice.edu> wrote:
> >> Really ?  pg_repack "found" and was victim to the duplicate keys, and 
> >> rolled
> >> back its work.  The CSV logs clearly show that our application INSERTed 
> >> rows
> >> which are duplicates.
> >>
> >> [pryzbyj@database ~]$ rpm -qa pg_repack10
> >> pg_repack10-1.4.2-1.rhel6.x86_64
> >>
> >> Justin
> >
> > Hi Justin,
> >
> > I just dealt with a similar problem with pg_repack and a PostgreSQL 9.5 DB,
> > the exact same error. It seemed to caused by a tuple visibility issue that
> > allowed the "working" unique index to be built, even though a duplicate row
> > existed. Then the next pg_repack would fail with the error you got. In our
> > case I needed the locality of reference to keep the DB performance 
> > acceptable
> > and it was not a critical issue if there was a duplicate. We would remove 
> > the
> > duplicates if we had a failure. We never had a problem with the NO pg_repack
> > scenarios.
> 
> A new, enhanced version of the corruption detection tool amcheck is
> now available, and has both apt + yum packages available:
> 
> https://github.com/petergeoghegan/amcheck
> 
> Unlike the version in Postgres 10, this enhanced version (V1.2) has
> "heapallindexed" verification, which is really what you want here. If
> you install it, and run it against the unique index in question (with
> "heapallindexed" verification), that could help. It might provide a
> more useful diagnostic message.
> 
> This is very new, so do let me know how you get on if you try it out.

I started an instance connected to a copy of the LVM snapshot I saved:
[pryzbyj@database ~]$ sudo -u postgres /usr/pgsql-10/bin/postmaster -c 
port=5678 -D /mnt/10/data

[pryzbyj@database amcheck]$ psql --port 5678 ts -c "SELECT 
bt_index_check('sites_idx'::regclass::oid, heapallindexed=>True)"
ERROR:  high key invariant violated for index "sites_idx"
DETAIL:  Index tid=(1,37) points to heap tid=(0,97) page lsn=0/0.
[pryzbyj@database amcheck]$ psql --port 5678 ts -c "SELECT 
bt_index_parent_check('sites_idx'::regclass::oid, heapallindexed=>True)"
ERROR:  high key invariant violated for index "sites_idx"
DETAIL:  Index tid=(1,37) points to heap tid=(0,97) page lsn=0/0.

ts=# SELECT * FROM page_header(get_raw_page('sites_idx', 1));
lsn   | 0/0
checksum  | 0
flags | 0
lower | 872
upper | 1696
special   | 8176
pagesize  | 8192
version   | 4
prune_xid | 0

ts=# SELECT * FROM page_header(get_raw_page('sites', 0));
lsn   | 1FB/AC5A4908
checksum  | 0
flags | 5
lower | 436
upper | 464
special   | 8192
pagesize  | 8192
version   | 4
prune_xid | 0

ts=# SELECT * FROM bt_page_items(get_raw_page('sites_idx', 1));

itemoffset | 48
ctid   | (1,37)
itemlen| 32
nulls  | f
vars   | t
data   | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 0b 31 31 31 31 00 00 00 00 
00 00

itemoffset | 37
ctid   | (0,97)
itemlen| 24
nulls  | f
vars   | t
data   | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 03 00 00

..which I gather just verifies that the index is corrupt, not sure if there's
anything else to do with it?  Note, we've already removed the duplicate rows.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unique index violation after pg_upgrade to PG10

On Tue, Oct 24, 2017 at 01:27:14PM -0500, Kenneth Marshall wrote:
> On Tue, Oct 24, 2017 at 01:14:53PM -0500, Justin Pryzby wrote:

> > Note:
> > I run a script which does various combinations of ANALYZE/VACUUM 
> > (FULL/ANALYZE)
> > following the upgrade, and a script runs nightly with REINDEX and pg_repack
> > (and a couple of CLUSTER), so you should assume that any combination of 
> > those
> > maintenance commands have been run.
> > 
> > In our reindex/repack log I found the first error due to duplicates:
> > Tue Oct 24 01:27:53 MDT 2017: sites: sites_idx(repack non-partitioned)...
> > WARNING: Error creating index "public"."index_61764": ERROR:  could not 
> > create unique index "index_61764"
> > DETAIL:  Key (site_office, site_location)=(CRCLMT-DOEMS0, 1120) is 
> > duplicated.
> > WARNING: Skipping index swapping for "sites", since no new indexes built
> > WARNING: repack failed for "sites_idx"
> > reindex: warning, dropping invalid/unswapped index: index_61764
> > 
> 
> Hi Justin,
> 
> This sounds like a pg_repack bug and not a PostgreSQL bug. What version are
> you running?

Really ?  pg_repack "found" and was victim to the duplicate keys, and rolled
back its work.  The CSV logs clearly show that our application INSERTed rows
which are duplicates.

[pryzbyj@database ~]$ rpm -qa pg_repack10
pg_repack10-1.4.2-1.rhel6.x86_64

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] unique index violation after pg_upgrade to PG10

x "index_61764"
DETAIL:  Key (site_office, site_location)=(CRCLMT-DOEMS0, 1120) is duplicated.
WARNING: Skipping index swapping for "sites", since no new indexes built
WARNING: repack failed for "sites_idx"
reindex: warning, dropping invalid/unswapped index: index_61764

postgres=# SELECT * FROM postgres_log WHERE session_id='59eeeb79.4bd5' AND 
error_severity='ERROR' ORDER BY 1 DESC LIMIT 33;
log_time   | 2017-10-24 01:27:53.545-06
user_name  | postgres
database   | ts
pid| 19413
connection_from| [local]
session_id | 59eeeb79.4bd5
session_line   | 10
command_tag| CREATE INDEX
session_start_time | 2017-10-24 01:27:53-06
error_severity | ERROR
message| could not create unique index "index_61764"
detail | Key (site_office, site_location)=(CRCLMT-DOEMS0, 1120) 
is duplicated.
query  | CREATE UNIQUE INDEX CONCURRENTLY index_61764 ON sites 
USING btree (site_office, site_location) TABLESPACE pg_default
application_name   | pg_repack

Disclosure: I see a storage error from 2 days ago (pre-upgrade).  This is a
4-drive RAID5 with spare.  I can't see that we're missing a spare, so I don't
know what else to do besides consider this message to be spurious..

Oct 22 21:28:16 midrivers-ubuntu kernel: [1429369.825577] hpsa :05:00.0: 
scsi 4:0:0:0 Aborting command 8f84f2ba7c00Tag:0x:01f0 CDBLen: 6 
CDB: 0x1201... SN: 0x0  BEING SENT
Oct 22 21:28:16 midrivers-ubuntu kernel: [1429369.825583] hpsa :05:00.0: 
scsi 4:0:0:0: Aborting command RAID  HP   P410i
controller SSDSmartPathCap- En- Exp=1
Oct 22 21:28:16 midrivers-ubuntu kernel: [1429369.898930] hpsa :05:00.0: 
CDB 120180006000 was aborted with status 0x0
Oct 22 21:28:17 midrivers-ubuntu kernel: [1429371.269963] hpsa :05:00.0: 
invalid command: LUN: CDB:f001
Oct 22 21:28:17 midrivers-ubuntu kernel: [1429371.269966] hpsa :05:00.0: 
probably means device no longer present
Oct 22 21:28:17 midrivers-ubuntu kernel: [1429371.269968] hpsa :05:00.0: 
scsi 4:0:0:0 Aborting command 8f84f2ba7c00Tag:0x:01f0 CDBLen: 6 
CDB: 0x1201... SN: 0x0  SENT, FAILED
Oct 22 21:28:17 midrivers-ubuntu kernel: [1429371.269974] hpsa :05:00.0: 
scsi 4:0:0:0: FAILED to abort command RAID  HP   P410i  
  controller SSDSmartPathCap- En- Exp=1

sites is perhaps our most central table and I would expect issues there to be
quickly apparent (however this was at least initially a silent failure).
 public | sites | table | telsasoft | 80 kB | 

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] per-sesson errors after interrupting CLUSTER pg_attrdef

2017-10-19 Thread Justin Pryzby

This was briefly scary but seems to have been limited to my psql session (no
other errors logged).  Issue with catcache (?)

I realized that the backup job I'd kicked off was precluding the CLUSTER from
running, but that CLUSTER was still holding lock and stalling everything else
under the sun.

psql (10.0, server 9.6.5)
...

ts=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index ;   

^CCancel request sent   


ERROR:  canceling statement due to user request

ts=# \df qci_add*
ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 
8192 bytes
ts=# \dt+ pg_att

ts=# \dt+ pg_attrdef
ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 
8192 bytes
ts=# ^C
ts=# \q

postgres=# SELECT session_line ln, user_name, error_severity sev, 
left(message,66) , left(query,66) FROM postgres_log_2017_10_19_1700 WHERE 
session_id='59e93953.20c9'  ORDER BY 1,2 DESC ;
 ln | user_name |  sev  |left   
 |left  
  
+---+---++
  1 | pryzbyj   | LOG   | statement: CLUSTER pg_attribute USING 
pg_attribute_relid_attnum_in | 
  2 | pryzbyj   | ERROR | canceling statement due to user request   
 | CLUSTER pg_attribute USING pg_attribute_relid_attnum_index ;
  3 | pryzbyj   | LOG   | statement: SELECT n.nspname as "Schema",  
+| 
|   |   |   p.proname as "Name",
+| 
|   |   |   
 | 
  4 | pryzbyj   | ERROR | could not read block 8 in file 
"base/16400/999225102": read only 0 | SELECT n.nspname as "Schema", 
+
|   |   |   
 |   p.proname as "Name",   
 +
|   |   |   
 |   pg_catalog.
  5 | pryzbyj   | LOG   | statement: SELECT pg_catalog.quote_ident(c.relname) 
FROM pg_catalo | 
  6 | pryzbyj   | ERROR | could not read block 1 in file 
"base/16400/999225102": read only 0 | SELECT pg_catalog.quote_ident(c.relname) 
FROM pg_catalog.pg_class 
  7 | pryzbyj   | LOG   | statement: SELECT pg_catalog.quote_ident(c.relname) 
FROM pg_catalo | 
  8 | pryzbyj   | ERROR | could not read block 1 in file 
"base/16400/999225102": read only 0 | SELECT pg_catalog.quote_ident(c.relname) 
FROM pg_catalog.pg_class 
  9 | pryzbyj   | LOG   | statement: SELECT pg_catalog.quote_ident(c.relname) 
FROM pg_catalo | 
 10 | pryzbyj   | ERROR | could not read block 1 in file 
"base/16400/999225102": read only 0 | SELECT pg_catalog.quote_ident(c.relname) 
FROM pg_catalog.pg_class 
 11 | pryzbyj   | LOG   | statement: SELECT n.nspname as "Schema",  
+| 
|   |   |   c.relname as "Name",
+| 
|   |   |   
 | 
 12 | pryzbyj   | ERROR | could not read block 8 in file 
"base/16400/999225102": read only 0 | SELECT n.nspname as "Schema", 
+
|   |   |   
 |   c.relname as "Name",   
 +
|   |   |   
 |   CASE c.relk
(12 rows)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

Hi,

I just ran into this again in another context (see original dicussion, quoted
below).

Some time ago, while initially introducting non-default stats target, I set our
non-filtering columns to "STATISTICS 10", lower than default, to minimize the
size of pg_statistic, which (at least at one point) I perceived to have become
bloated and causing issue (partially due to having an excessive number of
"daily" granularity partitions, a problem I've since mitigated).

The large number of columns with non-default stats target was (I think) causing
pg_dump --section=pre-data to take 10+ minutes, which makes pg_upgrade more
disruptive than necessary, so now I'm going back and fixing it.

[pryzbyj@database ~]$ time sed '/SET STATISTICS 10;$/!d; s//SET STATISTICS 
-1;/' /srv/cdrperfbackup/ts/2017-10-17/pg_dump-section\=pre-data |psql -1q ts
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

[pryzbyj@database ~]$ dmesg |tail -n2
Out of memory: Kill process 6725 (postmaster) score 550 or sacrifice child
Killed process 6725, UID 26, (postmaster) total-vm:13544792kB, 
anon-rss:8977764kB, file-rss:8kB

So I'm hoping to encourage someone to commit the change contemplated earlier.

Thanks in advance.

Justin

On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > I've seen this before while doing SET STATISTICS on a larger number of 
> > columns
> > using xargs, but just came up while doing ADD of a large number of columns.
> > Seems to be roughly linear in number of children but superlinear WRT 
> > columns.
> > I think having to do with catalog update / cache invalidation with many
> > ALTERs*children*columns?
> 
> I poked into this a bit.  The operation is necessarily roughly O(N^2) in
> the number of columns, because we rebuild the affected table's relcache
> entry after each elementary ADD COLUMN operation, and one of the principal
> components of that cost is reading all the pg_attribute entries.  However,
> that should only be a time cost not a space cost.  Eventually I traced the
> O(N^2) space consumption to RememberToFreeTupleDescAtEOX, which seems to
> have been introduced in Simon's commit e5550d5fe, and which strikes me as
> a kluge of the first magnitude.  Unless I am missing something, that
> function's design concept can fairly be characterized as "let's leak
> memory like there's no tomorrow, on the off chance that somebody somewhere
> is ignoring basic coding rules".
> 
> I tried ripping that out, and the peak space consumption of your example
> (with 20 child tables and 1600 columns) decreased from ~3GB to ~200MB.
> Moreover, the system still passes make check-world, so it's not clear
> to me what excuse this code has to live.
> 
> It's probably a bit late in the v10 cycle to be taking any risks in
> this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
> as soon as the v11 cycle opens, unless someone can show an example
> of non-broken coding that requires it.  (And if so, there ought to
> be a regression test incorporating that.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

On Wed, Oct 18, 2017 at 07:22:27PM +0200, Alvaro Herrera wrote:
> Do you still have those core dumps?  If so, would you please verify the
> database that autovacuum was running in?  Just open each with gdb (using
> the original postgres binary, not the one you just installed) and do
> "print MyDatabaseId".

[pryzbyj@database ~]$ gdb ccpp-2017-10-16-23:21:22-31543/coredump -ex 'p 
MyDatabaseId' -ex q 2>/dev/null |tail -5
Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178
178 LockRelId   relid = relation->rd_lockInfo.lockRelId;
$1 = 16400

[pryzbyj@database ~]$ gdb ccpp-2017-10-14-18:05:35-26500/coredump -ex 'p 
MyDatabaseId' -ex q 2>/dev/null |tail -5
Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  pfree (pointer=0x298c740) at mcxt.c:954
954 (*context->methods->free_p) (context, pointer);
$1 = 16400

gtt=# SELECT oid,datname FROM pg_database;
 13456 | template0
 16400 | gtt
 13457 | postgres
 1 | template1

The gtt DB is where the (only) BRIN indicies are (not sure what to conclude
from that?)

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote:
> Justin Pryzby wrote:
> 
> > No crashes in ~28hr.  It occurs to me that it's a weaker test due to not
> > preserving most compilation options.
> 
> And the previous code crashes in 45 minutes?  That's solid enough for
> me; I'll clean up the patch and push in the next few days.  I think what
> you have now should be sufficient for the time being for your production
> system.

No - the crash happened 4 times since adding BRIN+autosummarize 6 days ago, and
in once instance occured twice within 3 hours (while I was trying to query logs
for the preceding crash).

[pryzbyj@database ~]$ sudo grep -hE 'in postgres|Saved core' /var/log/messages*
Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 
sp:7ffd9b349990 error:0 in postgres[40+692000]
Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 
(15040512 bytes)
Oct 14 18:05:35 database kernel: postmaster[26500] general protection ip:84a177 
sp:7ffd9b349b88 error:0 in postgres[40+692000]
Oct 14 18:05:35 database abrt[27564]: Saved core dump of pid 26500 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-14-18:05:35-26500 
(24137728 bytes)
Oct 16 23:21:22 database kernel: postmaster[31543] general protection ip:4bd467 
sp:7ffe08a94890 error:0 in postgres[40+692000]
Oct 16 23:21:22 database abrt[570]: Saved core dump of pid 31543 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-16-23:21:22-31543 
(25133056 bytes)
Oct 17 01:58:36 database kernel: postmaster[8646]: segfault at 8 ip 
0084a177 sp 7ffe08a94a88 error 4 in postgres[40+692000]
Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 
(7692288 bytes)

> > If I understand, our crash isn't explained by the avw_database test
> > anyway (?)
> 
> I don't see why you would think that -- I disagree.

No problem - apparently I read too far into Tom's thoughts regarding memory
context.

I'll continue runnning with the existing patch and come back if the issue
recurs.

Thanks
Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

On Tue, Oct 17, 2017 at 09:07:40AM -0500, Justin Pryzby wrote:
> On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote:
> > Justin Pryzby <pry...@telsasoft.com> writes:
> > > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote:
> > >> Anyway, can give this patch a try?
> > 
> > The trick in this sort of situation is to make sure you build binaries
> > that match your existing install in every way except having the added
> > patch, and maybe getting installed into a different directory.
>
> I'm familiar with that process; but, these are PG10 binaries from PGDG for
> centos6 x64.  I had to add symlinks for postgis library, but otherwise seems 
> to
> be working fine (although I didn't preserve as many configure options as your
> message would suggest I should have).

On Tue, Oct 17, 2017 at 12:49:55PM -0400, Tom Lane wrote:
> So what I'm thinking is that you need an error during perform_work_item,
> and/or more than one work_item picked up in the calling loop, to make this
> bug manifest.  You would need to enter perform_work_item in a

..in our case probably due to interruption by LOCK TABLE, yes?

On Tue, Oct 17, 2017 at 12:49:55PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvhe...@alvh.no-ip.org> writes:
> > And I think that's because we're not
> > checking that the namespace OID is a valid value before calling
> > get_namespace_name on it.
> 
> The part of your patch that adds a check on avw_database is clearly
> correct and necessary.  I'm thinking the change you propose in
> perform_work_item is just overcomplicating code that's okay as it
> stands.  We don't need to optimize for the schema-went-away case.

No crashes in ~28hr.  It occurs to me that it's a weaker test due to not
preserving most compilation options.  If I understand, our crash isn't
explained by the avw_database test anyway (?)

Should I make clean and recompile with all non-prefix options and a minimal
patch (avw_database==MyDatabaseId || continue) ?

Or recompile with existing options but no patch to first verify crash occurs
with locally compiled binary?

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote:
> >> Anyway, can give this patch a try?
> 
> > I've only compiled postgres once before and this is a production environment
> > (althought nothing so important that the crashes are a serious concern 
> > either).
> 
> > Is it reasonable to wget the postgres tarball, apply the patch, and run the
> > compiled postgres binary from the source tree, without running make install 
> > or
> > similar ?  Otherwise, would it be good enough to copy the postgres binary to
> > /usr/pgsql-10/bin (and reinstall the binary package later) ?
> 
> The trick in this sort of situation is to make sure you build binaries
> that match your existing install in every way except having the added
> patch, and maybe getting installed into a different directory.
> 
> So: where did you get the existing binaries?  If it's from some vendor
> packaging system, what you should do is fetch the package source, add
> the patch to the probably-nonempty set of patches the vendor is applying,
> and rebuild your own custom package version.  If you haven't done that
> before, it's a good skill to acquire ...

I'm familiar with that process; but, these are PG10 binaries from PGDG for
centos6 x64.  I had to add symlinks for postgis library, but otherwise seems to
be working fine (although I didn't preserve as many configure options as your
message would suggest I should have).

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote:
> Justin Pryzby wrote:
> 
> > #1  0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at 
> > autovacuum.c:2676
> > cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 
> > :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 146} {CONST 
> > :consttype 1184 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true 
> > :constisnull fal"...
> > cur_nspname = 0x298c728 "s ({VAR :varno 1 :varattno 1 :vartype 1184 
> > :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 
> > :location 146} {CONST :consttype 1184 :consttypmod -1 :constcollid 0 
> > :constlen 8 :constbyv"...
> > cur_relname = 0x298cd68 
> > "cdrs_eric_msc_sms_2017_10_14_startofcharge_idx"
> > __func__ = "perform_work_item"
> 
> cur_datname here seems corrupted -- it points halfway into cur_nspname,
> which is also a corrupt value.  And I think that's because we're not
> checking that the namespace OID is a valid value before calling
> get_namespace_name on it.  And I'm betting that these values are all not
> what we expect, because we're not checking that we're in the correct
> database before trying to execute the work item.  I don't quite
> understand how this results in an invalid string rather than just a
> NULL, as I would have expected.

I'm happy to try the patch, but in case it makes any difference, we have few
DBs/schemas:

postgres=# \dn
  List of schemas
  Name  |  Owner   
+--
 public | postgres
(1 row)

postgres=# \l+
List of 
databases
   Name|  Owner   | Encoding |   Collate   |Ctype|   Access 
privileges   |  Size   | Tablespace |Description
 
---+--+--+-+-+---+-++
 gtt   | pryzbyj  | UTF8 | en_US.UTF-8 | en_US.UTF-8 |  
 | 2215 GB | pg_default | 
 postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |  
 | 76 GB   | pg_default | default administrative connection database
 template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres  
+| 7625 kB | pg_default | unmodifiable empty database
   |  |  | | | 
postgres=CTc/postgres | || 
 template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | 
postgres=CTc/postgres+| 7485 kB | pg_default | default template for new 
databases
   |  |  | | | =c/postgres  
 | || 
(4 rows)

Note that db:postgres is of substantial size since I set log_statement=all and
log_destination=stderr,csv, and import all CSVs into a table heirarchy in
db:postgres.  There are 3 BRIN indices per (daily child) table in db:gtt (and
none in postgres, although that's probably a good idea for the timestamp column
once all our customers are upgraded to PG10).

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote:
> Anyway, can give this patch a try?

I've only compiled postgres once before and this is a production environment
(althought nothing so important that the crashes are a serious concern either).

Is it reasonable to wget the postgres tarball, apply the patch, and run the
compiled postgres binary from the source tree, without running make install or
similar ?  Otherwise, would it be good enough to copy the postgres binary to
/usr/pgsql-10/bin (and reinstall the binary package later) ?

Thanks,
Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

On Sun, Oct 15, 2017 at 02:44:58PM +0200, Tomas Vondra wrote:
> Thanks, but I'm not sure that'll help, at this point. We already know
> what happened (corrupted memory), we don't know "how". And core files
> are mostly just "snapshots" so are not very useful in answering that :-(

Is there anything I should be saving for these or hints how else to debug?  I
just got a segfault while looking for logs from an segfault from 2hr
earlier...but it was a duplicate from the first stacktrace...

< 2017-10-16 23:21:23.172 -04  >LOG:  server process (PID 31543) was terminated 
by signal 11: Segmentation fault
< 2017-10-16 23:21:23.172 -04  >DETAIL:  Failed process was running: 
autovacuum: BRIN summarize public.gtt 162175

Oct 16 23:21:22 database kernel: postmaster[31543] general protection ip:4bd467 
sp:7ffe08a94890 error:0 in postgres[40+692000]
...
Oct 17 01:58:36 database kernel: postmaster[8646]: segfault at 8 ip 
0084a177 sp 7ffe08a94a88 error 4 in postgres[40+692000]
Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 
(7692288 bytes)
Oct 17 01:58:38 database abrtd: Directory 'ccpp-2017-10-17-01:58:38-8646' 
creation detected

Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 
(7692288 bytes)
Oct 17 01:58:38 database abrtd: Directory 'ccpp-2017-10-17-01:58:38-8646' 
creation detected
Oct 17 01:59:18 database abrtd: Generating core_backtrace
Oct 17 01:59:18 database abrtd: Duplicate: core backtrace
Oct 17 01:59:18 database abrtd: DUP_OF_DIR: 
/var/spool/abrt/ccpp-2017-10-14-18:05:35-26500
Oct 17 01:59:18 database abrtd: Deleting problem directory 
ccpp-2017-10-17-01:58:38-8646 (dup of ccpp-2017-10-14-18:05:35-26500)

postgres=# SELECT * FROM postgres_log_2017_10_16_2300 WHERE pid=31543;
-[ RECORD 1 
]--+-
log_time   | 2017-10-16 23:21:22.021-04
pid| 31543
session_id | 59e57677.7b37
session_line   | 2
session_start_time | 2017-10-16 23:18:15-04
virtual_transaction_id | 22/54376
transaction_id | 0
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| automatic analyze of table 
"gtt.public.cdrs_eric_egsnpdprecord_2017_10_15"
-[ RECORD 2 
]--+-
log_time   | 2017-10-16 23:21:22.021-04
pid| 31543
session_id | 59e57677.7b37
session_line   | 3
session_start_time | 2017-10-16 23:18:15-04
virtual_transaction_id | 22/54377
transaction_id | 0
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| processing work entry for relation 
"gtt.public.cdrs_eric_msc_voice_2017_10_16_startofcharge_idx"
-[ RECORD 3 
]--+-
log_time   | 2017-10-16 23:21:21.003-04
pid| 31543
session_id | 59e57677.7b37
session_line   | 1
session_start_time | 2017-10-16 23:18:15-04
virtual_transaction_id | 22/54374
transaction_id | 0
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| automatic analyze of table 
"gtt.public.cdrs_eric_ggsnpdprecord_2017_10_15"

Note, the most recent LOCK TABLE I could find beofre 23:21:22 was actually
~10min earlier at 2017-10-16 23:12:16.519-04.

Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178
178 LockRelId   relid = relation->rd_lockInfo.lockRelId;
Missing separate debuginfos, use: debuginfo-install 
audit-libs-2.4.5-6.el6.x86_64 cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 
keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 
libcom_err-1.41.12-23.el6.x86_64 libgcc-4.4.7-18.el6.x86_64 
libicu-4.2.1-14.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 
libstdc++-4.4.7-18.el6.x86_64 nspr-4.13.1-1.el6.x86_64 
nss-3.28.4-1.el6_9.x86_64 nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64 
nss-util-3.28.4-1.el6_9.x86_64 openldap-2.4.40-16.el6.x86_64 
openssl-1.0.1e-57.el6.x86_64 pam-1.1.1-24.el6.x86_64 zlib-1.2.3-29.el6.x86_64

(gdb) p relation
$1 = (struct RelationData *) 0x324647603246466
(gdb) p relation->rd_lockInfo
Cannot access memory at address 0x3246476032464aa

(gdb) bt
#0  index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178
#1  0x004bc3c2 in systable_endscan (sysscan=0x145b3c8) at

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-15 Thread Justin Pryzby

On Sat, Oct 14, 2017 at 08:56:56PM -0500, Justin Pryzby wrote:
> On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> > > Also notice the vacuum process was interrupted, same as yesterday (think
> > > goodness for full logs).  Our INSERT script is using python
> > > multiprocessing.pool() with "maxtasksperchild=1", which I think means we 
> > > load
> > > one file and then exit the subprocess, and pool() creates a new subproc, 
> > > which
> > > starts a new PG session and transaction.  Which explains why autovacuum 
> > > starts
> > > processing the table only to be immediately interrupted.
> 
> On Sun, Oct 15, 2017 at 01:57:14AM +0200, Tomas Vondra wrote:
> > I don't follow. Why does it explain that autovacuum gets canceled? I
> > mean, merely opening a new connection/session should not cancel
> > autovacuum. That requires a command that requires table-level lock
> > conflicting with autovacuum (so e.g. explicit LOCK command, DDL, ...).
> 
> I was thinking that INSERT would do it, but I gather you're right about
> autovacuum.  Let me get back to you about this..

I confirmed that we're taking an explicit lock before creating new child tables
(as I recall, to avoid errors in the logs shortly after midnight when multiple
subprocesses see data for the new date for the first time):

 2017-10-15 12:52:50.499-04 | 59e3925e.6951 | statement: LOCK TABLE 
cdrs_huawei_sgsnPDPRecord IN SHARE UPDATE EXCLUSIVE MODE

Probably we can improve that with LOCK TABLE ONLY.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby

On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> > Also notice the vacuum process was interrupted, same as yesterday (think
> > goodness for full logs).  Our INSERT script is using python
> > multiprocessing.pool() with "maxtasksperchild=1", which I think means we 
> > load
> > one file and then exit the subprocess, and pool() creates a new subproc, 
> > which
> > starts a new PG session and transaction.  Which explains why autovacuum 
> > starts
> > processing the table only to be immediately interrupted.

On Sun, Oct 15, 2017 at 01:57:14AM +0200, Tomas Vondra wrote:
> I don't follow. Why does it explain that autovacuum gets canceled? I
> mean, merely opening a new connection/session should not cancel
> autovacuum. That requires a command that requires table-level lock
> conflicting with autovacuum (so e.g. explicit LOCK command, DDL, ...).

I was thinking that INSERT would do it, but I gather you're right about
autovacuum.  Let me get back to you about this..

> > Due to a .."behavioral deficiency" in the loader for those tables, the 
> > crashed
> > backend causes the loader to get stuck, so the tables should be untouched 
> > since
> > the crash, should it be desirable to inspect them.
> > 
> 
> It's a bit difficult to guess what went wrong from this backtrace. For
> me gdb typically prints a bunch of lines immediately before the frames,
> explaining what went wrong - not sure why it's missing here.

Do you mean this ?

...
Loaded symbols for /lib64/libnss_files-2.12.so
Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  pfree (pointer=0x298c740) at mcxt.c:954
954 (*context->methods->free_p) (context, pointer);

> Perhaps some of those pointers are bogus, the memory was already pfree-d
> or something like that. You'll have to poke around and try dereferencing
> the pointers to find what works and what does not.
> 
> For example what do these gdb commands do in the #0 frame?
> 
> (gdb) p *(MemoryContext)context

(gdb) p *(MemoryContext)context
Cannot access memory at address 0x7474617261763a20

> (gdb) p *GetMemoryChunkContext(pointer)

(gdb) p *GetMemoryChunkContext(pointer)
No symbol "GetMemoryChunkContext" in current context.

I had to do this since it's apparently inlined/macro:
(gdb) p *(MemoryContext *) (((char *) pointer) - sizeof(void *))
$8 = (MemoryContext) 0x7474617261763a20

I uploaded the corefile:
http://telsasoft.com/tmp/coredump-postgres-autovacuum-brin-summarize.gz

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-14 Thread Justin Pryzby

On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> I don't have any reason to believe there's memory issue on the server, So I
> suppose this is just a "heads up" to early adopters until/in case it happens
> again and I can at least provide a stack trace.

I'm back; find stacktrace below.

> Today I see:
> < 2017-10-13 17:22:47.839 -04  >LOG:  server process (PID 32127) was 
> terminated by signal 11: Segmentation fault
> < 2017-10-13 17:22:47.839 -04  >DETAIL:  Failed process was running: 
> autovacuum: BRIN summarize public.gtt 747263

Is it a coincidence the server failed within 45m of yesterday's failure ?

postmaster[26500] general protection ip:84a177 sp:7ffd9b349b88 error:0 in 
postgres[40+692000]
< 2017-10-14 18:05:36.432 -04  >DETAIL:  Failed process was running: 
autovacuum: BRIN summarize public.gtt 41087

> It looks like this table was being inserted into simultaneously by a python
> program using multiprocessing.  It looks like each subprocess was INSERTing
> into several tables, each of which has one BRIN index on timestamp column.

I should add:
These are insert-only child tables in a heirarchy (not PG10 partitions), being
inserted into directly (not via trigger/rule).

Also notice the vacuum process was interrupted, same as yesterday (think
goodness for full logs).  Our INSERT script is using python
multiprocessing.pool() with "maxtasksperchild=1", which I think means we load
one file and then exit the subprocess, and pool() creates a new subproc, which
starts a new PG session and transaction.  Which explains why autovacuum starts
processing the table only to be immediately interrupted.

postgres=# SELECT * FROM postgres_log_2017_10_14_1800 WHERE pid=26500 ORDER BY 
log_time DESC LIMIT 9;
log_time   | 2017-10-14 18:05:34.132-04
pid| 26500
session_id | 59e289b4.6784
session_line   | 2
session_start_time | 2017-10-14 18:03:32-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| processing work entry for relation 
"gtt.public.cdrs_eric_ggsnpdprecord_2017_10_14_recordopeningtime_idx"
---+-
log_time   | 2017-10-14 18:05:32.925-04
pid| 26500
session_id | 59e289b4.6784
session_line   | 1
session_start_time | 2017-10-14 18:03:32-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| automatic analyze of table 
"gtt.public.cdrs_eric_egsnpdprecord_2017_10_14"

gtt=# \dt+ *record_2017_10_14
 public | cdrs_eric_egsnpdprecord_2017_10_14 | table | gtt   | 1642 MB | 
 public | cdrs_eric_ggsnpdprecord_2017_10_14 | table | gtt   | 492 MB  | 

gtt=# \di+ *_2017_10_14*_recordopeningtime_idx
 public | cdrs_eric_egsnpdprecord_2017_10_14_recordopeningtime_idx | index | 
gtt   | cdrs_eric_egsnpdprecord_2017_10_14 | 72 kB | 
 public | cdrs_eric_ggsnpdprecord_2017_10_14_recordopeningtime_idx | index | 
gtt   | cdrs_eric_ggsnpdprecord_2017_10_14 | 48 kB | 

Due to a .."behavioral deficiency" in the loader for those tables, the crashed
backend causes the loader to get stuck, so the tables should be untouched since
the crash, should it be desirable to inspect them.

#0  pfree (pointer=0x298c740) at mcxt.c:954
context = 0x7474617261763a20
#1  0x006a52e9 in perform_work_item (workitem=0x7f8ad1f94824) at 
autovacuum.c:2676
cur_datname = 0x298c740 "no 1 :vartype 1184 :vartypmod -1 :varcollid 0 
:varlevelsup 0 :varnoold 1 :varoattno 1 :location 146} {CONST :consttype 1184 
:consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull fal"...
cur_nspname = 0x298c728 "s ({VAR :varno 1 :varattno 1 :vartype 1184 
:vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 
146} {CONST :consttype 1184 :consttypmod -1 :constcollid 0 :constlen 8 
:constbyv"...
cur_relname = 0x298cd68 "cdrs_eric_msc_sms_2017_10_14_startofcharge_idx"
__func__ = "perform_work_item"
#2  0x006a6fd9 in do_autovacuum () at autovacuum.c:2533
workitem = 0x7f8ad1f94824
classRel = 0x7f89c26d0e58
tuple = 
relScan = 
dbForm = 
table_oids = 
orphan_oids = 0x0
ctl = {num_partitions = 0, ssize = 0, dsize = 0, max_dsize = 0, ffactor 
= 0, keysize = 4, entrysize = 80, hash = 0, match = 0, keycopy = 0, alloc = 0, 
hcxt = 0x0, hctl = 0x0}
table_toast_map = 0x29c8188
cell = 0x0
shared = 0x298ce18
dbentry = 0x298d0a0
bstrategy = 0x2a61c18
key = {sk_flags = 0, sk_attno = 16, sk_strategy = 3, sk_subtype = 0, 
sk_colla

[HACKERS] SIGSEGV in BRIN autosummarize

2017-10-13 Thread Justin Pryzby

I upgraded one of our customers to PG10 Tuesday night, and Wednesday replaced
an BTREE index with BRIN index (WITH autosummarize).

Today I see:
< 2017-10-13 17:22:47.839 -04  >LOG:  server process (PID 32127) was terminated 
by signal 11: Segmentation fault
< 2017-10-13 17:22:47.839 -04  >DETAIL:  Failed process was running: 
autovacuum: BRIN summarize public.gtt 747263

postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in 
postgres[40+692000]

[pryzbyj@database ~]$ rpm -qa postgresql10
postgresql10-10.0-1PGDG.rhel6.x86_64

Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 
sp:7ffd9b349990 error:0 in postgres[40+692000]
Oct 13 17:22:47 database abrtd: Directory 'ccpp-2017-10-13-17:22:47-32127' 
creation detected
Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 
(15040512 bytes)

..unfortunately:
Oct 13 17:22:47 database abrtd: Package 'postgresql10-server' isn't signed with 
proper key
Oct 13 17:22:47 database abrtd: 'post-create' on 
'/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127' exited with 1
Oct 13 17:22:47 database abrtd: DELETING PROBLEM DIRECTORY 
'/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127'

postgres=# SELECT * FROM bak_postgres_log_2017_10_13_1700 WHERE pid=32127 ORDER 
BY log_time DESC LIMIT 9;
-[ RECORD 1 
]--+-
log_time   | 2017-10-13 17:22:45.56-04
pid| 32127
session_id | 59e12e67.7d7f
session_line   | 2
command_tag| 
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| processing work entry for relation 
"gtt.public.cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx"
-[ RECORD 2 
]--+-
log_time   | 2017-10-13 17:22:44.557-04
pid| 32127
session_id | 59e12e67.7d7f
session_line   | 1
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message| canceling autovacuum task
context| automatic analyze of table 
"gtt.public.cdrs_huawei_sgsnpdprecord_2017_10_13"

Time: 375.552 ms

It looks like this table was being inserted into simultaneously by a python
program using multiprocessing.  It looks like each subprocess was INSERTing
into several tables, each of which has one BRIN index on timestamp column.

gtt=# \dt+ cdrs_eric_egsnpdprecord_2017_10_13
 public | cdrs_eric_egsnpdprecord_2017_10_13 | table | gtt   | 5841 MB | 

gtt=# \di+ cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx 
 public | cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx | index | 
gtt   | cdrs_eric_egsnpdprecord_2017_10_13 | 136 kB | 

I don't have any reason to believe there's memory issue on the server, So I
suppose this is just a "heads up" to early adopters until/in case it happens
again and I can at least provide a stack trace.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] memory fields from getrusage()

2017-09-01 Thread Justin Pryzby

On Sat, Sep 02, 2017 at 02:00:44PM +1200, Thomas Munro wrote:
> On Sat, Sep 2, 2017 at 7:46 AM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
> > On 6/15/17 10:58, Justin Pryzby wrote:
> >> On Thu, Jun 15, 2017 at 10:29:21AM -0400, Robert Haas wrote:
> >>> On Wed, Jun 14, 2017 at 6:28 PM, Justin Pryzby <pry...@telsasoft.com> 
> >>> wrote:
> >>>> On Tue, Jun 13, 2017 at 12:16:00PM -0400, Robert Haas wrote:
> >>>>> It might be worth adding platform-specific code for common platforms.
> >>>>
> >>>> All I care (which linux happily/happens to support) is maxrss; I was 
> >>>> probably
> >>>> originally interested in this while digging into an issue with hash agg.
> >>>
> >>> I don't think it needs to go in a separate file.  I'd just patch 
> >>> ShowUsage().
> >
> > I have committed a patch that shows maxrss, with /1024 adjustment for
> > macOS.  That should cover all platforms that I could find.(*)
> 
> Apparently ru_maxrss is in *pages* on Solaris-derived systems:
> 
> https://illumos.org/man/3c/getrusage

..but note that that:
"The ru_maxrss, ru_ixrss, ru_idrss, and ru_isrss members of the rusage
structure are set to 0 in this implementation."

Same here:
https://docs.oracle.com/cd/E23823_01/html/816-5168/getrusage-3c.html

..and earlier solaris docs don't seem to mention getrusage at all (?)

posix docs say:
http://pubs.opengroup.org/onlinepubs/009695399/functions/getrusage.html
|CHANGE HISTORY
|First released in Issue 4, Version 2.

..which I gather is SUSv1 c. 1995

Curiously, GNU time reported maxrss too high by a factor of 4 (and that was
present in early centos6.X)
https://bugzilla.redhat.com/show_bug.cgi?id=702826

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1

2017-08-09 Thread Justin Workman

On Tue, Aug 8, 2017 at 8:17 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Peter Geoghegan <p...@bowt.ie> writes:
> > On Thu, Jan 19, 2017 at 5:45 PM, Peter Geoghegan <p...@heroku.com> wrote:
> >> A customer is on 9.6.1, and complains of a segfault observed at least
> >> 3 times.
> > ...
> > For the sake of the archives: this now looks very much like the issue
> > that Tom just fixed with commit
> > 9bf4068cc321a4d44ac54089ab651a49d89bb567.
>
> Yeah, particularly seeing that $customer noted that some of the
> columns involved were UUIDs:
>
> https://www.postgresql.org/message-id/CAOxz3fqK9Y0GntL8MDoeZ
> jy2ot_6lx1yvhay6bd1vykup-i...@mail.gmail.com
>
> Good to have gotten to the bottom of that one.  Too bad it just
> missed the train for 9.6.4.
>
> regards, tom lane
>

 $customer, here. I just want to thank everyone involved for getting to the
bottom of this and for your support. Even if it missed the 9.6.4 release,
I'm very grateful for your help. We haven't had much of an issue since
disabling parallel workers so nothing is harmed by waiting a little longer.

Thanks,
Justin

[HACKERS] PG10b2: column-list UPDATE syntax fails with single column

2017-07-19 Thread Justin Pryzby

In testing our application with PG10beta2, I discovered that our use of UPDATE
broke, apparently by this commit

|commit 906bfcad7ba7cb3863fe0e2a7810be8e3cd84fbd
|Author: Tom Lane <t...@sss.pgh.pa.us>
|Date:   Tue Nov 22 15:19:57 2016 -0500
|
|Improve handling of "UPDATE ... SET (column_list) = row_constructor".

ERROR ON QUERY: UPDATE eric_enodeb_cell_metrics SET (pmActiveDrbDlSum)=(%s) 
WHERE sect_id=%s AND subnetwork_id=%s AND start_time=%s AND site_id=%s AND 
interval_seconds=%s AND nedn=%s AND mecontext=%s AND EUtranCellxDD=%s AND 
EUtranCell=%s AND subnetwork=%s AND device_id=%s --
Query Error: ERROR:  source for a multiple-column UPDATE item must be a 
sub-SELECT or ROW() expression

This still works for multiple columns but in pg10b2 no longer works for single
column update with "column_list" syntax.

ts=# begin; UPDATE eric_enodeb_201707 SET 
(pmhdelayvarbest50pct,pmlicconnecteduserstimecong)=(0,0) ;
BEGIN
UPDATE 3
ts=# begin; UPDATE eric_enodeb_201707 SET (pmhdelayvarbest50pct)=(0) ;
BEGIN
ERROR:  source for a multiple-column UPDATE item must be a sub-SELECT or ROW() 
expression

It may be that our use was wrong (?) or unintuitive (I'm in the middle of
changing it), but wondered if it was intentional or otherwise if the release
notes should mention that old syntax is no longer accepted.

Either way - thanks in advance.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] tupconvert.c API change in v10 release notes

2017-07-19 Thread Justin Pryzby

FYI, I happened across this commit comment:

3f902354b08ac788600f0ae54fcbfc1d4e3ea765
|   So, let's accept the removal of the guarantee about
|   the output tuple's rowtype marking, recognizing that this is a API change
|   that could conceivably break third-party callers of tupconvert.c.  (So,
|   let's remember to mention it in the v10 release notes.)

..but couldn't see that the commit or change is so referenced.

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] memory fields from getrusage()

2017-06-15 Thread Justin Pryzby

On Thu, Jun 15, 2017 at 10:29:21AM -0400, Robert Haas wrote:
> On Wed, Jun 14, 2017 at 6:28 PM, Justin Pryzby <pry...@telsasoft.com> wrote:
> > On Tue, Jun 13, 2017 at 12:16:00PM -0400, Robert Haas wrote:
> >> It might be worth adding platform-specific code for common platforms.
> >
> > All I care (which linux happily/happens to support) is maxrss; I was 
> > probably
> > originally interested in this while digging into an issue with hash agg.
> 
> I don't think it needs to go in a separate file.  I'd just patch ShowUsage().

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f99dd0a..7f57a84 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4467,6 +4467,21 @@ ShowUsage(const char *title)
 r.ru_nvcsw - Save_r.ru_nvcsw,
 r.ru_nivcsw - Save_r.ru_nivcsw,
 r.ru_nvcsw, r.ru_nivcsw);
+
+#if defined(__linux__)
+   appendStringInfo(,
+"!\t%ld max resident (kB)\n",
+r.ru_maxrss);
+#elif defined(BSD)
+   appendStringInfo(,
+"!\t%ld max resident, %ld shared, %ld unshared data, 
%ld unshared stack (kB)\n",
+r.ru_maxrss, r.ru_ixrss, r.ru_idrss, 
r.ru_isrss);
+#elif defined(__darwin__)
+   appendStringInfo(,
+"!\t%ld max resident, %ld shared, %ld unshared data, 
%ld unshared stack (kB)\n",
+r.ru_maxrss/1024, r.ru_ixrss/1024, 
r.ru_idrss/1024, r.ru_isrss/1024);
+#endif /* __linux__ */
+
 #endif   /* HAVE_GETRUSAGE */
 
/* remove trailing newline */

Comments ?

Testing or suggestions on !linux would be useful.

Justin
Add memory fields (sometimes) available from getrusage()

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f99dd0a..7f57a84 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4467,6 +4467,21 @@ ShowUsage(const char *title)
 r.ru_nvcsw - Save_r.ru_nvcsw,
 r.ru_nivcsw - Save_r.ru_nivcsw,
 r.ru_nvcsw, r.ru_nivcsw);
+
+#if defined(__linux__)
+   appendStringInfo(,
+"!\t%ld max resident (kB)\n",
+r.ru_maxrss);
+#elif defined(BSD)
+   appendStringInfo(,
+"!\t%ld max resident, %ld shared, %ld unshared data, 
%ld unshared stack (kB)\n",
+r.ru_maxrss, r.ru_ixrss, r.ru_idrss, 
r.ru_isrss);
+#elif defined(__darwin__)
+   appendStringInfo(,
+"!\t%ld max resident, %ld shared, %ld unshared data, 
%ld unshared stack (kB)\n",
+r.ru_maxrss/1024, r.ru_ixrss/1024, 
r.ru_idrss/1024, r.ru_isrss/1024);
+#endif /* __linux__ */
+
 #endif   /* HAVE_GETRUSAGE */
 
/* remove trailing newline */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] memory fields from getrusage()

2017-06-14 Thread Justin Pryzby

On Tue, Jun 13, 2017 at 12:16:00PM -0400, Robert Haas wrote:
> It might be worth adding platform-specific code for common platforms.

All I care (which linux happily/happens to support) is maxrss; I was probably
originally interested in this while digging into an issue with hash agg.

I think it's fine to show zeros for unsupported fields; that's what getusage(2)
and time(1) do after all.

pryzbyj@pryzbyj:~$ sh -c 'command time -v find /dev' 2>&1 >/dev/null |grep -Fw 0
User time (seconds): 0.00
System time (seconds): 0.00
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0

> it would be a good idea to install code specific to Linux that
> displays all and only those values that are meaningful on Linux, and
> (less importantly) similarly for macOS.  Linux is such a common
> platform that reporting bogus zero values and omitting other fields
> that are actually meaningful does not seem like a very good plan.

That has the issue that it varies not just by OS but also by OS version.  For
example PG already shows context switches and FS in/out puts, but they're
nonzero only since linux 2.6 (yes, 2.4 is ancient and unsupported but still).

   ru_nvcsw (since Linux 2.6)
   ru_inblock (since Linux 2.6.22)

..and other fields are "currently unused", but maybe supported in the past or
future(?)
   ru_ixrss (unmaintained)
  This field is currently unused on Linux.

Are you thinking of something like this, maybe hidden away in a separate file
somewhere?

#if defined(__linux__) || defined(BSD)
appendStringInfo(, "!\t%ld max resident, %ld shared, %ld unshared 
data, %ld unshared stack (kB)\n", r.ru_maxrss, r.ru_ixrss, r.ru_idrss, 
r.ru_isrss);
#elif defined(__darwin__)
appendStringInfo(, "!\t%ld max resident, %ld shared, %ld unshared 
data, %ld unshared stack (kB)\n", r.ru_maxrss/1024, r.ru_ixrss/1024, 
r.ru_idrss/1024, r.ru_isrss/1024);
#endif /* __linux__ */

Justin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] memory fields from getrusage()

2017-06-10 Thread Justin Pryzby

I'm interested to expose output of the remaining (memory) fields from 
getrusage().

postgres=# SET log_parser_stats='on';
postgres=# SELECT c.oid::regclass, usagecount FROM pg_buffercache b JOIN 
pg_class c USING (relfilenode) WHERE usagecount=1 ;
LOG:  PARSER STATISTICS
DETAIL:  ! system usage stats:
!   0.000197 elapsed 0.000119 user 0.79 system sec
!   [0.011572 user 0.007714 sys total]
!   0/0 [0/264] filesystem blocks in/out
!   0/39 [0/989] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   0/0 [2/2] voluntary/involuntary context switches
!   3920 MAX RESIDENT, 0 SHARED, 0 UNSHARED DATA, 0 UNSHARED STACK 
(Kb)

Before that can work for some platforms, it looks like rusagestub.h needs to
have any desired fields added, and ./src/port/getrusage.c should memset(0) in
the non-windows case before adding any reliance on the rest of the structure.

This comment from ~1996 says:
https://doxygen.postgresql.org/postgres_8c_source.html
 4421  * the only stats we don't show here are for memory usage -- i can't
 4422  * figure out how to interpret the relevant fields in the rusage 
struct,
 4423  * and they change names across o/s platforms, anyway. if you can 
figure
 4424  * out what the entries mean, you can somehow extract resident set 
size,
 4425  * shared text size, and unshared data and stack sizes.

.. is that really (still) the case for supported platforms?  I'm hoping that in
2017 one can just call getrusage() if autoconf says it's okay ??

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1

2017-02-16 Thread Justin Workman

>
> It would help to know the data types of the columns involved in this
> query; but just eyeballing it, it doesn't look like it involves any
> array operations, so it's pretty hard to believe that the expanded-object
> code could have gotten invoked intentionally.  (The mere presence of
> an array column somewhere in the vicinity would not do that; you'd
> need to invoke an array-ish operation, or at least pass the array into
> a plpgsql function.)
> If I had to bet on the basis of this much info, I would bet that the
> parallel-query infrastructure is dropping the ball somewhere and
> transmitting a corrupted datum that accidentally looks like it is
> an expanded-object reference.
> If $customer wants a quick fix, I'd suggest seeing whether disabling
> parallel query makes the problem go away.  That might be a good first
> step anyway, just to narrow down where the problem lies.
> regards, tom lane


Hi Tom,

My name is Justin, and I am $customer as it were. As Peter explained, we
haven't seen the segfaults anymore since disabling parallel queries. This
works as a quick fix and is much appreciated! If you would still like to
get to the bottom of this, I am willing to help out with more information
as needed. My knowledge of PG internals is extremely limited so I don't
know how much help I can be, but we'd like to see this resolved beyond the
quick fix, or at least understand why it happened.

album_photo_assignments.id, album_photo_assignments.album_id,
album_photo_assignments.photo_id and albums.id are all UUID columns.
albums.deleted_at is a timestamp.

Thanks so much for your time,

Justin Workman

[HACKERS] smallint out of range EXECUTEing prepared statement

2017-01-18 Thread Justin Pryzby

Is this expected behavior ?

ts=# SELECT * FROM t WHERE site_id=32768 LIMIT 1;
(0 rows)

ts=# PREPARE x AS SELECT * FROM t WHERE site_id=$1 LIMIT 1;
PREPARE
ts=# EXECUTE x(32768);
ERROR:  smallint out of range

ts=# PREPARE y AS SELECT * FROM t WHERE site_id::int=$1 LIMIT 1;
PREPARE
ts=# EXECUTE y(32768);
(0 rows)

Note, we also sometimes get small/int out of range when SELECTing from a view,
and we end up as a workaround putting a ::big/int cast into the view or
multiplying by 1.

Thanks,
Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

On Tue, Jan 03, 2017 at 03:35:34PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > On Tue, Jan 03, 2017 at 03:18:15PM -0500, Tom Lane wrote:
> >> I'm wondering if this represents some sort of out-of-sync condition
> >> between the table and its child tables.  We can't actually tell from
> >> this trace which table is being processed.  Could you try, from this
> >> breakpoint,
> >> 
> >> f 3
> >> p oldrel->rd_rel->relname
> 
> > (gdb) p oldrel->rd_rel->relname
> > $1 = {data = "eric_umts_rnc_utrancell_201701", '\000' }
> 
> Right.  So I bet that if you check the attnum of pmsumpacketlatency_000 in
> eric_umts_rnc_utrancell_metrics, you'll find it's different from that in
> eric_umts_rnc_utrancell_201701, and that the attribute having that attnum
> in eric_umts_rnc_utrancell_201701 has type smallint not int.

I think that's consistent with what your understanding:

ts=# SELECT attrelid::regclass, attname, attnum, atttypid FROM pg_attribute 
WHERE attrelid::regclass::text~'eric_umts_rnc_utrancell_(metrics|201701)$' AND 
(attname='pmsumpacketlatency_000' OR attnum IN (367,424) ) ORDER BY 1,2;
 eric_umts_rnc_utrancell_metrics | pmsamplespshsadchrabestablish |367 | 
  21
 eric_umts_rnc_utrancell_metrics | pmsumpacketlatency_000|424 | 
  23
 eric_umts_rnc_utrancell_201701  | pmsumpacketlatency_000|367 | 
  23
 eric_umts_rnc_utrancell_201701  | pmulupswitchsuccessmedium |424 | 
  21

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

On Tue, Jan 03, 2017 at 03:18:15PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > (gdb) bt
> > #3  0x0059d5ce in ATRewriteTable (tab=, 
> > OIDNewHeap=, lockmode=) at 
> > tablecmds.c:4152
> 
> I'm wondering if this represents some sort of out-of-sync condition
> between the table and its child tables.  We can't actually tell from
> this trace which table is being processed.  Could you try, from this
> breakpoint,
> 
> f 3
> p oldrel->rd_rel->relname

(gdb) p oldrel->rd_rel->relname
$1 = {data = "eric_umts_rnc_utrancell_201701", '\000' }


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

On Tue, Jan 03, 2017 at 02:50:21PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > On Tue, Jan 03, 2017 at 02:32:36PM -0500, Tom Lane wrote:
> >> 2. Even better would be a stack trace for the call to errfinish,
> >> https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
> 
> Thanks, but we need the whole call stack, or at least the first dozen or
> so levels.  "bt" in gdb would do.

#0  errfinish (dummy=0) at elog.c:414
#1  0x006dd39f in exec_simple_query (query_string=0x1fc5fb0 "ALTER 
TABLE eric_umts_rnc_utrancell_metrics ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE 
BIGINT USING PMSUMPACKETLATENCY_000::BIGINT;") at postgres.c:932
#2  0x006dec8c in PostgresMain (argc=, argv=, dbname=0x1f65d98 "ts", username=) at 
postgres.c:4070
#3  0x0067f2c5 in BackendRun (argc=, argv=) at postmaster.c:4270
#4  BackendStartup (argc=, argv=) at 
postmaster.c:3944
#5  ServerLoop (argc=, argv=) at 
postmaster.c:1701
#6  PostmasterMain (argc=, argv=) at 
postmaster.c:1309
#7  0x00607658 in main (argc=3, argv=0x1f3a4f0) at main.c:228


(gdb) bt
#0  errfinish (dummy=0) at elog.c:414
#1  0x005d0e30 in ExecEvalScalarVar (exprstate=, 
econtext=, isNull=, isDone=) at execQual.c:655
#2  0x005d0c3c in ExecMakeFunctionResultNoSets (fcache=0x21f18a0, 
econtext=0x2199e80, isNull=0x21e90ee "", isDone=) at 
execQual.c:2015
#3  0x0059d5ce in ATRewriteTable (tab=, 
OIDNewHeap=, lockmode=) at 
tablecmds.c:4152
#4  0x005a92fc in ATRewriteTables (parsetree=0x1f63b20, rel=, cmds=, recurse=, 
lockmode=) at tablecmds.c:3858
#5  ATController (parsetree=0x1f63b20, rel=, cmds=, recurse=, lockmode=) 
at tablecmds.c:3104
#6  0x006e25e6 in ProcessUtilitySlow (parsetree=0x1fc6f78, 
queryString=0x1fc5fb0 "ALTER TABLE eric_umts_rnc_utrancell_metrics ALTER COLUMN 
PMSUMPACKETLATENCY_000 TYPE BIGINT USING PMSUMPACKETLATENCY_000::BIGINT;", 
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=, 
completionTag=0x7fff8b9d3a90 "") at utility.c:1085
#7  0x006e2a70 in standard_ProcessUtility (parsetree=0x1fc6f78, 
queryString=0x1fc5fb0 "ALTER TABLE eric_umts_rnc_utrancell_metrics ALTER 
COLUMN PMSUMPACKETLATENCY_000 TYPE BIGINT USING 
PMSUMPACKETLATENCY_000::BIGINT;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, 
dest=0x1fc72b8, 
completionTag=0x7fff8b9d3a90 "") at utility.c:907
#8  0x006df2cc in PortalRunUtility (portal=0x1fff2e0, 
utilityStmt=0x1fc6f78, isTopLevel=1 '\001', setHoldSnapshot=, dest=0x1fc72b8, completionTag=0x7fff8b9d3a90 "") at pquery.c:1193
#9  0x006e01cb in PortalRunMulti (portal=0x1fff2e0, isTopLevel=1 
'\001', setHoldSnapshot=0 '\000', dest=0x1fc72b8, altdest=0x1fc72b8, 
completionTag=0x7fff8b9d3a90 "") at pquery.c:1349
#10 0x006e0934 in PortalRun (portal=0x1fff2e0, 
count=9223372036854775807, isTopLevel=1 '\001', dest=0x1fc72b8, 
altdest=0x1fc72b8, completionTag=0x7fff8b9d3a90 "") at pquery.c:815
#11 0x006dd5b1 in exec_simple_query (query_string=0x1fc5fb0 "ALTER 
TABLE eric_umts_rnc_utrancell_metrics ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE 
BIGINT USING PMSUMPACKETLATENCY_000::BIGINT;") at postgres.c:1094
#12 0x006dec8c in PostgresMain (argc=, argv=, dbname=0x1f65d98 "ts", username=) at 
postgres.c:4070
#13 0x0067f2c5 in BackendRun (argc=, argv=) at postmaster.c:4270
#14 BackendStartup (argc=, argv=) at 
postmaster.c:3944
#15 ServerLoop (argc=, argv=) at 
postmaster.c:1701
#16 PostmasterMain (argc=, argv=) at 
postmaster.c:1309
#17 0x00607658 in main (argc=3, argv=0x1f3a4f0) at main.c:228

> > I'll send the rest of \d if you really want but:
> 
> Well, we don't know what we're looking for, so assuming that there's
> nothing of interest there is probably bad.

Attached

Justin


alter-wrong-type-dplus.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

On Tue, Jan 03, 2017 at 02:32:36PM -0500, Tom Lane wrote:
> 3. It's pretty hard to see how you'd reach any of these places for an
> ALTER COLUMN TYPE on a simple table.  Has the table got rules, triggers,
> default values?  Could we see "\d+" output for it?

I really meant to do \d+..

   Table 
"public.eric_umts_rnc_utrancell_metrics"
   Column|   Type   | 
Modifiers | Storage  | Stats target | Description 
-+--+---+--+--+-
 sect_id | integer  | not 
null  | plain| 400  | 
 start_time  | timestamp with time zone | not 
null  | plain| 400  | 
 site_id | integer  | not 
null  | plain| 400  | 
 interval_seconds| smallint | not 
null  | plain| 200  | 
 utrancell   | text | not 
null  | extended | 200  | 
 nedn| text | not 
null  | extended | 200  | 
 rnc_id  | integer  | not 
null  | plain| 400  | 
 device_id   | integer  | not 
null  | plain| 200  | 
 pmcelldowntimeauto  | smallint |   
| plain| 10   | 
 pmcelldowntimeman   | smallint |   
| plain| 10   | 
[...]

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

On Tue, Jan 03, 2017 at 02:32:36PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> >>>> I can cause the error at will on the existing table,
> 
> That's good news, at least.
> 
> 1. Please trigger it with "\set VERBOSITY verbose" enabled, so we can see
> the exact source location --- there are a couple of instances of that
> text.

ts=# begin; drop view umts_eric_ch_switch_view, eric_umts_rnc_utrancell_view, 
umts_eric_cell_integrity_view; ALTER TABLE eric_umts_rnc_utrancell_metrics 
ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE BIGINT USING 
PMSUMPACKETLATENCY_000::BIGINT; 
BEGIN
DROP VIEW
ERROR:  42804: attribute 424 has wrong type
DETAIL:  Table has type smallint, but query expects integer.
LOCATION:  ExecEvalScalarVar, execQual.c:660

> 2. Even better would be a stack trace for the call to errfinish,
> https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

#1  0x006dd39f in exec_simple_query (query_string=0x1fc5fb0 "begin;")
at postgres.c:932
dest = DestRemote
oldcontext = 0x1f3b100
parsetree_list = 0x1fc69f0
save_log_statement_stats = 0 '\000'
was_logged = 0 '\000'
msec_str = 
"\360:\235\213\377\177\000\000`<\235\213\377\177\000\000\260_\374\001", '\000' 

__func__ = "exec_simple_query"

and then

#1  0x006dd39f in exec_simple_query (
query_string=0x1fc5fb0 "ALTER TABLE eric_umts_rnc_utrancell_metrics ALTER 
COLUMN PMSUMPACKETLATENCY_000 TYPE BIGINT USING 
PMSUMPACKETLATENCY_000::BIGINT;")
at postgres.c:932
dest = DestRemote
oldcontext = 0x1f3b100
parsetree_list = 0x1fc6fc8
save_log_statement_stats = 0 '\000'
was_logged = 0 '\000'
msec_str = 
"\360:\235\213\377\177\000\000`<\235\213\377\177\000\000\260_\374\001", '\000' 

__func__ = "exec_simple_query"

then

#1  0x005d0e30 in ExecEvalScalarVar (exprstate=, 
econtext=, isNull=, 
isDone=) at execQual.c:655
attnum = 424
__func__ = "ExecEvalScalarVar"

> 3. It's pretty hard to see how you'd reach any of these places for an
> ALTER COLUMN TYPE on a simple table.  Has the table got rules, triggers,
> default values?  Could we see "\d+" output for it?

triggers and defaults, yes.

 sect_id | integer  | not 
null
 start_time  | timestamp with time zone | not 
null
 site_id | integer  | not 
null
 interval_seconds| smallint | not 
null
 utrancell   | text | not 
null
 nedn| text | not 
null
 rnc_id  | integer  | not 
null
 device_id   | integer  | not 
null
 pmcelldowntimeauto  | smallint | 
 pmcelldowntimeman   | smallint | 
 pmchswitchattemptfachura| smallint | 
 pmchswitchattempturafach| smallint | 
...
Triggers:
eric_umts_rnc_utrancell_insert_trigger BEFORE INSERT ON 
eric_umts_rnc_utrancell_metrics FOR EACH ROW EXECUTE PROCEDURE 
eric_umts_rnc_utrancell_insert_function()
Number of child tables: 3 (Use \d+ to list them.)

I'll send the rest of \d if you really want but:

ts=# SELECT COUNT(1) FROM pg_attribute WHERE 
attrelid='eric_umts_rnc_utrancell_metrics'::regclass;
count | 1116

Justin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

On Tue, Jan 03, 2017 at 01:40:50PM -0500, Robert Haas wrote:
> On Tue, Jan 3, 2017 at 11:59 AM, Justin Pryzby <pry...@telsasoft.com> wrote:
> > On Tue, Jan 03, 2017 at 11:45:33AM -0500, Robert Haas wrote:
> >> > ts=# begin; drop view umts_eric_ch_switch_view, 
> >> > eric_umts_rnc_utrancell_view, umts_eric_cell_integrity_view; ALTER TABLE 
> >> > eric_umts_rnc_utrancell_metrics ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE 
> >> > BIGINT USING PMSUMPACKETLATENCY_000::BIGINT;
> >> > BEGIN
> >> > DROP VIEW
> >> > ERROR:  attribute 424 has wrong type
> >> > DETAIL:  Table has type smallint, but query expects integer.
> >> > ts=#
> >> >
> > I can cause the error at will on the existing table, but I wouldn't know 
> > how to
> > reproduce the problem on a new table/database.  I'm guessing it has 
> > something

> Just for kicks, could you try running pg_catcheck on the affected system?
> 
> https://github.com/EnterpriseDB/pg_catcheck

Neat, I hadn't heard of it before ;)

The version in PGDG has the "amkeytype" issue, so I compiled,

I got this:

[pryzbyj@database pg_catcheck]$ ./pg_catcheck ts
notice: pg_shdepend row has invalid classid "2613": not a system catalog OID
row identity: dbid="16402" classid="2613" objid="1086583699" objsubid="0" 
refclassid="1260" refobjid="16384" deptype="o"
notice: pg_shdepend row has invalid classid "2613": not a system catalog OID
row identity: dbid="16402" classid="2613" objid="1086583701" objsubid="0" 
refclassid="1260" refobjid="16384" deptype="o"
[...]

notice: pg_depend row has invalid objid "1124153791": no matching entry in 
pg_class
row identity: classid="1259" objid="1124153791" objsubid="0" refclassid="1259" 
refobjid="1064197368" refobjsubid="1" deptype="a"

progress: done (294 inconsistencies, 0 warnings, 0 errors)

.. those are the only two problem oids:
[pryzbyj@database pg_catcheck]$ time ./pg_catcheck ts 2>&1 |grep -Evw 
'2613|1259'
progress: done (264 inconsistencies, 0 warnings, 0 errors)




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

On Tue, Jan 03, 2017 at 11:45:33AM -0500, Robert Haas wrote:
> > ts=# begin; drop view umts_eric_ch_switch_view, 
> > eric_umts_rnc_utrancell_view, umts_eric_cell_integrity_view; ALTER TABLE 
> > eric_umts_rnc_utrancell_metrics ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE 
> > BIGINT USING PMSUMPACKETLATENCY_000::BIGINT;
> > BEGIN
> > DROP VIEW
> > ERROR:  attribute 424 has wrong type
> > DETAIL:  Table has type smallint, but query expects integer.
> > ts=#
> >
> > ts=# begin; drop view umts_eric_ch_switch_view, 
> > eric_umts_rnc_utrancell_view, umts_eric_cell_integrity_view; ALTER TABLE 
> > eric_umts_rnc_utrancell_metrics ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE 
> > BIGINT ;
> > BEGIN
> > DROP VIEW
> > ALTER TABLE
> > ts=#
> >
> > Is it useful to send something from pg_attribute, or other clues ??
> 
> So, are these errors reproducible?  Like, if you create a brand new

I can cause the error at will on the existing table, but I wouldn't know how to
reproduce the problem on a new table/database.  I'm guessing it has something
to do with dropped columns or historic alters (which I mentioned are typically
done separately on child tables vs their parent).

Since it's happened 3 times now on this table, but not others on this database,
I would guess it's an "data issue", possibly related to pg_upgrades.  IOW it
may be impossible to get into this state from a fresh initdb from a current
version.

I considered that perhaps it only affected our oldest tables, and would stop
happening once they were dropped, but note this ALTER is only of a parent and
its 3 most recent children.  So only the empty parent could be described as
"old".

Justin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

2017-01-02 Thread Justin Pryzby

On Wed, Oct 12, 2016 at 10:25:05AM -0500, Justin Pryzby wrote:
> > I don't have a clear recollection how I solved this in July; possibly by
> > restoring the (historic, partition) table from backup.
> > 
> > Last week again again just now (both under 9.6), a colleague found that he 
> > was
> > able to avoid the error by ALTER TYPE without USING.
> > 
> > Note that we ALTER TABLE .. NO INHERIT the partitions for all but the most
> > recent 2 months before ALTERing them (or the parent).  The "ALTER NO 
> > INHERIT"
> > and the ALTER TYPE of historic partitions are done outside of a transaction 
> > in
> > order to avoid large additional disk use otherwise used when ALTERing a 
> > parent
> > with many or large children (the sum of the size of the children).

Here's DETAILs for a 2nd such error which has shown up today:

(EricssonUtranXmlParser): Failed to alter table 
eric_umts_rnc_utrancell_metrics: ERROR:  attribute 424 has wrong type
DETAIL:  Table has type smallint, but query expects integer.

(EricssonUtranXmlParser): Failed to alter table 
eric_umts_rnc_utrancell_metrics: ERROR:  attribute 361 has wrong type
DETAIL:  Table has type integer, but query expects smallint.

Also, note both alters really do work without "USING":

ts=# begin; drop view umts_eric_ch_switch_view, eric_umts_rnc_utrancell_view, 
umts_eric_cell_integrity_view; ALTER TABLE eric_umts_rnc_utrancell_metrics 
ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE BIGINT USING 
PMSUMPACKETLATENCY_000::BIGINT;
BEGIN
DROP VIEW
ERROR:  attribute 424 has wrong type
DETAIL:  Table has type smallint, but query expects integer.
ts=# 

ts=# begin; drop view umts_eric_ch_switch_view, eric_umts_rnc_utrancell_view, 
umts_eric_cell_integrity_view; ALTER TABLE eric_umts_rnc_utrancell_metrics 
ALTER COLUMN PMSUMPACKETLATENCY_000 TYPE BIGINT ;
BEGIN
DROP VIEW
ALTER TABLE
ts=# 

Is it useful to send something from pg_attribute, or other clues ??

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE .. ALTER COLUMN .. ERROR: attribute .. has wrong type

2017-01-02 Thread Justin Pryzby

I originally sent to psql-general some months ago, but it appears it was never
delivered (perhaps I wasn't properly subscribed?).

Failed to alter table eric_umts_rnc_utrancell_metrics: ERROR:  attribute 361 
has wrong type
DETAIL:  Table has type integer, but query expects smallint.

We've seen this at least 4 times now, on PG95 and 9.6; 3 of those times are for
the above table.

Any ideas what I can do to either reproduce it or otherwise avoid it ?

On Wed, Oct 12, 2016 at 10:25:05AM -0500, Justin Pryzby wrote:
> We've seen this happen at least once on a 9.5 server, and twice on (the same)
> server since its upgrade last week to 9.6:
> 
> > ALTER TABLE t ALTER column TYPE says: "ERROR:  attribute 81 has wrong type".
> 
> Just now under 9.6
> DETAIL: Table has type integer, but query expects smallint
> ...
> ts=# SELECT attnum, atttypid, attrelid::regclass FROM pg_attribute WHERE 
> attname='pmnopagingattemptutranrejected' ORDER BY 1 DESC,2,3;
>  attnum | atttypid |attrelid 
> +--+-
> 193 |   21 | eric_umts_rnc_utrancell_metrics
> 193 |   21 | eric_umts_rnc_utrancell_201508
> 179 |   21 | eric_umts_rnc_utrancell_201509
> 179 |   21 | eric_umts_rnc_utrancell_201510
> 179 |   21 | eric_umts_rnc_utrancell_201511
> 179 |   21 | eric_umts_rnc_utrancell_201602
> [...]
> 179 |   21 | eric_umts_rnc_utrancell_201610
> 179 |   21 | eric_umts_rnc_utrancell_201611
> (17 rows)
> 
> Last week (same server, same table, still 9.6):
> DETAIL: Table has type real, but query expects smallint
> 
> In July (different server) under 9.5
> DETAIL: Table has type real, but query expects smallint
> ...
> SELECT atttypid, attnum, attrelid::regclass FROM pg_attribute WHERE 
> attname='c_84150886'
>  atttypid | attnum |  attrelid   
> --++-
>21 |200 | huawei_msc_trunkgrp_201605
>21 |200 | huawei_msc_trunkgrp_201604
>21 |200 | huawei_msc_trunkgrp_201603
>21 |200 | huawei_msc_trunkgrp_201602
>21 |200 | huawei_msc_trunkgrp_201512
>21 |200 | huawei_msc_trunkgrp_201511
>21 |200 | huawei_msc_trunkgrp_201510
>21 |200 | huawei_msc_trunkgrp_201508
>21 |200 | huawei_msc_trunkgrp_201507
>21 |200 | huawei_msc_trunkgrp_201506
>21 |200 | huawei_msc_trunkgrp_201505
>21 |200 | huawei_msc_trunkgrp_201607
>21 |200 | huawei_msc_trunkgrp_201606
>21 |200 | huawei_msc_trunkgrp_201608
>21 |201 | huawei_msc_trunkgrp_metrics
>21 |200 | huawei_msc_trunkgrp_201509
>21 |200 | huawei_msc_trunkgrp_201601
> (17 rows)
> 
> I don't have a clear recollection how I solved this in July; possibly by
> restoring the (historic, partition) table from backup.
> 
> Last week again again just now (both under 9.6), a colleague found that he was
> able to avoid the error by ALTER TYPE without USING.
> 
> Note that we ALTER TABLE .. NO INHERIT the partitions for all but the most
> recent 2 months before ALTERing them (or the parent).  The "ALTER NO INHERIT"
> and the ALTER TYPE of historic partitions are done outside of a transaction in
> order to avoid large additional disk use otherwise used when ALTERing a parent
> with many or large children (the sum of the size of the children).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Odd behavior with domains

2016-06-23 Thread Justin Dearing

I was the one that reported that on twitter. I have a more detailed message
on the general list that I sent before subscribing and probably needs to be
moderated (or if it went to /dev/null let me know).

On Thu, Jun 23, 2016 at 11:01 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> "Joshua D. Drake" <j...@commandprompt.com> writes:
> > So this came across my twitter feed:
> > https://pbs.twimg.com/media/ClqIJtmXEAA5IGt.png
>
> public.text can exist in parallel with pg_catalog.text.
>
> It just doesn't seem right to me to be able to do:

CREATE DOMAIN int AS varchar(50);

Justin

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-05-13 Thread Justin Clift

On 13 May 2016, at 21:42, Josh berkus <j...@agliodbs.com> wrote:
> On 05/13/2016 01:04 PM, Joshua D. Drake wrote:
>> On 05/13/2016 12:03 PM, Josh berkus wrote:
>>> On 05/13/2016 11:48 AM, Robert Haas wrote:
>>>> On Fri, May 13, 2016 at 12:12 PM, Joshua D. Drake
>>>> <j...@commandprompt.com> wrote:
>> 
>>> Anyway, all of this is a moot point, because nobody has the power to
>>> tell the various companies what to do.  We're just lucky that everyone
>>> is still committed to writing stuff which adds to PostgreSQL.
>> 
>> Lucky? No. We earned it. We earned it through years and years of hard
>> work. Should we be thankful? Absolutely. Should we be grateful that we
>> have such a powerful and engaged commercial contribution base? 100%.
> 
> Lucky.  Sure there was work and personal integrity involved, but like
> any success story, there was luck.
> 
> But we've also been fortunate in not spawning hostile-but-popular forks
> by people who left the project, and that none of the companies who
> created hostile forks were very successful with them, and that nobody
> has seriously tried using lawyers to control/ruin the project.
> 
> And, most importantly, we've been lucky that a lot of competing projects
> have self-immolated instead of being successful and brain-draining our
> contributors (MySQL, ANTS, MonetDB, etc.)

Oracle buying MySQL (via Sun) seems to have helped things along pretty
well too. :)

+ Justin

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Justin Clift

On 12 Apr 2016, at 17:23, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Mon, Apr 11, 2016 at 11:39 AM, Justin Clift <jus...@postgresql.org> wrote:
>> Moving over a conversation from the pgsql-advocacy mailing list.  In it
>> Simon (CC'd) raised the issue of potentially creating a 
>> backwards-compatibility
>> breaking release at some point in the future, to deal with things that
>> might have no other solution (my wording).
>> 
>> Relevant part of that thread there for reference:
>> 
>>  
>> http://www.postgresql.org/message-id/CANP8+jLtk1NtaJyXc=hAqX=0k+ku4zfavgvbkfs+_sor9he...@mail.gmail.com
>> 
>> Simon included a short starter list of potentials which might be in
>> that category:
>> 
>>  * SQL compliant identifiers
>>  * Remove RULEs
>>  * Change recovery.conf
>>  * Change block headers
>>  * Retire template0, template1
>>  * Optimise FSM
>>  * Add heap metapage
>>  * Alter tuple headers
>>  et al
>> 
>> This still is better placed on -hackers though, so lets have the
>> conversation here to figure out if a "backwards compatibility breaking"
>> release really is needed or not.
> 
> A couple of points here:
> *) I don't think having a version number that starts with 10 instead
> of 9 magically fixes backwards compatibility problems and I think
> that's a dangerous precedent to set unless we're willing to fork
> development and support version 9 indefinitely including major release
> versions.
> 
> *) Compatibility issues at the SQL level have to be taken much more
> seriously than other things (like internal layouts or .conf issues).
> 
> *) We need to do an honest cost benefit analysis before breaking
> things.  Code refactors placed on your users puts an enormous cost
> that is often underestimated.  I have some fairly specific examples of
> the costs related to the text cast removal for example.  It's not a
> pretty picture.

Yeah.  Moving the discussion here was more to determine which items
really would need a backwards compatible break.  eg no other approach can
be found.

Seems I started it off badly, as no-one's yet jumped in to discuss the
initial points. :(

+ Justin

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-12 Thread Justin Clift

On 12 Apr 2016, at 14:12, Yury Zhuravlev <u.zhurav...@postgrespro.ru> wrote:
> Justin Clift wrote:
>> Simon included a short starter list of potentials which might be in
>> that category:
>> 
>>  * SQL compliant identifiers
>>  * Remove RULEs
>>  * Change recovery.conf
>>  * Change block headers
>>  * Retire template0, template1
>>  * Optimise FSM
>>  * Add heap metapage
>>  * Alter tuple headers
>>  et al
> 
> + CMake build I think.
> 
> Now I can build:
> * postgres
> * bin/* programs
> * pl/* languages
> * contrib/* (with cmake PGXS analogue)
> 
> Can run regression and isolation tests for postgres/pl* and all contrib 
> modules. 
> There is still a lot of work but I hope everything will turn out. Also it 
> would be good to get help.
> 
> Thanks.
> 
> PS https://github.com/stalkerg/postgres_cmake

If/when PostgreSQL can be built and tested with CMake... why would the
resulting code + database files + network protocol (etc) not be compatible
with previous versions? :)

+ Justin

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0

2016-04-11 Thread Justin Clift

Moving over a conversation from the pgsql-advocacy mailing list.  In it
Simon (CC'd) raised the issue of potentially creating a backwards-compatibility
breaking release at some point in the future, to deal with things that
might have no other solution (my wording).

Relevant part of that thread there for reference:

  
http://www.postgresql.org/message-id/CANP8+jLtk1NtaJyXc=hAqX=0k+ku4zfavgvbkfs+_sor9he...@mail.gmail.com

Simon included a short starter list of potentials which might be in
that category:

  * SQL compliant identifiers
  * Remove RULEs
  * Change recovery.conf
  * Change block headers
  * Retire template0, template1
  * Optimise FSM
  * Add heap metapage
  * Alter tuple headers
  et al

This still is better placed on -hackers though, so lets have the
conversation here to figure out if a "backwards compatibility breaking"
release really is needed or not.

Hopefully we can get it all done without giving users a reason to consider
switching. ;)

Regards and best wishes,

Justin Clift

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg9.4 relpages of child tables

2015-03-18 Thread Justin Pryzby

On Wed, Mar 18, 2015 at 12:11:22PM -0400, Tom Lane wrote:
 Justin Pryzby pry...@telsasoft.com writes:
  I believe there's been a behavior change, and not sure if it's deliberate.  
  I
  don't think there's a negative consequence for our production use, but it
  confused me while summing relpages for analysis purposes, as our 9.4 
  customers
  behaved differently.
 
 I suspect that you're getting confused because autovacuum kicked in on the

It seems you're right.  I was additionally confused because the autovacuum had
processed the most recent tables (triggered by insertion; partitioned by date),
but had not processed some of last month's tables (which I was querying for
relpages, since it's a complete month), since this DB was upgraded to pg9.4
last month.

Thanks,
Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg9.4 relpages of child tables

2015-03-18 Thread Justin Pryzby

I believe there's been a behavior change, and not sure if it's deliberate.  I
don't think there's a negative consequence for our production use, but it
confused me while summing relpages for analysis purposes, as our 9.4 customers
behaved differently.

Documentation indicates that in pg9.0, ANALYZE of a parent table included
statistics of its children.

Under both pg9.3 and 9.4, this returns no stats rows, after the parent table is
analyzed.  It returns stats if the child is analyzed.
SELECT * FROM pg_statistic WHERE starelid='.._2014_01'::regclass

However, in pg9.4, the child's pg_class.relpages is 0 (apparently, 1 for
indices) even after the parent is analyzed (and is an approximate number of
pages if the child is analyzed).

On pg93:
pryzbyj=# create table t (i int);
pryzbyj=# create table t2 (like t) inherits(t);
pryzbyj=# insert into t2(SELECT generate_series(1,10) ORDER BY RANDOM());
pryzbyj=# select relpages from pg_class where relname='t2';
 = 0
pryzbyj=# analyze t;
pryzbyj=# select relpages from pg_class where relname='t2';
 = 885

On pg94:
ts=# create table t (i int);
ts=# create table t2 (like t) inherits(t);
ts=# insert into t2(SELECT generate_series(1,10) ORDER BY RANDOM());
ts=# select relpages from pg_class where relname='t2';
 = 0
ts=# analyze t;
ts=# select relpages from pg_class where relname='t2'; -- this changed
 = 0
ts=# analyze t2;
ts=# select relpages from pg_class where relname='t2';
 = 443

Is that a deliberate change, and if so, is there any documentation of it?  I'd
prefer to avoid analyzing all our child tables, as all queries hit the parents,
which include statistics on the children.

Thanks,
Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] ruby connect

2009-05-30 Thread Justin Carrera

I'm trying to connect ruby to postgres on ubuntu and the only link I found that 
has the library is down.  Does anyone have the postgres library for ruby?  Or 
direct me to it?

Justin Darby Carrera
Database Programmer/Analyst
MED-Department of Medical Social Sciences
j-carr...@northwestern.edumailto:j-carr...@northwestern.edu
312-695-8994

Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread justin





Tom Lane wrote:

Have M$ really been stupid
enough to make an external kill look just like an exit() call?

			regards, tom lane

  

kind of :-(

Everything i have read seems to point the Task Manager calls
TerminateProcess() in the kernel32.dll and passes a 1 setting the
exitcode to 1. I have not found anything clearly stating that, yet
logic points that is what its doing. 

http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx

Would it not be easy to set the normal exitcode to something other than
1 to see the difference 
ExitProcess()
http://msdn.microsoft.com/en-us/library/ms682658(VS.85).aspx


this article maybe helpful. 
http://msdn.microsoft.com/en-us/library/ms686722(VS.85).aspx

BTW tested on Windows 2003 SP1 this problem shows up in Postgresql
8.3.3

Re: [HACKERS] windows doesn't notice backend death

2009-05-03 Thread justin


Andrew Dunstan wrote:



justin wrote:


Would it not be easy to set the normal exitcode to something other 
than 1 to see the difference

ExitProcess()



Not really, as Tom showed later this is an example of a more general 
problem. I think his solution of detecting when backends have cleaned 
up nicely and when they have not is the right way to go.


cheers

andrew

Stupid thought  why can the some clueless code set the exit status to 
crashed status???  Would it not be more prudent to remove that ability??? 




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-07 Thread justin


Steve Crawford wrote:

Did I miss the exciting conclusion or did this drift silently off radar?


it was pretty well split between the options.  tabled for another time. 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread justin





Tom Lane wrote:

  Robert Haas robertmh...@gmail.com writes:
  
  
On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark st...@enterprisedb.com wrote:


  On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason s...@samason.me.uk wrote:
  
  
string_to_array('',',')::INT[] = invalid input syntax for integer: ""

  
  Oof. That's a good point.
  

  
  
  
  
+1.  I find this argument much more compelling than anything else
that's been offered up so far.

  
  
Yeah.  It seems to me that if you consider only the case where the array
elements are text, there's a weak preference for considering '' to be a
single empty string; but as soon as you think about any other datatype,
there's a strong preference to consider it a zero-element list.  So I
too have come around to favor the latter interpretation.  Do we have
any remaining holdouts?

			regards, tom lane
  


I'm still a hold out, We are taking a string putting it into a array
based on a delimiter. That is very simple and straight forward. Yet
many argue if we want to cast this into another data type the function
should deal with in limited cases. 

string_to_array('',',')::INT[] works as proposed 

But 
string_to_array(',,,', ',' )::INT[] Fails 
or
string_to_array('1,2,,4', ',' )::INT[] Fails .


I'm trying to understand the difference between a empty string to a
string with many blank entries between the delimiter. 
Consider
',,' = '' once the delimiter is removed . Yet Seven zero length
entries were passed. How is that going to
be handled 

In one case it works and yet other cases it fails. This is
inconsistent behavior. Unless all zero length strings are removed or
are treated as NULLs I can't see how casting to another type is going
to work.

If zero length strings are treated as NULLs this creates idea that zero
length strings are = to NULLs. 

The input is a string and the output is text[], casting to another
data type is error prone and should be handled by the programmer.

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread justin

If someone can show me a real world example  this logic simplifies the 
code and has more uses I'll bite



I just presently can't see how this works better.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-04-01 Thread Justin





Martin Gainty wrote:

  
  Split strings into array elements using provided
delimiter
  
string_to_array('xx~^~yy~^~zz', '~^~') 
output: {xx,yy,zz}
  
http://www.postgresql.org/docs/8.3/interactive/functions-array.html
  

Sorry thats not the question i'm asking. 

We are debating if it makes sense to change the output in certain
cases. 

I'm for not returning nulls or returning zero element array. 

I'm asking how is the other better by giving a real world example??? I
don't see the plus side at the moment.

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-03-31 Thread justin


Sam Mason wrote:

I'd expect 3,2,1 and 1.

That's also a disingenuous example; what would you expect back from:

  select count_elements(string_to_array('butter,,milk',','))

I think the semantics you want is what you'd get from:

  array_filter_blanks(string_to_array($1,$2))

where I defined array_filter_blanks in my previous post.
  


I agree the function should not be changing values passed.   
Stripping/Dropping empty strings is changing what was passed into the 
function instead breaking it into a array. 




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-03-31 Thread justin


Sam Mason wrote:


  string_to_array('',',')::INT[]  = invalid input syntax for integer: 

Which you don't get at the moment; although you do currently get it in
other common cases such as:

  string_to_array('1,',',')::INT[]

If you want backwards compatible behaviour you could always bung a
NULLIF in there:

  string_to_array(NULLIF('',''),',')::INT[]  = NULL


  


But consider  this fails also

select string_to_array('1, , 3', ',' )::int[] = ERROR:  invalid input 
syntax for integer:  


yet this works

select string_to_array('1, 2, 3',',')::int[]

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-03-31 Thread justin





Greg Stark wrote:

  On Tue, Mar 31, 2009 at 5:48 PM, justin jus...@emproshunts.com wrote:
  
  
But consider this fails also

select string_to_array('1, , 3', ',' )::int[] = ERROR: invalid input
syntax for integer: " "

yet this works

select string_to_array('1, 2, 3',',')::int[]

  
  

Sure, and the analogous pair string_to_array(' ',',')::int[] and
string_to_array('1 ',',')::int[] behave similarly.

The point is that '' could represent no items or one empty string. We
get to pick which one and in any use case where the string was a list
of objects it's almost certainly intended to be an empty list. And
databases are almost always processing lists of things.

I think the only use case where you want it to be a singleton list of
an empty string is when you're doing string parsing such as building a
lexer or something like that, which is isn't a typical use for sql
code.
  


I disagree. Casting a string to something else can be a very error
prone to begin with. 
Having string_to_array() to deal with that possibility is out of its
scope IMHO.

Consider this. I have intelligent part numbers that need to be split
apart to simplify searching and do math with. 

string_to_array(' F-2500-50 ', '-' ) ::int[]

Still fails with an error as expected. what is the difference between
' ' and 'F' 

So before doing any thing a test needs to be done to verify the
contents, so it can be casted to something else.

Re: [HACKERS] [GENERAL] string_to_array with empty input

2009-03-31 Thread justin

This thread being cross posted has made it a bit
confusing

Greg Stark wrote:

Nobody has ever suggested filtering out empty elements or dealing
specially with spaces or anything else like that. If you're talking
about that then you've missed the original question.

"Does anyone want to argue for keeping it the same? Or perhaps
argue that a zero-element array is a more sensible result than
a one-element array with one empty string? (It doesn't seem
like it to me, but maybe somebody thinks so.)"

That means to me dropping empty strings or removing values that don't make sense. Then the argument begins what values make since to drop. Just zero length strings or include strings with million white spaces???

One last try.

If there's a column called "shopping list" which is a comma-separated
list of items to buy in the store and I store '' in it. How many items
do you think that means you should go shopping for? Do you think
that's one item that should be rejected because it's an empty string?
Or do you think that's an empty list with zero items listed?

It can't be rejected, Its an Empty shopping
list although a worthless shopping list .

What would it look like if it was a zero-length list? You can ask what
would it look like if it was a shopping list of one item called ''.
But I agree both are theoretically consistent, but one is actually
useful in 99% of use cases. The other is only useful in unusual cases.

I'm still confused which one you want here

[HACKERS] math error or rounding problem Money type

2008-06-07 Thread Justin

I believe i have found a math bug/rounding problem with Money type when 
its used with SUM()...  Postgresql 8.3.1


--- Background on the Problem

We have gl_trans table with 92,000 rows with one column containing the 
positive and negative entries.


In order to make this gl_trans table make more sense and to group the 
accounts in correct debits and credits along with type of accounts, A 
view was created that does grouping and sorting.   To further make 
things easier the view casted the results into the Money Type just to 
make the select statements that call the view shorter.


All looked great for several weeks till all of sudden the sumed values 
for all accounts goes out by 0.01. 

I needed to confirm this was a rounding problem and not  a GL entry that 
was bad.  ( if we had a bad entry this would scream we have a far bigger 
problem where the application allowed an GL entry to be committed that 
was out of balance)


To confirm that all entries made have equal and opposite entry  below 
select statement was created.  The  gltrans_sequence column is integer 
key that groups General Ledger entries together so all the sides of a 
specific entry can be found. 


select *
from
(select gltrans_sequence, sum(gltrans_amount) as Neg from gltrans  where 
gltrans_amount  0 group by gltrans_sequence) as neg,
(select gltrans_sequence, sum(gltrans_amount) as pos from gltrans  where 
gltrans_amount  0 group by gltrans_sequence) as pos

where neg.gltrans_sequence = pos.gltrans_sequence
and neg.neg  pos.pos*-1

This returns no records as expected...

Now armed with that no entry was bad I suspected it had to be with the 
money data type.

So I added explicit castings

select *
from
(select gltrans_sequence, sum(gltrans_amount::text::money) as Neg from 
gltrans  where gltrans_amount  0 group by gltrans_sequence) as neg,
(select gltrans_sequence, sum(gltrans_amount::text::money) as pos from 
gltrans  where gltrans_amount  0 group by gltrans_sequence) as pos

where neg.gltrans_sequence = pos.gltrans_sequence
and neg.neg  pos.pos*-1

select *
from
(select gltrans_sequence, sum(gltrans_amount::text::money) as Neg from 
gltrans  where gltrans_amount  0 group by gltrans_sequence) as neg,
(select gltrans_sequence, sum(gltrans_amount::text::money) as pos from 
gltrans  where gltrans_amount  0 group by gltrans_sequence) as pos

where neg.gltrans_sequence = pos.gltrans_sequence
and neg.neg::text::money  pos.pos::text::money*-1
-
select *
from
(select gltrans_sequence, sum(gltrans_amount) as Neg from gltrans  where 
gltrans_amount  0 group by gltrans_sequence) as neg,
(select gltrans_sequence, sum(gltrans_amount) as pos from gltrans  where 
gltrans_amount  0 group by gltrans_sequence) as pos

where neg.gltrans_sequence = pos.gltrans_sequence
and neg.neg::text::money  pos.pos::text::money*-1

---
Nothing  resulted in showing a entry that was out of balance. 






--Identifying the problem ---

So i turned my attention to the view which casted numeric type to 
Money.  View is called trailbalance


The Bad Select Statement that creates the View --
SELECT p.period_id, p.period_start, p.period_end, a.accnt_id,
  a.accnt_number, a.accnt_descrip, p.period_yearperiod_id,
  a.accnt_type,
  SUM(CASE WHEN g.gltrans_date  p.period_start
   THEN g.gltrans_amount ELSE 0.0
  END)::text::money AS beginbalance,
  SUM(CASE WHEN g.gltrans_date = p.period_end
AND g.gltrans_date = p.period_start
AND g.gltrans_amount = 0::numeric
   THEN g.gltrans_amount ELSE 0.0
  END)::text::money AS negative,
  SUM(CASE WHEN g.gltrans_date = p.period_end
AND g.gltrans_date = p.period_start
AND g.gltrans_amount = 0::numeric
   THEN g.gltrans_amount ELSE 0.0
  END)::text::money AS positive,
  SUM(CASE WHEN g.gltrans_date = p.period_end
AND g.gltrans_date = p.period_start
   THEN g.gltrans_amount ELSE 0.0
  END)::text::money AS difference,
  SUM(CASE WHEN g.gltrans_date = p.period_end
   THEN g.gltrans_amount ELSE 0.0
  END)::text::money AS endbalance
 FROM period p
CROSS JOIN accnt a
 LEFT JOIN gltrans g ON (g.gltrans_accnt_id = a.accnt_id
 AND g.gltrans_posted = true)
 where p.period_id = 58
group by  p.period_id, p.period_start, p.period_end, a.accnt_id,
  a.accnt_number, a.accnt_descrip, p.period_yearperiod_id,
  a.accnt_type

ORDER BY p.period_id, a.accnt_number;
---End Select 


The query that calls this View 


--
Select
   sum( beginBalance ) as beginbalance,
   sum( negative ) as debit,
   sum( positive ) as credit,
   sum( difference ) as difference,
   sum( endbalance) as endbalance
from trailbalance
-

Re: [HACKERS] math error or rounding problem Money type

2008-06-07 Thread Justin




Tom Lane wrote:

Justin [EMAIL PROTECTED] writes:
  
I believe i have found a math bug/rounding problem with Money type when 
its used with SUM()...  Postgresql 8.3.1



You do know that money only stores two fractional digits?

regards, tom lane
  

yes.  The question is why are the to sides not equal anymore

Take this

Select
   '2',
   round(sum( beginBalance ),6) as beginbalance,
   round(sum( negative ),6) as debit,
   round(sum( positive ),6) as credit,
   round(sum( difference ),6) as difference,
   round(sum( endbalance),6) as endbalance
from trailbalance
union
Select
   '1',
   sum( round(beginBalance,6)) as beginbalance,
   sum( round(negative,6)) as debit,
   sum( round(positive,6)) as credit,
   sum( round(difference,6)) as difference,
   sum( round(endbalance,6)) as endbalance
from trailbalance

1  -0.06   -11250546.74375211250546.743752   0.00   -0.07
2   0.00-11250546.743752   11250546.743752   0.000.00

At the very least this show a clear warning when rounding do it after 
all the sum function is called not before.

Re: [HACKERS] [PERFORM] Memory question on win32 systems

2008-05-29 Thread Justin




Sabbiolina wrote:
On Thu, May 29, 2008 at 2:11 AM, Douglas McNaught [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


On Wed, May 28, 2008 at 7:05 PM, Sabbiolina [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:
 Hello, in my particular case I need to configure Postgres to
handle only a
 few concurrent connections, but I need it to be blazingly fast,
so I need it
 to cache everything possible. I've changed the config file and
multiplied
 all memory-related values by 10, still Postgres uses only less
than 50 Mb of
 my RAM.

How are you measuring this?


Task Manager



 I have 4 Gigs of RAM, how do I force Postgres to use a higher
part of such
 memory in order to cache more indexes, queries and so on?

Post the settings values you're using and people will be better able
to help you.

-Doug




config:
port = 5432# (change requires restart)
max_connections = 50# (change requires restart)

shared_buffers = 320MB# min 128kB or max_connections*16kB
temp_buffers = 80MB# min 800kB
max_prepared_transactions = 10# can be 0 or more
work_mem = 10MB# min 64kB
maintenance_work_mem = 160MB# min 1MB
Max_stack_depth = 20MB# min 100kB
max_fsm_pages = 2048000# min max_fsm_relations*16, 6 bytes 
each

max_fsm_relations = 1# min 100, ~70 bytes each
shared_preload_libraries = '$libdir/plugins/plugin
_debugger.dll'# (change requires restart)
wal_buffers = 640kB# min 32kB

log_destination = 'stderr'# Valid values are combinations of
logging_collector = on# Enable capturing of stderr and csvlog
log_line_prefix = '%t '# special values:

datestyle = 'iso, dmy'
lc_messages = 'Italian_Italy.1252'# locale for system 
error message
lc_monetary = 'Italian_Italy.1252'# locale for monetary 
formatting
lc_numeric = 'Italian_Italy.1252'# locale for number 
formatting

lc_time = 'Italian_Italy.1252'# locale for time formatting
default_text_search_config = 'pg_catalog.italian'
Need to Use Performance monitor and Process Explorer to track down the 
caching on windows machines and performance.


Postgresql does not work like MS SQL where when it start ups grabs a 
large chuck memory that it uses to keep  last  X number of queries along 
with the indexes and tables in memory then its pushed out only when new 
queries that completely different need the memory.


To my understanding Postgresql only caches queries and results in memory 
for that specific connection.  So when that connection is closed those 
cached results are cleared out.So cached indexs and queries are for 
that connection only.   I hope my understanding is correct.

Re: [HACKERS] [PERFORM] Memory question on win32 systems

2008-05-29 Thread Justin




Merlin Moncure wrote:

On Thu, May 29, 2008 at 10:19 AM, Justin [EMAIL PROTECTED] wrote:
  

To my understanding Postgresql only caches queries and results in memory for
that specific connection.  So when that connection is closed those cached
results are cleared out.So cached indexs and queries are for that
connection only.   I hope my understanding is correct.



completely wrong.

They are called 'shared' buffers for a reason.  Also you are missing
the point of the o/s file cache which lies under that.  If you have a
computer with 4gb ram that is addressable, all its memory is used for
caching at all times, period, less what operating system needs or what
is used by running programs or temporary demands (sorting, etc).

Also, postgresql doesn't as a rule cache 'results and queries'.

shared buffers setting reserves memory for postgresql's internal cache
(plus some housekeeping things like locks)...which can be faster than
the o/s cache because it is more tightly integrated with the backend.
However a page fault to disk is much more interesting in performance
terms than the performance differences between shared buffers and o/s
cache.

merlin

  
This is not meant to be argumentative i'm trying to clearify what is 
going because documentation for different sources seem to be in 
conflict  or confusing. 
---

From Postgresql Second Edition:
   Shared_Buffers: This cache is shared by all clients connected to a 
single cluster. DISK I?) (and cache I/O) is performed in 8KB chunks.  
The shared_buffers parameter determines how many 8KB will be created in 
the shared cache.


from here explanation is different
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html  
it referrs to queries which means to me the SQL command and the result 
sets (but that is wrong)


We have sort_mem aka work_mem
---
From Postgresql Second Edition:
where postgresql processes query it transforms the query from 
string form into an execution plan. An execution plan is a sequence of 
operations that must be performed in order satisfy the query.   

(This specific to the client  connection and when the client is closed 
out this is cleared my thinking.)


From postgresql Documentation

work_mem (integer)

   Specifies the amount of memory to be used by internal sort
   operations and hash tables before switching to temporary disk files.
   The value is defaults to one megabyte (1MB). Note that for a complex
   query, several sort or hash operations might be running in parallel;
   each one will be allowed to use as much memory as this value
   specifies before it starts to put data into temporary files. Also,
   several running sessions could be doing such operations
   concurrently. So the total memory used could be many times the value
   of work_mem; it is necessary to keep this fact in mind when choosing
   the value. Sort operations are used for ORDER BY, DISTINCT, and
   merge joins. Hash tables are used in hash joins, hash-based
   aggregation, and hash-based processing of IN subqueries

-

Quoting You Also, postgresql doesn't as a rule cache 'results and queries'.

Then what is the purpose of shared buffers if nothing is being reused  
is it only used to keep track locks, changes and what is to being 
spooled to the kernel???


I'm confussed trying to figure out how caches are being use and being 
moving through postgresql backend.   As many have stated that small 
database can fit completely in the caches How does that really work.

[HACKERS] transaction support in pl/slql

2008-03-12 Thread Justin


i have a  request ??

Is there any intension of expanding  Transaction support into PL/SQL.  
this would simplify GUI/client code to know the if a function failed it 
roll back the transaction.  right now i have to pass back some kind 
error message  that i  look for to see if the transaction failed to 
either run the commit or rollback command


i know about raise exception within sub blocks  but i want  a way to 
roll back to the starting transaction.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] creating new aggregate function

2008-02-29 Thread Justin


Need help and direction  creating new aggregate functions.

We need to add more average functions for both scientific and finical 
purposes


RMS for electrical measurement purposes
Mode for both electrical and finical
Weighted Average  finical  purposes
Generalized mean for electrical measurement purposes
Geometric mean for electrical measurement purposes
Harmonic mean for electrical measurement purposes

what would be the best way to create these new functions??



---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Ad Hoc Indexes

2008-02-18 Thread Justin

Question:  We have recently moved PostgreSql as we just migrated to 
OpenMfg.  i really like the database and PL/pgSQL its very similar to 
how FoxPro language layout so learning has been pretty easy.


Now for my question It does not appear PostgreSQL does not have an Ad 
Hoc Indexes ability where the Query planner will create an in memory 
index based on the Select, Update, Insert or Delete commands.  I got 
pretty lazy with MsSQL and Foxpro doing ad hoc indexes to speed up 
complex queries where i should have created an index but never did.


Is there any plans in the future to add the ability for PostgreSQL to 
create Ad Hoc indexes if it makes sense.Example i was creating a new 
report thats presently only processing around 120,000 records in the 7 
tables for the query. But the query took between 18 to 19 seconds to 
run.  I analyzed the query and added 2 indexes and drop the processing 
time to 93 milliseconds.
This query will be run 3 to 5 times a day so it makes sense to create 
indexes


But with reports that run maybe once a week or once a month it does not 
make sense to create indexes.  this is where I think Add Hoc indexes are 
nice to have because to my understanding maintaining indexes can get 
pretty expensive. 


Whats the likely hood of Ad Hoc Indexes being added to PostgreSql

thanks

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Ad Hoc Indexes

2008-02-18 Thread Justin

The idea of ad hoc indexes is speed up loop scans  To prove my idea i 
created a sql file in PGAdmin that creates the indexes on the fly then 
runs the query then drops the indexs.


without the indexes it takes 18 to 19 seconds to run the query.

To create the index and do the query takes 400 milliseconds.  I wish i 
could do an Explain on it with create index in the query but i can't it 
errors out.  So i reran the query with indexes already in place it drops 
the query time 191 milliseconds. 

Create and deleting the indexes on the fly improves performance almost 
50 times. I think creating Ad Hoc indexes on the fly in memory makes 
sense .  I imagine it would be even faster if the index stayed in memory




Tom Lane wrote:

Justin [EMAIL PROTECTED] writes:
  
Is there any plans in the future to add the ability for PostgreSQL to 
create Ad Hoc indexes if it makes sense.



No, I'm not aware of anyone contemplating such a thing.  I can hardly
imagine a situation where building an index for a single query is
actually a win.  Maybe those DBMSes you mention were using this as a
substitute for having decent join mechanisms, or something?

regards, tom lane
  
Sort  (cost=4091.92..4091.93 rows=1 width=83) (actual 
time=22003.849..22004.041 rows=458 loops=1)
  Sort Key: to_char((coitem.coitem_scheddate)::timestamp with time zone, 
'Month Dy DD '::text), cohead.cohead_number, coitem.coitem_linenumber
  -  Nested Loop Left Join  (cost=0.00..4091.91 rows=1 width=83) (actual 
time=14.477..21998.119 rows=458 loops=1)
-  Nested Loop  (cost=0.00..191.54 rows=1 width=75) (actual 
time=0.132..32.066 rows=458 loops=1)
  -  Nested Loop  (cost=0.00..191.22 rows=1 width=44) (actual 
time=0.121..25.253 rows=458 loops=1)
-  Nested Loop  (cost=0.00..182.94 rows=1 width=40) 
(actual time=0.111..18.476 rows=458 loops=1)
  -  Nested Loop  (cost=0.00..182.53 rows=1 width=30) 
(actual time=0.100..11.662 rows=458 loops=1)
-  Index Scan using coitem_status_key on 
coitem  (cost=0.00..174.25 rows=1 width=30) (actual time=0.084..2.364 rows=458 
loops=1)
  Index Cond: (coitem_status = 
'O'::bpchar)
  Filter: ((coitem_scheddate = 
'2008-01-01'::date) AND (coitem_scheddate = '2008-12-31'::date))
-  Index Scan using itemsite_pkey on itemsite 
 (cost=0.00..8.27 rows=1 width=8) (actual time=0.012..0.014 rows=1 loops=458)
  Index Cond: (coitem.coitem_itemsite_id = 
itemsite.itemsite_id)
  -  Index Scan using item_pkey on item  
(cost=0.00..0.40 rows=1 width=18) (actual time=0.008..0.010 rows=1 loops=458)
Index Cond: (itemsite.itemsite_item_id = 
item.item_id)
-  Index Scan using cohead_pkey on cohead  
(cost=0.00..8.27 rows=1 width=12) (actual time=0.008..0.010 rows=1 loops=458)
  Index Cond: (cohead.cohead_id = 
coitem.coitem_cohead_id)
  -  Index Scan using cust_pkey on custinfo  (cost=0.00..0.31 
rows=1 width=39) (actual time=0.007..0.009 rows=1 loops=458)
Index Cond: (cohead.cohead_cust_id = custinfo.cust_id)
-  Index Scan using wo_pkey on wo  (cost=0.00..8.27 rows=1 width=12) 
(actual time=0.007..0.011 rows=1 loops=458)
  Index Cond: (wo.wo_id = coitem.coitem_order_id)
SubPlan
  -  Nested Loop  (cost=958.43..1975.22 rows=1 width=36) (actual 
time=22.607..22.955 rows=1 loops=419)
Join Filter: (wrkcnt.wrkcnt_id = wooper.wooper_wrkcnt_id)
InitPlan
  -  Aggregate  (cost=958.42..958.43 rows=1 width=4) (actual 
time=13.870..13.870 rows=1 loops=419)
-  Seq Scan on wooper  (cost=0.00..958.41 rows=1 
width=4) (actual time=13.453..13.856 rows=1 loops=419)
  Filter: (wooper_rncomplete AND (wooper_wo_id = 
$0))
-  Seq Scan on wooper  (cost=0.00..1013.10 rows=1 width=8) 
(actual time=22.547..22.821 rows=1 loops=419)
  Filter: ((wooper_seqnumber = $1) AND (wooper_wo_id = 
$0))
-  Seq Scan on wrkcnt  (cost=0.00..2.75 rows=75 width=36) 
(actual time=0.010..0.052 rows=75 loops=419)
  -  Aggregate  (cost=958.42..958.43 rows=1 width=4) (actual 
time=14.033..14.034 rows=1 loops=419)
-  Seq Scan on wooper  (cost=0.00..958.41 rows=1 width=4) 
(actual time=13.600..14.011 rows=1 loops=419)
  Filter: (wooper_rncomplete AND (wooper_wo_id = $0))
  -  Aggregate  (cost=958.42..958.43 rows=1 width=4) (actual 
time=14.058..14.059 rows=1 loops=458)
-  Seq Scan on wooper  (cost=0.00..958.41 rows=1 width=4) 
(actual time=13.671..14.043 rows=1 loops=458)
  Filter: (wooper_rncomplete AND (wooper_wo_id = $0

Re: [HACKERS] Ad Hoc Indexes

2008-02-18 Thread Justin

Then why are the estimates so far off???  If estimates where correct 
would it improve the performance that much. 


Vaccum is set to run automatically so the stats stay update.

Total record count for the tables for all the tables put together is 
around 120,000 the query returns only 458 records which is correct. 

If i am correct in my understanding the reason the index improved the 
query so much is the wooper table gets hit hard because it appears in 3 
separate nested queries . So taking only  458 records returned from the 
parent query times 3 for 1,375 table scans going through 21,873 records 
for a total number records being processed to 30,075,375  on a table 
with no index.  So if you look at it that way PostgreSql did remarkably 
well processing the query in 18 to 20 seconds.   

The idea behind adhoc indexes is when one shot queries or really used 
queries are created that would require numerous indexes to run in a 
decent time can be run in a faction of the time.  This also saves 
processing times across the entire system where creating indexes for the 
all the possible queries is impractical


This does not take away the need for index but speed up  ad-hoc queries 
created from a website or other business analysis tool that someone 
might create


Tom Lane wrote:

Justin [EMAIL PROTECTED] writes:
  
The idea of ad hoc indexes is speed up loop scans  To prove my idea i 
created a sql file in PGAdmin that creates the indexes on the fly then 
runs the query then drops the indexs.



  

 without the indexes it takes 18 to 19 seconds to run the query.



  

To create the index and do the query takes 400 milliseconds.



The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off.  I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Ad Hoc Indexes

2008-02-18 Thread Justin


oops dam spell checker really should be rarely sorry

Justin wrote:
Then why are the estimates so far off???  If estimates where correct 
would it improve the performance that much. 


Vaccum is set to run automatically so the stats stay update.

Total record count for the tables for all the tables put together is 
around 120,000 the query returns only 458 records which is correct. 

If i am correct in my understanding the reason the index improved the 
query so much is the wooper table gets hit hard because it appears in 
3 separate nested queries . So taking only  458 records returned from 
the parent query times 3 for 1,375 table scans going through 21,873 
records for a total number records being processed to 30,075,375  on a 
table with no index.  So if you look at it that way PostgreSql did 
remarkably well processing the query in 18 to 20 seconds.   

The idea behind adhoc indexes is when one shot queries or rarely used 
queries are created that would require numerous indexes to run in a 
decent time can be run in a faction of the time.  This also saves 
processing times across the entire system where creating indexes for 
the all the possible queries is impractical


This does not take away the need for index but speed up  ad-hoc 
queries created from a website or other business analysis tool that 
someone might create


Tom Lane wrote:

Justin [EMAIL PROTECTED] writes:
  
The idea of ad hoc indexes is speed up loop scans  To prove my idea i 
created a sql file in PGAdmin that creates the indexes on the fly then 
runs the query then drops the indexs.



  

 without the indexes it takes 18 to 19 seconds to run the query.



  

To create the index and do the query takes 400 milliseconds.



The example you show doesn't convince me of much of anything, because
the estimated rowcounts are so far off.  I think you're basically
dealing with an estimation failure and it's pure luck that the extra
index fixes it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Ad Hoc Indexes

2008-02-18 Thread Justin

yes that improved the select statement allot  from 18 to 20 to 6 
seconds  3 fold improvement with no indexes.  thanks


Now i tested your new and improved select statement with the indexes its 
query time was taking from 90 to 140 milliseconds compared to the 
original select statement of  94 to 120 milliseconds.  I tested both 
select statements 5 times to get an average  and they both hover around 
110 to 120


I attached the results from the explain that used the indexes and one 
with no index


although in both case with and with out indexes the estimate still 
failed to return the correct number by allot.



Stephen Denne wrote:

Justin wrote:
  

Then why are the estimates so far off?



Perhaps a really high correlation between those records where coitem_status = 
'O' and those records in your date range. The planner estimates 1 row when 
restricting by both restrictions, whereas there where 458. The 1 row was 
probably a result of a small % for status='O' multiplied by a small % for the 
date range.

  

If estimates where correct would it improve the performance that much.



Possibly, but a better performance gain might be obtained by rewriting the 
query, changing the case expression to something along the lines of:

coalesce((
	select wrkcnt_code || ' Operation Completed :' || cast(wooper_seqnumber as text) 
	from wooper inner join wrkcnt on wrkcnt_id = wooper_wrkcnt_id

where wooper_rncomplete = true and wooper_wo_id = coitem_order_id
order by wooper_seqnumber desc limit 1
),'No Operation Completed') as LastFinshedOp

regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer 
focus, and courage. This email with any attachments is confidential and may be 
subject to legal privilege.  If it is not intended for you please advise by 
reply immediately, destroy it and do not copy, disclose or use it in any way.

__
  This email has been scanned by the DMZGlobal Business Quality 
  Electronic Messaging Suite.

Please see http://www.dmzglobal.com/services/bqem.htm for details.
__



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq
  
Sort  (cost=1161.97..1161.98 rows=1 width=83) (actual time=6458.955..6459.166 
rows=458 loops=1)
  Sort Key: to_char((coitem.coitem_scheddate)::timestamp with time zone, 
'Month Dy DD '::text), cohead.cohead_number, coitem.coitem_linenumber
  -  Nested Loop Left Join  (cost=0.00..1161.96 rows=1 width=83) (actual 
time=14.638..6453.589 rows=458 loops=1)
-  Nested Loop  (cost=0.00..191.54 rows=1 width=75) (actual 
time=0.121..31.942 rows=458 loops=1)
  -  Nested Loop  (cost=0.00..191.22 rows=1 width=44) (actual 
time=0.111..24.863 rows=458 loops=1)
-  Nested Loop  (cost=0.00..182.94 rows=1 width=40) 
(actual time=0.102..18.215 rows=458 loops=1)
  -  Nested Loop  (cost=0.00..182.53 rows=1 width=30) 
(actual time=0.091..11.299 rows=458 loops=1)
-  Index Scan using coitem_status_key on 
coitem  (cost=0.00..174.25 rows=1 width=30) (actual time=0.077..2.314 rows=458 
loops=1)
  Index Cond: (coitem_status = 
'O'::bpchar)
  Filter: ((coitem_scheddate = 
'2008-01-01'::date) AND (coitem_scheddate = '2008-12-31'::date))
-  Index Scan using itemsite_pkey on itemsite 
 (cost=0.00..8.27 rows=1 width=8) (actual time=0.012..0.014 rows=1 loops=458)
  Index Cond: (coitem.coitem_itemsite_id = 
itemsite.itemsite_id)
  -  Index Scan using item_pkey on item  
(cost=0.00..0.40 rows=1 width=18) (actual time=0.008..0.010 rows=1 loops=458)
Index Cond: (itemsite.itemsite_item_id = 
item.item_id)
-  Index Scan using cohead_pkey on cohead  
(cost=0.00..8.27 rows=1 width=12) (actual time=0.008..0.010 rows=1 loops=458)
  Index Cond: (cohead.cohead_id = 
coitem.coitem_cohead_id)
  -  Index Scan using cust_pkey on custinfo  (cost=0.00..0.31 
rows=1 width=39) (actual time=0.007..0.010 rows=1 loops=458)
Index Cond: (cohead.cohead_cust_id = custinfo.cust_id)
-  Index Scan using wo_pkey on wo  (cost=0.00..8.27 rows=1 width=12) 
(actual time=0.007..0.011 rows=1 loops=458)
  Index Cond: (wo.wo_id = coitem.coitem_order_id)
SubPlan
  -  Limit  (cost=962.12..962.12 rows=1 width=36) (actual 
time=13.977..13.978 rows=1 loops=458)
-  Sort  (cost=962.12..962.12 rows=1 width=36) (actual 
time=13.970..13.970 rows=1 loops=458)
  Sort Key

Re: [HACKERS] Ad Hoc Indexes

2008-02-18 Thread Justin

there are certainly problems with this idea.  Figuring out the cost to 
either create an index or just do table scan would be pain but not 
impossible.   The hit to index a table in memory i don't think would be 
that bad compare to do 100's to thousands of loops with table scans.


I see more options for the query planner in config file to handle ad-hoc 
indexes


Now to Toms points.  The point of Ad Hoc index is they're only alive for 
the period time the  query is running and only live in memory.   Once 
the query completes they die horrible deaths


These temporay indexes will not do a table lock or row lock ever it only 
needs to view the record to create this index.  So the same problems 
that affects 100 table scans would affect create temp index. I think 
this would help reduce concurrence sense the table is not being scanned 
thousands of times


The idea of creating one time use indexes has been around for long time 
and has showed to be a benefit when dealing with  large data sets where 
queries is one time or rarely used and its to much labor and cost to 
figure out how to make it faster.  This would also reduce IO disk 
activity allot if the table can't fit in memory but the index would fit 
because it relativity small in comparison.  


Tom Lane wrote:

Stephen Denne [EMAIL PROTECTED] writes:
  

The improvement wasn't to the part of the query that had the bad cost estimate, 
it was to the part that was being performed hundreds of times instead of the 
one time the planner estimated.



  

The planner still thought it was only going to perform a sequential scan of 
your wooper table once. So even if there had been any Ad Hoc Index creation 
code that had been used to consider creating indexes as part of a plan cost 
estimate, it wouldn't have bothered creating any indexes on wooper.



Right.  And even more to the point, if it had gotten the estimate right
and known that the subquery would have been repeated, that would have
(ideally) prompted it to shift to a different plan structure.

As Peter pointed out upthread, the existing hash join logic seems to be
a pretty decent facsimile of an ad hoc index --- in fact, the hash
table *is* a hash index for all intents and purposes.  If you tilt your
head at the right angle, a merge join could be seen as comparable to
constructing a couple of ad-hoc btree indexes.  Plus the join code is
not burdened by any overhead that actual index code would be likely to
have, such as niggling worries about crash-safety or concurrent access
to the index.

So in my mind the issue here is not why don't we have ad hoc indexes,
it's why the planner didn't choose a more appropriate join method.
It's clear that faulty rowcount estimation was one part of the answer.
Now it may also be that there are some outright structural limitations
involved due to the join arising from a sub-select --- I'm not sure
that the planner *can* generate a bulk join plan from a query expressed
like this one.  But the TODO items that this suggests to me do not
include implementing anything I'd call an ad-hoc index.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [pgsql-www] pg_autovacuum is nice ... but ...

2004-11-29 Thread Justin Clift

Bruce Momjian wrote:
Should I add a TODO to warn if FSM values are too small?  Is that doable?
It sounds like it should be, and it would be a valuable pointer to 
people, so yep.

Any idea who'd be interested in claiming it?
Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Improvements to PostgreSQL

2004-07-26 Thread Justin Clift

Suresh Tri wrote:
snip
 All your sugestions are welcome. Please help us to
 implement these features.
 Our aim is to make postgresql enterprise level.
Hi Suresh,
From reading your post, I feel your team is approaching the goal of 
making PostgreSQL Enterprise Level in a non-optimal way.

With the soon to be released version 7.5 of PostgreSQL, the core 
database engine itself is already very good.  This is not the area 
needing to be worked upon for the next level of Enterprise Functionality.

Your team will likely have a lot more effect if they concentrate on what 
Enterprises really need that PostgreSQL is missing:

 + An SNMP agent to report on PostgreSQL's status and allows remote 
control of the PostgreSQL daemon.  From an Oracle perspective, this 
would be the equivalent of Oracle Intelligent Agents, part of the core 
features of the Oracle Enterprise Manager (OEM).

 + Tools to allow control of PostgreSQL databases from one central 
place.  Again, the same as OEM.

   + Starting and stopping the database
   + Managing Users
   + Backup and Recovery
   + Alerts and submitting jobs
   + etc
   Oracle does this by having a centralised information repository that 
a management GUI connects too, and having Oracle Intelligent Agents 
running on each server the database software is on.  These Oracle 
Intelligent Agents keep the centralised repository aware of the status 
of the Oracle server software, perform actions on the Oracle servers as 
directed by the centralised repository (jobs running on there, 
instructions by the GUI, etc), and more.

   There's more to what the OEM GUI does, but that's a good start.
   + Something else that would be useful is a GUI tool to automatically 
setup PostgreSQL replication.  The PostgreSQL Slony-I project would be 
the first one to look at, and probably equivalent to something like 
Oracle's Data Guard.  They use the different approach, but the end 
result is having a master and standby databases.

Hope this is helpful.
Regards and best wishes,
Justin Clift
 Thanks,
 Suresh
snip

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Improvements to PostgreSQL

2004-07-26 Thread Justin Clift

Patrick Welche wrote:
snip
Is there more to remote control than setting GUC variables? Tell me more!
Sure:
 + starting/restarting/stopping the database server software
 + the normal DBA type work - creating/altering/dropping databases, 
users, functions, languages, permissions (etc)
 + Remote backup and recovery
 + Submitting jobs to run remotely on the server.  i.e. reindexing or 
vacuuming scripts

 Remote Monitoring:
 + Alerts for specified events.  i.e. The database server is getting 
near to capacity in it's filesystem(s), or there have been too many 
invalid PG authorisation attempts, or there are connections getting 
rejected because the max_connections parameter isn't high enough

 Groups
 + Defining arbitrary groups of servers for the above to speed things 
up when working with many servers

 Roles
 + Having multiple administrators with different permissions (role 
based is generally good), all communicating through the centralised info 
repository so things don't get out of sync

 (possibly)
 + loading additional PG packages.  i.e. rolling out oid2name or 
pgbench (or other PG utils) to servers.  Could be viewed as something 
that should be done with the OS packaging mechanism(s) instead.

Any of the PG GUI's (I generally use pgAdmin) could likely be extended 
to do all of this in a nice, user friendly way.

As an aside, SNMP is important in enterprise settings as it allows PG to 
be plugged into the monitoring capabilities of enterprise management 
frameworks.  i.e. Concord's eHealth, and probably Tivoli, OpenView, etc

Hope that's useful.
Regards and best wishes,
Justin Clift
Cheers,
Patrick

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

[HACKERS] pg_dump - option for just dumping sequences?

2004-07-15 Thread Justin Clift

Hi all,
Came across a scenario today where I want to backup the latest values of 
sequences in the database, so they can be restored along with the data 
from a COPY command.

Looked at pg_dump's output options (for 7.4/7.5) and there doesn't seem 
to be an easy way for just dumping sequence definitions (with the latest 
sequence numbers).

Thought about using egrep too, but had difficulties there too.
Does anyone else think it would be a useful thing adding an option to 
pg_dump for outputting just the sequences?

Or perhaps there's an easy way of doing this I just haven't thought of?
:-)
Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] pg_dump - option for just dumping sequences?

2004-07-15 Thread Justin Clift

Christopher Kings-Lynne wrote:
Does anyone else think it would be a useful thing adding an option to 
pg_dump for outputting just the sequences?

Or perhaps there's an easy way of doing this I just haven't thought of?
Can't you just grep the output for pg_catalog.setval and SET SESSION 
AUTHORIZATION?
Arrrgh, yep.
Thanks Chris, I must be in space cadet mode today.  :(
Regards and best wishes,
Justin Clift

Chris


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] [pgsql-www] Problems logging into CVS server

2004-07-13 Thread Justin Clift

Marc G. Fournier wrote:
Damn ... I'll have to look at it ... we had a hacker get in through the 
way anoncvs was setup, so I set a passwd on in /etc/passwd (but didn't 
touch the anoncvs setup itself) ... will play with it tonight and see if 
I can figure out how to do a more secure anon-cvs ;(  I have to be 
missing something in the config *sigh*
Um, that sounds worrying.  Was the activity of the hacker anything that 
would affect PG code, or access to anything sensitive (account 
passwords, etc)?

Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: Release planning (was: Re: [HACKERS] Status report)

2004-07-13 Thread Justin Clift

Marc G. Fournier wrote:
snip
As Jan points out, its the 'small features that are done' that we've 
been griping about having to wait for, not the big ones which we know 
aren't done ...
snip
Hmmm... so we do things slightly differently than previously...
This upcoming version could be PG version 8.0,
We continue with bugfixes on 7.4.x,
That still leaves 7.5.x, 7.6.x (etc if needed), for releasing new 
versions of PG without the big features.

Kind of like an in-between thing, whilst waiting for the major features 
in the major releases?

That would mean we'd have:
Version 8.0 as our next main release,
Version 9.0 being the version after that with the next big features in it.
Version 8.x being version 8 plus smaller features, prior to 9.0
Version 8.x.x being version 8.x plus bug fixes.
Sounds like it could get hairy if we're not careful, but I reckon the PG 
Community is mature enough to make the right calls where needed.

:)
Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Status report

2004-07-11 Thread Justin Clift

Bruce Momjian wrote:
If you get full control of PostgreSQL, you can dictate what will happen.
Until then, I will follow the community consensus, which may or may not
match your opinion.
Um, let's take the time to get the features in, otherwise we'll be 
waiting another year (roughly) to get PITR and others out to end users.

:)
Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

[HACKERS] Loadable Oracle Personality: WAS LinuxTag wrapup thread

2004-07-06 Thread Justin Clift

Simon Riggs wrote:
snip
External tool is one thing, but the loadable personality seems like a
very good idea and worth discussing further.
Would an interesting, and maybe slightly different way of viewing a 
loadable personality, be as a set of rules that can be applied to 
parser input before the parser actually gets it... and massages input 
SQL into something for the parser to understand.

I'm hugely generalising here of course, but you know how we have a 
PostgreSQL Rules system that rewrites queries before handing them to 
the query planner... well, would it be possible/practical to potentially 
have a Rules system that rewrites incoming SQL before it gets given to 
the normal parser.

Might get complicated though... we'd need a pre-parser or something.
However, having a generalised system for doing this may make it far 
easier to provide personalities.  i.e. load a set of Oracle 8i rules, 
load a set of Oracle 9i rules, load a set of DB2 x, rules, etc.

:)
Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] LinuxTag wrapup

2004-07-05 Thread Justin Clift

Mario Weilguni wrote:
Because their SQL queries always seem to need a target object to select 
from.  i.e. SELECT NEXTVAL.foo isn't valid for Oracle 8/9.
snip
It has been a long time since I've used Oracle, but shouldn't it be select foo.nextval from dual?
Yep, that's sounds better.  It's been a couple of months since I was 
writing SQL in Oracle.  Previous contract.

:)
Regards and best wishes,
Justin Clift

Regards,
Mario Weilguni
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [HACKERS] LinuxTag wrapup

Andreas Pflug wrote:
snip
That's true, it's the question how much can be offered without too much 
effort.
I'm not too deep in oracle stuff, what comes to my mind is
- outer join syntax (parser thing)
- sequences usage (parser too)
- maybe stored procedure call, with a wrapper to convert output 
parameters to a composite return value.
There's also their FROM DUAL workaround (in common usage) as well.
i.e. SELECT NEXTVAL.foo FROM DUAL;
Because their SQL queries always seem to need a target object to select 
from.  i.e. SELECT NEXTVAL.foo isn't valid for Oracle 8/9.

Regards and best wishes,
Justin Clift

There's certainly no point supporting any weird ddl command, so there's 
still porting work to be done when migrating.

Regards,
Andreas

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [HACKERS] Adding column comment to information_schema.columns

Christopher Kings-Lynne wrote:
Well, if we add them (and they would be very useful I reckon) should 
we ensure there's an obvious PG naming thing happening?
Why are they useful If you want PG specific stuff then use the PG 
specific catalogs!!!
My take on this is that it's a LOT easier for people who don't know the 
internals of the PG catalogs to be able to query the information schema, 
as in the information schema things are generally explicitly named.

Much easier for non-experts, which most people don't want to have to 
invest the time in becoming.

Regards and best wishes,
Justin Clift
Chris

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Adding column comment to information_schema.columns

Christopher Kings-Lynne wrote:
snip
Anyone who's writing queries that are examing the schema of the database 
is by definition not a newbie...
By newbie here, I mean someone who's a PG newbie but has a reasonable 
understanding of databases (i.e. Oracle, etc) would generally find the 
information_schema much easier to locate and use information in 
compared to having to learn the PG internals.

There's a whole lot of difference between the skill level needed to 
query the information_schema and find out things like table and column 
names, vs looking into pg_namespace, pg_class and pg_attribute plus 
understanding the specific info there to work out table and column names.

I reckon that having information pre-prepared in views like those in 
information_schema is of course going to be easier for people than 
raw information our internal catalogs.

Do you get where I'm coming from with this?
Regards and best wishes,
Justin Clift
Chris


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Adding column comment to information_schema.columns

Christopher Kings-Lynne wrote:
There's a whole lot of difference between the skill level needed to 
query the information_schema and find out things like table and column 
names, vs looking into pg_namespace, pg_class and pg_attribute plus 
understanding the specific info there to work out table and column names.

I reckon that having information pre-prepared in views like those in 
information_schema is of course going to be easier for people than 
raw information our internal catalogs.

Do you get where I'm coming from with this?

Yes, but I disagree.  Your opinion is as an experienced user anyway, and 
you're just putting words in novice mouths...
That I directly disagree with.  I'm putting forth the viewpoint of the 
people I work with here, who aren't PG experienced.  They're Oracle 
experienced.

We've never had someone complain about querying stuff like that.  For 
example, why do you need the comments on columns?
The comment on columns addition to the constraint_column_usage view 
was a suggestion for our particular environment, where it's easier for 
some of the Perl programmers to have one view that shows them all of the 
needed info.

I'm not super caring either if we add this stuff or not to PG, it was 
just a suggestion from the trying to be helpful POV.

However, saying that people who aren't experienced with PG can easily 
(i.e. time efficiently) figure out how to query table and column names 
from PG by going through the pg_catalog stuff in comparison to things 
like information_schema.* is just not right.

One other benefit of having more stuff in information_schema.* is that 
the stuff there is easier to look at and figure out what it is.  With 
the view definitions that are provided to things like psql and pgAdmin 
when people look at an information_schema view, it provides them a way 
of figuring out where in the internal tables stuff is if they want to 
look for it.  i.e. they can find a column in 
information_schema.constraint_column_usage and go gee where is that in 
the real PostgreSQL tables?  Then look at the code that generates it 
and so on.

:)
Regards and best wishes,
Justin Clift

Chris


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Adding column comment to information_schema.columns

Tom Lane wrote:
Justin Clift [EMAIL PROTECTED] writes:
Not sure how worthwhile others will find this small patch (to CVS HEAD), 
but we found it useful.  It adds the column comments to the 
information_schema.columns view.

This question has been touched on before, but I guess it's time to face
it fair and square: is it reasonable for an SQL implementation to add
implementation-specific columns to an information_schema view?  One
could certainly argue that the entire point of information_schema is
to be *standard*, not more, not less.  OTOH I do not know if adding
an extra column is likely to break anyone's application.  Comments?
Well, I suppose it reduces application portability if anyone starts 
relying on it.

?
Regards and best wishes,
Justin Clift

regards, tom lane


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

[HACKERS] Bug with view definitions?

Hi guys,
Not sure if this is a known issue or not, but I think I may have found a 
bug with the way view definitions are shown... at least in psql.

Using 7.5 development CVS (as of a few hours ago) or even 7.4.3, if I 
connect using it's version of psql to a database (of the same version), 
then use psql to view the information_schema.constraint_columns_usage 
view, it gives me this definition:

***
mydb=# \d information_schema.constraint_column_usage
 View information_schema.constraint_column_usage
   Column   |   Type| Modifiers
+---+---
 table_catalog  | information_schema.sql_identifier |
 table_schema   | information_schema.sql_identifier |
 table_name | information_schema.sql_identifier |
 column_name| information_schema.sql_identifier |
 constraint_catalog | information_schema.sql_identifier |
 constraint_schema  | information_schema.sql_identifier |
 constraint_name| information_schema.sql_identifier |
View definition:
 SELECT current_database()::information_schema.sql_identifier AS 
table_catalog, x.tblschema::information_schema.sql_identifier AS 
table_schema, x.tblname::information_schema.sql_identifier AS 
table_name, x.colname::information_schema.sql_identifier AS column_name, 
current_database()::information_schema.sql_identifier AS 
constraint_catalog, x.cstrschema::information_schema.sql_identifier AS 
constraint_schema, x.cstrname::information_schema.sql_identifier AS 
constraint_name
   FROM ( SELECT DISTINCT nr.nspname, r.relname, r.relowner, a.attname, 
nc.nspname, c.conname
   FROM pg_namespace nr, pg_class r, pg_attribute a, pg_depend 
d, pg_namespace nc, pg_constraint c
  WHERE nr.oid = r.relnamespace AND r.oid = a.attrelid AND 
d.refclassid = 'pg_class'::regclass::oid AND d.refobjid = r.oid AND 
d.refobjsubid = a.attnum AND d.classid = 'pg_constraint'::regclass::oid 
AND d.objid = c.oid AND c.connamespace = nc.oid AND c.contype = 
'c'::char AND r.relkind = 'r'::char AND NOT a.attisdropped
  ORDER BY nr.nspname, r.relname, r.relowner, a.attname, 
nc.nspname, c.conname
UNION ALL
 SELECT nr.nspname, r.relname, r.relowner, a.attname, nc.nspname, c.conname
   FROM pg_namespace nr, pg_class r, pg_attribute a, pg_namespace nc, 
pg_constraint c, information_schema._pg_keypositions() pos(n)
  WHERE nr.oid = r.relnamespace AND r.oid = a.attrelid AND nc.oid = 
c.connamespace AND
CASE
WHEN c.contype = 'f'::char THEN r.oid = c.confrelid AND 
c.confkey[pos.n] = a.attnum
ELSE r.oid = c.conrelid AND c.conkey[pos.n] = a.attnum
END AND NOT a.attisdropped AND (c.contype = 'p'::char OR 
c.contype = 'u'::char OR c.contype = 'f'::char) AND r.relkind = 
'r'::char) x(tblschema, tblname, tblowner, colname, cstrschema, 
cstrname), pg_user u
  WHERE x.tblowner = u.usesysid AND u.usename = current_user();

mydb=#
***
However, when I use this definition (cut-n-paste style to avoid 
mistakes) to create the view anew (even with a different name, etc), 
then it gives me an error:

***
mydb=# \e
ERROR:  parse error at or near ALL at character 1105
ERROR:  parse error at or near ALL at character 1105
LINE 6: UNION ALL
  ^
mydb=#
***
I haven't come across this before, and am having the same problem with 
pgAdmin3 as well, as it supplies the exact same definition of the view.

I think I'm doing everything right here, could this be a bug with PG?
Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Bug with view definitions?

Dennis Bjorklund wrote:
snip
I've still not checked any code. I don't even know what part of pg it is 
that produce that bad SQL. The view itself works, so it must be the pretty 
printer that is broken (where ever that is hidden away in the code).
Thanks Dennis.
So, it's definitely a bug then.  I wasn't sure if it was PG or me.  :)
Um, as I'm not up to the task of fixing it, is this something you or 
someone else would be interested in?

Regards and best wishes,
Justin Clift

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Adding column comment to information_schema.columns

Christopher Kings-Lynne wrote:
There is a huge difference between adhering to a standard and limiting
yourself to a standard.  The real question is whether PostgreSQL's
goal is to support SQL standards, or whether PostgreSQL's goal is to
give PostgreSQL users a useful set of tools.

There are literally _hundreds_ of fields we could add to the 
information_schema.  Either we add them all or we add none of them.
Well, if we add them (and they would be very useful I reckon) should we 
ensure there's an obvious PG naming thing happening?

i.e.  pg_column_comment
or similar?  Maybe not pg_ but you know what I mean.
:-)
Regards and best wishes,
Justin Clift

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Bug with view definitions?

Tom Lane wrote:
snip
Actually, if you look at the source code (information_schema.sql) there
is no ORDER BY in it, only a DISTINCT.  The ORDER BY gets added by the
parser to help implement the DISTINCT.  Sooner or later we should look
at suppressing the added ORDER BY when displaying the view.
If someone fixes this can we make sure it goes into 7.4.4 as well (if 
it's not a drastic code change)?

It's not a data corrupting bug but it's stopping view definitions from 
working as advertised which is bad if you're used to being able to 
rely on them.  :-/

For now, I'll personally use the pg_dump version of the query, or maybe 
see if the one in backend/catalog/information_schema.sql can be run 
directly.  :)

Regards and best wishes,
Justin Clift
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Adding column comment to information_schema.columns