Re: [GENERAL] Help on Index only scan

2017-08-13 Thread Tom Lane
=?utf-8?Q?Ertan_K=C3=BC=C3=A7=C3=BCko=C4=9Flu?=  
writes:
>>> I want to have an index only scan for my below query:
>>> select autoinc, fileversion from updates where filename = 'Robox.exe' order
>>> by autoinc desc;

>> On 14 Aug 2017, at 01:15, Melvin Davidson  wrote:
>> As far as "Index only scan" , since the table only has 2003 rows, the 
>> optimizer has determined it is faster just to
>> load all the rows into memory and then filter.

> Sorry, my question was misleading. I do not want to use "set enable_seqscan = 
> off" I want to be sure that when necessary (record count increases) relevant 
> index(es) will be used.

There's a considerable distance between "is the planner making appropriate
use of indexes" and "I insist on an index-only scan".  The reason you're
not getting an index-only scan here is that that requires an index that
includes every column referenced in the query, which you don't have.  At
minimum you'd need an index including all of autoinc, fileversion, and
filename to do this query with an IOS.  If you want it to be particularly
efficient for this query then you'd need the index's column order to be
(filename, autoinc, fileversion) --- putting filename means the entries
satisfying WHERE will be clumped in the index, and putting autoinc second
means that a backwards scan on that portion of the index is enough to
produce the requested sort ordering without an explicit sort step.

Whether it's worth maintaining an index this specialized depends on how
much update traffic you have versus how often you want to do this
particular query.  Often it's not worth the extra disk space and update
overhead to have such an index.

In any case, I wouldn't worry about it until you have an actual
performance problem.  Trying to tell on toy data what the planner
will do with production-sized data is usually a losing game.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help on Index only scan

2017-08-13 Thread Melvin Davidson
On Sun, Aug 13, 2017 at 7:37 PM, Ertan Küçükoğlu <
ertan.kucuko...@1nar.com.tr> wrote:

>
> On 14 Aug 2017, at 01:15, Melvin Davidson  wrote:
>
>
> On Sun, Aug 13, 2017 at 5:59 PM, Ertan Küçükoğlu <
> ertan.kucuko...@1nar.com.tr> wrote:
>
>> Hello,
>>
>> My table details:
>> robox=# \dS+ updates
>>Table "public.updates"
>> Column |  Type   | Modifiers
>> | Storage  | Stats target | Description
>> ---+-+--
>> 
>> -+--+--+-
>>  autoinc   | integer | not null default
>> nextval('updates_autoinc_seq'::regclass) | plain|  |
>>  filename  | text|
>> | extended |  |
>>  dateofrelease | date|
>> | plain|  |
>>  fileversion   | text|
>> | extended |  |
>>  afile | text|
>> | extended |  |
>>  filehash  | text|
>> | extended |  |
>>  active| boolean |
>> | plain|  |
>> Indexes:
>> "updates_pkey" PRIMARY KEY, btree (autoinc)
>> "update_filename" btree (filename)
>> "updates_autoinc" btree (autoinc DESC)
>> "updates_dateofrelease" btree (dateofrelease)
>> "updates_filename_dateofrelease" btree (filename, dateofrelease)
>>
>>
>> robox=# select count(autoinc) from updates;
>>  count
>> ---
>>   2003
>> (1 row)
>>
>> robox=# select autoinc, filename, fileversion from updates limit 10;
>>  autoinc | filename | fileversion
>> -+--+-
>>   18 | Robox.exe| 1.0.1.218
>>   19 | Robox.exe| 1.0.1.220
>>   20 | Robox.exe| 1.0.1.220
>>   21 | 8423bfc5a669864f9b66b6b15ce908b9 | 1.1.1.1
>>   22 | 4fdabb0c7adbc5a89fbe679ce76ccef9 | 1.1.1.1
>>   23 | f469d77bfa86c8917c7846c0f871137c | 1.1.1.1
>>   24 | bc10af4c8789718a9ca6565ea14cb17d | 1.1.1.1
>>   25 | d9f87ee46cdb41cd15c2f71ed599faf9 | 1.1.1.1
>>   26 | 6f7428a5364aae1d5914a66cba3e6f3b | 1.1.1.1
>>   27 | 66ec4cdb8d64ca1414f75c1fb9eaa518 | 1.1.1.1
>> (10 rows)
>>
>> I want to have an index only scan for my below query:
>> select autoinc, fileversion from updates where filename = 'Robox.exe'
>> order
>> by autoinc desc;
>>
>> I simply could not understand planner and cannot provide right index for
>> it.
>> Below index names "update_filename" and "updates_autoinc" are added just
>> for
>> the query that I would like to have a index only scan plan. I also failed
>> with following indexes
>> "autoinc desc, filename, fileversion"
>> "autoinc desc, filename"
>>
>> First 3 rows in above select results are actual data. You will find that I
>> have inserted about 2000 rows of dummy data to have somewhat meaningful
>> plan
>> for the query.
>>
>> Current planner result:
>> robox=# vacuum full;
>> VACUUM
>> robox=# explain analyze
>> robox-# select autoinc, fileversion
>> robox-# from updates
>> robox-# where filename = 'Robox.exe'
>> robox-# order by autoinc desc;
>>   QUERY PLAN
>> 
>> 
>> --
>>  Sort  (cost=12.79..12.79 rows=3 width=12) (actual time=0.047..0.047
>> rows=3
>> loops=1)
>>Sort Key: autoinc DESC
>>Sort Method: quicksort  Memory: 25kB
>>->  Bitmap Heap Scan on updates  (cost=4.30..12.76 rows=3 width=12)
>> (actual time=0.040..0.040 rows=3 loops=1)
>>  Recheck Cond: (filename = 'Robox.exe'::text)
>>  Heap Blocks: exact=1
>>  ->  Bitmap Index Scan on update_filename  (cost=0.00..4.30 rows=3
>> width=0) (actual time=0.035..0.035 rows=3 loops=1)
>>Index Cond: (filename = 'Robox.exe'::text)
>>  Planning time: 1.873 ms
>>  Execution time: 0.076 ms
>> (10 rows)
>>
>>
>> I appreciate any help on having right index(es) as I simply failed myself.
>>
>> Regards,
>> Ertan Küçükoğlu
>>
>> *First, you do not need index "updates_autoinc", since autoinc is the
> Primary Key, you are just duplicating the index.*
>
>
> Is that true even if that index is a descending one?
>
>
> *As far as "Index only scan" , since the table only has 2003 rows, the
> optimizer has determined it is faster just to*
> *load all the rows into memory and then filter. If you really want to
> force an index scan, then you would have to do*
> *SET enable_seqscan = off; Before doing the query, however you are just
> shooting yourself in the foot by doing that*
> *as it will make the query slower.*
>
>
> I will try to load up more dummy rows to overflow the work_mem and observe
> results.
>
> Sorry, my question was misleading. I do not want to use "set
> enable_seqscan = off" I want to be sure that when necessary (record count
> 

Re: [GENERAL] Help on Index only scan

2017-08-13 Thread Ertan Küçükoğlu

> On 14 Aug 2017, at 01:15, Melvin Davidson  wrote:
> 
> 
>> On Sun, Aug 13, 2017 at 5:59 PM, Ertan Küçükoğlu 
>>  wrote:
>> Hello,
>> 
>> My table details:
>> robox=# \dS+ updates
>>Table "public.updates"
>> Column |  Type   | Modifiers
>> | Storage  | Stats target | Description
>> ---+-+--
>> -+--+--+-
>>  autoinc   | integer | not null default
>> nextval('updates_autoinc_seq'::regclass) | plain|  |
>>  filename  | text|
>> | extended |  |
>>  dateofrelease | date|
>> | plain|  |
>>  fileversion   | text|
>> | extended |  |
>>  afile | text|
>> | extended |  |
>>  filehash  | text|
>> | extended |  |
>>  active| boolean |
>> | plain|  |
>> Indexes:
>> "updates_pkey" PRIMARY KEY, btree (autoinc)
>> "update_filename" btree (filename)
>> "updates_autoinc" btree (autoinc DESC)
>> "updates_dateofrelease" btree (dateofrelease)
>> "updates_filename_dateofrelease" btree (filename, dateofrelease)
>> 
>> 
>> robox=# select count(autoinc) from updates;
>>  count
>> ---
>>   2003
>> (1 row)
>> 
>> robox=# select autoinc, filename, fileversion from updates limit 10;
>>  autoinc | filename | fileversion
>> -+--+-
>>   18 | Robox.exe| 1.0.1.218
>>   19 | Robox.exe| 1.0.1.220
>>   20 | Robox.exe| 1.0.1.220
>>   21 | 8423bfc5a669864f9b66b6b15ce908b9 | 1.1.1.1
>>   22 | 4fdabb0c7adbc5a89fbe679ce76ccef9 | 1.1.1.1
>>   23 | f469d77bfa86c8917c7846c0f871137c | 1.1.1.1
>>   24 | bc10af4c8789718a9ca6565ea14cb17d | 1.1.1.1
>>   25 | d9f87ee46cdb41cd15c2f71ed599faf9 | 1.1.1.1
>>   26 | 6f7428a5364aae1d5914a66cba3e6f3b | 1.1.1.1
>>   27 | 66ec4cdb8d64ca1414f75c1fb9eaa518 | 1.1.1.1
>> (10 rows)
>> 
>> I want to have an index only scan for my below query:
>> select autoinc, fileversion from updates where filename = 'Robox.exe' order
>> by autoinc desc;
>> 
>> I simply could not understand planner and cannot provide right index for it.
>> Below index names "update_filename" and "updates_autoinc" are added just for
>> the query that I would like to have a index only scan plan. I also failed
>> with following indexes
>> "autoinc desc, filename, fileversion"
>> "autoinc desc, filename"
>> 
>> First 3 rows in above select results are actual data. You will find that I
>> have inserted about 2000 rows of dummy data to have somewhat meaningful plan
>> for the query.
>> 
>> Current planner result:
>> robox=# vacuum full;
>> VACUUM
>> robox=# explain analyze
>> robox-# select autoinc, fileversion
>> robox-# from updates
>> robox-# where filename = 'Robox.exe'
>> robox-# order by autoinc desc;
>>   QUERY PLAN
>> 
>> --
>>  Sort  (cost=12.79..12.79 rows=3 width=12) (actual time=0.047..0.047 rows=3
>> loops=1)
>>Sort Key: autoinc DESC
>>Sort Method: quicksort  Memory: 25kB
>>->  Bitmap Heap Scan on updates  (cost=4.30..12.76 rows=3 width=12)
>> (actual time=0.040..0.040 rows=3 loops=1)
>>  Recheck Cond: (filename = 'Robox.exe'::text)
>>  Heap Blocks: exact=1
>>  ->  Bitmap Index Scan on update_filename  (cost=0.00..4.30 rows=3
>> width=0) (actual time=0.035..0.035 rows=3 loops=1)
>>Index Cond: (filename = 'Robox.exe'::text)
>>  Planning time: 1.873 ms
>>  Execution time: 0.076 ms
>> (10 rows)
>> 
>> 
>> I appreciate any help on having right index(es) as I simply failed myself.
>> 
>> Regards,
>> Ertan Küçükoğlu
>> 
> 
> First, you do not need index "updates_autoinc", since autoinc is the Primary 
> Key, you are just duplicating the index.

Is that true even if that index is a descending one?

> 
> As far as "Index only scan" , since the table only has 2003 rows, the 
> optimizer has determined it is faster just to
> load all the rows into memory and then filter. If you really want to force an 
> index scan, then you would have to do
> SET enable_seqscan = off; Before doing the query, however you are just 
> shooting yourself in the foot by doing that
> as it will make the query slower.

I will try to load up more dummy rows to overflow the work_mem and observe 
results.

Sorry, my question was misleading. I do not want to use "set enable_seqscan = 
off" I want to be sure that when necessary (record count increases) relevant 
index(es) will be used.

Obviously I still can't read query plan as I did not understand that operation 

Re: [GENERAL] Help on Index only scan

2017-08-13 Thread Melvin Davidson
On Sun, Aug 13, 2017 at 5:59 PM, Ertan Küçükoğlu <
ertan.kucuko...@1nar.com.tr> wrote:

> Hello,
>
> My table details:
> robox=# \dS+ updates
>Table "public.updates"
> Column |  Type   | Modifiers
> | Storage  | Stats target | Description
> ---+-+--
> 
> -+--+--+-
>  autoinc   | integer | not null default
> nextval('updates_autoinc_seq'::regclass) | plain|  |
>  filename  | text|
> | extended |  |
>  dateofrelease | date|
> | plain|  |
>  fileversion   | text|
> | extended |  |
>  afile | text|
> | extended |  |
>  filehash  | text|
> | extended |  |
>  active| boolean |
> | plain|  |
> Indexes:
> "updates_pkey" PRIMARY KEY, btree (autoinc)
> "update_filename" btree (filename)
> "updates_autoinc" btree (autoinc DESC)
> "updates_dateofrelease" btree (dateofrelease)
> "updates_filename_dateofrelease" btree (filename, dateofrelease)
>
>
> robox=# select count(autoinc) from updates;
>  count
> ---
>   2003
> (1 row)
>
> robox=# select autoinc, filename, fileversion from updates limit 10;
>  autoinc | filename | fileversion
> -+--+-
>   18 | Robox.exe| 1.0.1.218
>   19 | Robox.exe| 1.0.1.220
>   20 | Robox.exe| 1.0.1.220
>   21 | 8423bfc5a669864f9b66b6b15ce908b9 | 1.1.1.1
>   22 | 4fdabb0c7adbc5a89fbe679ce76ccef9 | 1.1.1.1
>   23 | f469d77bfa86c8917c7846c0f871137c | 1.1.1.1
>   24 | bc10af4c8789718a9ca6565ea14cb17d | 1.1.1.1
>   25 | d9f87ee46cdb41cd15c2f71ed599faf9 | 1.1.1.1
>   26 | 6f7428a5364aae1d5914a66cba3e6f3b | 1.1.1.1
>   27 | 66ec4cdb8d64ca1414f75c1fb9eaa518 | 1.1.1.1
> (10 rows)
>
> I want to have an index only scan for my below query:
> select autoinc, fileversion from updates where filename = 'Robox.exe' order
> by autoinc desc;
>
> I simply could not understand planner and cannot provide right index for
> it.
> Below index names "update_filename" and "updates_autoinc" are added just
> for
> the query that I would like to have a index only scan plan. I also failed
> with following indexes
> "autoinc desc, filename, fileversion"
> "autoinc desc, filename"
>
> First 3 rows in above select results are actual data. You will find that I
> have inserted about 2000 rows of dummy data to have somewhat meaningful
> plan
> for the query.
>
> Current planner result:
> robox=# vacuum full;
> VACUUM
> robox=# explain analyze
> robox-# select autoinc, fileversion
> robox-# from updates
> robox-# where filename = 'Robox.exe'
> robox-# order by autoinc desc;
>   QUERY PLAN
> 
> 
> --
>  Sort  (cost=12.79..12.79 rows=3 width=12) (actual time=0.047..0.047 rows=3
> loops=1)
>Sort Key: autoinc DESC
>Sort Method: quicksort  Memory: 25kB
>->  Bitmap Heap Scan on updates  (cost=4.30..12.76 rows=3 width=12)
> (actual time=0.040..0.040 rows=3 loops=1)
>  Recheck Cond: (filename = 'Robox.exe'::text)
>  Heap Blocks: exact=1
>  ->  Bitmap Index Scan on update_filename  (cost=0.00..4.30 rows=3
> width=0) (actual time=0.035..0.035 rows=3 loops=1)
>Index Cond: (filename = 'Robox.exe'::text)
>  Planning time: 1.873 ms
>  Execution time: 0.076 ms
> (10 rows)
>
>
> I appreciate any help on having right index(es) as I simply failed myself.
>
> Regards,
> Ertan Küçükoğlu
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

*First, you do not need index "updates_autoinc", since autoinc is the
Primary Key, you are just duplicating the index.*

*As far as "Index only scan" , since the table only has 2003 rows, the
optimizer has determined it is faster just to*
*load all the rows into memory and then filter. If you really want to force
an index scan, then you would have to do*
*SET enable_seqscan = off; Before doing the query, however you are just
shooting yourself in the foot by doing that*

*as it will make the query slower.*


-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-07 Thread Tom Lane
Ken Tanzer  writes:
>> FWIW, the business with making and editing a list file should work just
>> fine with a tar-format dump, not only with a custom-format dump.  The
>> metadata is all there in either case.

> The pg_dump doc page kinda suggests but doesn't quite say that you can't
> re-order tar files; between that and the error message I gave up on that
> possibility.  Are you suggesting it should work?

[ sorry for slow response ]

Ah, right: you can reorder simple object declarations, but you can't
change the relative order in which TABLE DATA objects are restored.
This is because the code doesn't support seeking in the tar file,
so it has to either read or skip each table-data subfile as it comes
to it.

It seems to me that that's just a small matter of programming to fix,
but few people use the tar format so nobody's bothered.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Adrian Klaver

On 06/05/2017 05:59 PM, Ken Tanzer wrote:




Not sure why just know that if I stay within the guidelines it
works, if I do not its does not work:)


That's fair enough, leaving aside the curiosity part.  Usually though 
the things you can't do just aren't allowed.  It's easier to overlook 
something that you shouldn't (but can) do!


Yes, what you ran into is just a subset of a bigger issue. That being, 
there are many ways you can dump a database and not get what you wanted 
on the restore. Another example, that is similar, is using the -n switch 
to pg_dump when you have cross schema references in the schema you did dump.





Ken








--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Ken Tanzer
>
> > So I can switch to Custom format for future backups.  But regarding the
> > existing backups I have in Tar format, is there any way to successfully
> > restore them?
>
> FWIW, the business with making and editing a list file should work just
> fine with a tar-format dump, not only with a custom-format dump.  The
> metadata is all there in either case.
>

I had tried that originally, but got an error:

bash-4.1$ pg_restore -L spc_restore_list.tmp -d spc_test_1
agency_backup.spc.2017.06.05_10.30.01.tar

pg_restore: [tar archiver] restoring data out of order is not supported in
this archive format: "10608.dat" is required, but comes before "10760.dat"
in the archive file.

The pg_dump doc page kinda suggests but doesn't quite say that you can't
re-order tar files; between that and the error message I gave up on that
possibility.  Are you suggesting it should work?

https://www.postgresql.org/docs/9.3/static/app-pgdump.html

The alternative archive file formats must be used with pg_restore
 to rebuild
the database. They allow pg_restore to be selective about what is restored,
or even to reorder the items prior to being restored. The archive file
formats are designed to be portable across architectures.

When used with one of the archive file formats and combined with pg_restore
, pg_dump provides a flexible archival and transfer mechanism. pg_dump can
be used to backup an entire database, then pg_restore can be used to
examine the archive and/or select which parts of the database are to be
restored. *The most flexible output file formats are the "custom" format
(-Fc) and the "directory" format(-Fd). They allow for selection and
reordering* of all archived items, support parallel restoration, and are
compressed by default. The "directory" format is the only format that
supports parallel dumps.
Cheers,
Ken
-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Tom Lane
Ken Tanzer  writes:
> ...The rest of the DB is fine, but tbl_payment has 0 rows.  I believe this is
> because tbl_payment has a constraint that calls a function has_perm() that
> relies on data in a couple of other tables, and that tbl_payment is being
> restored before those tables.  I was able to created a new dump in Custom
> format, reorder the List file, and restore that successfully.

> So I can switch to Custom format for future backups.  But regarding the
> existing backups I have in Tar format, is there any way to successfully
> restore them?

FWIW, the business with making and editing a list file should work just
fine with a tar-format dump, not only with a custom-format dump.  The
metadata is all there in either case.

As already noted, it's hard to get pg_dump/pg_restore to cope
automatically with hidden dependencies like what you have here.
The fact that those other tables would need to be restored first
simply isn't visible to pg_dump.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread David G. Johnston
On Mon, Jun 5, 2017 at 6:21 PM, Ken Tanzer  wrote:

> I do get the "make \d show relevant information" argument and that is one
>> that seems easier to solve...
>>
>
> Maybe I'm missing something, but I'm not sure how you'd solve this or
> change what \d shows for a table.  Right now I get to see this in my \d:
>
> "authorized_approvers_only" CHECK (approved_by IS NULL OR 
> has_perm(approved_by, 'APPROVE_PAYMENT'::character varying, 'W'::character
> varying))
>
> But when I move that to a trigger, I'll only see the trigger name.  Any
> while this procedure would be really short, others not so much, so you
> wouldn't really want to automatically display it inline.
>

​FWIW​

​I wouldn't show the trigger functions but I'd show something like:

CREATE ​trg_tbl2_exists_tbl3_missing_or_vice_versa
TRIGGER ON tbl1 CHANGES EXECUTE func_tbl1
REFERENCES tbl2 CHANGES EXECUTE func_tbl2
REFERENCES tbl3 CHANGES EXECUTE func_tbl3;

FOR tbl1
DEPENDS ON tbl2, tbl3 VIA TRIGGER
​trg_tbl2_exists_tbl3_missing_or_vice_versa

​FOR tbl2
DEPENDED ON BY tbl1 VIA TRIGGER ​​trg_tbl2_exists_tbl3_missing_or_vice_versa

FOR tbl3
DEPENDED ON BY tbl1 VIA TRIGGER ​​trg_tbl2_exists_tbl3_missing_or_vice_versa

I suspect the possibility to enforce that trigger execution doesn't touch
tables other than those specified.

​David J.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Ken Tanzer
>
> I do get the "make \d show relevant information" argument and that is one
> that seems easier to solve...
>

Maybe I'm missing something, but I'm not sure how you'd solve this or
change what \d shows for a table.  Right now I get to see this in my \d:

"authorized_approvers_only" CHECK (approved_by IS NULL OR
has_perm(approved_by, 'APPROVE_PAYMENT'::character varying,
'W'::character
varying))

But when I move that to a trigger, I'll only see the trigger name.  Any
while this procedure would be really short, others not so much, so you
wouldn't really want to automatically display it inline.

Ken

-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread David G. Johnston
On Mon, Jun 5, 2017 at 5:59 PM, Ken Tanzer  wrote:

> I can't really make this an FK.  I can (and probably will) put this into a
>>> trigger.  Although it seems like an extra layer of wrapping just to call a
>>> function.  I'm curious if there's any conceptual reason why constraints
>>> couldn't (as an option) be restored after all the data is loaded, and
>>> whether there would be any negative consequences of that?  I could see if
>>> your data still didn't pass the CHECKs, it's already loaded.  But the
>>> constraint could then be marked not valid?
>>>
>>
>> Not sure why just know that if I stay within the guidelines it works, if
>> I do not its does not work:)
>>
>>
> That's fair enough, leaving aside the curiosity part.  Usually though the
> things you can't do just aren't allowed.  It's easier to overlook something
> that you shouldn't (but can) do!
>
>
​I find in life most things that are prohibited are actually doable -
you're just punished if you get caught doing them.  In all seriousness
though I agree it would be nice if that's how this worked; but decades of
historical precedent makes actual preventive enforcement ​difficult if not
impossible.

Since "test your backups" covers this potential problem, and so many
possible others, any non-trivial effort to solve the actual problem is hard
to justify spending time on.

I do get the "make \d show relevant information" argument and that is one
that seems easier to solve, since adding explicit dependencies during
trigger creation would be a purely new feature.

David J.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Ken Tanzer
>
> I can't really make this an FK.  I can (and probably will) put this into a
>> trigger.  Although it seems like an extra layer of wrapping just to call a
>> function.  I'm curious if there's any conceptual reason why constraints
>> couldn't (as an option) be restored after all the data is loaded, and
>> whether there would be any negative consequences of that?  I could see if
>> your data still didn't pass the CHECKs, it's already loaded.  But the
>> constraint could then be marked not valid?
>>
>
> Not sure why just know that if I stay within the guidelines it works, if I
> do not its does not work:)
>
>
That's fair enough, leaving aside the curiosity part.  Usually though the
things you can't do just aren't allowed.  It's easier to overlook something
that you shouldn't (but can) do!



> See that, but in your scenario you wanted to create a 'scratch' database
> so you are back to a user with privileges.


>
Yeah, I was thinking pg_dump could just conjure it up in the ether (and
then discard it), but I can see that doesn't really work.


Basically, if you have no way to test your backup/restore procedure before
> hand you are flying blind.
>
>
In this case, we had tested the restore part.  But then we changed the DB
in a way that made it stop working.  Good reminder to retest that
periodically!

Ken




-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Adrian Klaver

On 06/05/2017 05:15 PM, Ken Tanzer wrote:
Thanks Adrian and David.  That all makes sense, and I gather the answer 
regarding the existing dumps is "no, they can't be restored."  So be 
it.  Here's a couple of follow-on comments::


Ideally figure out how to write an actual FK constraint - otherwise
use triggers.


I can't really make this an FK.  I can (and probably will) put this into 
a trigger.  Although it seems like an extra layer of wrapping just to 
call a function.  I'm curious if there's any conceptual reason why 
constraints couldn't (as an option) be restored after all the data is 
loaded, and whether there would be any negative consequences of that?  I 
could see if your data still didn't pass the CHECKs, it's already 
loaded.  But the constraint could then be marked not valid?


Not sure why just know that if I stay within the guidelines it works, if 
I do not its does not work:)





-1; pg_dump should not be trying to restore things.​  The core
developers shouldn't really concern themselves with the various and
sundry ways people might want to setup such a process.  You have
tools for dump, and tools for restore, and you can combine them in
whatever fashion you deem useful.  Or otherwise acquire someone
else's ideas.


I get that as a general principle.  OTOH, being able to restore your 
backups isn't just a random or inconsequential feature.  I have access 
to the superuser and can create DBs, but users in more locked down 
scenarios might not be able to do so.




See that, but in your scenario you wanted to create a 'scratch' database 
so you are back to a user with privileges.  Then there is the whole 
overhead of doing a restore twice. Basically, if you have no way to test 
your backup/restore procedure before hand you are flying blind.



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread John R Pierce

On 6/5/2017 5:49 PM, David G. Johnston wrote:
On Mon, Jun 5, 2017 at 5:40 PM, John R Pierce >wrote:


​i​
ndeed, any sort of constraint that invokes a function call which
looks at other tables could later be invalidated if those other
tables change, and postgres would be none the smarter.   the same
goes for trigger based checks.


​ Yes.  I could imagine a new kind of "multi-referential trigger" that 
would specify all relations it touches and the function to fire when 
each of them is updated.  While you'd still have to write the 
functions correctly it would at least allow one to explicitly model 
the multi-table dynamic in pg_catalog.  Lacking that CHECK is no worse 
than TRIGGER and we've decided to say "use triggers".



at $job, the policy is, AVOID ALL TRIGGERS AND FANCY CONSTRAINTS :)

they don't even like using foreign key references, and rely on code 
logic to do most joins in the performance-critical OLTP side of things.



--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread David G. Johnston
On Mon, Jun 5, 2017 at 5:40 PM, John R Pierce  wrote:

> ​i​
> ndeed, any sort of constraint that invokes a function call which looks at
> other tables could later be invalidated if those other tables change, and
> postgres would be none the smarter.   the same goes for trigger based
> checks.
>

​Yes.  I could imagine a new kind of "multi-referential trigger" that would
specify all relations it touches and the function to fire when each of them
is updated.  While you'd still have to write the functions correctly it
would at least allow one to explicitly model the multi-table dynamic in
pg_catalog.  Lacking that CHECK is no worse than TRIGGER and we've decided
to say "use triggers".

David J.​


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Ken Tanzer
>
> Aside from being a bit more verbose there is nothing useful that writing
> this as "CHECK function()" provides that you don't also get by writing
> "CREATE TRIGGER".
>

I agree you get the same result.  It may be a minor issue, but for me it is
convenient to see the logic spelled out when using \d on the table.

Cheers,
Ken

-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread John R Pierce

On 6/5/2017 5:32 PM, David G. Johnston wrote:
On Mon, Jun 5, 2017 at 5:15 PM, Ken Tanzer >wrote:


From the docs:
https://www.postgresql.org/docs/9.6/static/sql-createtable.html

"Currently, CHECK expressions cannot contain subqueries nor
refer to variables other than columns of the current row. The
system column tableoid may be referenced, but not any other
system column.


I wonder if that should say "should not," or be followed by
something like this:


Make it say "must not" and I'd agree to change the word "cannot" and 
leave the rest.  Adding a note regarding functions seems appropriate.


Aside from being a bit more verbose there is nothing useful that 
writing this as "CHECK function()" provides that you don't also get by 
writing "CREATE TRIGGER". In a green field we'd probably lock down 
CHECK a bit more but there is too much code that is technically wrong 
but correctly functioning that we don't want to break.  IOW, we cannot 
mandate that the supplied function be immutable even though we 
should.  And we don't even enforce immutable execution if a function 
is defined that way.



indeed, any sort of constraint that invokes a function call which looks 
at other tables could later be invalidated if those other tables change, 
and postgres would be none the smarter.   the same goes for trigger 
based checks.




--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread David G. Johnston
On Mon, Jun 5, 2017 at 5:15 PM, Ken Tanzer  wrote:

> From the docs:
>> https://www.postgresql.org/docs/9.6/static/sql-createtable.html
>> "Currently, CHECK expressions cannot contain subqueries nor refer to
>> variables other than columns of the current row. The system column tableoid
>> may be referenced, but not any other system column.
>
>
> I wonder if that should say "should not," or be followed by something like
> this:
>
>
Make it say "must not" and I'd agree to change the word "cannot" and leave
the rest.  Adding a note regarding functions seems appropriate.

Aside from being a bit more verbose there is nothing useful that writing
this as "CHECK function()" provides that you don't also get by writing
"CREATE TRIGGER". In a green field we'd probably lock down CHECK a bit more
but there is too much code that is technically wrong but correctly
functioning that we don't want to break.  IOW, we cannot mandate that the
supplied function be immutable even though we should.  And we don't even
enforce immutable execution if a function is defined that way.

​David J.​


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread John R Pierce

On 6/5/2017 5:15 PM, Ken Tanzer wrote:
I can't really make this an FK.  I can (and probably will) put this 
into a trigger.  Although it seems like an extra layer of wrapping 
just to call a function.  I'm curious if there's any conceptual reason 
why constraints couldn't (as an option) be restored after all the data 
is loaded, and whether there would be any negative consequences of 
that?  I could see if your data still didn't pass the CHECKs, it's 
already loaded.  But the constraint could then be marked not valid?



when you have constraints that rely on calling functions, how would it 
know what order to check things in ?



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Ken Tanzer
Thanks Adrian and David.  That all makes sense, and I gather the answer
regarding the existing dumps is "no, they can't be restored."  So be it.
Here's a couple of follow-on comments::

Ideally figure out how to write an actual FK constraint - otherwise use
> triggers.


I can't really make this an FK.  I can (and probably will) put this into a
trigger.  Although it seems like an extra layer of wrapping just to call a
function.  I'm curious if there's any conceptual reason why constraints
couldn't (as an option) be restored after all the data is loaded, and
whether there would be any negative consequences of that?  I could see if
your data still didn't pass the CHECKs, it's already loaded.  But the
constraint could then be marked not valid?


-1; pg_dump should not be trying to restore things.​  The core developers
> shouldn't really concern themselves with the various and sundry ways people
> might want to setup such a process.  You have tools for dump, and tools for
> restore, and you can combine them in whatever fashion you deem useful.  Or
> otherwise acquire someone else's ideas.


I get that as a general principle.  OTOH, being able to restore your
backups isn't just a random or inconsequential feature.  I have access to
the superuser and can create DBs, but users in more locked down scenarios
might not be able to do so.


>From the docs:
> https://www.postgresql.org/docs/9.6/static/sql-createtable.html
> "Currently, CHECK expressions cannot contain subqueries nor refer to
> variables other than columns of the current row. The system column tableoid
> may be referenced, but not any other system column.


I wonder if that should say "should not," or be followed by something like
this:

n.b., In CHECK expressions, Postgres will not prevent you from calling
functions that reference other rows or tables.  However, doing so may have
undesirable consequences, including the possible inability to restore from
output created by pg_dump.

(Are there other possible pitfalls too, or is that the only one?)

Cheers,
Ken


-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread Adrian Klaver

On 06/05/2017 03:35 PM, Ken Tanzer wrote:

On 9.3.17, I tried to restore a tar from pg_dump.  It failed thusly:

bash-4.1$ pg_restore -d spc_test_1 agency_backup.spc.2017.06.05_10.30.01.tar

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 10608; 0 107743 TABLE 
DATA tbl_payment spc
pg_restore: [archiver (db)] COPY failed for table "tbl_payment": ERROR: 
  new row for relation "tbl_payment" violates check constraint 
"authorized_approvers_only"
DETAIL:  Failing row contains (286541, 3685, 2015-09-14, ADJUST, null, 
null, 137798, 93.00, HONEY, 4841, 2, SHONCRE, September adjustment, 
2015-10-01, null, null, null, null, null, f, f, t, f, f, f, f, null, 
null, null, null, 6, 2015-09-14 16:43:37, 25, 2016-02-08 16:34:20, f, 
null, null, null, Adjusting approved_at to changed_at for first few 
approvals

, 6, 2015-09-14 16:43:37, 2015-09-17).
CONTEXT:  COPY tbl_payment, line 179785: "2865413685   
  2015-09-14  ADJUST  \N  \N  137798  93.00   HONEY   48412

 SHONCRE September adjustment2015-10-0..."
WARNING: errors ignored on restore: 1

The rest of the DB is fine, but tbl_payment has 0 rows.  I believe this 
is because tbl_payment has a constraint that calls a function has_perm() 
that relies on data in a couple of other tables, and that tbl_payment is 
being restored before those tables.  I was able to created a new dump in 
Custom format, reorder the List file, and restore that successfully.


See this thread for more info:
https://www.postgresql.org/message-id/alpine.DEB.2.20.1703311620581.12863%40tglase.lan.tarent.de

From the docs:

https://www.postgresql.org/docs/9.6/static/sql-createtable.html

"Currently, CHECK expressions cannot contain subqueries nor refer to 
variables other than columns of the current row. The system column 
tableoid may be referenced, but not any other system column.




So I can switch to Custom format for future backups.  But regarding the 
existing backups I have in Tar format, is there any way to successfully 
restore them?  Specifically:


  * Any way to ignore or delay constraint checking?  Something like
disable-triggers?

  * Any way to tell pg_restore to skip past the failing row, and restore
the rest of what was in tbl_payment?

  * Some other way to go about this?


Change the check constraint to a trigger.



I also wonder if you folks might consider adding something like a 
--test_restore option to pg_dump that would attempt to create a new 
(scratch) DB from the output it creates, and report any errors?  I know 


Not that I know of. It would be easy enough to point pg_restore at your 
own scratch database for testing purposes.


the pieces are all there for us users to do that ourselves, but it would 
be handy for automated backups and might help us to avoid creating 
backups that won't restore successfully.  In my case, I think the 
problem started from changes we made about 9 months ago, and happily I 
discovered it during development/testing and not after a DB crash, which 
is why I'm also happily not gouging my eyeballs out right now. :)


Cheers, and thanks in advance!

Ken


--
AGENCY Software
A Free Software data system
By and for non-profits
/http://agency-software.org//
/https://agency-software.org/demo/client/
ken.tan...@agency-software.org 


(253) 245-3801

Subscribe to the mailing list 
 to

learn more about AGENCY or
follow the discussion.



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

2017-06-05 Thread David G. Johnston
On Mon, Jun 5, 2017 at 3:35 PM, Ken Tanzer  wrote:

> I believe this is because tbl_payment has a constraint that calls a
> function has_perm() that relies on data in a couple of other tables
>

​Indeed this is the cause.  That configuration is not supported.  If you
need to lookup values in other tables you either need to use an actual FK
constraint or create a trigger for the validation.


> So I can switch to Custom format for future backups.  But regarding the
> existing backups I have in Tar format, is there any way to successfully
> restore them?  Specifically:
>
>- Any way to ignore or delay constraint checking?  Something like
>disable-triggers?
>
> ​Using and then disabling triggers is the "closest" solution​.

>
>- Any way to tell pg_restore to skip past the failing row, and restore
>the rest of what was in tbl_payment?
>
> ​No, COPY doesn't have that capability and that is what is being used
under the hood.

>
>- Some other way to go about this?
>
> ​Ideally figure out how to write an actual FK constraint - otherwise use
triggers.​


> I also wonder if you folks might consider adding something like a
> --test_restore option to pg_dump
>

-1; pg_dump should not be trying to restore things.​  The core developers
shouldn't really concern themselves with the various and sundry ways people
might want to setup such a process.  You have tools for dump, and tools for
restore, and you can combine them in whatever fashion you deem useful.  Or
otherwise acquire someone else's ideas.

​David J.​


Re: [GENERAL] Help with terminology to describe what my software does please?

2017-05-28 Thread Neil Anderson
>> Cluster comparison would only occur if you have two or more clusters on
>> the same server, although it's possible to compare across servers,
>
>
> Explain, because as I understand it a server = one cluster:
>

I think he was using server in the server=one machine sense, ie a
single machine/server can have multiple clusters/database servers.

> https://www.postgresql.org/docs/9.6/static/app-pg-ctl.html
>
> "The init or initdb mode creates a new PostgreSQL database cluster. A
> database cluster is a collection of databases that are managed by a single
> server instance. This mode invokes the initdb command. See initdb for
> details."
>
>> but that would involve a lot more work. AFAIK, the only differences for a
>> cluster would be:
>> 1. PostgreSQL version
>> 2. path to database
>> 3. database users (note: it is also possible to make users database
>> specific)
>> 4. list of defined databases
>
>
> And anything different below the above, I am thinking checking a dev cluster
> against a production cluster.
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com

-- 
Neil Anderson
n...@postgrescompare.com
https://www.postgrescompare.com



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with terminology to describe what my software does please?

2017-05-28 Thread Neil Anderson
>
>
> Cluster comparison would only occur if you have two or more clusters on
> the same server, although it's possible to compare across servers,
> but that would involve a lot more work. AFAIK, the only differences for a
> cluster would be:
> 1. PostgreSQL version
> 2. path to database
> 3. database users (note: it is also possible to make users database
> specific)
> 4. list of defined databases
>

I was considering configuration settings to be at the cluster level too.
Stuff from pg_settings or pg_config. Also I think tablespaces are at that
level too. What do you think?


> Database comparison would involve db names, owners, encodings, tablespaces
> and acl's
> You might also want to include sizes. You can use the following two
> queries to help
> with that
>
> SELECT db.datname,
>au.rolname as datdba,
>pg_encoding_to_char(db.encoding) as encoding,
>db.datallowconn,
>db.datconnlimit,
>db.datfrozenxid,
>tb.spcname as tblspc,
>db.datacl
>   FROM pg_database db
>   JOIN pg_authid au ON au.oid = db.datdba
>   JOIN pg_tablespace tb ON tb.oid = db.dattablespace
>  ORDER BY 1;
>
> SELECT datname,
>pg_size_pretty(pg_database_size(datname))as size_pretty,
>pg_database_size(datname) as size,
>(SELECT pg_size_pretty (SUM( pg_database_size(datname))::bigint)
>   FROM pg_database)  AS total,
>((pg_database_size(datname) / (SELECT SUM(
> pg_database_size(datname))
>FROM pg_database) ) *
> 100)::numeric(6,3) AS pct
>   FROM pg_database
>   ORDER BY datname;
>

That's a great idea! Thanks for the info.


>
>

>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.
>



-- 
Neil Anderson
n...@postgrescompare.com
https://www.postgrescompare.com


Re: [GENERAL] Help with terminology to describe what my software does please?

2017-05-28 Thread Adrian Klaver

On 05/28/2017 07:53 AM, Melvin Davidson wrote:











Cluster comparison would only occur if you have two or more clusters on 
the same server, although it's possible to compare across servers,


Explain, because as I understand it a server = one cluster:

https://www.postgresql.org/docs/9.6/static/app-pg-ctl.html

"The init or initdb mode creates a new PostgreSQL database cluster. A 
database cluster is a collection of databases that are managed by a 
single server instance. This mode invokes the initdb command. See initdb 
for details."


but that would involve a lot more work. AFAIK, the only differences for 
a cluster would be:

1. PostgreSQL version
2. path to database
3. database users (note: it is also possible to make users database 
specific)

4. list of defined databases


And anything different below the above, I am thinking checking a dev 
cluster against a production cluster.




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with terminology to describe what my software does please?

2017-05-28 Thread Tom Lane
Neil Anderson  writes:
> I guess I don't know what is the most common way to say that it
> compares everything but the data. Any suggestions from your
> experience?

FWIW, I think it's pretty common to use "schema" in an abstract way
to mean "the structure of your database", ie everything but the data.
(It's unfortunate that the SQL standard commandeered the word to
mean a database namespace; but it's not like there are no other words
with more than one meaning.)

So I don't see any big problem with calling your tool a schema comparator.
You could maybe make your docs a bit clearer if you consistently refer
to the namespace objects as "SQL schemas", reserving the generic term
for the generic meaning.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with terminology to describe what my software does please?

2017-05-28 Thread Melvin Davidson
On Sun, May 28, 2017 at 9:51 AM, Adrian Klaver 
wrote:

> On 05/28/2017 05:49 AM, Neil Anderson wrote:
>
>> Hi,
>>
>> I'm working on a tool that can compare the properties of Postgres
>> objects from different instances, finding the differences and
>> outputting the update SQL.
>>
>> It can compare objects that are defined at the cluster, database or
>> schema level. As such I'm finding it difficult to describe what the
>> tool does simply and accurately. I've tried 'compares PostgreSQL
>> schemas' but that doesn't capture the database and cluster parts,
>> 'compares PostgreSQL schema and database objects'. That sort of thing.
>> Right now I have a mix of terms on my website and I would prefer to
>> tighten it up.
>>
>> I guess I don't know what is the most common way to say that it
>> compares everything but the data. Any suggestions from your
>> experience?
>>
>
> From above the first sentence of the second paragraph seems to me the best
> description of what you are doing.
>
>
>> Thanks,
>> Neil
>>
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Cluster comparison would only occur if you have two or more clusters on the
same server, although it's possible to compare across servers,
but that would involve a lot more work. AFAIK, the only differences for a
cluster would be:
1. PostgreSQL version
2. path to database
3. database users (note: it is also possible to make users database
specific)
4. list of defined databases

Database comparison would involve db names, owners, encodings, tablespaces
and acl's
You might also want to include sizes. You can use the following two queries
to help
with that

SELECT db.datname,
   au.rolname as datdba,
   pg_encoding_to_char(db.encoding) as encoding,
   db.datallowconn,
   db.datconnlimit,
   db.datfrozenxid,
   tb.spcname as tblspc,
   db.datacl
  FROM pg_database db
  JOIN pg_authid au ON au.oid = db.datdba
  JOIN pg_tablespace tb ON tb.oid = db.dattablespace
 ORDER BY 1;

SELECT datname,
   pg_size_pretty(pg_database_size(datname))as size_pretty,
   pg_database_size(datname) as size,
   (SELECT pg_size_pretty (SUM( pg_database_size(datname))::bigint)
  FROM pg_database)  AS total,
   ((pg_database_size(datname) / (SELECT SUM(
pg_database_size(datname))
   FROM pg_database) ) *
100)::numeric(6,3) AS pct
  FROM pg_database
  ORDER BY datname;

 schema comparison is a lot more complication as it involves comparing
 collations
 domains
 functions
 trigger functions
 sequences
 tables
 types
 views

-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Help with terminology to describe what my software does please?

2017-05-28 Thread Adrian Klaver

On 05/28/2017 05:49 AM, Neil Anderson wrote:

Hi,

I'm working on a tool that can compare the properties of Postgres
objects from different instances, finding the differences and
outputting the update SQL.

It can compare objects that are defined at the cluster, database or
schema level. As such I'm finding it difficult to describe what the
tool does simply and accurately. I've tried 'compares PostgreSQL
schemas' but that doesn't capture the database and cluster parts,
'compares PostgreSQL schema and database objects'. That sort of thing.
Right now I have a mix of terms on my website and I would prefer to
tighten it up.

I guess I don't know what is the most common way to say that it
compares everything but the data. Any suggestions from your
experience?


From above the first sentence of the second paragraph seems to me the 
best description of what you are doing.




Thanks,
Neil





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-16 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, May 16, 2017 at 10:00 AM, Devrim Gündüz  wrote:
>> Not sure whether we should *fix* this or not on RPM side. This may break
>> some of the existing installations, right?

> Changing that in a minor version seems like a *really* bad idea, because
> things *will* break. The way it is now it only breaks in case of a major
> version upgrade, and there is an easy enough workaround present.

Yeah, you don't have a lot of room in a minor release to make changes
that would affect this.

What Red Hat did about this, when I worked there, was to back-port the
unix_socket_directories patch from 9.3 into earlier branches, and then
set up the default server configuration to create sockets in both
/var/run/postgresql and /tmp.  But even if you did that, it'd require
an upgrade of the 9.2 installation before it would play nice with a
9.6 libpq, so that might be surprising.  (It would also break existing
9.2 installations that were explicitly setting unix_socket_directory,
but we can hope there are very few of those.)

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-16 Thread Adrian Klaver

On 05/16/2017 01:00 AM, Devrim Gündüz wrote:


Hi,

On Mon, 2017-05-15 at 22:35 -0700, Ken Tanzer wrote:

https://redmine.postgresql.org/issues/2409


Not sure whether we should *fix* this or not on RPM side. This may break some
of the existing installations, right?

I'm not objecting, just asking for opinions.


To me the principle of least surprise says that it should be fixed. At 
this point a pre-9.4 server is putting its socket where the primary 
client library(libpq) to said server cannot find it if a 9.4+ server is 
installed. The options seem to be:


1) Use the libpq appropriate for each Postgres version.

2) Modify the postgresql.conf to point at the socket directory that the 
controlling libpq is looking for. I could see this being messy.


3) Document the change in behavior. Possibly here:

https://www.postgresql.org/download/linux/redhat/

PostgreSQL Yum Repository



Regards,




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-16 Thread Magnus Hagander
On Tue, May 16, 2017 at 10:00 AM, Devrim Gündüz  wrote:

>
> Hi,
>
> On Mon, 2017-05-15 at 22:35 -0700, Ken Tanzer wrote:
> > https://redmine.postgresql.org/issues/2409
>
> Not sure whether we should *fix* this or not on RPM side. This may break
> some
> of the existing installations, right?
>
> I'm not objecting, just asking for opinions.
>
>
Changing that in a minor version seems like a *really* bad idea, because
things *will* break. The way it is now it only breaks in case of a major
version upgrade, and there is an easy enough workaround present.

But it should perhaps be more clearly documented somewhere.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-16 Thread Devrim Gündüz

Hi,

On Mon, 2017-05-15 at 22:35 -0700, Ken Tanzer wrote:
> https://redmine.postgresql.org/issues/2409

Not sure whether we should *fix* this or not on RPM side. This may break some
of the existing installations, right?

I'm not objecting, just asking for opinions.

Regards,
-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Ken Tanzer
On Mon, May 15, 2017 at 4:45 PM, Adrian Klaver 
wrote:

> On 05/15/2017 01:40 PM, Ken Tanzer wrote:
>
>
>
>> But let me ask, is there a big warning about this somewhere I missed?
>> Can the 9.2 updates do something to fix this, or at least create a warning
>> or an RPMNEW file?  I'm happy this is a cloud server and that I worked on a
>> copy.  However, in different circumstances I might well have reasoned
>> "well, installing the 9.6 packages really should be safe for 9.2, since
>> they're clearly meant to exist side-by-side."  And then have a setup that
>> no longer worked as it once did.  With an RHEL clone and PGDG packages
>> straight from the horses mouth, I'd have higher expectations than that.
>> Only because of the great work y'all do! ;)
>>
>
> Might want to file an issue here:
>
> https://redmine.postgresql.org/projects/pgrpms/
>
> You will need a Postgres community account, which you can sign up for on
> the same page.
>
>
>>
Done, and thanks for pointing me to the tracker.

https://redmine.postgresql.org/issues/2409

Cheers,
Ken


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Adrian Klaver

On 05/15/2017 01:40 PM, Ken Tanzer wrote:




But let me ask, is there a big warning about this somewhere I missed?  
Can the 9.2 updates do something to fix this, or at least create a 
warning or an RPMNEW file?  I'm happy this is a cloud server and that I 
worked on a copy.  However, in different circumstances I might well have 
reasoned "well, installing the 9.6 packages really should be safe for 
9.2, since they're clearly meant to exist side-by-side."  And then have 
a setup that no longer worked as it once did.  With an RHEL clone and 
PGDG packages straight from the horses mouth, I'd have higher 
expectations than that.  Only because of the great work y'all do! ;)


Might want to file an issue here:

https://redmine.postgresql.org/projects/pgrpms/

You will need a Postgres community account, which you can sign up for on 
the same page.




Cheers,
Ken






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Devrim Gündüz

Hi,

On Mon, 2017-05-15 at 16:34 -0400, Tom Lane wrote:
> > bash-4.1$ /usr/pgsql-9.2/bin/psql -p 5432
> > psql: could not connect to server: Connection refused
> >    Is the server running locally and accepting
> >    connections on Unix domain socket
> > "/var/run/postgresql/.s.PGSQL.5432"?
> 
> The default is actually compiled into libpq.so, not psql itself.
> So I'm thinking what's happening here is the 9.2 psql is picking
> up a libpq.so supplied by 9.6.

Yeah, sorry, my bad. I forgot that the RPMs also put a file under
/etc/ld.so.conf.d, so that the latest libpq is picked up.

Regards,
-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Ken Tanzer
>
>
>> Workarounds:
>>
>> * You can connect to 9.2 using /usr/pgsql-9.2/bin/psql command. It knows
>> the
>> old socket directory.
>>
>
> That was where I was going until I saw this in the OP:
>
> bash-4.1$ /usr/pgsql-9.2/bin/psql -p 5432
> psql: could not connect to server: Connection refused
> Is the server running locally and accepting
> connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.
> 5432"?
>
>
>
>> * Pass -h /tmp to 9.6's psql, so that it connects to 9.2 instance.
>>
>> -HTH
>>
>> Regards,
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Thanks everyone for the replies.  Adrian is right--I did try this with the
9.2 binaries, with the same problem.  But to address Tom's question (and if
I'm using ldd properly), the 9.2 psql binary is using the 9.6 libpq.

[root@centos-new postgresql]# ldd /usr/bin/psql | grep libpq
libpq.so.5 => /usr/pgsql-9.6/lib/libpq.so.5 (0x7f2e6c99a000)
[root@centos-new postgresql]# ldd /usr/pgsql-9.2/bin/psql | grep libpq
libpq.so.5 => /usr/pgsql-9.6/lib/libpq.so.5 (0x7f52f9c67000)

Devrim--the -h /tmp option works great.

I still wanted this to just "work" though, for scripts and such.  I
specified the socket directory in the 9.2 postgresql.conf, and it seems to
be working "normally" now.

But let me ask, is there a big warning about this somewhere I missed?  Can
the 9.2 updates do something to fix this, or at least create a warning or
an RPMNEW file?  I'm happy this is a cloud server and that I worked on a
copy.  However, in different circumstances I might well have reasoned
"well, installing the 9.6 packages really should be safe for 9.2, since
they're clearly meant to exist side-by-side."  And then have a setup that
no longer worked as it once did.  With an RHEL clone and PGDG packages
straight from the horses mouth, I'd have higher expectations than that.
Only because of the great work y'all do! ;)

Cheers,
Ken



-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://agency-software.org/demo/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Tom Lane
Adrian Klaver  writes:
> On 05/15/2017 01:10 PM, Devrim Gündüz wrote:
>> * You can connect to 9.2 using /usr/pgsql-9.2/bin/psql command. It knows the
>> old socket directory.

> That was where I was going until I saw this in the OP:

> bash-4.1$ /usr/pgsql-9.2/bin/psql -p 5432
> psql: could not connect to server: Connection refused
>   Is the server running locally and accepting
>   connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

The default is actually compiled into libpq.so, not psql itself.
So I'm thinking what's happening here is the 9.2 psql is picking
up a libpq.so supplied by 9.6.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Adrian Klaver

On 05/15/2017 01:10 PM, Devrim Gündüz wrote:


Hi,

On Mon, 2017-05-15 at 12:55 -0700, Ken Tanzer wrote:

Hi.  On a Centos 6.9 server (in the cloud with Rackspace), I'm wanting to
install PGDG 9.6 alongside the already-running 9.2.  After installing the
9.6 packages (and even before doing an initdb), I am no
longer able to make a local connection to the 9.2 server.  Instead I get
the message:

psql: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

That socket file does not exist on the server. (And in fact, the
/var/run/postgresql directory didn't exist before installing 9.6).  When I
configure 9.6 to use port 5433 and run it, it does create that socket for
5433.  I tried creating such a socket manually for 5432, but that didn't
seem to change anything.

Any help in getting this working and/or pointing out what I'm missing would
be great.  I'm also confused conceptually about what is happening here.
What is it that the installation (but not execution) of 9.6 does that's
blocking the local 9.2 access?  I'm guessing it's gotta be something in the
RPM install scripts.


PGDG RPMs use alternatives method, to replace some binaries that can be used
across multiple PostgreSQL versions, and psql is one of them. When you install
9.6, 9.6's psql has higher priority than 9.2, so that one is used -- and 9.4+
are complied with a patch that changes default socket directory from /tmp to
/var/run/postgresql, and 9.2 is not aware of that.


Workarounds:

* You can connect to 9.2 using /usr/pgsql-9.2/bin/psql command. It knows the
old socket directory.


That was where I was going until I saw this in the OP:

bash-4.1$ /usr/pgsql-9.2/bin/psql -p 5432
psql: could not connect to server: Connection refused
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?




* Pass -h /tmp to 9.6's psql, so that it connects to 9.2 instance.

-HTH

Regards,




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Devrim Gündüz

Hi,

On Mon, 2017-05-15 at 12:55 -0700, Ken Tanzer wrote:
> Hi.  On a Centos 6.9 server (in the cloud with Rackspace), I'm wanting to
> install PGDG 9.6 alongside the already-running 9.2.  After installing the
> 9.6 packages (and even before doing an initdb), I am no
> longer able to make a local connection to the 9.2 server.  Instead I get
> the message:
> 
> psql: could not connect to server: Connection refused
> Is the server running locally and accepting
> connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
> 
> That socket file does not exist on the server. (And in fact, the
> /var/run/postgresql directory didn't exist before installing 9.6).  When I
> configure 9.6 to use port 5433 and run it, it does create that socket for
> 5433.  I tried creating such a socket manually for 5432, but that didn't
> seem to change anything.
> 
> Any help in getting this working and/or pointing out what I'm missing would
> be great.  I'm also confused conceptually about what is happening here.
> What is it that the installation (but not execution) of 9.6 does that's
> blocking the local 9.2 access?  I'm guessing it's gotta be something in the
> RPM install scripts.

PGDG RPMs use alternatives method, to replace some binaries that can be used
across multiple PostgreSQL versions, and psql is one of them. When you install
9.6, 9.6's psql has higher priority than 9.2, so that one is used -- and 9.4+
are complied with a patch that changes default socket directory from /tmp to
/var/run/postgresql, and 9.2 is not aware of that.


Workarounds:

* You can connect to 9.2 using /usr/pgsql-9.2/bin/psql command. It knows the
old socket directory.

* Pass -h /tmp to 9.6's psql, so that it connects to 9.2 instance.

-HTH

Regards,

-- 
Devrim Gündüz
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Tom Lane
Ken Tanzer  writes:
> Hi.  On a Centos 6.9 server (in the cloud with Rackspace), I'm wanting to
> install PGDG 9.6 alongside the already-running 9.2.  After installing the
> 9.6 packages (and even before doing an initdb), I am no
> longer able to make a local connection to the 9.2 server.  Instead I get
> the message:

> psql: could not connect to server: Connection refused
> Is the server running locally and accepting
> connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Where is the 9.2 server making its socket ... /tmp ?

What it looks like is that you've started to use a libpq.so that is
following the Red Hat convention of putting the socket file in
/var/run/postgresql, rather than /tmp.  I do not know exactly where
the PGDG packages stand on that theological issue, or whether they
changed between 9.2 and 9.6.  But the first step would be to use
"ldd" to see which libpq your invoked psql is pulling in.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help: Installing 9.6 breaks local connections to 9.2 on Centos 6.9

2017-05-15 Thread Justin Pryzby
On Mon, May 15, 2017 at 12:55:48PM -0700, Ken Tanzer wrote:
> Hi.  On a Centos 6.9 server (in the cloud with Rackspace), I'm wanting to
> install PGDG 9.6 alongside the already-running 9.2.  After installing the
> 9.6 packages (and even before doing an initdb), I am no
> longer able to make a local connection to the 9.2 server.  Instead I get
> the message:

See eg.
https://www.postgresql.org/message-id/21044.1326496...@sss.pgh.pa.us
https://www.postgresql.org/message-id/0a21bc93-7b9c-476e-aaf4-0ff71708e...@elevated-dev.com

I'm guessing you upgraded the client libraries, which probably change the
(default) socket path.

Your options are to specify path to the socket (maybe in /tmp for running
PG92?), change to TCP connection, or specify server option
unix_socket_directories.

Justin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with Trigger

2016-12-28 Thread Clifford Snow
Thank you for your suggestion which solved the problem. Much better
solution that what I was trying to accomplish. Much smaller table to query
since it only has one entry per user.

Clifford

On Wed, Dec 28, 2016 at 8:12 PM, Adrian Klaver 
wrote:

> On 12/28/2016 07:06 PM, Clifford Snow wrote:
>
>> I'm trying to write a trigger (my first) to update another table if the
>> user_id is new. But I'm getting a index exception that the user_id
>>
>
> What is the actual error message?
>
> already exists. I'm picking up data from another feed which gives
>> provides me with changes to the main database.
>>
>> what I have is
>>
>> CREATE OR REPLACE FUNCTION add_new_user()
>> RETURNS TRIGGER AS
>> $BODY$
>> DECLARE
>> commits RECORD;
>> BEGIN
>> SELECT INTO commits * FROM changes WHERE user_id = NEW.user_id;
>>
>
> In the above you are checking whether the changes table has the user_id
> and if does not then creating a new user in the user table below. Not sure
> how they are related, but from the description of the error it would seem
> they are not that tightly coupled. In other words just because the user_id
> does not exist in changes does not ensure it also absent from the table
> user. Off the top of head I would say the below might be a better query:
>
> SELECT INTO commits * FROM user WHERE user_id = NEW.user_id;
>
> Though it would help the debugging process if you showed the complete
> schema for both the changes and user tables.
>
>
> IF NOT FOUND
>> THEN
>> INSERT INTO user (user_name, user_id, change_id,
>> created_date)
>> VALUES(NEW.user_name, NEW.user_id,
>> NEW.change_id, NEW.created_date);
>> END IF;
>> RETURN NEW;
>> END;
>> $BODY$
>> LANGUAGE plpgsql;
>>
>> CREATE TRIGGER add_new_user_trigger
>> BEFORE INSERT ON changes
>> FOR EACH ROW
>> EXECUTE PROCEDURE add_new_user();
>>
>> I hoping for some recommendations on how to fix or at where I'm going
>> wrong.
>>
>> Thanks,
>> Clifford
>>
>>
>> --
>> @osm_seattle
>> osm_seattle.snowandsnow.us 
>> OpenStreetMap: Maps with a human touch
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>



-- 
@osm_seattle
osm_seattle.snowandsnow.us
OpenStreetMap: Maps with a human touch


Re: [GENERAL] Help with Trigger

2016-12-28 Thread Adrian Klaver

On 12/28/2016 07:06 PM, Clifford Snow wrote:

I'm trying to write a trigger (my first) to update another table if the
user_id is new. But I'm getting a index exception that the user_id


What is the actual error message?


already exists. I'm picking up data from another feed which gives
provides me with changes to the main database.

what I have is

CREATE OR REPLACE FUNCTION add_new_user()
RETURNS TRIGGER AS
$BODY$
DECLARE
commits RECORD;
BEGIN
SELECT INTO commits * FROM changes WHERE user_id = NEW.user_id;


In the above you are checking whether the changes table has the user_id 
and if does not then creating a new user in the user table below. Not 
sure how they are related, but from the description of the error it 
would seem they are not that tightly coupled. In other words just 
because the user_id does not exist in changes does not ensure it also 
absent from the table user. Off the top of head I would say the below 
might be a better query:


SELECT INTO commits * FROM user WHERE user_id = NEW.user_id;

Though it would help the debugging process if you showed the complete 
schema for both the changes and user tables.




IF NOT FOUND
THEN
INSERT INTO user (user_name, user_id, change_id,
created_date)
VALUES(NEW.user_name, NEW.user_id,
NEW.change_id, NEW.created_date);
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;

CREATE TRIGGER add_new_user_trigger
BEFORE INSERT ON changes
FOR EACH ROW
EXECUTE PROCEDURE add_new_user();

I hoping for some recommendations on how to fix or at where I'm going wrong.

Thanks,
Clifford


--
@osm_seattle
osm_seattle.snowandsnow.us 
OpenStreetMap: Maps with a human touch



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] help with moving tablespace

2016-11-17 Thread rob stone

> Bonus question: I found an ER diagram of some of the pg_* tables at h
> ttp://www.slideshare.net/oddbjorn/Get-to-know-PostgreSQL. Is there an
> ERD of all of them so a person can better understand how to use them
> when one must? I suppose the same question applies to
> information_schema since I probably should be using that over the
> pg_* tables when possible (and as the above example shows, sometimes
> you have to go look at the pg_* tables).
> 
> Thanks!
> Kevin
> 
> 

Hello,

ExecuteQuery has an ER diagram tool. You can download the jar file from
www.executequery.org and obtain the JDBC driver from the Postgres site.
You set up separate connections to all databases that you wish to
access.
It generates the ER diagram but prior to printing it you need to drag
and drop the "boxes" around to make it readable. I have not tried it
(yet) over information_schema.

HTH,
Rob


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] help with moving tablespace

2016-11-17 Thread
> On Thu, Nov 17, 2016  wrote:
> > On Thu, Nov 17, 2016 at 9:16 AM,  wrote:
> > First, the above works only *most* of the time in our testing on multiple 
> > servers. When it fails, it's because not everything was moved out of the 
> > old tablespace and I don't understand why. An "ls $PGDATA/ourdb/PG*/" shows 
> > files are still present. According to some searching, I should be able to 
> > do:
> 
>  
> Likely more than one database in the cluster is using $PGDATA/ourdb as its 
> default tablespace location so you need to alter all of them.

Sigh, it's so easy to overlook the obvious; thanks for pointing that out. 
Knowing what to look for and with some research, doing:

select datname,dattablespace,spcname from pg_database join pg_tablespace on 
dattablespace = pg_tablespace.oid;

shows there is indeed an extra schema using that tablespace that I'll need to 
drop or move. Hopefully that helps someone else.


> pg_class displays relative to the current database only so you need to log 
> into the others to check them.

Right, something else I didn't consider.


> > Second, the "ALTER DATABASE ourdb SET TABLESPACE new_ts" which does the 
> > move is slow even on our smaller test DBs, almost as if it is having to 
> > dump and reload (or more likely copy) the data. This raises the concern of 
> > how long this is going to take on our bigger DBs. Is there a faster way to 
> > accomplish the same thing especially since the new and old tablespaces are 
> > on the same disk partition?
> >
> > For example, from what I can see the data is sitting in a dir and there is 
> > a symlink to it in $PGDATA/pg_tblspc.
> >
> > Could I shut down PG, move the DB dir, recreate the symlink in pg_tblspc, 
> > then restart PG and all would be well in only a few seconds?
> 
> 
> I think this would work - all the SQL commands do is invoke O/S commands on 
> your behalf and I'm reasonably certain this is what they end up doing.  Given 
> that you are indeed testing you should try this and make sure.  Its either 
> going to work, or not, I don't foresee (in my limited experience...) any 
> delayed reaction that would be likely to arise.


Thanks! That gives me confidence to give that method a try.

Kevin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] help with moving tablespace

2016-11-17 Thread David G. Johnston
On Thu, Nov 17, 2016 at 9:16 AM,  wrote:

> First, the above works only *most* of the time in our testing on multiple
> servers. When it fails, it's because not everything was moved out of the
> old tablespace and I don't understand why. An "ls $PGDATA/ourdb/PG*/" shows
> files are still present. According to some searching, I should be able to
> do:
>

​Likely more than one database in the cluster is using $PGDATA/ourdb as its
default tablespace location so you need to alter all of them.


> SELECT c.relname, t.spcname
> FROM   pg_class c JOIN pg_tablespace t ON c.reltablespace = t.oid
> WHERE  t.spcname = 'old_name';
>
> But that always returns 0 rows. So how do I track this down?
>

​pg_class displays relative to the current database only so you need to log
into the others to check them.​


> Second, the "ALTER DATABASE ourdb SET TABLESPACE new_ts" which does the
> move is slow even on our smaller test DBs, almost as if it is having to
> dump and reload (or more likely copy) the data. This raises the concern of
> how long this is going to take on our bigger DBs. Is there a faster way to
> accomplish the same thing especially since the new and old tablespaces are
> on the same disk partition?
>
> For example, from what I can see the data is sitting in a dir and there is
> a symlink to it in $PGDATA/pg_tblspc.



> Could I shut down PG, move the DB dir, recreate the symlink in pg_tblspc,
> then restart PG and all would be well in only a few seconds?
>

​I think this would work - all the SQL commands do is invoke O/S commands
on your behalf and I'm reasonably certain this is what they end up doing.
Given that you are indeed testing you should try this and make sure.  Its
either going to work, or not, I don't foresee (in my limited experience...)
any delayed reaction that would be likely to arise.​

​David J.​


Re: [GENERAL] Help with slow query - Pgsql 9.2

2016-09-06 Thread Jeff Janes
On Mon, Sep 5, 2016 at 6:53 PM, Patrick B  wrote:

> Hi guys,
>
> I got this query:
>
>> SELECT id,jobid,description,serialised_data
>> FROM logtable
>> WHERE log_type = 45
>> AND clientid = 24011
>> ORDER BY gtime desc
>
>

What is really going to help you here is multicolumn index on (clientid,
log_type), or (log_type, clientid).

It will not cost you much, because you can get rid of whichever
single-column index is on the column you list first in your multi-column
index.

>
>
> Explain analyze: https://explain.depesz.com/s/XKtU
>
> So it seems the very slow part is into:
>
>   ->  Bitmap Index Scan on "ix_client"  (cost=0.00..5517.96
>> rows=367593 width=0) (actual time=2668.246..2668.246 rows=356327 loops=1)
>> Index Cond: ("clientid" = 24011)
>
>
> Am I right? The query is already using an index on that table... how could
> I improve the performance in a query that is already using an index?
>

Right, that is the slow step.  Probably the index is not already in memory
and had to be read from disk, slowly.  You could turn track_io_timing on
and then run explain (analyze, buffers) to see if that is the case.  But
once you build a multi-column index, it shouldn't really matter anymore.

Cheers,

Jeff


Re: [GENERAL] Help on recovering my standby

2016-06-22 Thread Patrick B
I had the same issue... A slave server had missing wal_files... and it
wasn't synced.

I had to re-sync all the DB, by running the pg_basebackup command

So.. basically, what I did is:

1 - Ensure that the wal_files are being inserted into the slave
2 - Backup the recovery.conf, postgresql.conf and pg_hba.conf
3 - Delete all the current data folder, by doing: rm -rf
/var/lib/pgsql/9.2/data/*
4 - Running the pg_basebackup command to re-sync the DB from another slave
to the slave that I wanna fix
5 - Replace the .conf backup files into the new data folder
6 - Start postgres

And it worked nice

Patrick


Re: [GENERAL] Help on recovering my standby

2016-06-22 Thread Melvin Davidson
On Wed, Jun 22, 2016 at 12:22 PM, Alan Hodgson 
wrote:

> On Tuesday 21 June 2016 19:34:18 Ramalingam, Sankarakumar wrote:
> > Hi I have my standby (streaming replication) down due to missing wal
> files.
> > You would see the same error in the logs stating "cannot find the wal
> file
> > ..." What is the best way to get it going so that when we switch between
> > standby and primary once in a while they are in sync?
> >
> > Currently I am working on a CERT server and hence there is no outage
> > concerns. I need to repeat the same process on prod once I get it going
> > successfully. Any help is appreciated.
> >
>
> You should keep your WAL files from the master for at least as long as the
> slave might be offline (plus startup time), somewhere the slave can copy
> them
> from when needed (shared file system, object store, scp target, whatever).
>
> See the postgresql.conf parameter archive_command and the corresponding
> recovery.conf parameter restore_command.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

It would be really helpful if you included PostgreSQL version and O/S in
your problem description, but since you have not, I will give a "generic"
fix.

It is doubtful, but you can check the pg_xlog on the master for the
"missing" WAL files and if they are there, simply rsync them to the standby.
If you are truly missing WAL files in your slave/standy, then  you need to
rebuild the slave as per standard procedures.
Make sure you change wal_keep_segments value on the master to be
sufficiently highly so that the problem does not occur again.
Once you make the change, be sure to reload the config file on the master
Either
SELECT pg_reload_conf();
or
pg_ctl reload -D your_data_dir



-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Help on recovering my standby

2016-06-22 Thread Alan Hodgson
On Tuesday 21 June 2016 19:34:18 Ramalingam, Sankarakumar wrote:
> Hi I have my standby (streaming replication) down due to missing wal files.
> You would see the same error in the logs stating "cannot find the wal file
> ..." What is the best way to get it going so that when we switch between
> standby and primary once in a while they are in sync?
> 
> Currently I am working on a CERT server and hence there is no outage
> concerns. I need to repeat the same process on prod once I get it going
> successfully. Any help is appreciated.
> 

You should keep your WAL files from the master for at least as long as the 
slave might be offline (plus startup time), somewhere the slave can copy them 
from when needed (shared file system, object store, scp target, whatever).

See the postgresql.conf parameter archive_command and the corresponding  
recovery.conf parameter restore_command.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help needed structuring Postgresql correlation query

2016-06-21 Thread Tim Smith
Thanks for that, looks like something to sink my teeth into !

On 21 June 2016 at 13:29, Alban Hertroys  wrote:
>
>> On 19 Jun 2016, at 10:58, Tim Smith  wrote:
>>
>> Hi,
>>
>> My postgresql-fu is not good enough to write a query to achieve this
>> (some may well say r is a better suited tool to achieve this !).
>>
>> I need to calculate what I would call a correlation window on a time
>> series of data, my table looks like this :
>>
>> create table data(data_date date,data_measurement numeric);
>> insert into data values('2016-01-01',16.23);
>> 
>> insert into data values('2016-06-19',30.54);
>>
>> My "target sample" would be the N most recent samples in the table
>> (e.g. 20, the most recent 20 days)
>>
>> My "potential sample" would be a moving window of size N (the same
>> size N as above), starting at T0 (i.e. 2016-01-01 in this example) and
>> incrementing by one (i.e. 2016-01-01 + 20, then 2016-01-02+20 etc),
>> but the "target sample" would obviously be excluded.
>>
>> The output needs to display window date range (or at least the start
>> date of the "potential sample" window) and the result
>> corr(target,potential).
>>
>> Hope that makes sense
>
> Something like this could do the trick (untested):
>
> with recursive sample (nr, start_date) as (
> select 1 as nr, data_date as start_date, 
> SUM(data_measurement) as total
> from generate_series(0, 19) range(step)
> left join data on (data_date = start_date + range.step)
>
> union all
>
> select nr + 1, sample.start_date +1, SUM(data_measurement) as 
> total
> from sample
> join generate_series(0, 19) range(step)
> left join data on (data_date = start_date +1 + range.step)
> where start_date +1 +19 <= (select MAX(data_date) from data)
> group by 1, 2
> )
> select * from sample where start_date >= '2016-01-01';
>
> Not sure how best to go about parameterising sample size N, a stored function 
> seems like a good option.
>
>
> Another approach would be to move a (cumulative) window-function with 20 
> items over your data set and for each row subtract the first value of the 
> previous window from the total of the current window (that is, assuming 
> you're calculating a SUM of data_measurement for each window of 20 records).
>
> Visually that looks something like this for sample size 4:
> sample 1: (A + B + C + D)
> sample 2: (A + B + C + D) + E - A = (B + C + D + E)
> sample 3: (B + C + D + E) + F - B = (C + D + E + F)
> etc.
>
> To accomplish this, you calculate two cumulative totals (often misnamed as 
> running totals, but AFAIK that's something different), one from the start, 
> and one lagging N rows behind (you can use the lag() window function for 
> that) and subtract the two.
>
> Good luck!
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with namespaces in xpath (PostgreSQL 9.5.3)

2016-06-21 Thread Allan Kamau
Thank you David.

-Allan.

On Mon, Jun 20, 2016 at 11:19 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sun, Jun 19, 2016 at 5:09 PM, Allan Kamau  wrote:
>
>> I have an xml document from which I would like to extract the contents of
>> several elements.
>>
>> I would like to use xpath to extract the contents of "name" from the xml
>> document shown below.
>>
>> WITH x AS
>> (
>> SELECT
>> '
>> http://uniprot.org/uniprot; xmlns:xsi="
>> http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="
>> http://uniprot.org/uniprot
>> http://www.uniprot.org/support/docs/uniprot.xsd;>
>> > version="56">
>> A0JM59
>> UBP20_XENTR
>> 
>> 
>> '::xml AS d
>> )
>> SELECT (xpath('/uniprot/entry/name/text()',a.d))[1]::text AS uniprot_name
>> FROM
>> x AS a
>> ;
>>
>> The documentation for xpath() ("
>> https://www.postgresql.org/docs/9.5/static/functions-xml.html;)
>> describes "xpath(xpath, xml [, nsarray]").
>>
>> For the above xml document, what would be the two dimensional array
>> "nsarray" for the xpath() function?
>>
>
> ​Is there a specific part of the description and two examples that doesn't
> make sense to you?
>
> ​Or more specifically, do you understand what namespaces are?​
>
> ARRAY[
> ARRAY['defaultns','http://uniprot.org/uniprot'],
> ARRAY['xsi','http://www.w3.org/2001/XMLSchema-instance']
> ]​
>
> In effect when the xpath function parses the XML document it tosses away
> all of the document-local namespace aliases and instead associated the full
> namespace URI with each element (in the DOM).  Since, in the xpath
> expression, usually you'd want to refer to nodes in the DOM via their
> namespace alias you need to tell the xpath function which aliases you
> intend to use in the xpath and which full URI they correspond to.
> Furthermore, there is not concept of a default namespace in the xpath
> expression.  So while you can simply copy-paste the aliases and URIs from
> all of the non-default namespace aliases you must also choose a unique
> alias for the default namespace in the original document.
>
> In the above I've copied the alias and namespace URI for the named "xsi"
> alias and gave the name "defaultns" to the original document's default
> namespace URI.
>
> David J.
>
>


Re: [GENERAL] Help needed structuring Postgresql correlation query

2016-06-21 Thread Alban Hertroys

> On 19 Jun 2016, at 10:58, Tim Smith  wrote:
> 
> Hi,
> 
> My postgresql-fu is not good enough to write a query to achieve this
> (some may well say r is a better suited tool to achieve this !).
> 
> I need to calculate what I would call a correlation window on a time
> series of data, my table looks like this :
> 
> create table data(data_date date,data_measurement numeric);
> insert into data values('2016-01-01',16.23);
> 
> insert into data values('2016-06-19',30.54);
> 
> My "target sample" would be the N most recent samples in the table
> (e.g. 20, the most recent 20 days)
> 
> My "potential sample" would be a moving window of size N (the same
> size N as above), starting at T0 (i.e. 2016-01-01 in this example) and
> incrementing by one (i.e. 2016-01-01 + 20, then 2016-01-02+20 etc),
> but the "target sample" would obviously be excluded.
> 
> The output needs to display window date range (or at least the start
> date of the "potential sample" window) and the result
> corr(target,potential).
> 
> Hope that makes sense

Something like this could do the trick (untested):

with recursive sample (nr, start_date) as (
select 1 as nr, data_date as start_date, SUM(data_measurement) 
as total
from generate_series(0, 19) range(step)
left join data on (data_date = start_date + range.step)

union all

select nr + 1, sample.start_date +1, SUM(data_measurement) as 
total
from sample
join generate_series(0, 19) range(step)
left join data on (data_date = start_date +1 + range.step)
where start_date +1 +19 <= (select MAX(data_date) from data)
group by 1, 2
)
select * from sample where start_date >= '2016-01-01';

Not sure how best to go about parameterising sample size N, a stored function 
seems like a good option.


Another approach would be to move a (cumulative) window-function with 20 items 
over your data set and for each row subtract the first value of the previous 
window from the total of the current window (that is, assuming you're 
calculating a SUM of data_measurement for each window of 20 records).

Visually that looks something like this for sample size 4:
sample 1: (A + B + C + D)
sample 2: (A + B + C + D) + E - A = (B + C + D + E)
sample 3: (B + C + D + E) + F - B = (C + D + E + F)
etc.

To accomplish this, you calculate two cumulative totals (often misnamed as 
running totals, but AFAIK that's something different), one from the start, and 
one lagging N rows behind (you can use the lag() window function for that) and 
subtract the two.

Good luck!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with namespaces in xpath (PostgreSQL 9.5.3)

2016-06-20 Thread David G. Johnston
On Sun, Jun 19, 2016 at 5:09 PM, Allan Kamau  wrote:

> I have an xml document from which I would like to extract the contents of
> several elements.
>
> I would like to use xpath to extract the contents of "name" from the xml
> document shown below.
>
> WITH x AS
> (
> SELECT
> '
> http://uniprot.org/uniprot; xmlns:xsi="
> http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="
> http://uniprot.org/uniprot http://www.uniprot.org/support/docs/uniprot.xsd
> ">
>  version="56">
> A0JM59
> UBP20_XENTR
> 
> 
> '::xml AS d
> )
> SELECT (xpath('/uniprot/entry/name/text()',a.d))[1]::text AS uniprot_name
> FROM
> x AS a
> ;
>
> The documentation for xpath() ("
> https://www.postgresql.org/docs/9.5/static/functions-xml.html;) describes
> "xpath(xpath, xml [, nsarray]").
>
> For the above xml document, what would be the two dimensional array
> "nsarray" for the xpath() function?
>

​Is there a specific part of the description and two examples that doesn't
make sense to you?

​Or more specifically, do you understand what namespaces are?​

ARRAY[
ARRAY['defaultns','http://uniprot.org/uniprot'],
ARRAY['xsi','http://www.w3.org/2001/XMLSchema-instance']
]​

In effect when the xpath function parses the XML document it tosses away
all of the document-local namespace aliases and instead associated the full
namespace URI with each element (in the DOM).  Since, in the xpath
expression, usually you'd want to refer to nodes in the DOM via their
namespace alias you need to tell the xpath function which aliases you
intend to use in the xpath and which full URI they correspond to.
Furthermore, there is not concept of a default namespace in the xpath
expression.  So while you can simply copy-paste the aliases and URIs from
all of the non-default namespace aliases you must also choose a unique
alias for the default namespace in the original document.

In the above I've copied the alias and namespace URI for the named "xsi"
alias and gave the name "defaultns" to the original document's default
namespace URI.

David J.


Re: [GENERAL] HELP! Uninstalled wrong version of postgres

2016-03-25 Thread Leonardo M . Ramé

El 24/03/16 a las 14:19, Howard News escribió:

Hi,

I uninstalled the wrong version of postgres on Ubuntu using apt-get
remove postgresql-9.0, convinced that this was an old unused version.
You guess the rest...

The data files still appear to be there, all 485GB of them. Can these be
restored?

Thanks.



Ok, if the data files are still there I'd do this:

1) Assuming the data is in /var/lib/postgresql/9.0, rename that 
directory to /var/lib/9.0-old, AND COPY THAT DIRECTORY ELSEWHERE.
2) Reinstall 9.0 with "apt-get install postgresql-9.0". This should 
re-create the /var/lib/9.0 directory with an empty "main" dir.

3) Stop 9.0 with "pg_ctlcluster 9.0 main stop".
4) Rename the new directory /var/lib/9.0 to /var/lib/9.0-new
5) Rename the old dir (/var/lib/9.0-old) to /var/lib/9.0
6) Restart the cluster with "pg_ctlcluster 9.0 main start".

And everything should be fine again.

P.S.: All those steps should be done as root.

Regards,
--
Leonardo M. Ramé
http://leonardorame.blogspot.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-14 Thread Jim Nasby

On 12/9/15 7:05 PM, Andreas Kretschmer wrote:

I'm really newbie to PostgreSQL but the boss pushed me to handle it
>and implement it in production f*&%*$%%$#%$#

Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread Alan Hodgson
On Wednesday, December 09, 2015 07:55:09 AM FattahRozzaq wrote:
> archive_mode = on
> archive_command = 'cp -i %p /home/postgres/archive/master/%f'
> 
> 
> The WAL archive folder is at /home/postgres/archive/master/, right?
> This directory consumes around 750GB of Disk-1.
> Each segment in the /home/postgres/archive/master/ is 16MB each
> There are currently 47443 files in this folder.
> 
> If I want to limit the total size use by WAL archive to around 200-400
> GB, what value should I set for the wal_keep_segments,
> checkpoint_segments?

PostgreSQL doesn't clean up files copied by your archive_command. You need to 
have a separate task clean those out. PostgreSQL's active wal_keep_segments 
etc. are in the data/pg_xlog directory.


signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread Adrian Klaver

On 12/09/2015 11:15 AM, Alan Hodgson wrote:

On Wednesday, December 09, 2015 07:55:09 AM FattahRozzaq wrote:

archive_mode = on
archive_command = 'cp -i %p /home/postgres/archive/master/%f'


The WAL archive folder is at /home/postgres/archive/master/, right?
This directory consumes around 750GB of Disk-1.
Each segment in the /home/postgres/archive/master/ is 16MB each
There are currently 47443 files in this folder.

If I want to limit the total size use by WAL archive to around 200-400
GB, what value should I set for the wal_keep_segments,
checkpoint_segments?


PostgreSQL doesn't clean up files copied by your archive_command. You need to
have a separate task clean those out. PostgreSQL's active wal_keep_segments
etc. are in the data/pg_xlog directory.



The OP might want to take a look at:

http://www.postgresql.org/docs/9.4/interactive/pgarchivecleanup.html

To be safe I would use:

-n

Print the names of the files that would have been removed on stdout 
(performs a dry run).



at first.

--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread Joshua D. Drake

On 12/09/2015 04:38 PM, FattahRozzaq wrote:

Quick information,

After I realize, the line "archive_command=/bin/true" is a bad
decision, I have revert it back.
Now I'm really confused and panic.
I don't know what to do, and I don't really understand the postgresql.conf
I'm a network engineer, I should handle the network and also
postgresql database.
Oh man, the office is so good but this part is sucks :((


If the pg_xlog directory is growing it is likely that either:

* wal_keep_segments is set high and your slave is not correctly 
receiving updates.


* You are using a replication slot and the slave is not correctly 
receiving updates.


If your archive_command does not return a success, your pg_xlog will 
also grow but you don't need the archive_command *IF* your streaming 
replication is working *UNLESS* you are also doing archiving or PITR.


Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread FattahRozzaq
Hi John,

I really don't know why I should keep the wal archives.
I implement streaming replication into 1 server (standby server).
I'm really newbie to PostgreSQL but the boss pushed me to handle it
and implement it in production 

Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread Adrian Klaver

On 12/09/2015 04:27 PM, FattahRozzaq wrote:

Hi John,

I really don't know why I should keep the wal archives.


So who set up the archiving and why?

Is archive recovery set up on the standby?:

http://www.postgresql.org/docs/9.4/interactive/archive-recovery-settings.html


I implement streaming replication into 1 server (standby server).


Is that the only standby or is there another set up previously?

Per another recent thread having a WAL archive to fall back on is handy 
if the streaming replication falls behind and wal_keep_segments is not 
high enough:


http://www.postgresql.org/docs/9.4/interactive/warm-standby.html#STREAMING-REPLICATION

"If you use streaming replication without file-based continuous 
archiving, the server might recycle old WAL segments before the standby 
has received them. If this occurs, the standby will need to be 
reinitialized from a new base backup. You can avoid this by setting 
wal_keep_segments to a value large enough to ensure that WAL segments 
are not recycled too early, or by configuring a replication slot for the 
standby. If you set up a WAL archive that's accessible from the standby, 
these solutions are not required, since the standby can always use the 
archive to catch up provided it retains enough segments."



I'm really newbie to PostgreSQL but the boss pushed me to handle it
and implement it in production f*&%*$%%$#%$#

Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread Andreas Kretschmer


> FattahRozzaq  hat am 10. Dezember 2015 um 01:27
> geschrieben:
> 
> 
> Hi John,
> 
> I really don't know why I should keep the wal archives.


That's the problem! But that's your part, not our. If you need a Backup with
PITR-capability you have to create a so called basebackup and continously WAL's.
If you create later, say the next day, a new Basebackup and your Backup-Policy
is hold one Backup, than you can delete all WAL's untill to the new Basebackup
and the old Backup.

If i where you i would use somethink like barman (see: http://www.pgbarman.org/
) for that. And yes: you should a extra Backup-Server. If you have both
(Database and Backup) on the same machine and the machine burns you will lost
both, data and backup.


Questions?



> I implement streaming replication into 1 server (standby server).


Streamin Replication can't replace a Backup!


> I'm really newbie to PostgreSQL but the boss pushed me to handle it
> and implement it in production 

Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread John R Pierce

On 12/9/2015 4:27 PM, FattahRozzaq wrote:

I really don't know why I should keep the wal archives.
I implement streaming replication into 1 server (standby server).
I'm really newbie to PostgreSQL but the boss pushed me to handle it
and implement it in production f*&%*$%%$#%$#

Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread FattahRozzaq
Hi John,

Really thanking you for spend time typing and responding my email.
I think the archive_command returns success, I can see the archive
directory piling up 16MB every 2 minutes.
Maybe the pgarchivecleanup is the solution to cleanup the contents of
archive folder?
How to properly do it?
What is the pgarchivecleanup example that I can use for this case?
How to run a dry-run for pgarchivecleanup?


Best Regards,
FR

On 10/12/2015, Joshua D. Drake  wrote:
> On 12/09/2015 04:38 PM, FattahRozzaq wrote:
>> Quick information,
>>
>> After I realize, the line "archive_command=/bin/true" is a bad
>> decision, I have revert it back.
>> Now I'm really confused and panic.
>> I don't know what to do, and I don't really understand the
>> postgresql.conf
>> I'm a network engineer, I should handle the network and also
>> postgresql database.
>> Oh man, the office is so good but this part is sucks :((
>
> If the pg_xlog directory is growing it is likely that either:
>
> * wal_keep_segments is set high and your slave is not correctly
> receiving updates.
>
> * You are using a replication slot and the slave is not correctly
> receiving updates.
>
> If your archive_command does not return a success, your pg_xlog will
> also grow but you don't need the archive_command *IF* your streaming
> replication is working *UNLESS* you are also doing archiving or PITR.
>
> Sincerely,
>
> JD
>
> --
> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Announcing "I'm offended" is basically telling the world you can't
> control your own emotions, so everyone else should do it for you.
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread FattahRozzaq
Quick information,

After I realize, the line "archive_command=/bin/true" is a bad
decision, I have revert it back.
Now I'm really confused and panic.
I don't know what to do, and I don't really understand the postgresql.conf
I'm a network engineer, I should handle the network and also
postgresql database.
Oh man, the office is so good but this part is sucks :((

--
On 10/12/2015, FattahRozzaq  wrote:
> Hi John,
>
> I really don't know why I should keep the wal archives.
> I implement streaming replication into 1 server (standby server).
> I'm really newbie to PostgreSQL but the boss pushed me to handle it
> and implement it in production 

Re: [GENERAL] HELP!!! The WAL Archive is taking up all space

2015-12-09 Thread John R Pierce

On 12/8/2015 4:55 PM, FattahRozzaq wrote:

...I want to limit the total size use by WAL archive to around 200-400 GB...?


for what purpose are you keeping a wal archive ?

if its for PITR (point in time recovery), you need ALL WAL records since 
the start of a base backup up to the point in time at which you wish to 
recover.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help me recovery databases.

2015-06-01 Thread Evi-M
Thank you very much. Well done. Backups it's all))  01.06.2015, 03:05, "Melvin Davidson" melvin6...@gmail.com:If you have a pg_dumpall, or a pg_dump of your databases, you "might" be able to get your data back by doing the following.1. If your data directory is corrupted or still exists, rename it.2. Make copies of your postgresql.conf  pg_hba.conf if you still have them.3. use initdb to recreate the data directory4. Start PostgreSQL and create the database(s) you need5, Restore your data from pg_dumpall or pg_dump's.6. If step 5 works, replace the new pg_hba.conf with the old copy if you have it.On Sun, May 31, 2015 at 7:38 PM, Tomas Vondra tomas.von...@2ndquadrant.com wrote:"base" is where all the data files are located, so the answer is most likely 'no'.  On 05/31/15 15:11, Evi-M wrote:Good day, Anyone. I lost folders with /base pg_xlog and pg_clog mount another hard disk.(500gb) This is Postgresql 9.1, Ubuntu 12.04 Could i restore databases without /base? I have archive_status folder. -- С Уважением,Генералов Юрий -- Tomas Vondra                  http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services   --  Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Melvin DavidsonI reserve the right to fantasize.  Whether or not you  wish to share my fantasy is entirely up to you.   -- www.help-tec.ruwww.хелп-тек.рфС Уважением,Генералов Юрий 

Re: [GENERAL] Help me recovery databases.

2015-05-31 Thread Melvin Davidson
If you have a pg_dumpall, or a pg_dump of your databases, you might be
able to get your data back by doing the following.

1. If your data directory is corrupted or still exists, rename it.
2. Make copies of your postgresql.conf  pg_hba.conf if you still have them.
3. use initdb to recreate the data directory
4. Start PostgreSQL and create the database(s) you need
5, Restore your data from pg_dumpall or pg_dump's.
6. If step 5 works, replace the new pg_hba.conf with the old copy if you
have it.

On Sun, May 31, 2015 at 7:38 PM, Tomas Vondra tomas.von...@2ndquadrant.com
wrote:

 base is where all the data files are located, so the answer is most
 likely 'no'.

 On 05/31/15 15:11, Evi-M wrote:

 Good day, Anyone.
 I lost folders with /base
 pg_xlog and pg_clog mount another hard disk.(500gb)
 This is Postgresql 9.1, Ubuntu 12.04
 Could i restore databases without /base?
 I have archive_status folder.
 --
 С Уважением,Генералов Юрий


 --
 Tomas Vondra  http://www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] Help me recovery databases.

2015-05-31 Thread Tomas Vondra
base is where all the data files are located, so the answer is most 
likely 'no'.


On 05/31/15 15:11, Evi-M wrote:

Good day, Anyone.
I lost folders with /base
pg_xlog and pg_clog mount another hard disk.(500gb)
This is Postgresql 9.1, Ubuntu 12.04
Could i restore databases without /base?
I have archive_status folder.
--
С Уважением,Генералов Юрий


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with slow table update

2015-04-19 Thread Tim Uckun
On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov pawel.vese...@gmail.com
wrote:

 I found some dangling prepared transactions



How do you find and remove these?


Re: [GENERAL] Help with slow table update

2015-04-19 Thread Jim Nasby

On 4/19/15 9:53 PM, Tim Uckun wrote:


On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov pawel.vese...@gmail.com
mailto:pawel.vese...@gmail.com wrote:

I found some dangling prepared transactions



How do you find and remove these?


SELECT * FROM pg_prepared_xacts;
ROLLBACK PREPARED xid;
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with slow table update

2015-04-17 Thread Pawel Veselov

 [skipped]


  But remember that if you update or delete a row, removing it from an
 index, the data will stay in that index until vacuum comes along.

 Also, there's no point in doing a REINDEX after a VACUUM FULL;
 vacuum full rebuilds all the indexes for you.


 I was being desperate :)

 I still think there is something very wrong with this particular table.
 First, I have production systems that employ this function on way larger
 data set, and there is no problem (so far, but still). This machine is
 part of a test deployment, there is no constant load, the only data that
 is being written now is when I do these tests. Vacuuming should prune
 all that dead stuff, and if it's absent, it's unclear where is the time
 spent navigating/updating the table with 24 rows :)


 I think you definitely have a problem with dead rows, as evidenced by the
 huge improvement VACUUM FULL made.


 But it's not clear why (and not reasonable, IMHO, that) it wouldn't
 improve past current point.


What I should've done is 'VACUUM FULL VERBOSE'. Once I did, it told me
there were 800k dead rows that can't be removed. After digging around I
found some dangling prepared transactions, going back months. Once I threw
those away, and re-vacuumed, things got back to normal.

Thanks for all your help and advice.


Re: [GENERAL] Help with slow table update

2015-04-15 Thread Igor Neyman


From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Pawel Veselov
Sent: Tuesday, April 14, 2015 8:01 PM
To: Jim Nasby
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Help with slow table update

[skipped]

This is where using sets becomes really tedious, as Postgres severely lacks an 
upsert-like statement.
I don't think there are joins allowed in UPDATE statement, so I will need to 
use WITH query, right?
Also, I'm not sure how LEFT JOIN will help me isolate and insert missed 
entries...

Would it be OK to replace upsert part with merging into a temp table, then 
deleting and inserting from temp table? Is there any penalty for insert/delete 
comparing to update?

[skipped]

Yes, you can do UPDATE with joins 
(http://www.postgresql.org/docs/9.4/static/sql-update.html) like this:

UPDATE table1 A SET col1 = B.col2
  FROM table2 B
  WHERE A.col3 = B.col4;

Regards,
Igor Neyman


Re: [GENERAL] Help with slow table update

2015-04-15 Thread Pawel Veselov

 [skipped]



 This is where using sets becomes really tedious, as Postgres severely
 lacks an upsert-like statement.

 I don't think there are joins allowed in UPDATE statement, so I will need
 to use WITH query, right?

 Also, I'm not sure how LEFT JOIN will help me isolate and insert missed
 entries...



   [skipped]



 Yes, you can do UPDATE with joins (
 http://www.postgresql.org/docs/9.4/static/sql-update.html) like this:



 UPDATE table1 A SET col1 = B.col2

   FROM table2 B

   WHERE A.col3 = B.col4;



I meant using JOIN operator in the update. But it's still possible, though
through WITH query.


Re: [GENERAL] Help with slow table update

2015-04-14 Thread Pawel Veselov
On Mon, Apr 13, 2015 at 6:03 PM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov pawel.vese...@gmail.com
 wrote:


 r_agrio_hourly - good, r_agrio_total - bad.

  Update on r_agrio_hourly  (cost=0.42..970.32 rows=250 width=329) (actual
 time=2.248..2.248 rows=0 loops=1)
-  Index Scan using u_r_agrio_hourly on r_agrio_hourly
  (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1
 loops=1)
  Index Cond: ((tagid = 1002::numeric) AND (unitid =
 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
 (device_type = 3::numeric) AND (placement = 2::numeric))
  Total runtime: 2.281 ms
  Update on r_agrio_total  (cost=0.42..45052.56 rows=12068 width=321)
 (actual time=106.766..106.766 rows=0 loops=1)
-  Index Scan using u_r_agrio_total on r_agrio_total
  (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
 rows=1 loops=1)
  Index Cond: ((tagid = 1002::numeric) AND (unitid =
 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
  Total runtime: 106.793 ms


 What it is you expect to see here?

 ​What are the results (count and times) for:

 SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002;


Result: 8 (the whole table is 24 rows). It returns somewhat with a stumble,
but relatively quickly.
db= explain analyze SELECT count(*) FROM r_agrio_total WHERE tagid = 1002
and unitid = 1002;
   QUERY PLAN

-
 Aggregate  (cost=4.45..4.46 rows=1 width=0) (actual time=327.194..327.195
rows=1 loops=1)
   -  Index Scan using tag_r_agrio_total on r_agrio_total
 (cost=0.42..4.45 rows=1 width=0) (actual time=0.039..327.189 rows=8
loops=1)
 Index Cond: (tagid = 1002::numeric)
 Filter: (unitid = 1002::numeric)
 Total runtime: 327.228 ms


 SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002;


Result is 2869. Returns somewhat quckly. Explain analyze is crazy though:
db= explain analyze SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002
and unitid = 1002;

 QUERY PLAN


 Aggregate  (cost=68134.68..68134.69 rows=1 width=0) (actual
time=15177.211..15177.211 rows=1 loops=1)
   -  Index Scan using adunit_r_agrio_hourly on r_agrio_hourly
 (cost=0.42..67027.10 rows=443035 width=0) (actual time=0.096..15175.730
rows=2869 loops=1)
 Index Cond: (unitid = 1002::numeric)
 Filter: (tagid = 1002::numeric)
 Total runtime: 15177.240 ms

​More queries along this line might be needed.  The underlying question is
 how many index rows need to be skipped over on total to get the final
 result - or rather are the columns in the index in descending order of
 cardinality?


Idea is - both tables have unique multi-field indices, and each update hits
exactly one row from that index, no more, and all fields from the index are
locked with equality condition on the update. All of the updates (within a
transaction) would always work on a small subset of rows (max a few
hundred, ever; in this case, it's may be around 10). I expect it to be
possible for the server to keep the active working set in the cache at all
times. Since the index is unique, there shouldn't be a reason to re-scan
the table, if a cached row is found, no?


 Any chance you can perform a REINDEX - maybe there is some bloat
 present?  There are queries to help discern if that may be the case, I do
 not know then off the top of my head, but just doing it might be acceptable
 and is definitely quicker if so.


That's the thing - I've done both vacuum full, and re-index. The very first
time I did vacuum full things improved (60 seconds to 7 seconds). Re-index
didn't improve anything (but it was done after vacuum full).


 ​I'm still not really following your presentation but maybe my thoughts
 will spark something.​


Thank you! I hope I clarified this some :)


Re: [GENERAL] Help with slow table update

2015-04-14 Thread Pawel Veselov
On Mon, Apr 13, 2015 at 7:37 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/13/15 7:01 PM, Pawel Veselov wrote:

 Cursors tend to make things slow. Avoid them if you can.


 Is there an alternative to iterating over a number of rows, where a
 direct update query is not an option?

 I really doubt that either the actual processing logic, including use of
 types has anything to do with my problem. This is based on the fact that
 out of the tables that are being changed, only one is exhibiting the
 problem. All of the involved tables have nearly the same structure, and
 have the same logical operations performed on them. I thought may be the
 bad table is slow because it was first in the list, and Postgres was
 caching the functions results, but I moved things around, and pattern is
 the same.


 I'm guessing that you're essentially processing a queue. Take a look at
 http://www.postgresql.org/message-id/552c750f.2010...@bluetreble.com for
 some ideas. Basically, not only do cursors have non-trivial overhead, doing
 a ton of single-row queries is going to have a non-trivial overhead itself.


Thank you for the pointers. PgQ sounds interesting, it has to be remote for
RDS (I use RDS), but I'll try implementing a solution based on it.
However, for all the times that is being spent during this update, the
breakdown is:

update total table: 10.773033
update hourly table: 00.179711
update daily table: 01.082467
update some other table (actually, it has cardinality similar to total
table): 00.168287
clean the queue table: 00.021733
overhead: 00.014922

The overhead is time taken to run the whole procedure, minus all these
other times that have been counted.

(some notes about the daily table below)


  As for your specific question, I suggest you modify the plpgsql
 function so that it's doing an EXPLAIN ANALYZE on the slow table.
 EXPLAIN ANALYZE actually returns a recordset the same way a SELECT
 would, with a single column of type text. So you just need to do
 something with that output. The easiest thing would be to replace
 this in your function:

 UPDATE slow_table SET ...

 to this (untested)

 RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

 and change the function so it returns SETOF text instead of whatever
 it returns now.


 Thank you, that made it a lot easier to see into what's really going on.
 But the outcome is somewhat the same. The bad table analysis shows a
 very high cost, and thousands of rows, where the table contains only 24
 rows. This time, however, the actual run time is shown, and one can see
 where the time is spent (I was using just a sum of clock_time()s around
 the update statements to see where the problem is).

 r_agrio_hourly - good, r_agrio_total - bad.

   Update on r_agrio_hourly  (cost=0.42..970.32 rows=250 width=329)
 (actual time=2.248..2.248 rows=0 loops=1)
   -  Index Scan using u_r_agrio_hourly on r_agrio_hourly
   (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207
 rows=1 loops=1)
   Index Cond: ((tagid = 1002::numeric) AND (unitid =
 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
 (device_type = 3::numeric) AND (placement = 2::numeric))
   Total runtime: 2.281 ms
   Update on r_agrio_total  (cost=0.42..45052.56 rows=12068 width=321)
 (actual time=106.766..106.766 rows=0 loops=1)
   -  Index Scan using u_r_agrio_total on r_agrio_total
   (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
 rows=1 loops=1)
   Index Cond: ((tagid = 1002::numeric) AND (unitid =
 1002::numeric) AND (device_type = 3::numeric) AND (placement =
 2::numeric))
   Total runtime: 106.793 ms


 Keep in mind that the estimated cost is not terribly useful; it's the
 actual times that matter.

 I suspect what's happening here is a combination of things. First, the
 hourly table is basically living in cache, but the total table is not. That
 means that when you go to find a row in the total table you're actually
 hitting the disk instead of pulling the data from memory.



 Second, you may have a lot of dead rows in the total table. I suspect this
 because of the very large amount of time the index scan is taking. Unless
 you're running on an old 10MB MFM drive you'd be pretty hard pressed for
 even 2 IO operations (one for the index leaf page and one for the heap
 page) to take 32ms. I suspect the index scan is having to read many dead
 rows in before it finds a live one, and incurring multiple IOs. Swiching to
 EXPLAIN (analyze, buffers) would help confirm that.


That looks most likely to me as well. Most of the updates in a single
batch, for the total table would be on the same record, while for hourly
table it's a lot less. Logically, the tables contain identical data, except
that hourly table breaks it down per hour, and total table contains the
data for all times. The daily table contains the same data per day.

So, if I compared the tables, the total table has the 

Re: [GENERAL] Help with slow table update

2015-04-14 Thread Pawel Veselov
On Tue, Apr 14, 2015 at 3:29 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/14/15 4:44 PM, Pawel Veselov wrote:

 On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby jim.na...@bluetreble.com
 mailto:jim.na...@bluetreble.com wrote:

 On 4/14/15 1:28 PM, Pawel Veselov wrote:


 I wonder if what I need to do, considering that I update a lot
 of the
 same rows as I process this queue, is to create a temp table,
 update
 the rows there, and then update the actual tables once at the
 end...


 That's what I'd do.


 Well, in short, I changed (repeat the body of loop for how many tables
 are there)

 LOOP (item)
UPDATE table with item
IF not found INSERT item INTO table; END IF;
 END LOOP;

 to:

 CREATE TEMP TABLE xq_table (like table) on commit drop;
 LOOP (item)
LOOP
  UPDATE xq_table with item;
  exit when found;
  INSERT INTO xq_table select * from table for update;
  continue when found;
  INSERT item INTO xq_table;
  exit;
END LOOP;
 END LOOP;
 UPDATE table a set (rows) = (xq.rows)
FROM xq_table xq
WHERE (a.keys) = (xq.keys)

 That works significantly faster. The final update statement is very
 fast. The process is somewhat slow in the beginning as it sucks in
 records from total into xq_total, but once all of that is moved into
 the temp table, it rushes through the rest.


 Databases like to think in sets. It will generally be more efficient to do
 set operations instead of a bunch of row-by-row stuff.

 Since you're pulling all of this from some other table your best bet is
 probably something like:

 CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

 CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
 UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
 INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

 -- Same thing for daily
 -- Same thing for total


In my previous post, there was a problem with that pseudo-code, as it's
missing inserts into the final table at the end of loop, for those records
that need to be inserted and not updated.

This is where using sets becomes really tedious, as Postgres severely lacks
an upsert-like statement.
I don't think there are joins allowed in UPDATE statement, so I will need
to use WITH query, right?
Also, I'm not sure how LEFT JOIN will help me isolate and insert missed
entries...

Would it be OK to replace upsert part with merging into a temp table, then
deleting and inserting from temp table? Is there any penalty for
insert/delete comparing to update?

[skipped]


  But remember that if you update or delete a row, removing it from an
 index, the data will stay in that index until vacuum comes along.

 Also, there's no point in doing a REINDEX after a VACUUM FULL;
 vacuum full rebuilds all the indexes for you.


 I was being desperate :)

 I still think there is something very wrong with this particular table.
 First, I have production systems that employ this function on way larger
 data set, and there is no problem (so far, but still). This machine is
 part of a test deployment, there is no constant load, the only data that
 is being written now is when I do these tests. Vacuuming should prune
 all that dead stuff, and if it's absent, it's unclear where is the time
 spent navigating/updating the table with 24 rows :)


 I think you definitely have a problem with dead rows, as evidenced by the
 huge improvement VACUUM FULL made.


But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve
past current point.


Re: [GENERAL] Help with slow table update

2015-04-14 Thread Pawel Veselov
On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/14/15 1:28 PM, Pawel Veselov wrote:


 I wonder if what I need to do, considering that I update a lot of the
 same rows as I process this queue, is to create a temp table, update
 the rows there, and then update the actual tables once at the end...


 That's what I'd do.


Well, in short, I changed (repeat the body of loop for how many tables are
there)

LOOP (item)
  UPDATE table with item
  IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
  LOOP
UPDATE xq_table with item;
exit when found;
INSERT INTO xq_table select * from table for update;
continue when found;
INSERT item INTO xq_table;
exit;
  END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
  FROM xq_table xq
  WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very fast.
The process is somewhat slow in the beginning as it sucks in records from
total into xq_total, but once all of that is moved into the temp table,
it rushes through the rest.


 The other option would be to use a constraint trigger paired with a
 per-row trigger on the hourly table to drive the daily table, and on the
 daily table to drive the total table. The way that would work is the
 per-row table would simply keep track of all the unique records that were
 changed in a statement (presumably by putting them in a temp table). Once
 the statement is done, the constraint trigger would fire; it would
 summarize all the changed data and do a much smaller number of updates to
 the table being summarized into.


I'm not sure how I would be able to avoid the same number of changes on the
total table, trigger would fire on each update, won't it? So, same problem
with a lot of changes on a table...


 BTW, you also made a comment about not having to hit the table if you look
 at something in an index. You can only do that if all the data you need is
 in the index, AND the page with the record is marked as being all-visible
 (google for Postgres Visibility Map). If that's not the case then you still
 have to pull the row in the table in, in order to determine visibility. The
 only case where you can still avoid hitting the table is something like a
 NOT EXISTS; if you can't find any entries in the index for something then
 they definitely won't be in the table.


What I was saying is that if a table has a unique index, and there is
cached fact that a particular index value points to a particular row, there
shouldn't be a need to re-scan the index again to search for any more
matching values (which would be necessary if the index was not unique).
Again, all considering the size of the index, the amount of different index
values that are being queried, etc.


 But remember that if you update or delete a row, removing it from an
 index, the data will stay in that index until vacuum comes along.

 Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full
 rebuilds all the indexes for you.


I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is part
of a test deployment, there is no constant load, the only data that is
being written now is when I do these tests. Vacuuming should prune all that
dead stuff, and if it's absent, it's unclear where is the time spent
navigating/updating the table with 24 rows :)


Re: [GENERAL] Help with slow table update

2015-04-14 Thread Jim Nasby

On 4/14/15 1:28 PM, Pawel Veselov wrote:


I wonder if what I need to do, considering that I update a lot of the
same rows as I process this queue, is to create a temp table, update
the rows there, and then update the actual tables once at the end...


That's what I'd do.

The other option would be to use a constraint trigger paired with a 
per-row trigger on the hourly table to drive the daily table, and on the 
daily table to drive the total table. The way that would work is the 
per-row table would simply keep track of all the unique records that 
were changed in a statement (presumably by putting them in a temp 
table). Once the statement is done, the constraint trigger would fire; 
it would summarize all the changed data and do a much smaller number of 
updates to the table being summarized into.


BTW, you also made a comment about not having to hit the table if you 
look at something in an index. You can only do that if all the data you 
need is in the index, AND the page with the record is marked as being 
all-visible (google for Postgres Visibility Map). If that's not the case 
then you still have to pull the row in the table in, in order to 
determine visibility. The only case where you can still avoid hitting 
the table is something like a NOT EXISTS; if you can't find any entries 
in the index for something then they definitely won't be in the table. 
But remember that if you update or delete a row, removing it from an 
index, the data will stay in that index until vacuum comes along.


Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum 
full rebuilds all the indexes for you.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with slow table update

2015-04-14 Thread Jim Nasby

On 4/14/15 4:44 PM, Pawel Veselov wrote:

On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby jim.na...@bluetreble.com
mailto:jim.na...@bluetreble.com wrote:

On 4/14/15 1:28 PM, Pawel Veselov wrote:


I wonder if what I need to do, considering that I update a lot
of the
same rows as I process this queue, is to create a temp table,
update
the rows there, and then update the actual tables once at the end...


That's what I'd do.


Well, in short, I changed (repeat the body of loop for how many tables
are there)

LOOP (item)
   UPDATE table with item
   IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
   LOOP
 UPDATE xq_table with item;
 exit when found;
 INSERT INTO xq_table select * from table for update;
 continue when found;
 INSERT item INTO xq_table;
 exit;
   END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
   FROM xq_table xq
   WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very
fast. The process is somewhat slow in the beginning as it sucks in
records from total into xq_total, but once all of that is moved into
the temp table, it rushes through the rest.


Databases like to think in sets. It will generally be more efficient to 
do set operations instead of a bunch of row-by-row stuff.


Since you're pulling all of this from some other table your best bet is 
probably something like:


CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total


The other option would be to use a constraint trigger paired with a
per-row trigger on the hourly table to drive the daily table, and on
the daily table to drive the total table. The way that would work is
the per-row table would simply keep track of all the unique records
that were changed in a statement (presumably by putting them in a
temp table). Once the statement is done, the constraint trigger
would fire; it would summarize all the changed data and do a much
smaller number of updates to the table being summarized into.


I'm not sure how I would be able to avoid the same number of changes on
the total table, trigger would fire on each update, won't it? So, same
problem with a lot of changes on a table...


The difference is that you'd be doing plain INSERTs into a temp table 
and then summarizing that. That's going to be a LOT more efficient than 
a slew of updates on an existing table.



BTW, you also made a comment about not having to hit the table if
you look at something in an index. You can only do that if all the
data you need is in the index, AND the page with the record is
marked as being all-visible (google for Postgres Visibility Map). If
that's not the case then you still have to pull the row in the table
in, in order to determine visibility. The only case where you can
still avoid hitting the table is something like a NOT EXISTS; if you
can't find any entries in the index for something then they
definitely won't be in the table.


What I was saying is that if a table has a unique index, and there is
cached fact that a particular index value points to a particular row,
there shouldn't be a need to re-scan the index again to search for any
more matching values (which would be necessary if the index was not
unique). Again, all considering the size of the index, the amount of
different index values that are being queried, etc.


It still has to rescan because of visibility concerns.


But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL;
vacuum full rebuilds all the indexes for you.


I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)


I think you definitely have a problem with dead rows, as evidenced by 
the huge improvement VACUUM FULL made.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with slow table update

2015-04-13 Thread David G. Johnston
On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov pawel.vese...@gmail.com
wrote:


 r_agrio_hourly - good, r_agrio_total - bad.

  Update on r_agrio_hourly  (cost=0.42..970.32 rows=250 width=329) (actual
 time=2.248..2.248 rows=0 loops=1)
-  Index Scan using u_r_agrio_hourly on r_agrio_hourly
  (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1
 loops=1)
  Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
 AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
 3::numeric) AND (placement = 2::numeric))
  Total runtime: 2.281 ms
  Update on r_agrio_total  (cost=0.42..45052.56 rows=12068 width=321)
 (actual time=106.766..106.766 rows=0 loops=1)
-  Index Scan using u_r_agrio_total on r_agrio_total
  (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
 rows=1 loops=1)
  Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
 AND (device_type = 3::numeric) AND (placement = 2::numeric))
  Total runtime: 106.793 ms


What it is you expect to see here?

​What are the results (count and times) for:

SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002;
SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002;

​More queries along this line might be needed.  The underlying question is
how many index rows need to be skipped over on total to get the final
result - or rather are the columns in the index in descending order of
cardinality?

Any chance you can perform a REINDEX - maybe there is some bloat
present?  There are queries to help discern if that may be the case, I do
not know then off the top of my head, but just doing it might be acceptable
and is definitely quicker if so.

​I'm still not really following your presentation but maybe my thoughts
will spark something.​

​David J.
​


Re: [GENERAL] Help with slow table update

2015-04-13 Thread Pawel Veselov
On Sun, Apr 12, 2015 at 5:40 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 4/9/15 6:18 PM, Pawel Veselov wrote:

 Hi.

 I have a plpgsql procedure that updates a few similar tables.
 for some reason, updates on one of the tables take a lot longer the
 updates on the other ones. The difference is, say, 7 seconds vs. 80
 milliseconds.

 the procedure uses cursors and record variables to do the updates. For
 example:

  update r_agrio_total set
unserved = unserved + (agrow-'unserved')::numeric(38),
r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),
  where
tagid = _tagid and
unitid = (akey-'unitid')::numeric and
placement = (akey-'placement')::numeric and
device_type = (akey-'device_type')::numeric;

 There is another table (xq_agr) that is read record by record, and for
 each of those records, such update is executed.

 I was trying to select analyze the updates to see where the time could
 be spent.
 There are only 24 row in the bad table, and 3,400 rows in good
 table. So, for the bad table, most of the updates will be on the same
 rows. The times were measured on processing 100 original records.

 When I'm analyzing pure update statements, I don't see anything strange.

 bad table: explain analyze update r_agrio_total set unconfirmed =
 unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
 and device_type = 100;

 RESULT:
   Update on r_agrio_total  (cost=0.42..4.46 rows=1 width=321) (actual
 time=0.253..0.253 rows=0 loops=1)
 -  Index Scan using tag_r_agrio_total on r_agrio_total
   (cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1
 loops=1)
   Index Cond: (tagid = 1000::numeric)
   Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric)
 AND (device_type = 100::numeric))
   Rows Removed by Filter: 7
   Total runtime: 0.282 ms

 good table: explain analyze update r_agrio_hourly set unconfirmed =
 unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
 and device_type = 100 and rowdate = '2015-02-23T13';

 RESULT:
   Update on r_agrio_hourly  (cost=0.42..17.36 rows=6 width=329) (actual
 time=0.102..0.102 rows=0 loops=1)
 -  Index Scan using u_r_agrio_hourly on r_agrio_hourly
   (cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1
 loops=1)
   Index Cond: ((tagid = 1000::numeric) AND (unitid =
 1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND
 (device_type = 100::numeric) AND (placement = 0::numeric))
   Total runtime: 0.135 ms

 When I try doing it with WITH statement (really, to apply the actual
 data that the plpgsql function uses), there is something strange in the
 bad table.

 explain analyze
 with SRC as (select * from xq_agr where id = 914830)
  update r_agrio_total set
unconfirmed = unconfirmed +
 (SRC.r_agrio-'unconfirmed')::numeric(38)
  from SRC
  where
tagid = (SRC.r_agrio-'key'-'tagid')::numeric and
unitid = (SRC.r_agrio-'key'-'unit')::numeric and
placement = (SRC.r_agrio-'key'-'placement')::numeric and
device_type = (SRC.r_agrio-'key'-'device_type')::numeric;

 RESULT:
   Update on r_agrio_total  (cost=8.91..32777.51 rows=19331 width=409)
 (actual time=0.107..0.107 rows=0 loops=1)
 CTE src
   -  Index Scan using xq_agr_pkey on xq_agr  (cost=0.42..8.44
 rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1)
 Index Cond: (id = 914830)
 -  Nested Loop  (cost=0.46..32769.07 rows=19331 width=409) (actual
 time=0.107..0.107 rows=0 loops=1)
   -  CTE Scan on src  (cost=0.00..0.02 rows=1 width=88) (actual
 time=0.032..0.033 rows=1 loops=1)
   -  Index Scan using u_r_agrio_total on r_agrio_total
   (*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001

 rows=0 loops=1)
 Index Cond: ((tagid = (((src.r_agrio - 'key'::text) -
 'tagid'::text))::numeric) AND (unitid = (((src.r_agrio - 'key'::text)
 - 'unit'::text))::numeric) AND (device_type = (((src.r_agrio -
 'key'::text) - 'device_type'::text))::numeric) AND (placement =
 (((src.r_agrio - 'key'::text) - 'placement'::text))::numeric))
   Total runtime: 0.155 ms

 explain analyze
 with SRC as (select * from xq_agr where id = 914830)
  update r_agrio_hourly set
unconfirmed = unconfirmed +
 (SRC.r_agrio-'unconfirmed')::numeric(38)
  from SRC
  where
tagid = (SRC.r_agrio-'key'-'tagid')::numeric and
unitid = (SRC.r_agrio-'key'-'unit')::numeric and
placement = (SRC.r_agrio-'key'-'placement')::numeric and
device_type = (SRC.r_agrio-'key'-'device_type')::numeric
 and
rowdate = (SRC.r_agrio-'key'-'rowdate');

 RESULT:
   Update on r_agrio_hourly  (cost=8.91..52.91 rows=20 width=417) (actual
 time=0.123..0.123 rows=0 loops=1)
 CTE src
   -  Index Scan using xq_agr_pkey on xq_agr 

Re: [GENERAL] Help with slow table update

2015-04-13 Thread Jim Nasby

On 4/13/15 7:01 PM, Pawel Veselov wrote:

Cursors tend to make things slow. Avoid them if you can.


Is there an alternative to iterating over a number of rows, where a
direct update query is not an option?

I really doubt that either the actual processing logic, including use of
types has anything to do with my problem. This is based on the fact that
out of the tables that are being changed, only one is exhibiting the
problem. All of the involved tables have nearly the same structure, and
have the same logical operations performed on them. I thought may be the
bad table is slow because it was first in the list, and Postgres was
caching the functions results, but I moved things around, and pattern is
the same.


I'm guessing that you're essentially processing a queue. Take a look at 
http://www.postgresql.org/message-id/552c750f.2010...@bluetreble.com for 
some ideas. Basically, not only do cursors have non-trivial overhead, 
doing a ton of single-row queries is going to have a non-trivial 
overhead itself.



As for your specific question, I suggest you modify the plpgsql
function so that it's doing an EXPLAIN ANALYZE on the slow table.
EXPLAIN ANALYZE actually returns a recordset the same way a SELECT
would, with a single column of type text. So you just need to do
something with that output. The easiest thing would be to replace
this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever
it returns now.


Thank you, that made it a lot easier to see into what's really going on.
But the outcome is somewhat the same. The bad table analysis shows a
very high cost, and thousands of rows, where the table contains only 24
rows. This time, however, the actual run time is shown, and one can see
where the time is spent (I was using just a sum of clock_time()s around
the update statements to see where the problem is).

r_agrio_hourly - good, r_agrio_total - bad.

  Update on r_agrio_hourly  (cost=0.42..970.32 rows=250 width=329)
(actual time=2.248..2.248 rows=0 loops=1)
  -  Index Scan using u_r_agrio_hourly on r_agrio_hourly
  (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207
rows=1 loops=1)
  Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
(device_type = 3::numeric) AND (placement = 2::numeric))
  Total runtime: 2.281 ms
  Update on r_agrio_total  (cost=0.42..45052.56 rows=12068 width=321)
(actual time=106.766..106.766 rows=0 loops=1)
  -  Index Scan using u_r_agrio_total on r_agrio_total
  (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
rows=1 loops=1)
  Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
  Total runtime: 106.793 ms


Keep in mind that the estimated cost is not terribly useful; it's the 
actual times that matter.


I suspect what's happening here is a combination of things. First, the 
hourly table is basically living in cache, but the total table is not. 
That means that when you go to find a row in the total table you're 
actually hitting the disk instead of pulling the data from memory.


Second, you may have a lot of dead rows in the total table. I suspect 
this because of the very large amount of time the index scan is taking. 
Unless you're running on an old 10MB MFM drive you'd be pretty hard 
pressed for even 2 IO operations (one for the index leaf page and one 
for the heap page) to take 32ms. I suspect the index scan is having to 
read many dead rows in before it finds a live one, and incurring 
multiple IOs. Swiching to EXPLAIN (analyze, buffers) would help confirm 
that.


Third, I think something odd is happening with the update itself. I'm 
pretty sure that the index scan itself is visiting the heap pages, so 
each page should be in shared buffers by the time each tuple hits the 
update node. That makes me wonder what on earth is taking 60ms to update 
the tuple. I suspect it's going into either finding a free buffer to put 
the new tuple on, or waiting to try and extend the relation. Selecting 
ctid from the freshly updated rows and comparing the first number to the 
total number of pages in the heap would show if the new tuples are all 
ending up at the end of the heap.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help with slow table update

2015-04-12 Thread Jim Nasby

On 4/9/15 6:18 PM, Pawel Veselov wrote:

Hi.

I have a plpgsql procedure that updates a few similar tables.
for some reason, updates on one of the tables take a lot longer the
updates on the other ones. The difference is, say, 7 seconds vs. 80
milliseconds.

the procedure uses cursors and record variables to do the updates. For
example:

 update r_agrio_total set
   unserved = unserved + (agrow-'unserved')::numeric(38),
   r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),
 where
   tagid = _tagid and
   unitid = (akey-'unitid')::numeric and
   placement = (akey-'placement')::numeric and
   device_type = (akey-'device_type')::numeric;

There is another table (xq_agr) that is read record by record, and for
each of those records, such update is executed.

I was trying to select analyze the updates to see where the time could
be spent.
There are only 24 row in the bad table, and 3,400 rows in good
table. So, for the bad table, most of the updates will be on the same
rows. The times were measured on processing 100 original records.

When I'm analyzing pure update statements, I don't see anything strange.

bad table: explain analyze update r_agrio_total set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100;

RESULT:
  Update on r_agrio_total  (cost=0.42..4.46 rows=1 width=321) (actual
time=0.253..0.253 rows=0 loops=1)
-  Index Scan using tag_r_agrio_total on r_agrio_total
  (cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1
loops=1)
  Index Cond: (tagid = 1000::numeric)
  Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric)
AND (device_type = 100::numeric))
  Rows Removed by Filter: 7
  Total runtime: 0.282 ms

good table: explain analyze update r_agrio_hourly set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100 and rowdate = '2015-02-23T13';

RESULT:
  Update on r_agrio_hourly  (cost=0.42..17.36 rows=6 width=329) (actual
time=0.102..0.102 rows=0 loops=1)
-  Index Scan using u_r_agrio_hourly on r_agrio_hourly
  (cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1
loops=1)
  Index Cond: ((tagid = 1000::numeric) AND (unitid =
1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND
(device_type = 100::numeric) AND (placement = 0::numeric))
  Total runtime: 0.135 ms

When I try doing it with WITH statement (really, to apply the actual
data that the plpgsql function uses), there is something strange in the
bad table.

explain analyze
with SRC as (select * from xq_agr where id = 914830)
 update r_agrio_total set
   unconfirmed = unconfirmed +
(SRC.r_agrio-'unconfirmed')::numeric(38)
 from SRC
 where
   tagid = (SRC.r_agrio-'key'-'tagid')::numeric and
   unitid = (SRC.r_agrio-'key'-'unit')::numeric and
   placement = (SRC.r_agrio-'key'-'placement')::numeric and
   device_type = (SRC.r_agrio-'key'-'device_type')::numeric;

RESULT:
  Update on r_agrio_total  (cost=8.91..32777.51 rows=19331 width=409)
(actual time=0.107..0.107 rows=0 loops=1)
CTE src
  -  Index Scan using xq_agr_pkey on xq_agr  (cost=0.42..8.44
rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (id = 914830)
-  Nested Loop  (cost=0.46..32769.07 rows=19331 width=409) (actual
time=0.107..0.107 rows=0 loops=1)
  -  CTE Scan on src  (cost=0.00..0.02 rows=1 width=88) (actual
time=0.032..0.033 rows=1 loops=1)
  -  Index Scan using u_r_agrio_total on r_agrio_total
  (*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001
rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio - 'key'::text) -
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio - 'key'::text)
- 'unit'::text))::numeric) AND (device_type = (((src.r_agrio -
'key'::text) - 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio - 'key'::text) - 'placement'::text))::numeric))
  Total runtime: 0.155 ms

explain analyze
with SRC as (select * from xq_agr where id = 914830)
 update r_agrio_hourly set
   unconfirmed = unconfirmed +
(SRC.r_agrio-'unconfirmed')::numeric(38)
 from SRC
 where
   tagid = (SRC.r_agrio-'key'-'tagid')::numeric and
   unitid = (SRC.r_agrio-'key'-'unit')::numeric and
   placement = (SRC.r_agrio-'key'-'placement')::numeric and
   device_type = (SRC.r_agrio-'key'-'device_type')::numeric and
   rowdate = (SRC.r_agrio-'key'-'rowdate');

RESULT:
  Update on r_agrio_hourly  (cost=8.91..52.91 rows=20 width=417) (actual
time=0.123..0.123 rows=0 loops=1)
CTE src
  -  Index Scan using xq_agr_pkey on xq_agr  (cost=0.42..8.44
rows=1 width=379) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (id = 914830)
-  Nested Loop  (cost=0.47..44.47 rows=20 width=417) 

Re: [GENERAL] Help with tokenization of age-ranges in full text search

2015-02-25 Thread Alvaro Herrera
Mason Hale wrote:
 Hello, I've got a 9.3 database hosted at Heroku.
 
 I'm full text search to search for group names in part of my application,
 and some of my group names are the names of youth sports age groups like
 Boys 9-10 or Girls 11-12.
 
 I would like for a search for the terms Boys, Boys 9-10,  9, 10 or
 9-10 to match Boys 9-10.

Hm, so if there's a sport for Boys 8-10, what will you do when it
doesn't match a query for 9?  Does this matter?  I mean, maybe
tokenization is not the most appropriate thing to do in this case.

 So my question is -- can I get the tokenization that I want out of a
 configuration of the stock available token types?

The tokenizer stuff is not the most configurable part of the FTS stuff,
sadly.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] help troubleshooting invalid page header error

2014-12-29 Thread Kevin Grittner
Cory Zue c...@dimagi.com wrote:

 I was able to get the database back to a normal functional state
 using the zero_damaged_pages flag. However, after getting
 everything working and starting to use the database again, I am
 again getting invalid page header errors on a certain table.

 Does this imply there is a hardware issue on my machine? Is there
 anything else that could be causing this to come back?

In my personal experience bad hardware is the most common cause,
followed by buggy device drivers (where an OS software upgrade
prevented further corruption), followed by using incorrect
procedures for backup, restore, replication setup, or node
promotion.  For example, not excluding files under pg_xlog from a
base backup or deleting (or moving) the backup_label file can cause
corruption.

For a more complete discussion, see this blog page:

http://rhaas.blogspot.com/2012/03/why-is-my-database-corrupted.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] help troubleshooting invalid page header error

2014-12-28 Thread Cory Zue
Hi again,

I was able to get the database back to a normal functional state using
the zero_damaged_pages
flag. However, after getting everything working and starting to use the
database again, I am again getting invalid page header errors on a
certain table.

Does this imply there is a hardware issue on my machine? Is there anything
else that could be causing this to come back?

thanks,
Cory

On Fri, Dec 26, 2014 at 5:15 PM, Cory Zue c...@dimagi.com wrote:

 Hi Chiru,

 I am trying to pg_dump the database to have a snapshot of the current
 state. I've turned on 'zero_damaged_pages' but pg_dump is still failing
 with an invalid page header error - this time from what looks like a
 sequence object that is auto-setting IDs on a table. Any advice on how to
 remove this error?

 Here is the full query that's failing:

 SELECT sequence_name, start_value, last_value, increment_by, CASE WHEN
 increment_by  0 AND max_value = 9223372036854775807 THEN NULL  WHEN
 increment_by  0 AND max_value = -1 THEN NULL  ELSE max_value END AS
 max_value, CASE WHEN increment_by  0 AND min_value = 1 THEN NULL  WHEN
 increment_by  0 AND min_value = -9223372036854775807 THEN NULL  ELSE
 min_value END AS min_value, cache_value, is_cycled, is_called from
 unfinishedsubmissionstub_id_seq

 On Fri, Dec 26, 2014 at 2:35 PM, chiru r chir...@gmail.com wrote:

 Hi Cory,

 After recovering table turn off *zero_damaged_pages  *parameter.


 On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue c...@dimagi.com wrote:

 Hi all,

 Thanks for the responses. Chiru, I'm looking into your suggestion.

 Sameer, here is the kernel version info:

 Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT
 2014 x86_64 x86_64 x86_64 GNU/Linux

 Does that seem like it could be a problematic version?

 More generally - I'm still wondering whether I should chalk this failure
 up to a transient/random issue, or whether I should be more worried about
 the hardware on the machine. According to our diagnostic tools,  disk and
 memory are fine, but it's still not clear to me how it got into this state.
 Any general bits of information regarding the potential causes of these
 types of issues would be much appreciated.

 thanks,
 Cory


 On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar sameer.ku...@ashnik.com
 wrote:

 On 23 Dec 2014 12:05, Cory Zue c...@dimagi.com wrote:
 
  Hi all,
 
  Our postgres instance on one of our production machines has recently
 been returning errors of the form DatabaseError: invalid page header in
 block 1 of relation base/16384/76623 from normal queries. I've been
 reading that these are often linked to hardware errors, but I would like to
 better understand what else it could be or how to determine that for sure.
 I've filled out the standard issue reporting template below. Any feedback
 or troubleshooting instructions would be much appreciated.
 
  ---
  A description of what you are trying to achieve and what results you
 expect.:
 
  Intermittent queries are failing with the error DatabaseError:
 invalid page header in block 1 of relation base/16384/76623
 
  PostgreSQL version number you are running:
 
  PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc
 (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
 
  How you installed PostgreSQL:
 
  from standard package installer
 
  Changes made to the settings in the postgresql.conf file:
 
 
   name |   current_setting   |
  source
 
 --+-+--
   checkpoint_completion_target | 0.9 |
 configuration file
   checkpoint_segments  | 32  |
 configuration file
   checkpoint_timeout   | 15min   |
 configuration file
   DateStyle| ISO, MDY|
 configuration file
   default_text_search_config   | pg_catalog.english  |
 configuration file
   effective_cache_size | 1GB |
 configuration file
   lc_messages  | en_US.UTF-8 |
 configuration file
   lc_monetary  | en_US.UTF-8 |
 configuration file
   lc_numeric   | en_US.UTF-8 |
 configuration file
   lc_time  | en_US.UTF-8 |
 configuration file
   log_checkpoints  | on  |
 configuration file
   log_connections  | off |
 configuration file
   log_destination  | csvlog  |
 configuration file
   log_directory| /opt/data/pgsql/data/pg_log |
 configuration file
   log_disconnections   | off |
 configuration file
   log_duration | on  |
 configuration file
   log_filename | postgres-%Y-%m-%d_%H%M%S|
 configuration file
   

Re: [GENERAL] help troubleshooting invalid page header error

2014-12-26 Thread Sameer Kumar
On 23 Dec 2014 12:05, Cory Zue c...@dimagi.com wrote:

 Hi all,

 Our postgres instance on one of our production machines has recently been
returning errors of the form DatabaseError: invalid page header in block 1
of relation base/16384/76623 from normal queries. I've been reading that
these are often linked to hardware errors, but I would like to better
understand what else it could be or how to determine that for sure. I've
filled out the standard issue reporting template below. Any feedback or
troubleshooting instructions would be much appreciated.

 ---
 A description of what you are trying to achieve and what results you
expect.:

 Intermittent queries are failing with the error DatabaseError: invalid
page header in block 1 of relation base/16384/76623

 PostgreSQL version number you are running:

 PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

 How you installed PostgreSQL:

 from standard package installer

 Changes made to the settings in the postgresql.conf file:


  name |   current_setting   |
 source

--+-+--
  checkpoint_completion_target | 0.9 |
configuration file
  checkpoint_segments  | 32  |
configuration file
  checkpoint_timeout   | 15min   |
configuration file
  DateStyle| ISO, MDY|
configuration file
  default_text_search_config   | pg_catalog.english  |
configuration file
  effective_cache_size | 1GB |
configuration file
  lc_messages  | en_US.UTF-8 |
configuration file
  lc_monetary  | en_US.UTF-8 |
configuration file
  lc_numeric   | en_US.UTF-8 |
configuration file
  lc_time  | en_US.UTF-8 |
configuration file
  log_checkpoints  | on  |
configuration file
  log_connections  | off |
configuration file
  log_destination  | csvlog  |
configuration file
  log_directory| /opt/data/pgsql/data/pg_log |
configuration file
  log_disconnections   | off |
configuration file
  log_duration | on  |
configuration file
  log_filename | postgres-%Y-%m-%d_%H%M%S|
configuration file
  log_lock_waits   | on  |
configuration file
  log_min_duration_statement   | 250ms   |
configuration file
  log_rotation_age | 1d  |
configuration file
  log_rotation_size| 1GB |
configuration file
  log_temp_files   | 0   |
configuration file
  log_timezone | Asia/Kolkata| command line
  log_truncate_on_rotation | on  |
configuration file
  logging_collector| on  |
configuration file
  maintenance_work_mem | 768MB   |
configuration file
  max_connections  | 500 |
configuration file
  max_stack_depth  | 2MB | environment
variable
  port | 5432| command line
  shared_buffers   | 4GB |
configuration file
  ssl  | on  |
configuration file
  TimeZone | Asia/Kolkata| command line
  timezone_abbreviations   | Default | command line
  wal_buffers  | 16MB|
configuration file
  work_mem | 48MB|
configuration file

 It's also probably worth noting that postgres is installed on an
encrypted volume which is mounted using ecryptfs.

 Operating system and version:

 RedHatEnterpriseServer, version 6.6

 What program you're using to connect to PostgreSQL:

 Python (django)

 Is there anything relevant or unusual in the PostgreSQL server logs?:

 I see lots of instances of this error (and similar). I'm not sure what
else I should be looking for.

 What you were doing when the error happened / how to cause the error:

 I haven't explicitly tried to reproduce it, but it seems to consistently
happen with certain queries. However, the system was rebooted shortly
before the errors started occuring. The system was rebooted because another
database (elasticsearch) was having problems on the same machine and the
reboot was to attempt to resolve things.

 The EXACT TEXT of the error message you're getting, if there is one:

 DatabaseError: invalid 

Re: [GENERAL] help troubleshooting invalid page header error

2014-12-26 Thread Cory Zue
Hi all,

Thanks for the responses. Chiru, I'm looking into your suggestion.

Sameer, here is the kernel version info:

Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT 2014
x86_64 x86_64 x86_64 GNU/Linux

Does that seem like it could be a problematic version?

More generally - I'm still wondering whether I should chalk this failure up
to a transient/random issue, or whether I should be more worried about the
hardware on the machine. According to our diagnostic tools,  disk and
memory are fine, but it's still not clear to me how it got into this state.
Any general bits of information regarding the potential causes of these
types of issues would be much appreciated.

thanks,
Cory


On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar sameer.ku...@ashnik.com
wrote:

 On 23 Dec 2014 12:05, Cory Zue c...@dimagi.com wrote:
 
  Hi all,
 
  Our postgres instance on one of our production machines has recently
 been returning errors of the form DatabaseError: invalid page header in
 block 1 of relation base/16384/76623 from normal queries. I've been
 reading that these are often linked to hardware errors, but I would like to
 better understand what else it could be or how to determine that for sure.
 I've filled out the standard issue reporting template below. Any feedback
 or troubleshooting instructions would be much appreciated.
 
  ---
  A description of what you are trying to achieve and what results you
 expect.:
 
  Intermittent queries are failing with the error DatabaseError: invalid
 page header in block 1 of relation base/16384/76623
 
  PostgreSQL version number you are running:
 
  PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
 
  How you installed PostgreSQL:
 
  from standard package installer
 
  Changes made to the settings in the postgresql.conf file:
 
 
   name |   current_setting   |
  source
 
 --+-+--
   checkpoint_completion_target | 0.9 |
 configuration file
   checkpoint_segments  | 32  |
 configuration file
   checkpoint_timeout   | 15min   |
 configuration file
   DateStyle| ISO, MDY|
 configuration file
   default_text_search_config   | pg_catalog.english  |
 configuration file
   effective_cache_size | 1GB |
 configuration file
   lc_messages  | en_US.UTF-8 |
 configuration file
   lc_monetary  | en_US.UTF-8 |
 configuration file
   lc_numeric   | en_US.UTF-8 |
 configuration file
   lc_time  | en_US.UTF-8 |
 configuration file
   log_checkpoints  | on  |
 configuration file
   log_connections  | off |
 configuration file
   log_destination  | csvlog  |
 configuration file
   log_directory| /opt/data/pgsql/data/pg_log |
 configuration file
   log_disconnections   | off |
 configuration file
   log_duration | on  |
 configuration file
   log_filename | postgres-%Y-%m-%d_%H%M%S|
 configuration file
   log_lock_waits   | on  |
 configuration file
   log_min_duration_statement   | 250ms   |
 configuration file
   log_rotation_age | 1d  |
 configuration file
   log_rotation_size| 1GB |
 configuration file
   log_temp_files   | 0   |
 configuration file
   log_timezone | Asia/Kolkata| command
 line
   log_truncate_on_rotation | on  |
 configuration file
   logging_collector| on  |
 configuration file
   maintenance_work_mem | 768MB   |
 configuration file
   max_connections  | 500 |
 configuration file
   max_stack_depth  | 2MB |
 environment variable
   port | 5432| command
 line
   shared_buffers   | 4GB |
 configuration file
   ssl  | on  |
 configuration file
   TimeZone | Asia/Kolkata| command
 line
   timezone_abbreviations   | Default | command
 line
   wal_buffers  | 16MB|
 configuration file
   work_mem | 48MB|
 configuration file
 
  It's also probably worth noting that 

Re: [GENERAL] help troubleshooting invalid page header error

2014-12-26 Thread chiru r
Hi Cory,

After recovering table turn off *zero_damaged_pages  *parameter.


On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue c...@dimagi.com wrote:

 Hi all,

 Thanks for the responses. Chiru, I'm looking into your suggestion.

 Sameer, here is the kernel version info:

 Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT
 2014 x86_64 x86_64 x86_64 GNU/Linux

 Does that seem like it could be a problematic version?

 More generally - I'm still wondering whether I should chalk this failure
 up to a transient/random issue, or whether I should be more worried about
 the hardware on the machine. According to our diagnostic tools,  disk and
 memory are fine, but it's still not clear to me how it got into this state.
 Any general bits of information regarding the potential causes of these
 types of issues would be much appreciated.

 thanks,
 Cory


 On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar sameer.ku...@ashnik.com
 wrote:

 On 23 Dec 2014 12:05, Cory Zue c...@dimagi.com wrote:
 
  Hi all,
 
  Our postgres instance on one of our production machines has recently
 been returning errors of the form DatabaseError: invalid page header in
 block 1 of relation base/16384/76623 from normal queries. I've been
 reading that these are often linked to hardware errors, but I would like to
 better understand what else it could be or how to determine that for sure.
 I've filled out the standard issue reporting template below. Any feedback
 or troubleshooting instructions would be much appreciated.
 
  ---
  A description of what you are trying to achieve and what results you
 expect.:
 
  Intermittent queries are failing with the error DatabaseError: invalid
 page header in block 1 of relation base/16384/76623
 
  PostgreSQL version number you are running:
 
  PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
 
  How you installed PostgreSQL:
 
  from standard package installer
 
  Changes made to the settings in the postgresql.conf file:
 
 
   name |   current_setting   |
  source
 
 --+-+--
   checkpoint_completion_target | 0.9 |
 configuration file
   checkpoint_segments  | 32  |
 configuration file
   checkpoint_timeout   | 15min   |
 configuration file
   DateStyle| ISO, MDY|
 configuration file
   default_text_search_config   | pg_catalog.english  |
 configuration file
   effective_cache_size | 1GB |
 configuration file
   lc_messages  | en_US.UTF-8 |
 configuration file
   lc_monetary  | en_US.UTF-8 |
 configuration file
   lc_numeric   | en_US.UTF-8 |
 configuration file
   lc_time  | en_US.UTF-8 |
 configuration file
   log_checkpoints  | on  |
 configuration file
   log_connections  | off |
 configuration file
   log_destination  | csvlog  |
 configuration file
   log_directory| /opt/data/pgsql/data/pg_log |
 configuration file
   log_disconnections   | off |
 configuration file
   log_duration | on  |
 configuration file
   log_filename | postgres-%Y-%m-%d_%H%M%S|
 configuration file
   log_lock_waits   | on  |
 configuration file
   log_min_duration_statement   | 250ms   |
 configuration file
   log_rotation_age | 1d  |
 configuration file
   log_rotation_size| 1GB |
 configuration file
   log_temp_files   | 0   |
 configuration file
   log_timezone | Asia/Kolkata| command
 line
   log_truncate_on_rotation | on  |
 configuration file
   logging_collector| on  |
 configuration file
   maintenance_work_mem | 768MB   |
 configuration file
   max_connections  | 500 |
 configuration file
   max_stack_depth  | 2MB |
 environment variable
   port | 5432| command
 line
   shared_buffers   | 4GB |
 configuration file
   ssl  | on  |
 configuration file
   TimeZone | Asia/Kolkata| command
 line
   timezone_abbreviations   | Default | command
 line
   wal_buffers  | 16MB   

Re: [GENERAL] help troubleshooting invalid page header error

2014-12-26 Thread Cory Zue
Hi Chiru,

I am trying to pg_dump the database to have a snapshot of the current
state. I've turned on 'zero_damaged_pages' but pg_dump is still failing
with an invalid page header error - this time from what looks like a
sequence object that is auto-setting IDs on a table. Any advice on how to
remove this error?

Here is the full query that's failing:

SELECT sequence_name, start_value, last_value, increment_by, CASE WHEN
increment_by  0 AND max_value = 9223372036854775807 THEN NULL  WHEN
increment_by  0 AND max_value = -1 THEN NULL  ELSE max_value END AS
max_value, CASE WHEN increment_by  0 AND min_value = 1 THEN NULL  WHEN
increment_by  0 AND min_value = -9223372036854775807 THEN NULL  ELSE
min_value END AS min_value, cache_value, is_cycled, is_called from
unfinishedsubmissionstub_id_seq

On Fri, Dec 26, 2014 at 2:35 PM, chiru r chir...@gmail.com wrote:

 Hi Cory,

 After recovering table turn off *zero_damaged_pages  *parameter.


 On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue c...@dimagi.com wrote:

 Hi all,

 Thanks for the responses. Chiru, I'm looking into your suggestion.

 Sameer, here is the kernel version info:

 Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT
 2014 x86_64 x86_64 x86_64 GNU/Linux

 Does that seem like it could be a problematic version?

 More generally - I'm still wondering whether I should chalk this failure
 up to a transient/random issue, or whether I should be more worried about
 the hardware on the machine. According to our diagnostic tools,  disk and
 memory are fine, but it's still not clear to me how it got into this state.
 Any general bits of information regarding the potential causes of these
 types of issues would be much appreciated.

 thanks,
 Cory


 On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar sameer.ku...@ashnik.com
 wrote:

 On 23 Dec 2014 12:05, Cory Zue c...@dimagi.com wrote:
 
  Hi all,
 
  Our postgres instance on one of our production machines has recently
 been returning errors of the form DatabaseError: invalid page header in
 block 1 of relation base/16384/76623 from normal queries. I've been
 reading that these are often linked to hardware errors, but I would like to
 better understand what else it could be or how to determine that for sure.
 I've filled out the standard issue reporting template below. Any feedback
 or troubleshooting instructions would be much appreciated.
 
  ---
  A description of what you are trying to achieve and what results you
 expect.:
 
  Intermittent queries are failing with the error DatabaseError:
 invalid page header in block 1 of relation base/16384/76623
 
  PostgreSQL version number you are running:
 
  PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc
 (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
 
  How you installed PostgreSQL:
 
  from standard package installer
 
  Changes made to the settings in the postgresql.conf file:
 
 
   name |   current_setting   |
  source
 
 --+-+--
   checkpoint_completion_target | 0.9 |
 configuration file
   checkpoint_segments  | 32  |
 configuration file
   checkpoint_timeout   | 15min   |
 configuration file
   DateStyle| ISO, MDY|
 configuration file
   default_text_search_config   | pg_catalog.english  |
 configuration file
   effective_cache_size | 1GB |
 configuration file
   lc_messages  | en_US.UTF-8 |
 configuration file
   lc_monetary  | en_US.UTF-8 |
 configuration file
   lc_numeric   | en_US.UTF-8 |
 configuration file
   lc_time  | en_US.UTF-8 |
 configuration file
   log_checkpoints  | on  |
 configuration file
   log_connections  | off |
 configuration file
   log_destination  | csvlog  |
 configuration file
   log_directory| /opt/data/pgsql/data/pg_log |
 configuration file
   log_disconnections   | off |
 configuration file
   log_duration | on  |
 configuration file
   log_filename | postgres-%Y-%m-%d_%H%M%S|
 configuration file
   log_lock_waits   | on  |
 configuration file
   log_min_duration_statement   | 250ms   |
 configuration file
   log_rotation_age | 1d  |
 configuration file
   log_rotation_size| 1GB |
 configuration file
   log_temp_files   | 0   |
 configuration file
   log_timezone | Asia/Kolkata| 

Re: [GENERAL] help troubleshooting invalid page header error

2014-12-26 Thread Cory Zue
(nevermind - it looks like the zero_damaged_pages setting only took for the
duration of the session)

On Fri, Dec 26, 2014 at 5:15 PM, Cory Zue c...@dimagi.com wrote:

 Hi Chiru,

 I am trying to pg_dump the database to have a snapshot of the current
 state. I've turned on 'zero_damaged_pages' but pg_dump is still failing
 with an invalid page header error - this time from what looks like a
 sequence object that is auto-setting IDs on a table. Any advice on how to
 remove this error?

 Here is the full query that's failing:

 SELECT sequence_name, start_value, last_value, increment_by, CASE WHEN
 increment_by  0 AND max_value = 9223372036854775807 THEN NULL  WHEN
 increment_by  0 AND max_value = -1 THEN NULL  ELSE max_value END AS
 max_value, CASE WHEN increment_by  0 AND min_value = 1 THEN NULL  WHEN
 increment_by  0 AND min_value = -9223372036854775807 THEN NULL  ELSE
 min_value END AS min_value, cache_value, is_cycled, is_called from
 unfinishedsubmissionstub_id_seq

 On Fri, Dec 26, 2014 at 2:35 PM, chiru r chir...@gmail.com wrote:

 Hi Cory,

 After recovering table turn off *zero_damaged_pages  *parameter.


 On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue c...@dimagi.com wrote:

 Hi all,

 Thanks for the responses. Chiru, I'm looking into your suggestion.

 Sameer, here is the kernel version info:

 Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT
 2014 x86_64 x86_64 x86_64 GNU/Linux

 Does that seem like it could be a problematic version?

 More generally - I'm still wondering whether I should chalk this failure
 up to a transient/random issue, or whether I should be more worried about
 the hardware on the machine. According to our diagnostic tools,  disk and
 memory are fine, but it's still not clear to me how it got into this state.
 Any general bits of information regarding the potential causes of these
 types of issues would be much appreciated.

 thanks,
 Cory


 On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar sameer.ku...@ashnik.com
 wrote:

 On 23 Dec 2014 12:05, Cory Zue c...@dimagi.com wrote:
 
  Hi all,
 
  Our postgres instance on one of our production machines has recently
 been returning errors of the form DatabaseError: invalid page header in
 block 1 of relation base/16384/76623 from normal queries. I've been
 reading that these are often linked to hardware errors, but I would like to
 better understand what else it could be or how to determine that for sure.
 I've filled out the standard issue reporting template below. Any feedback
 or troubleshooting instructions would be much appreciated.
 
  ---
  A description of what you are trying to achieve and what results you
 expect.:
 
  Intermittent queries are failing with the error DatabaseError:
 invalid page header in block 1 of relation base/16384/76623
 
  PostgreSQL version number you are running:
 
  PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc
 (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
 
  How you installed PostgreSQL:
 
  from standard package installer
 
  Changes made to the settings in the postgresql.conf file:
 
 
   name |   current_setting   |
  source
 
 --+-+--
   checkpoint_completion_target | 0.9 |
 configuration file
   checkpoint_segments  | 32  |
 configuration file
   checkpoint_timeout   | 15min   |
 configuration file
   DateStyle| ISO, MDY|
 configuration file
   default_text_search_config   | pg_catalog.english  |
 configuration file
   effective_cache_size | 1GB |
 configuration file
   lc_messages  | en_US.UTF-8 |
 configuration file
   lc_monetary  | en_US.UTF-8 |
 configuration file
   lc_numeric   | en_US.UTF-8 |
 configuration file
   lc_time  | en_US.UTF-8 |
 configuration file
   log_checkpoints  | on  |
 configuration file
   log_connections  | off |
 configuration file
   log_destination  | csvlog  |
 configuration file
   log_directory| /opt/data/pgsql/data/pg_log |
 configuration file
   log_disconnections   | off |
 configuration file
   log_duration | on  |
 configuration file
   log_filename | postgres-%Y-%m-%d_%H%M%S|
 configuration file
   log_lock_waits   | on  |
 configuration file
   log_min_duration_statement   | 250ms   |
 configuration file
   log_rotation_age | 1d  |
 configuration file
   log_rotation_size| 1GB  

Re: [GENERAL] help troubleshooting invalid page header error

2014-12-25 Thread chiru r
Hi Cory,

We have *zero_damaged_pages* parameter in PostgreSQL configuration,by
default it is set be *off*.
To recover data from corrupted table,we can turn *on* this parameter as a
 super user and populate new table using dump or copy utility.

Note : The damaged pages we can't recover from table,it will set to 0 and
it will skip while fetching data from table.

Please follow below steps, if decided to recover data from corrupted table.

*Sample case :*

[postgres@instructor ~]$ /usr/local/pgsql/bin/psql
psql (9.4rc1)
Type help for help.

postgres=# select count(*) from test;
*ERROR:  invalid page in block 7 of relation base/13003/16384*
postgres=# show zero_damaged_pages;
 zero_damaged_pages

 off
(1 row)

postgres=# *set zero_damaged_pages=on;*
SET
postgres=# show zero_damaged_pages;
 zero_damaged_pages

 on
(1 row)

postgres=# select count(*) from test;
*WARNING:  invalid page in block 7 of relation base/13003/16384; zeroing
out page*
WARNING:  invalid page in block 8 of relation base/13003/16384; zeroing out
page
WARNING:  invalid page in block 9 of relation base/13003/16384; zeroing out
page
WARNING:  invalid page in block 10 of relation base/13003/16384; zeroing
out page
WARNING:  invalid page in block 11 of relation base/13003/16384; zeroing
out page
WARNING:  invalid page in block 12 of relation base/13003/16384; zeroing
out page
WARNING:  invalid page in block 13 of relation base/13003/16384; zeroing
out page

count

 979163
(1 row)


On Tue, Dec 23, 2014 at 8:47 AM, Cory Zue c...@dimagi.com wrote:

 Hi all,

 Our postgres instance on one of our production machines has recently been
 returning errors of the form DatabaseError: invalid page header in block
 1 of relation base/16384/76623 from normal queries. I've been reading that
 these are often linked to hardware errors, but I would like to better
 understand what else it could be or how to determine that for sure. I've
 filled out the standard issue reporting template below. Any feedback or
 troubleshooting instructions would be much appreciated.

 ---
 A description of what you are trying to achieve and what results you
 expect.:

 Intermittent queries are failing with the error DatabaseError: invalid
 page header in block 1 of relation base/16384/76623

 PostgreSQL version number you are running:

 PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

 How you installed PostgreSQL:

 from standard package installer

 Changes made to the settings in the postgresql.conf file:


  name |   current_setting   |
  source

 --+-+--
  checkpoint_completion_target | 0.9 |
 configuration file
  checkpoint_segments  | 32  |
 configuration file
  checkpoint_timeout   | 15min   |
 configuration file
  DateStyle| ISO, MDY|
 configuration file
  default_text_search_config   | pg_catalog.english  |
 configuration file
  effective_cache_size | 1GB |
 configuration file
  lc_messages  | en_US.UTF-8 |
 configuration file
  lc_monetary  | en_US.UTF-8 |
 configuration file
  lc_numeric   | en_US.UTF-8 |
 configuration file
  lc_time  | en_US.UTF-8 |
 configuration file
  log_checkpoints  | on  |
 configuration file
  log_connections  | off |
 configuration file
  log_destination  | csvlog  |
 configuration file
  log_directory| /opt/data/pgsql/data/pg_log |
 configuration file
  log_disconnections   | off |
 configuration file
  log_duration | on  |
 configuration file
  log_filename | postgres-%Y-%m-%d_%H%M%S|
 configuration file
  log_lock_waits   | on  |
 configuration file
  log_min_duration_statement   | 250ms   |
 configuration file
  log_rotation_age | 1d  |
 configuration file
  log_rotation_size| 1GB |
 configuration file
  log_temp_files   | 0   |
 configuration file
  log_timezone | Asia/Kolkata| command line
  log_truncate_on_rotation | on  |
 configuration file
  logging_collector| on  |
 configuration file
  maintenance_work_mem | 768MB   |
 configuration file
  max_connections  | 500 |
 configuration file
  

Re: [GENERAL] Help Optimizing a Summary Query

2014-12-11 Thread Robert DiFalco
Thanks Arthur. I don't think there is as big a different between BIGINT and
INTEGER as you think there is. In fact with an extended filesystem you
might not see any difference at all.

As I put in the first emal I am using a GIST index on user.name.

I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and
if there was a better alternative I had not considered.

On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva arthur...@gmail.com wrote:

 On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco robert.difa...@gmail.com
 wrote:

 I'm sorry, I missed a JOIN on the second variation. It is:

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
 http://u.id*
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;


 On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco robert.difa...@gmail.com
  wrote:

 I have users, friends, and friend_requests. I need a query that
 essentially returns a summary containing:

 * user (name, imageURL, bio, ...)
 * Friend status (relative to an active user)
* Is the user a friend of the active user?
* Has the user sent a friend request to the active user?
* Has the user received a friend request from the active user?
 * # of mutualFriends
 * Exclude the active user from the result set.

 So I have mocked this up two ways but both have complicated query plans
 that will be problematic with large data sets. I'm thinking that my lack of
 deep SQL knowledge is making me miss the obvious choice.

 Here's my two query examples:

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
 f.friend_id = u.id)   THEN 'isFriend'
   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
 AND s.from_id = u.id) THEN 'hasSentRequest'
   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
 AND r.from_id = 33)   THEN 'hasReceivedRequest'
   ELSE 'none'
END AS friendStatus,
(SELECT COUNT(1)
   FROM friends f1
  JOIN friends f2 ON f1.friend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
 FROM users u
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 33 is just the id of the active user I am using for testing. The WHERE
 clause could be anything. I'm just using u.name here but I'm more
 concerned about the construction of the result set than the WHERE clause.
 These have more or less similar query plans, nothing that would change
 things factorially. Is this the best I can do or am I missing the obvious?

 Here are the tables:


 CREATE TABLE users (
   idBIGINT,
   name  VARCHAR,
   imageURL  VARCHAR
   created   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
   phone_natlBIGINT,   /* National Phone Number */
   country_e164  SMALLINT, /* E164 country code */
   email VARCHAR(255),
   PRIMARY KEY (id),
   UNIQUE (email),
   UNIQUE (phone_natl, country_e164)
 );


 CREATE TABLE friends (
   user_id  BIGINT,
   friend_id   BIGINT,
   PRIMARY KEY (user_id, user_id),
   FOREIGN KEY (user_id)REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friends_friend ON friends(friend_id);

 CREATE TABLE friend_requests (
   from_id  BIGINT,
   to_idBIGINT,
   created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (from_id, user_id),
   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (to_id)REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

 Let me know if you guys need anything else.




 Hello Robert, none of your schemas worked for me, here's a clean version

 CREATE TABLE users (
 

Re: [GENERAL] Help Optimizing a Summary Query

2014-12-11 Thread Arthur Silva
On Thu, Dec 11, 2014 at 6:52 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 Thanks Arthur. I don't think there is as big a different between BIGINT
 and INTEGER as you think there is. In fact with an extended filesystem you
 might not see any difference at all.

 As I put in the first emal I am using a GIST index on user.name.

 I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and
 if there was a better alternative I had not considered.

 On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva arthur...@gmail.com wrote:

 On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco robert.difa...@gmail.com
 wrote:

 I'm sorry, I missed a JOIN on the second variation. It is:

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
 http://u.id*
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;


 On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco 
 robert.difa...@gmail.com wrote:

 I have users, friends, and friend_requests. I need a query that
 essentially returns a summary containing:

 * user (name, imageURL, bio, ...)
 * Friend status (relative to an active user)
* Is the user a friend of the active user?
* Has the user sent a friend request to the active user?
* Has the user received a friend request from the active user?
 * # of mutualFriends
 * Exclude the active user from the result set.

 So I have mocked this up two ways but both have complicated query plans
 that will be problematic with large data sets. I'm thinking that my lack of
 deep SQL knowledge is making me miss the obvious choice.

 Here's my two query examples:

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
 f.friend_id = u.id)   THEN 'isFriend'
   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
 AND s.from_id = u.id) THEN 'hasSentRequest'
   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
 AND r.from_id = 33)   THEN 'hasReceivedRequest'
   ELSE 'none'
END AS friendStatus,
(SELECT COUNT(1)
   FROM friends f1
  JOIN friends f2 ON f1.friend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
 FROM users u
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 33 is just the id of the active user I am using for testing. The WHERE
 clause could be anything. I'm just using u.name here but I'm more
 concerned about the construction of the result set than the WHERE clause.
 These have more or less similar query plans, nothing that would change
 things factorially. Is this the best I can do or am I missing the obvious?

 Here are the tables:


 CREATE TABLE users (
   idBIGINT,
   name  VARCHAR,
   imageURL  VARCHAR
   created   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
   phone_natlBIGINT,   /* National Phone Number */
   country_e164  SMALLINT, /* E164 country code */
   email VARCHAR(255),
   PRIMARY KEY (id),
   UNIQUE (email),
   UNIQUE (phone_natl, country_e164)
 );


 CREATE TABLE friends (
   user_id  BIGINT,
   friend_id   BIGINT,
   PRIMARY KEY (user_id, user_id),
   FOREIGN KEY (user_id)REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friends_friend ON friends(friend_id);

 CREATE TABLE friend_requests (
   from_id  BIGINT,
   to_idBIGINT,
   created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (from_id, user_id),
   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (to_id)REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

 Let me know if you guys need anything else.




 Hello 

Re: [GENERAL] Help Optimizing a Summary Query

2014-12-11 Thread David G Johnston
Robert DiFalco wrote
 I have users, friends, and friend_requests. I need a query that
 essentially
 returns a summary containing:
 
 * user (name, imageURL, bio, ...)
 * Friend status (relative to an active user)
* Is the user a friend of the active user?
* Has the user sent a friend request to the active user?
* Has the user received a friend request from the active user?
 * # of mutualFriends
 * Exclude the active user from the result set.
 
 So I have mocked this up two ways but both have complicated query plans
 that will be problematic with large data sets. I'm thinking that my lack
 of
 deep SQL knowledge is making me miss the obvious choice.
 
 Here's my two query examples:
 
 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
 f.friend_id = u.id)   THEN 'isFriend'
   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33   AND
 s.from_id = u.id) THEN 'hasSentRequest'
   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id AND
 r.from_id = 33)   THEN 'hasReceivedRequest'
   ELSE 'none'
END AS friendStatus,
(SELECT COUNT(1)
   FROM friends f1
  JOIN friends f2 ON f1.friend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
 FROM users u
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
 
 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
 
 33 is just the id of the active user I am using for testing. The WHERE
 clause could be anything. I'm just using u.name here but I'm more
 concerned about the construction of the result set than the WHERE clause.
 These have more or less similar query plans, nothing that would change
 things factorially. Is this the best I can do or am I missing the obvious?

I dislike the multiple LEFT JOIN version though I did not try to prove that
it possible to give incorrect results.

The goal is to avoid looping - so you want to create temporary results that
will contain all of the data you plan to need and then join them together. 
CTE/WITH is the feature that can do this most easily.

I have no idea how this will perform relative to the CASE WHEN EXISTS
version but it seems like it should be faster.  Again, I don't believe your
original LEFT JOIN query is equivalent to either of these but I cannot be
certain without more effort than I am able to put forth.

Hybrid SQL Code (note in particular that you cannot have literals in the
WITH field alias area...)

WITH user_ref (ref_u_id) AS ( VALUES (33) )
, users_vis_a_vis_ref (u_id, ref_id) AS ( ... WHERE u_id != ref_u_id)
, user_friend (u_id, ref_u_id, 'Friend' AS status_uf) AS ( ... )
, user_sent_request (u_id, ref_u_id, 'Sent' AS status_usr) AS ( ... )
, user_recv_request (u_id, ref_u_id, 'Received' AS status_urr) AS ( ... )
, user_mutuals (u_id, ref_u_id, ## AS mutual_count) AS ( ... )

SELECT u_id, ref_u_id
, COALESCE(status_uf, status_usr, status_urr, 'None') AS FriendStatus
, COALESCE(mutual_count, 0) AS MutualFriendCount
FROM users_vis_a_vis_ref 
NATURAL LEFT JOIN user_friend
NATURAL LEFT JOIN user_sent_request 
NATURAL LEFT JOIN user_recv_request 
NATURAL LEFT JOIN user_mutuals

It is safe to use NATURAL here since you are fully controlling the source
relations since they all come from the CTE/WITH structure.

David J.



--
View this message in context: 
http://postgresql.nabble.com/Help-Optimizing-a-Summary-Query-tp5829941p5830198.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help Optimizing a Summary Query

2014-12-09 Thread Robert DiFalco
I'm sorry, I missed a JOIN on the second variation. It is:

SELECT u.id, u.name, u.imageURL, u.bio,
   CASE
  WHEN f.friend_id IS NOT NULL THEN 'isFriend'
  WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
  WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
  ELSE 'none'
   END AS 'friendStatus',
   (SELECT COUNT(1) AS d
  FROM friends f1
 JOIN friends f2 ON f1.fiend_id = f2.friend_id
  WHERE f1.user_id = 33 AND f2.user_id = u.id)
FROM users u
*LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
http://u.id*
LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;


On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco robert.difa...@gmail.com
wrote:

 I have users, friends, and friend_requests. I need a query that
 essentially returns a summary containing:

 * user (name, imageURL, bio, ...)
 * Friend status (relative to an active user)
* Is the user a friend of the active user?
* Has the user sent a friend request to the active user?
* Has the user received a friend request from the active user?
 * # of mutualFriends
 * Exclude the active user from the result set.

 So I have mocked this up two ways but both have complicated query plans
 that will be problematic with large data sets. I'm thinking that my lack of
 deep SQL knowledge is making me miss the obvious choice.

 Here's my two query examples:

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
 f.friend_id = u.id)   THEN 'isFriend'
   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33   AND
 s.from_id = u.id) THEN 'hasSentRequest'
   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
 AND r.from_id = 33)   THEN 'hasReceivedRequest'
   ELSE 'none'
END AS friendStatus,
(SELECT COUNT(1)
   FROM friends f1
  JOIN friends f2 ON f1.friend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
 FROM users u
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 33 is just the id of the active user I am using for testing. The WHERE
 clause could be anything. I'm just using u.name here but I'm more
 concerned about the construction of the result set than the WHERE clause.
 These have more or less similar query plans, nothing that would change
 things factorially. Is this the best I can do or am I missing the obvious?

 Here are the tables:


 CREATE TABLE users (
   idBIGINT,
   name  VARCHAR,
   imageURL  VARCHAR
   created   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
   phone_natlBIGINT,   /* National Phone Number */
   country_e164  SMALLINT, /* E164 country code */
   email VARCHAR(255),
   PRIMARY KEY (id),
   UNIQUE (email),
   UNIQUE (phone_natl, country_e164)
 );


 CREATE TABLE friends (
   user_id  BIGINT,
   friend_id   BIGINT,
   PRIMARY KEY (user_id, user_id),
   FOREIGN KEY (user_id)REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friends_friend ON friends(friend_id);

 CREATE TABLE friend_requests (
   from_id  BIGINT,
   to_idBIGINT,
   created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (from_id, user_id),
   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (to_id)REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

 Let me know if you guys need anything else.




Re: [GENERAL] Help Optimizing a Summary Query

2014-12-09 Thread Arthur Silva
On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 I'm sorry, I missed a JOIN on the second variation. It is:

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
 http://u.id*
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;


 On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco robert.difa...@gmail.com
 wrote:

 I have users, friends, and friend_requests. I need a query that
 essentially returns a summary containing:

 * user (name, imageURL, bio, ...)
 * Friend status (relative to an active user)
* Is the user a friend of the active user?
* Has the user sent a friend request to the active user?
* Has the user received a friend request from the active user?
 * # of mutualFriends
 * Exclude the active user from the result set.

 So I have mocked this up two ways but both have complicated query plans
 that will be problematic with large data sets. I'm thinking that my lack of
 deep SQL knowledge is making me miss the obvious choice.

 Here's my two query examples:

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
 f.friend_id = u.id)   THEN 'isFriend'
   WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
 AND s.from_id = u.id) THEN 'hasSentRequest'
   WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
 AND r.from_id = 33)   THEN 'hasReceivedRequest'
   ELSE 'none'
END AS friendStatus,
(SELECT COUNT(1)
   FROM friends f1
  JOIN friends f2 ON f1.friend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
 FROM users u
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 SELECT u.id, u.name, u.imageURL, u.bio,
CASE
   WHEN f.friend_id IS NOT NULL THEN 'isFriend'
   WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
   WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
   ELSE 'none'
END AS 'friendStatus',
(SELECT COUNT(1) AS d
   FROM friends f1
  JOIN friends f2 ON f1.fiend_id = f2.friend_id
   WHERE f1.user_id = 33 AND f2.user_id = u.id)
 FROM users u
 LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
 LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
 WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

 33 is just the id of the active user I am using for testing. The WHERE
 clause could be anything. I'm just using u.name here but I'm more
 concerned about the construction of the result set than the WHERE clause.
 These have more or less similar query plans, nothing that would change
 things factorially. Is this the best I can do or am I missing the obvious?

 Here are the tables:


 CREATE TABLE users (
   idBIGINT,
   name  VARCHAR,
   imageURL  VARCHAR
   created   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
   phone_natlBIGINT,   /* National Phone Number */
   country_e164  SMALLINT, /* E164 country code */
   email VARCHAR(255),
   PRIMARY KEY (id),
   UNIQUE (email),
   UNIQUE (phone_natl, country_e164)
 );


 CREATE TABLE friends (
   user_id  BIGINT,
   friend_id   BIGINT,
   PRIMARY KEY (user_id, user_id),
   FOREIGN KEY (user_id)REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friends_friend ON friends(friend_id);

 CREATE TABLE friend_requests (
   from_id  BIGINT,
   to_idBIGINT,
   created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (from_id, user_id),
   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
   FOREIGN KEY (to_id)REFERENCES users(id) ON DELETE CASCADE
 );
 CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

 Let me know if you guys need anything else.




Hello Robert, none of your schemas worked for me, here's a clean version

CREATE TABLE users (
  idBIGINT,
  name  VARCHAR,
  imageURL  VARCHAR,
  created   TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  phone_natlBIGINT,
  country_e164  SMALLINT,
  email VARCHAR(255),
  PRIMARY KEY (id),
  UNIQUE (email),
  UNIQUE (phone_natl, country_e164)
);


CREATE TABLE friends (
  user_id  BIGINT,
  friend_id   BIGINT,
  PRIMARY KEY (user_id, friend_id),
  FOREIGN KEY (user_id)REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN 

Re: [GENERAL] Help with PostgreSQL 9.4 to expand jsonb int array into table with row numbers

2014-11-04 Thread hari . fuchs
David G Johnston david.g.johns...@gmail.com writes:

 Neil Tiffin-3 wrote
 Trying to wrap my head around postgresql 9.4 jsonb and would like some
 help figuring out how to do the following.
 
 Given the following example jsonb:
 
 ‘{“name1” : value1, “name2”  : value2, “name3” : [int1, int2, int3]
 }’::jsonb AS table1.column1
  
 Wanted: Return the “name3” array only, as a table with a return signature
 of 
 
 TABLE( var_name varchar, var_value int, var_row_num int)
 
 So the resulting data would look like this:
  
 (‘name3’, int1, 1)
 (‘name3’, int2, 2)
 (‘name3’, int3, 3)
 
 Assume the array could be any length except zero and ‘name3’ is guaranteed
 to exist.
 
 Also posted on stackoverflow:
 
 http://stackoverflow.com/questions/26691725/postgresql-9-4-expand-jsonb-int-array-into-table-with-row-numbers

 Not syntax checked but...

 SELECT 'name3', int_text::integer AS int, int_ord
 FROM ( VALUES (...) ) src (column1)
 LATERAL ROWS FROM(
 json_array_elements(column1-'name3')
 ) WITH ORDINALITY jae (int_text, int_ord)

 Both WITH ORDINALITY and jsonb are introduced in 9.4; it is possible to
 make this work in all supported versions of PostgreSQL through the liberal
 use of CTE (WITH) as possibly the generate_series() function.

I think this can just be written as

SELECT 'name3' AS var_name,
   json_array_elements(column1-'name3') AS var_value,
   row_number() OVER () AS var_row_num
FROM table1



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help related to Postgresql for RHEL 6.5

2014-08-29 Thread David G Johnston
Yogesh. Sharma wrote
 Dear David,
 
 Are you currently using PostgreSQL?
 Currently we are using PostgreSQL 8.1.18 version on RHEL 5.8.
 Now we plan to update this to PostgreSQL 9.0 version with  RHEL6.5. As in
 verion 9.0 I found least Compatibilities.
 
 So, please guide me.
 
 Regards,

Guidance is why we write documentation. if you have specific questions or
concerns after reading the documentation you can ask here.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Help-related-to-Postgresql-for-RHEL-6-5-tp5816742p5816876.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


  1   2   3   4   5   6   7   8   9   >