Re: simple division

2018-12-05 Thread Gavin Flower

On 05/12/2018 20:07, Rob Sargent wrote:



On Dec 4, 2018, at 9:33 PM, Gavin Flower  wrote:


On 05/12/2018 10:51, Rob Sargent wrote:


On 12/4/18 2:36 PM, Martin Mueller wrote:
It worked, and I must have done something wrong. I'm probably not the only 
person who would find something like the following helpful:


division (integer division truncates the result)10/33

The math types might take offense here, with the use of "truncates".  Integer division 
really ask how many times can one subtract the numerator from the denominator without going 
negative (or how many times does the numerator "go into" the denominator).

It may seem a nuisance, but int division is a useful construct and must be 
supported (and be the default). (If you have 10 people to transport in cars 
which hold four (all can drive) 10/4 = 3 ;) )





Hmm...

10 / 4 = 2


And two are left stranded!
  The point is that integer math has its place. You cant have 2.5 cars. So 10/4 
in this context is 3.
More correctly the calculation is
10/4 + 10%4>0 ? 1 :0 = 3

(Maybe psql does have % so mod(10,4))



SELECT ceil(10/4.0);

Is what you want for that example.





Re: Geographic coordinate values format conversion to DD (Decimal Degrees) format

2018-12-05 Thread Allan Kamau
Thank you very much Paul. Your suggestions and input have spared me many
hours of trying to identify applications and functions to perform this
transformation.
I am now installing PostGIS.

Allan.


On Wed, Dec 5, 2018 at 1:25 AM Paul Ramsey 
wrote:

>
> On Dec 4, 2018, at 12:36 PM, Allan Kamau  wrote:
>
> Does PostgreSQL (more specifically PostGIS) have functions for these types
> of conversions.
>
> Below are examples of the geographic coordinates values I have coupled
> with the resulting decimal degrees values.
> 39.529053 N 107.772406 W=39.5290530°, -107.7724060°
> 27.485973 S 153.190699 E=-27.4859730°, 153.1906990°
> 30°32’39” N, 91°07’36” E=30.5441667°, 091.127°
> 27.485973 S 153.190699 E=-27.4859730°, 153.1906990°
> 1¡20'1N 103¡45'15E=01.3336111°, 103.7541667°
>
> SELECT st_astext(txt2geometry('S 20 10.8035165 W 176 36.074496'));
>
> =-20.1800586°, -176.6012416°
>
> The "°" sign in the results is optional.
>
>
> Nope, you’ve got a big ugly pattern matching problem there, unfortunately,
> and probably are going to have to regex your way out of the bag. PostGIS
> will help you output forms like that, but it doesn’t have any general
> handling of arbitrary DMS strings.
>
> http://postgis.net/docs/manual-2.5/ST_AsLatLonText.html
>
> Here’s a PLPGSQL example that does half of your cases.
>
> CREATE OR REPLACE FUNCTION txt2geometry(textcoord text)
> RETURNS geometry AS
> $$
> DECLARE
> textarr text[];
> sep text;
> lon float8;
> lat float8;
> BEGIN
> textarr := regexp_matches(textcoord, '(\d+)(\D?)(\d{2})\D?([\d\.]+)\D?
> ([NS]),? (\d+)\D?(\d{2})\D?(\d+)\D? ([EW])');
> sep := textarr[2];
> RAISE NOTICE '%', textarr;
> -- DD.DD
> IF sep = '.' THEN
> lat := int4(textarr[1]) + int4(textarr[3]) / 100.0 + float8(textarr[4]) /
> pow(10, length(textarr[4])) / 100;
> lon := int4(textarr[6]) + int4(textarr[7]) / 100.0 + float8(textarr[8]) /
> pow(10, length(textarr[8])) / 100;
> -- DD.MM'SS"
> ELSE
> lat := int4(textarr[1]) + int4(textarr[3]) / 60.0 + float8(textarr[4]) /
> pow(10, length(textarr[4])) / 36;
> lon := int4(textarr[6]) + int4(textarr[7]) / 60.0 + float8(textarr[8]) /
> pow(10, length(textarr[8])) / 36;
> END IF;
> IF textarr[5] = 'S' THEN
> lat := -1 * lat;
> END IF;
> IF textarr[9] = 'W' THEN
> lon := -1 * lon;
> END IF;
> RETURN ST_SetSRID(ST_MakePoint(lon, lat), 4326);
> END;
> $$
> LANGUAGE 'plpgsql' IMMUTABLE
> COST 100;
>
>
>
>
>


Re: simple division

2018-12-05 Thread Geoff Winkless
On Wed, 5 Dec 2018 at 09:13, Gavin Flower  wrote:
> SELECT ceil(10/4.0);
>
> Is what you want for that example.

Except that implies that "number of people who can fit in a car" is a
real number, not a whole.

IMO it's fundamentally broken that SQL doesn't cast the result of a
divide into a numeric value - the potential for unexpected errors
creeping into calculations is huge; however that's the standard and
no-one's going to change it now.

Having said that it's worth noting that those in the Other Place think
that it's broken enough to go against the standard (they have a DIV b
for integer divide and a/b for float).

Geoff



Re: surprising query optimisation

2018-12-05 Thread Chris Withers

On 30/11/2018 15:33, Stephen Frost wrote:

Greetings,

* Chris Withers (ch...@withers.org) wrote:

On 28/11/2018 22:49, Stephen Frost wrote:

* Chris Withers (ch...@withers.org) wrote:

We have an app that deals with a lot of queries, and we've been slowly
seeing performance issues emerge. We take a lot of free form queries from
users and stumbled upon a very surprising optimisation.

So, we have a 'state' column which is a 3 character string column with an
index on it. Despite being a string, this column is only used to store one
of three values: 'NEW', 'ACK', or 'RSV'.


Sounds like a horrible field to have an index on.


That's counter-intuitive for me. What leads you to say this and what would
you do/recommend instead?


For this, specifically, it's because you end up with exactly what you
have: a large index with tons of duplicate values.  Indexes are
particularly good when you have high-cardinality fields.  Now, if you
have a skewed index, where there's one popular value and a few much less
popular ones, then that's where you really want a partial index (as I
suggest earlier) so that queries against the non-popular value(s) is
able to use the index and the index is much smaller.


Interesting! In my head, for some reason, I'd always assumed a btree 
index would break down a char field based on the characters within it. 
Does that never happen?


If I changed this to be an enum field, would != still perform poorly or 
can the query optimiser spot that it's an enum and just look for the 
other options?


cheers,

Chris



Re: surprising query optimisation

2018-12-05 Thread Chris Withers

On 30/11/2018 22:10, Gavin Flower wrote:


I once optimised a very complex set queries that made extensive use of 
indexes.  However, with the knowledge I have today, I would have most 
likely had fewer and smaller indexes.  As I now realize, that some of my 
indexes were probably counter productive, especially as I'd given no 
thought to how much RAM would be required to host the data plus 
indexes!  Fortunately, while the objective was to run all those queries 
within 24 hours, they actually only took a couple of hours.


So, interestingly, this box has 250GB memory in it, and even though I've 
set effective_cache_size to 200GB, I only see 9G of memory being used. 
How can I persuade postgres to keep more in memory?


cheers,

Chris



Re: surprising query optimisation

2018-12-05 Thread Thomas Kellerer
Stephen Frost schrieb am 30.11.2018 um 14:05:
> PG doesn’t know, with complete certainty, that there’s only 3
> values.
 
Would the optimizer consult a check constraint ensuring that?




Re: surprising query optimisation

2018-12-05 Thread Thomas Kellerer
Chris Withers schrieb am 05.12.2018 um 12:42:
> So, interestingly, this box has 250GB memory in it, and even though
> I've set effective_cache_size to 200GB, I only see 9G of memory being
> used. How can I persuade postgres to keep more in memory?
effective_cache_size is a hint to the optimizer on how much data is to be 
expected to be cached (by the filesystem).

Only shared_buffers (and work_mem) will actually allocate memory from the 
operating system.

Thomas



Re: simple division

2018-12-05 Thread Gavin Flower

On 06/12/2018 00:05, Geoff Winkless wrote:

On Wed, 5 Dec 2018 at 09:13, Gavin Flower  wrote:

SELECT ceil(10/4.0);

Is what you want for that example.

Except that implies that "number of people who can fit in a car" is a
real number, not a whole.

IMO it's fundamentally broken that SQL doesn't cast the result of a
divide into a numeric value - the potential for unexpected errors
creeping into calculations is huge; however that's the standard and
no-one's going to change it now.

Having said that it's worth noting that those in the Other Place think
that it's broken enough to go against the standard (they have a DIV b
for integer divide and a/b for float).

Geoff

If you divide one integer by another, then it is logical to get an 
integer as as the answer.






Re: simple division

2018-12-05 Thread Geoff Winkless
On Wed, 5 Dec 2018 at 12:45, Gavin Flower  wrote:
> If you divide one integer by another, then it is logical to get an
> integer as as the answer.

Hmm. It might fit with what a computer scientist might expect (or
rather, not be surprised about), but I don't think you can say that
it's "logical".

Where's the logical progression in step 3 here:

1 You asked the computer a question

2 The values you passed to it don't have decimal points

...

4 Ergo, you wanted an answer that was incorrect.

Geoff



Re: simple division

2018-12-05 Thread Adrian Klaver

On 12/5/18 4:45 AM, Gavin Flower wrote:

On 06/12/2018 00:05, Geoff Winkless wrote:
On Wed, 5 Dec 2018 at 09:13, Gavin Flower 
 wrote:

SELECT ceil(10/4.0);



Geoff

If you divide one integer by another, then it is logical to get an 
integer as as the answer.


Hmm, grab any of my calculators and divide 10/4 and get 2.5. Seems not 
everybody agrees with that logic:)



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: simple division

2018-12-05 Thread Raymond Brinzer
On Wed, Dec 5, 2018 at 7:55 AM Geoff Winkless  wrote:

> Where's the logical progression in step 3 here:
>
> 1 You asked the computer a question
>
> 2 The values you passed to it don't have decimal points
>
> ...
>
> 4 Ergo, you wanted an answer that was incorrect.
>

Well put.  However the nature of the question you asked does not
necessarily allow for a correct finite answer.  If I ask for 10/3, for
instance, or 1/0.  To play devil's advocate, then:  you should have known
that some of the answers would need to be truncated.  This just truncates a
little more aggressively.  ;-)

-- 
Ray Brinzer


Re: querying both text and non-text properties

2018-12-05 Thread Rob Nikander



> On Dec 4, 2018, at 4:59 PM, Laurenz Albe  wrote:
> 
> You have two options:
> 
> A combined index:
> 
>  CREATE EXTENSION btree_gin;
>  CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc), color);
> 
> That is the perfect match for a query with
> 
>  WHERE color = 'red' AND to_tsvector('german', doc) @@ to_tsquery('english', 
> 'word');
> 
> But you can also create two indexes:
> 
>  CREATE INDEX ON fulltext USING gin (to_tsvector('english', doc));
>  CREATE INDEX ON fulltext (color);
> 
> Then you don't need the extension, and PostgreSQL can still use them for the 
> search,
> either only one of them if the condition is selective enough, or a 
> "BitmapAnd" of both.

Thanks! I will try both these methods and compare the performance.

Rob


Database corruption in cascaded replica, "pg_xact/003A" doesn't exist, reading as zeroes"

2018-12-05 Thread Dejan Petrovic
I believe this is a result of my "broken" procedure for setting up a 
cascaded replica. I would love to know where the issue is.


This is a report of a database corruption which was detected after 
promoting a replica server to a master server. We suspect the actual 
corruption occurred during replication of a two-level cascaded replica, 
however it was not detected until after it was promoted to master server 
when pg_dump and pg_restore failed. After re-checking postgresql logs on 
the corrupted node, we found this log line, which was the only indicator 
of a possible issue, but was overlooked: "2018-07-25 13:14:42 UTCLOG: 
file "pg_xact/003A" doesn't exist, reading as zeroes".


After extensive investigation, we detected three different cases of data 
corruption:
1.)Issue with pg_toast - happens during pg_dump or when row with 
corrupted data is selected

"pg_dump: Dumping the contents of table failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: missing chunk number 2 for 
toast value 86123528 in pg_toast_41291242"


2.) Issue with uncommitted transactions, occurs when running VACUUM or 
pg_dump/pg_restore:
"ERROR: uncommitted xmin 60817551 from before xid cutoff 61487222 needs 
to be frozen

SQL state: XX001"

3.) 2 duplicated rows in different tables, violated PK unique 
constraint, occurs during pg_restore


This is how we fixed the three issues:
1.) Using bisection and SELECT, we found the row/column with corrupted 
data. We nulled the corrupt column and deleted the row (it was not 
critical, had no FKs)
2.) We ran VACUUM on all tables to check where this issue occured and 
updated all rows between reported xmin and xid cutoff with same data - 
this generated a new xmin on the broken rows, which fixed the issue.
3.) We removed the duplicated rows in a transaction, disabled all 
triggers before and enabled them right after
4.) Our final step is to do a full dump and restore on master DB so that 
the DB is rebuilt


This is the procedure we used to replicate the cascaded replica (master 
-> replica -> replica)


Notes:
Machines are running on Centos 7, Postgresql 10.2
DB-1 = master
DB-2 = replica of DB-1
DB-3 = replica of DB-2

--> DB-3 (new cascaded replica)
systemctl disable postgresql-10
systemctl stop postgresql-10
--> DB-1 (master)
psql -U postgres
select pg_start_backup('clone',true);
<- DB-3 (cascaded replica)
rsync -azv -e --delete DB-2:/isl/pgsql10/data /isl/pgsql10/ --exclude 
pg_wal --exclude postgresql.pid

assert rsync finished
vi recovery.conf
    standby_mode = 'on'
    primary_conninfo = 'host=DB-2'
    recovery_target_timeline = 'latest'
-> DB-1
select pg_stop_backup();
<- DB-3
rsync -azv -e --delete DB-2:/isl/pgsql10/data/pg_wal /isl/pgsql10/data/
systemctl start postgresql-10
systemctl enable postgresql-10
tail log file

Any comments regarding the cascaded replication procedure or database 
corruption detection or resolution are welcome.


Best regards,

Dejan Petrovic
ISL Online




Re: surprising query optimisation

2018-12-05 Thread Stephen Frost
Greetings,

* Thomas Kellerer (spam_ea...@gmx.net) wrote:
> Stephen Frost schrieb am 30.11.2018 um 14:05:
> > PG doesn’t know, with complete certainty, that there’s only 3
> > values.
>  
> Would the optimizer consult a check constraint ensuring that?

Not today, I don't believe (haven't looked at the code though, to be
fair), and seems like it'd be an awful lot of work for a rare
use-case that would be better with just a partial index..

What we will do today is if you ask for a specific value and there's a
CHECK constraint which lists out specific values and that value isn't
among the set, then we'll just skip over the table (One-time filter:
false), thanks to constraint exclusion.

Thanks!

Stephen


signature.asc
Description: PGP signature


debugging intermittent slow updates under higher load

2018-12-05 Thread Chris Withers

Hi All,

This is on postgres 9.4.16, same table as the last question I asked, 
here's an abbreviated desc:


# \d alerts_alert
  Table "public.alerts_alert"
 Column  |   Type   | Modifiers
-+--+---
 tags    | jsonb    | not null
 id  | character varying(86)    | not null
...
Indexes:
    "alerts_alert_pkey" PRIMARY KEY, btree (id)

The table has around 1.5M rows which have been updated/inserted around 
121M times, the distribution of updates to row in alerts_alert will be 
quite uneven, from 1 insert up to 1 insert and 0.5M updates.


Under high load (200-300 inserts/updates per second) we see occasional 
(~10 per hour) updates taking excessively long times (2-10s). These 
updates are always of the form:


UPDATE "alerts_alert" SET ...bunch of fields... WHERE 
"alerts_alert"."id" = '...sha1 hash...';


Here's a sample explain:

https://explain.depesz.com/s/Fjq8

What could be causing this? What could we do to debug? What config 
changes could we make to alleviate this?


cheers,

Chris



Re: surprising query optimisation

2018-12-05 Thread Stephen Frost
Greetings,

* Chris Withers (ch...@withers.org) wrote:
> On 30/11/2018 15:33, Stephen Frost wrote:
> >* Chris Withers (ch...@withers.org) wrote:
> >>On 28/11/2018 22:49, Stephen Frost wrote:
> >For this, specifically, it's because you end up with exactly what you
> >have: a large index with tons of duplicate values.  Indexes are
> >particularly good when you have high-cardinality fields.  Now, if you
> >have a skewed index, where there's one popular value and a few much less
> >popular ones, then that's where you really want a partial index (as I
> >suggest earlier) so that queries against the non-popular value(s) is
> >able to use the index and the index is much smaller.
> 
> Interesting! In my head, for some reason, I'd always assumed a btree index
> would break down a char field based on the characters within it. Does that
> never happen?

Not sure what you mean by 'break down a char field'.

> If I changed this to be an enum field, would != still perform poorly or can
> the query optimiser spot that it's an enum and just look for the other
> options?

I don't believe we've got any kind of optimization like that today for
enums.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: surprising query optimisation

2018-12-05 Thread Chris Withers

On 05/12/2018 14:38, Stephen Frost wrote:

Greetings,

* Chris Withers (ch...@withers.org) wrote:

On 30/11/2018 15:33, Stephen Frost wrote:

* Chris Withers (ch...@withers.org) wrote:

On 28/11/2018 22:49, Stephen Frost wrote:

For this, specifically, it's because you end up with exactly what you
have: a large index with tons of duplicate values.  Indexes are
particularly good when you have high-cardinality fields.  Now, if you
have a skewed index, where there's one popular value and a few much less
popular ones, then that's where you really want a partial index (as I
suggest earlier) so that queries against the non-popular value(s) is
able to use the index and the index is much smaller.


Interesting! In my head, for some reason, I'd always assumed a btree index
would break down a char field based on the characters within it. Does that
never happen?


Not sure what you mean by 'break down a char field'.


Rather than breaking into three buckets ('NEW', 'ACK', 'RSV'), a more 
complicated hierarchy ('N', 'NE', 'A', 'AC', etc).



If I changed this to be an enum field, would != still perform poorly or can
the query optimiser spot that it's an enum and just look for the other
options?


I don't believe we've got any kind of optimization like that today for
enums.


Good to know, I see query optimisers as magic, and postgres often seems 
to achieve magic results ;-)


Chris



Re: surprising query optimisation

2018-12-05 Thread Ron

On 12/05/2018 08:42 AM, Chris Withers wrote:

On 05/12/2018 14:38, Stephen Frost wrote:

Greetings,

* Chris Withers (ch...@withers.org) wrote:

On 30/11/2018 15:33, Stephen Frost wrote:

* Chris Withers (ch...@withers.org) wrote:

On 28/11/2018 22:49, Stephen Frost wrote:

For this, specifically, it's because you end up with exactly what you
have: a large index with tons of duplicate values.  Indexes are
particularly good when you have high-cardinality fields. Now, if you
have a skewed index, where there's one popular value and a few much less
popular ones, then that's where you really want a partial index (as I
suggest earlier) so that queries against the non-popular value(s) is
able to use the index and the index is much smaller.


Interesting! In my head, for some reason, I'd always assumed a btree index
would break down a char field based on the characters within it. Does that
never happen?


Not sure what you mean by 'break down a char field'.


Rather than breaking into three buckets ('NEW', 'ACK', 'RSV'), a more 
complicated hierarchy ('N', 'NE', 'A', 'AC', etc).


The b-tree indexes on legacy RDBMS which I still occasionally fiddle with 
performs key prefix compression in a manner similar to what you refer to, 
but otherwise that's not how b-trees work.


--
Angular momentum makes the world go 'round.



Re: Database corruption in cascaded replica, "pg_xact/003A" doesn't exist, reading as zeroes"

2018-12-05 Thread Stephen Frost
Greetings,

* Dejan Petrovic (dejan.petro...@islonline.com) wrote:
> I believe this is a result of my "broken" procedure for setting up a
> cascaded replica. I would love to know where the issue is.

[...]

> Notes:
> Machines are running on Centos 7, Postgresql 10.2
> DB-1 = master
> DB-2 = replica of DB-1
> DB-3 = replica of DB-2
> 
> --> DB-3 (new cascaded replica)
> systemctl disable postgresql-10
> systemctl stop postgresql-10
> --> DB-1 (master)
> psql -U postgres
> select pg_start_backup('clone',true);
> <- DB-3 (cascaded replica)
> rsync -azv -e --delete DB-2:/isl/pgsql10/data /isl/pgsql10/ --exclude pg_wal
> --exclude postgresql.pid
> assert rsync finished
> vi recovery.conf
>     standby_mode = 'on'
>     primary_conninfo = 'host=DB-2'
>     recovery_target_timeline = 'latest'
> -> DB-1
> select pg_stop_backup();
> <- DB-3
> rsync -azv -e --delete DB-2:/isl/pgsql10/data/pg_wal /isl/pgsql10/data/
> systemctl start postgresql-10
> systemctl enable postgresql-10

If I'm reading this correctly, you're doing a pg_start_backup() on the
primary, but then making a copy of the DB-2 replica (while the primary
and the DB-2 replica are still running..).

In short, no, that doesn't work.  Never has.  That someone thought they
could create a replica that way is just another good reason to rip out
the broken exclusive-backup mode.  I hope this isn't how your regular
backups are done.

I'd strongly suggest that you use existing tools to do this- in
partciular, you should be able to use pg_basebackup against the DB-2
replica to build out your cascaded replica and that'll do all the right
things.  Alternatively, using a backup solution like pgbackrest would
allow you to quickly build a replica from your backups instead of from a
running replica.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: simple division

2018-12-05 Thread Tom Lane
Geoff Winkless  writes:
> IMO it's fundamentally broken that SQL doesn't cast the result of a
> divide into a numeric value - the potential for unexpected errors
> creeping into calculations is huge; however that's the standard and
> no-one's going to change it now.
> Having said that it's worth noting that those in the Other Place think
> that it's broken enough to go against the standard (they have a DIV b
> for integer divide and a/b for float).

Well, this isn't really blame-able on the SQL standard; it's a
Postgres-ism.  What the spec says (in SQL99, 6.26 ) is

 1) If the declared type of both operands of a dyadic arithmetic
operator is exact numeric, then the declared type of the
result is exact numeric, with precision and scale determined
as follows:

a) Let S1 and S2 be the scale of the first and second operands
  respectively.

b) The precision of the result of addition and subtraction is
  implementation-defined, and the scale is the maximum of S1
  and S2.

c) The precision of the result of multiplication is
  implementation-defined, and the scale is S1 + S2.

d) The precision and scale of the result of division is
  implementation-defined.

 2) If the declared type of either operand of a dyadic arithmetic
operator is approximate numeric, then the declared type of the
result is approximate numeric. The precision of the result is
implementation-defined.

Postgres' integer types map onto the standard as exact numerics with
scale 0.  (The precision aspect is a bit squishy, since their maximum
values aren't powers of 10, but let's disregard that.)  Postgres'
integer division operator meets the spec with the stipulation that
the "implementation-defined" scale of the result is 0.  Other SQL
implementations can and do define that differently --- if they even
have an "integer" data type, which some do not.

Anyway, the bottom line here is that we're balancing surprise factor
for novices against twenty-plus years of backwards compatibility,
and the latter is going to win.

regards, tom lane



Re: Database corruption in cascaded replica, "pg_xact/003A" doesn't exist, reading as zeroes"

2018-12-05 Thread Achilleas Mantzios

On 5/12/18 4:10 μ.μ., Dejan Petrovic wrote:

I believe this is a result of my "broken" procedure for setting up a cascaded 
replica. I would love to know where the issue is.

This is a report of a database corruption which was detected after promoting a replica server to a master server. We suspect the actual corruption occurred during replication of a two-level cascaded 
replica, however it was not detected until after it was promoted to master server when pg_dump and pg_restore failed. After re-checking postgresql logs on the corrupted node, we found this log line, 
which was the only indicator of a possible issue, but was overlooked: "2018-07-25 13:14:42 UTCLOG: file "pg_xact/003A" doesn't exist, reading as zeroes".


After extensive investigation, we detected three different cases of data 
corruption:
1.)Issue with pg_toast - happens during pg_dump or when row with corrupted data 
is selected
"pg_dump: Dumping the contents of table failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: missing chunk number 2 for toast value 
86123528 in pg_toast_41291242"

2.) Issue with uncommitted transactions, occurs when running VACUUM or 
pg_dump/pg_restore:
"ERROR: uncommitted xmin 60817551 from before xid cutoff 61487222 needs to be 
frozen
SQL state: XX001"

3.) 2 duplicated rows in different tables, violated PK unique constraint, 
occurs during pg_restore

This is how we fixed the three issues:
1.) Using bisection and SELECT, we found the row/column with corrupted data. We 
nulled the corrupt column and deleted the row (it was not critical, had no FKs)
2.) We ran VACUUM on all tables to check where this issue occured and updated all rows between reported xmin and xid cutoff with same data - this generated a new xmin on the broken rows, which fixed 
the issue.

3.) We removed the duplicated rows in a transaction, disabled all triggers 
before and enabled them right after
4.) Our final step is to do a full dump and restore on master DB so that the DB 
is rebuilt

This is the procedure we used to replicate the cascaded replica (master -> replica 
-> replica)

Notes:
Machines are running on Centos 7, Postgresql 10.2
DB-1 = master
DB-2 = replica of DB-1
DB-3 = replica of DB-2

--> DB-3 (new cascaded replica)
systemctl disable postgresql-10
systemctl stop postgresql-10
--> DB-1 (master)
psql -U postgres
select pg_start_backup('clone',true);
<- DB-3 (cascaded replica)
rsync -azv -e --delete DB-2:/isl/pgsql10/data /isl/pgsql10/ --exclude pg_wal 
--exclude postgresql.pid
assert rsync finished
vi recovery.conf
    standby_mode = 'on'
    primary_conninfo = 'host=DB-2'
    recovery_target_timeline = 'latest'


In addition to what Stephen wrote you, from : 
https://www.postgresql.org/docs/10/continuous-archiving.html
the low level backup method is relying on :
"1. Ensure that WAL archiving is enabled and working."
Have you setup this correctly?

Also there should at least be a restore_command inside your recovery.conf . Otherwise how do you know that DB-2 has the WALs you need? Whereas with your WALs archived it is guaranteed that the backup 
is consistent.


But anyway, its better to either use pg_basebackup or invest in some higher 
level tool like Stephen said.


-> DB-1
select pg_stop_backup();
<- DB-3
rsync -azv -e --delete DB-2:/isl/pgsql10/data/pg_wal /isl/pgsql10/data/
systemctl start postgresql-10
systemctl enable postgresql-10
tail log file

Any comments regarding the cascaded replication procedure or database 
corruption detection or resolution are welcome.

Best regards,

Dejan Petrovic
ISL Online





--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt




Re: surprising query optimisation

2018-12-05 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> On 12/05/2018 08:42 AM, Chris Withers wrote:
> >On 05/12/2018 14:38, Stephen Frost wrote:
> * Chris Withers (ch...@withers.org) wrote:
> >>>Interesting! In my head, for some reason, I'd always assumed a btree index
> >>>would break down a char field based on the characters within it. Does that
> >>>never happen?
> >>
> >>Not sure what you mean by 'break down a char field'.
> >
> >Rather than breaking into three buckets ('NEW', 'ACK', 'RSV'), a more
> >complicated hierarchy ('N', 'NE', 'A', 'AC', etc).
> 
> The b-tree indexes on legacy RDBMS which I still occasionally fiddle with
> performs key prefix compression in a manner similar to what you refer to,
> but otherwise that's not how b-trees work.

There's been some discussion of prefix compression in PostgreSQL.  Even
with that, though, it hardly seems sensible to have an index which has
tons of duplicates comprising most of the index, and a != would still
have to search the index to make sure there aren't any entries which
need to be returned..

Now, maybe once we get skipping scans where we would be able to skip
over a large chunk of the index because it's just tons of duplicates
without having to visit everything along the way, then maybe having this
inefficient index would "just" take up disk space, but why waste that
space?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: debugging intermittent slow updates under higher load

2018-12-05 Thread Alexey Bashtanov





The table has around 1.5M rows which have been updated/inserted around 
121M times, the distribution of updates to row in alerts_alert will be 
quite uneven, from 1 insert up to 1 insert and 0.5M updates.


Under high load (200-300 inserts/updates per second) we see occasional 
(~10 per hour) updates taking excessively long times (2-10s). These 
updates are always of the form:


UPDATE "alerts_alert" SET ...bunch of fields... WHERE 
"alerts_alert"."id" = '...sha1 hash...';


Here's a sample explain:

https://explain.depesz.com/s/Fjq8

What could be causing this? What could we do to debug? What config 
changes could we make to alleviate this?




Hello Chris,

One of the reasons could be the row already locked by another backend, 
doing the same kind of an update or something different.

Are these updates performed in a longer transactions?
Can they hit the same row from two clients at the same time?
Is there any other write or select-for-update/share load on the table?

Have you tried periodical logging of the non-granted locks?
Try querying pg_stat_activity and pg_locks (possibly joined and maybe 
repeatedly self-joined, google for it)
to get the backends that wait one for another while competing for to 
lock the same row or object.


Best,
 Alex



Re: debugging intermittent slow updates under higher load

2018-12-05 Thread Rene Romero Benavides
Also read about hot updates and the storage parameter named "fill_factor",
so, data blocks can be recycled instead of creating new ones if the updated
fields don't update also indexes.

Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov
:

>
> >
> > The table has around 1.5M rows which have been updated/inserted around
> > 121M times, the distribution of updates to row in alerts_alert will be
> > quite uneven, from 1 insert up to 1 insert and 0.5M updates.
> >
> > Under high load (200-300 inserts/updates per second) we see occasional
> > (~10 per hour) updates taking excessively long times (2-10s). These
> > updates are always of the form:
> >
> > UPDATE "alerts_alert" SET ...bunch of fields... WHERE
> > "alerts_alert"."id" = '...sha1 hash...';
> >
> > Here's a sample explain:
> >
> > https://explain.depesz.com/s/Fjq8
> >
> > What could be causing this? What could we do to debug? What config
> > changes could we make to alleviate this?
> >
>
> Hello Chris,
>
> One of the reasons could be the row already locked by another backend,
> doing the same kind of an update or something different.
> Are these updates performed in a longer transactions?
> Can they hit the same row from two clients at the same time?
> Is there any other write or select-for-update/share load on the table?
>
> Have you tried periodical logging of the non-granted locks?
> Try querying pg_stat_activity and pg_locks (possibly joined and maybe
> repeatedly self-joined, google for it)
> to get the backends that wait one for another while competing for to
> lock the same row or object.
>
> Best,
>   Alex
>
>

-- 
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


Re: debugging intermittent slow updates under higher load

2018-12-05 Thread Rene Romero Benavides
This parameter can be updated on a "per table" basis.

Am Mi., 5. Dez. 2018 um 09:47 Uhr schrieb Rene Romero Benavides <
rene.romer...@gmail.com>:

> Also read about hot updates and the storage parameter named "fill_factor",
> so, data blocks can be recycled instead of creating new ones if the updated
> fields don't update also indexes.
>
> Am Mi., 5. Dez. 2018 um 09:39 Uhr schrieb Alexey Bashtanov
> :
>
>>
>> >
>> > The table has around 1.5M rows which have been updated/inserted around
>> > 121M times, the distribution of updates to row in alerts_alert will be
>> > quite uneven, from 1 insert up to 1 insert and 0.5M updates.
>> >
>> > Under high load (200-300 inserts/updates per second) we see occasional
>> > (~10 per hour) updates taking excessively long times (2-10s). These
>> > updates are always of the form:
>> >
>> > UPDATE "alerts_alert" SET ...bunch of fields... WHERE
>> > "alerts_alert"."id" = '...sha1 hash...';
>> >
>> > Here's a sample explain:
>> >
>> > https://explain.depesz.com/s/Fjq8
>> >
>> > What could be causing this? What could we do to debug? What config
>> > changes could we make to alleviate this?
>> >
>>
>> Hello Chris,
>>
>> One of the reasons could be the row already locked by another backend,
>> doing the same kind of an update or something different.
>> Are these updates performed in a longer transactions?
>> Can they hit the same row from two clients at the same time?
>> Is there any other write or select-for-update/share load on the table?
>>
>> Have you tried periodical logging of the non-granted locks?
>> Try querying pg_stat_activity and pg_locks (possibly joined and maybe
>> repeatedly self-joined, google for it)
>> to get the backends that wait one for another while competing for to
>> lock the same row or object.
>>
>> Best,
>>   Alex
>>
>>
>
> --
> El genio es 1% inspiración y 99% transpiración.
> Thomas Alva Edison
> http://pglearn.blogspot.mx/
>
>

-- 
El genio es 1% inspiración y 99% transpiración.
Thomas Alva Edison
http://pglearn.blogspot.mx/


Re: simple division

2018-12-05 Thread Martin Mueller
I take the point that two decades of backward compatibility should and will 
win. That said,  it's an easy enough thing to right the balance for novices and 
put in a really obvious place in the documentation what you should do if you 
want to divide two integers and get the results with the number of decimals of 
your choice. I made one suggestion how this could be done. A better way might 
be a short paragraph like

A note on division:  if you divide two constants or variables defined as 
integers, the default will be an integer. If you want the result with decimals, 
add "::numeric".  If you want to limit the decimals, use the round() function:
Select 10/3:3
Select 10/3::numeric3.3
Round(select 10/3::numeric, 3)  3.333
For more detail see the sections on ...





`
On 12/5/18, 9:23 AM, "Tom Lane"  wrote:

Geoff Winkless  writes:
> IMO it's fundamentally broken that SQL doesn't cast the result of a
> divide into a numeric value - the potential for unexpected errors
> creeping into calculations is huge; however that's the standard and
> no-one's going to change it now.
> Having said that it's worth noting that those in the Other Place think
> that it's broken enough to go against the standard (they have a DIV b
> for integer divide and a/b for float).

Well, this isn't really blame-able on the SQL standard; it's a
Postgres-ism.  What the spec says (in SQL99, 6.26 ) is

 1) If the declared type of both operands of a dyadic arithmetic
operator is exact numeric, then the declared type of the
result is exact numeric, with precision and scale determined
as follows:

a) Let S1 and S2 be the scale of the first and second operands
  respectively.

b) The precision of the result of addition and subtraction is
  implementation-defined, and the scale is the maximum of S1
  and S2.

c) The precision of the result of multiplication is
  implementation-defined, and the scale is S1 + S2.

d) The precision and scale of the result of division is
  implementation-defined.

 2) If the declared type of either operand of a dyadic arithmetic
operator is approximate numeric, then the declared type of the
result is approximate numeric. The precision of the result is
implementation-defined.

Postgres' integer types map onto the standard as exact numerics with
scale 0.  (The precision aspect is a bit squishy, since their maximum
values aren't powers of 10, but let's disregard that.)  Postgres'
integer division operator meets the spec with the stipulation that
the "implementation-defined" scale of the result is 0.  Other SQL
implementations can and do define that differently --- if they even
have an "integer" data type, which some do not.

Anyway, the bottom line here is that we're balancing surprise factor
for novices against twenty-plus years of backwards compatibility,
and the latter is going to win.

regards, tom lane





Re: simple division

2018-12-05 Thread Gavin Flower

On 06/12/2018 02:32, Adrian Klaver wrote:

On 12/5/18 4:45 AM, Gavin Flower wrote:

On 06/12/2018 00:05, Geoff Winkless wrote:
On Wed, 5 Dec 2018 at 09:13, Gavin Flower 
 wrote:

SELECT ceil(10/4.0);



Geoff

If you divide one integer by another, then it is logical to get an 
integer as as the answer.


Hmm, grab any of my calculators and divide 10/4 and get 2.5. Seems not 
everybody agrees with that logic:)



Calculators normally work in floating point (in fact, as far as I am 
aware, they never work in integer mode by default) , unless you are 
either doing symbolic maths or numbers that are integer based like 
hexadecimal.


So your example does not contrdict what I said.





Re: simple division

2018-12-05 Thread Alvaro Herrera
On 2018-Dec-06, Gavin Flower wrote:

> Calculators normally work in floating point (in fact, as far as I am aware,
> they never work in integer mode by default),

The reason they don't work in "integer mode" is because it doesn't make
sense.

We only have this thing called "integer division" because it's a cheap
thing to implement in terms of machine instructions.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Limitting full join to one match

2018-12-05 Thread Phil Endecott
Dear Experts,

I have a couple of tables that I want to reconcile, finding rows 
that match and places where rows are missing from one table or the 
other:

db=> select * from a;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |  <-- missing from b
| 2018-04-01 |   5.00 |
+++

db=> select * from b;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-03-01 |   8.00 |  <-- missing from a
| 2018-04-01 |   5.00 |
+++

db=> select a.date, a.amount, b.date, b.amount from a full join b using 
(date,amount);
+++++
|date| amount |date| amount |
+++++
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |||
||| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
+++++

This works fine until I have multiple items with the same date 
and amount:

db=> select * from a;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |
| 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 |  <--
| 2018-05-01 |  20.00 |  <--
+++


db=> select * from b;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 |  <--
| 2018-05-01 |  20.00 |  <--
+++

db=> select a.date, a.amount, b.date, b.amount from a full join b using 
(date,amount);
+++++
|date| amount |date| amount |
+++++
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |||
||| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  3
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  4
+++++

It has, of course, put four rows in the output for the new items.

So my question is: how can I modify my query to output only two rows, 
like this:?

+++++
|date| amount |date| amount |
+++++
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |||
||| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
+++++


Any suggestions anyone?


The best I have found so far is something involving EXCEPT ALL:

db=> select * from a except all select * from b;
db=> select * from b except all select * from a;

That's not ideal, though, as what I ultimately want is something 
that lists everything with its status:

++++
|date| amount | status |
++++
| 2018-01-01 |  10.00 |   OK   |
| 2018-02-01 |   5.00 | a_only |
| 2018-03-01 |   8.00 | b_only |
| 2018-04-01 |   5.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
++++

That would be easy enough to achieve from the JOIN.


Thanks, Phil.





Re: Limitting full join to one match

2018-12-05 Thread John W Higgins
On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott <
spam_from_pgsql_li...@chezphil.org> wrote:

> Dear Experts,
>
> I have a couple of tables that I want to reconcile, finding rows
> that match and places where rows are missing from one table or the
> other:
>
> ...


> So my question is: how can I modify my query to output only two rows,
> like this:?
>
> +++++
> |date| amount |date| amount |
> +++++
> | 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
> | 2018-02-01 |   5.00 |||
> ||| 2018-03-01 |   8.00 |
> | 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
> | 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
> +++++
>
>
Evening Phil,

Window functions are your friend here. I prefer views for this stuff - but
subqueries would work just fine.

create view a_rows as (select *,
   row_number() OVER (PARTITION BY date, amount) AS pos
from a);
create view b_rows as (select *,
   row_number() OVER (PARTITION BY date, amount) AS pos
from b);

select
  a_rows.date,
  a_rows.amount,
  a_rows.pos,
  b_rows.date,
  b_rows.amount,
  b_rows.pos
from
  a_rows full join b_rows using (date,amount,pos);

Example here - http://sqlfiddle.com/#!17/305d6/3

John

>
> Any suggestions anyone?
>
>
> The best I have found so far is something involving EXCEPT ALL:
>
> db=> select * from a except all select * from b;
> db=> select * from b except all select * from a;
>
> That's not ideal, though, as what I ultimately want is something
> that lists everything with its status:
>
> ++++
> |date| amount | status |
> ++++
> | 2018-01-01 |  10.00 |   OK   |
> | 2018-02-01 |   5.00 | a_only |
> | 2018-03-01 |   8.00 | b_only |
> | 2018-04-01 |   5.00 |   OK   |
> | 2018-05-01 |  20.00 |   OK   |
> | 2018-05-01 |  20.00 |   OK   |
> ++++
>
> That would be easy enough to achieve from the JOIN.
>
>
> Thanks, Phil.
>
>
>
>


Re: Limitting full join to one match

2018-12-05 Thread Ron

On 12/05/2018 06:34 PM, Phil Endecott wrote:

Dear Experts,

I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:

db=> select * from a;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |  <-- missing from b
| 2018-04-01 |   5.00 |
+++

db=> select * from b;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-03-01 |   8.00 |  <-- missing from a
| 2018-04-01 |   5.00 |
+++

db=> select a.date, a.amount, b.date, b.amount from a full join b using 
(date,amount);
+++++
|date| amount |date| amount |
+++++
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |||
||| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
+++++

This works fine until I have multiple items with the same date
and amount:

db=> select * from a;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |
| 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 |  <--
| 2018-05-01 |  20.00 |  <--
+++


db=> select * from b;
+++
|date| amount |
+++
| 2018-01-01 |  10.00 |
| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 |  <--
| 2018-05-01 |  20.00 |  <--
+++


What's your PK on "a" and "b"?

(Also, gmail seems to think that all -- or at least most -- of your email is 
spam.)



db=> select a.date, a.amount, b.date, b.amount from a full join b using 
(date,amount);
+++++
|date| amount |date| amount |
+++++
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |||
||| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  3
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  4
+++++

It has, of course, put four rows in the output for the new items.

So my question is: how can I modify my query to output only two rows,
like this:?

+++++
|date| amount |date| amount |
+++++
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |||
||| 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
+++++


Any suggestions anyone?


The best I have found so far is something involving EXCEPT ALL:

db=> select * from a except all select * from b;
db=> select * from b except all select * from a;

That's not ideal, though, as what I ultimately want is something
that lists everything with its status:

++++
|date| amount | status |
++++
| 2018-01-01 |  10.00 |   OK   |
| 2018-02-01 |   5.00 | a_only |
| 2018-03-01 |   8.00 | b_only |
| 2018-04-01 |   5.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
++++

That would be easy enough to achieve from the JOIN.


Thanks, Phil.





--
Angular momentum makes the world go 'round.



order of reading the conf files

2018-12-05 Thread bhargav kamineni
Hi,

may i know the order in which postgres reads the configuration files like
conf , auto.conf , hba  ?
and how does postmaster forks postgres , can we see that forking process in
logfile ?



Thanks,
Banu.