date:20130925

Re: [HACKERS] Minmax indexes

2013-09-25 Thread Erik Rijkers

On Wed, September 25, 2013 00:14, Alvaro Herrera wrote:
 [minmax-4-incr.patch]

After a  --data-checksums initdb (successful), the following error came up:



after the statement: create index t_minmax_idx on t using minmax (r);

WARNING:  page verification failed, calculated checksum 25951 but expected 0
ERROR:  invalid page in block 1 of relation base/21324/26267_vm

it happens reliably. every time I run the program.

Below is the whole program that I used.


Thanks,

Erik Rijkers






#!/bin/sh

t=t

if [[ 1 -eq 1 ]]; then

echo 
drop table if exists $t ;
create table $t
as
select i, cast( random() * 10^9 as integer ) as r
from generate_series(1, 100)  as f(i) ;
analyze $t;
table $t limit 5;
select count(*) from $t;
explain analyze select min(r), max(r) from $t;
select min(r), max(r) from $t;

create index ${t}_minmax_idx on $t using minmax (r);
analyze $t;

explain analyze select min(r), max(r) from $t;
select min(r), max(r) from $t;

 | psql

fi





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump/restore encoding woes

2013-09-25 Thread Amit Khandekar

On 27 August 2013 20:06, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 26.08.2013 18:59, Tom Lane wrote:

 Heikki Linnakangashlinnakangas@**vmware.com hlinnakan...@vmware.com
  writes:

  The pg_dump -E option just sets client_encoding, but I think it would be
 better for -E to only set the encoding used in the dump, and
 PGCLIENTENCODING env variable (if set) was used to determine the
 encoding of the command-line arguments. Opinions?


 I think this is going to be a lot easier said than done, but feel free
 to see if you can make it work.  (As you point out, we don't have
 any client-side encoding conversion infrastructure, but I don't see
 how you're going to make this work without it.)


 First set client_encoding to PGCLIENTENCODING (ie. let libpq do its
 default thing), and run the queries that fetch the OIDs of any matching
 tables. Only after doing that, set client_encoding to that specified by -E.
 That's actually pretty easy to do, as pg_dump already issues all the
 queries that include user-given strings first.

 There's one small wrinkle in that: if the dump fails because the server
 sends an error, the error would come from the server in the -E encoding,
 because that's used as the client_encoding after the initial queries.
 Logically, the errors should be printed in PGCLIENTENCODING. But I think we
 can live with that.

 The pg_restore part is more difficult, as pg_restore needs to work without
 a database connection at all. There the conversion has to be done
 client-side.


  A second issue is whether we should divorce -E and PGCLIENTENCODING like
 that, when they have always meant the same thing.  You mentioned the
 alternative of looking at pg_dump's locale environment to determine the
 command line encoding --- would that be better?


 Hmm. I wasn't thinking of looking at the locale environment as an
 alternative to the divorce, just as a way to determine the default for the
 client encoding if PGCLIENTENCODING is not set.

 It would be good for pg_dump to be consistent with other tools (reindexdb
 etc.). If we say that LC_CTYPE determines the encoding of command-line
 options, regardless of PGCLIENTENCODING, we should do the same in other
 tools, too. And that would be weird for the other tools. So no, I don't
 think we should do that.

 My plan is to assume that the command-line options are in the client
 encoding, and the client encoding is determined by the following things,
 in this order:

 1. client_encoding setting given explicitly in the command line, as part
 of the connection string.
 2. PGCLIENTENCODING
 3. LC_CTYPE
 4. same as server_encoding

 The encoding to be used in the pg_dump output is a different concept, and
 there are reasons to *not* want the dump to be in the client encoding.
 Hence using server_encoding for that would be a better default than the
 current client encoding. This isn't so visible today, as client_encoding
 defaults to server_encoding anyway, but it will be if we set
 client_encoding='auto' (ie. look at LC_CTYPE) by default.

 There are certainly cases where you'd want to use the client encoding as
 the dump output encoding. For example if you pipe the pg_dump output to
 some custom script that mangles it. Or if you just want to look at the
 output, ie. if the output goes to a terminal. But for the usual case of
 taking a backup, using the server encoding makes more sense. One option is
 to check if the output is a tty (like psql does), and output the dump in
 client or server encoding depending on that (if -E is not given).

 So yeah, I think we should divorce -E and PGCLIENTENCODING. It feels that
 using PGCLIENTENCODING to specify the output encoding was more accidental
 than on purpose. Looking at the history, the implementation has been that
 way forever, but the docs were adjusted by commit b524cb36 in 2005 to
 document that fact.

 Here is a set of three patches that I've been working on:

 0001-Divorce-pg_dump-E-option-**from-PGCLIENTENCODING.patch

 Separates pg_dump -E from PGCLIENTENCODING.


Since AH-encoding is now used only to represent dump encoding, we should
rename it as AH-dump_encoding.

-

The standard_conforming_strings parameter is currently set in
setup_connection. You have moved it in setup_dump_encoding(). Is it in any
way related to encoding ? If not, I think we should keep it in
setup_connection().

-

If a user supplies pg_dump --role=ROLENAME, we do the SET-ROLE in
setup_connection. The role name is in client encoding. In case of parallel
pg_dump, the setup_connection() in the parent process does it right, but in
the worker processes, the client_encoding is already set to the
dump_encoding when -E is given (parent has already updated it's
client_encoding to dump_encoding and I think the child inherits
client_encoding from parent), and then in the child when SET-ROLE is called
through setup_connection, it does not work because role name is in client
encoding, not dump encoding.

$ pg_dump -t pöö -E

[HACKERS] invalid regexp crashes the server on windows or 9.3

2013-09-25 Thread Marc Mamin

Hi,

This regexp call freezes with almost 100% CPU usage, so I guess it creates an 
infinite loop.

With  Postgres 9.1 on Linux, I can kill the backend cleanly and get following 
message:

cic_db=# select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx');
ERROR:  invalid regular expression: nfa has too many states

on Windows and Postgres 9.3, I used the terminate icon from pgAdmin,
which resulted in a FATAL error, killed all other processes and finally stopped 
the server.

I guesss this is a windows issue, but we do not have a Postgres 9.3 
installation on Linux yet, so that I can't test it.


here the log entries from widows 9.3:

2013-09-25 09:02:40 CEST LOG:  Serverprozess (PID 5952) wurde durch Ausnahme 
0xC0FD beendet
2013-09-25 09:02:40 CEST DETAIL:  Der fehlgeschlagene Prozess führte aus: 
select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx')
2013-09-25 09:02:40 CEST TIPP:  Sehen Sie die Beschreibung des Hexadezimalwerts 
in der C-Include-Datei ntstatus.h nach.
2013-09-25 09:02:40 CEST LOG:  aktive Serverprozesse werden abgebrochen
2013-09-25 09:02:40 CEST WARNUNG:  breche Verbindung ab wegen Absturz eines 
anderen Serverprozesses
2013-09-25 09:02:40 CEST DETAIL:  Der Postmaster hat diesen Serverprozess 
angewiesen, die aktuelle Transaktion zurÃ¼ckzurollen und die Sitzung zu 
beenden, weil ein anderer Serverprozess abnormal beendet wurde und 
mÃ¶glicherweise das Shared Memory verfÃ¤lscht hat.
2013-09-25 09:02:40 CEST TIPP:  In einem Moment sollten Sie wieder mit der 
Datenbank verbinden und Ihren Befehl wiederholen kÃ¶nnen.
2013-09-25 09:02:40 CEST WARNUNG:  breche Verbindung ab wegen Absturz eines 
anderen Serverprozesses
2013-09-25 09:02:40 CEST DETAIL:  Der Postmaster hat diesen Serverprozess 
angewiesen, die aktuelle Transaktion zurückzurollen und die Sitzung zu beenden, 
weil ein anderer Serverprozess abnormal beendet wurde und möglicherweise das 
Shared Memory verfälscht hat.
2013-09-25 09:02:40 CEST TIPP:  In einem Moment sollten Sie wieder mit der 
Datenbank verbinden und Ihren Befehl wiederholen können.
2013-09-25 09:02:40 CEST LOG:  alle Serverprozesse beendet; initialisiere neu
2013-09-25 09:02:50 CEST FATAL:  bereits bestehender Shared-Memory-Block wird 
noch benutzt
2013-09-25 09:02:50 CEST TIPP:  Prüfen Sie, ob irgendwelche alten 
Serverprozesse noch laufen und beenden Sie diese.

translated:
exception 0xC0FD stopped  PID ,
check ntstatus.h for the description of the hexadecimal value



regards,

Marc Mamin

Re: [HACKERS] Minmax indexes

2013-09-25 Thread Amit Kapila

On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Hi,

 Here's a reviewable version of what I've dubbed Minmax indexes.  Some
 people said they would like to use some other name for this feature, but
 I have yet to hear usable ideas, so for now I will keep calling them
 this way.  I'm open to proposals, but if you pick something that cannot
 be abbreviated mm I might have you prepare a rebased version which
 renames the files and structs.

 The implementation here has been simplified from what I originally
 proposed at 20130614222805.gz5...@eldon.alvh.no-ip.org -- in particular,
 I noticed that there's no need to involve aggregate functions at all; we
 can just use inequality operators.  So the pg_amproc entries are gone;
 only the pg_amop entries are necessary.

 I've somewhat punted on the question of doing resummarization separately
 from vacuuming.  Right now, resummarization (as well as other necessary
 index cleanup) takes place in amvacuumcleanup.  This is not optimal; I
 have stated elsewhere that I'd like to create separate maintenance
 actions that can be carried out by autovacuum.  That would be useful
 both for Minmax indexes and GIN indexes (pending insertion list); maybe
 others.  That's not part of this patch, however.

 The design of this stuff is in the file minmax-proposal at the top of
 the tree.  That file is up to date, though it still contains some open
 questions that were present in the original proposal.  (I have not fixed
 some bogosities pointed out by Noah, for instance.  I will do that
 shortly.)  In a final version, that file would be applied as
 src/backend/access/minmax/README, most likely.

 One area on which I needed to modify core code is IndexBuildHeapScan.  I
 needed a version that was able to scan only a certain range of pages,
 not the entire table, so I introduced a new IndexBuildHeapRangeScan, and
 added a quick heap_scansetlimits function.  I haven't tested that this
 works outside of the HeapRangeScan thingy, so it's probably completely
 bogus; I'm open to suggestions if people think this should be
 implemented differently.  In any case, keeping that implementation
 together with vanilla IndexBuildHeapScan makes a lot of sense.

 One thing still to tackle is when to mark ranges as unsummarized.  Right
 now, any new tuple on a page range would cause a new index entry to be
 created and a new revmap update.  This would cause huge index bloat if,
 say, a page is emptied and vacuumed and filled with new tuples with
 increasing values outside the original range; each new tuple would
 create a new index tuple.  I have two ideas about this (1. mark range as
 unsummarized if 3rd time we touch the same page range;

   Why only at 3rd time?
   Doesn't it need to be precise, like if someone inserts a row having
value greater than max value of corresponding index tuple,
   then that index tuple's corresponding max value needs to be updated
and I think its updated with the help of validity map.

   For example:
   considering we need to store below info for each index tuple:
   In each index tuple (corresponding to one page range), we store:
- first block this tuple applies to
- last block this tuple applies to
- for each indexed column:
  * min() value across all tuples in the range
  * max() value across all tuples in the range

   Assume first and last block for index tuple is same (assume block
no. 'x') and min value is 5 and max is 10.
   Now user insert/update value in block 'x' such that max value of
index col. is 11, if we don't update corresponding
   index tuple or at least invalidate it, won't it lead to wrong results?

 2. vacuum the
 affected index page if it's full, so we can maintain the index always up
 to date without causing unduly bloat), but I haven't implemented
 anything yet.

 The amcostestimate routine is completely bogus; right now it returns
 constant 0, meaning the index is always chosen if it exists.

  I think for first version, you might want to keep things simple, but
there should be some way for optimizer to select this index.
  So rather than choose if it is present, we can make optimizer choose
when some-one says set enable_minmax index to true.


  How about keeping this up-to-date during foreground operations.
Vacuum/Maintainer task maintaining things usually have problems of
bloat and
  then we need optimize/workaround issues.
  Lot of people have raised this or similar point previously and what
I read you are of opinion that it seems to be slow.
  I really don't think that it can be so slow that adding so much
handling to get it up-to-date by some maintainer task is useful.
Currently there are
  systems like Oracle where index clean-up is mainly done during
foreground operation, so this alone cannot be reason for slowness.

  Comparing the logic with IOS is also not completely right as for
IOS, we need to know each tuple's visibility, which is not the case
here.

  Now it can so happen

Re: [HACKERS] invalid regexp crashes the server on windows or 9.3

2013-09-25 Thread Erik Rijkers

On Wed, September 25, 2013 09:33, Marc Mamin wrote:
 Hi,

 This regexp call freezes with almost 100% CPU usage, so I guess it creates an 
 infinite loop.

 With  Postgres 9.1 on Linux, I can kill the backend cleanly and get following 
 message:

 cic_db=# select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx');
 ERROR:  invalid regular expression: nfa has too many states

 on Windows and Postgres 9.3, I used the terminate icon from pgAdmin,
 which resulted in a FATAL error, killed all other processes and finally 
 stopped the server.

 I guesss this is a windows issue, but we do not have a Postgres 9.3 
 installation on Linux yet, so that I can't test it.


On 9.3 and 9.4devel, on linux (centos 6.4), that statement / regex returns 
after ~2 minutes (on a modest desktop) with that
same error, but not crashing.


Erik Rijkers







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-25 Thread David Rowley

On Wed, Sep 25, 2013 at 1:20 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Sep 24, 2013 at 5:04 AM, David Rowley dgrowle...@gmail.com
 wrote:
  So... I guess the question that I'd ask is, if you write a PL/pgsql
  function that does RAISE NOTICE in a loop a large number of times, can
  you measure any difference in how fast that function executes on the
  patch and unpatched code?  If so, how much?
  I do see a 15-18% slow down with the patched version, so perhaps I'll
 need
  to look to see if I can speed it up a bit, although I do feel this
 benchmark
  is not quite a normal workload.

 Ouch!  That's pretty painful.  I agree that's not a normal workload,
 but I don't think it's an entirely unfair benchmark, either.  There
 certainly are people who suffer because of the cost of logging as
 things are; for example, log_min_duration_statement is commonly used
 and can produce massive amounts of output on a busy system.

 I wouldn't mind too much if the slowdown you are seeing only occurred
 when the feature is actually used, but taking a 15-18% hit on logging
 even when the new formatting features aren't being used seems too
 expensive to me.


Ok, I think I've managed to narrow the performance gap to just about
nothing but noise, though to do this the code is now a bit bigger. I've
added a series of tests to see if the padding is  0 and if it's not then
I'm doing things the old way.

I've also added a some code which does a fast test to see if it is worth
while calling the padding processing function. This is just a simple if (*p
= '9'), I'm not completely happy with that as it does look a bit weird,
but to compensate I've added a good comment to explain what it is doing.

Please find attached the new patch ... version v0.5 and also updated
benchmark results.

Regards

David


--
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



log_line_prefix_benchmark_stresslog_v0.5.xls
Description: MS-Excel spreadsheet


log_line_formatting_v0.5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Revive line type

2013-09-25 Thread Jeevan Chalke

Hi,

I had a look over this patch and here are my review points:

1. Patch applies cleanly.
2. make, make install and make check is good.
3. I did lot of random testing and didn't find any issue.
4. Test coverage is very well. It has all scenarios and all operators are
tested with line. That's really great.

So no issues from my side.

However, do we still need this in close_pl() ?

#ifdef NOT_USED
if (FPeq(line-A, -1.0)  FPzero(line-B))
{/* vertical */
}
#endif

Also close_sl, close_lb and dist_lb are NOT yet implemented. It will be good
if we have those. But I don't think we should wait for those functions to be
implemented. We can go ahead with this. Please confirm above concern so that
I will mark it as Ready for Committer.

Thanks


On Tue, Sep 17, 2013 at 7:34 AM, Alvaro Herrera alvhe...@2ndquadrant.comwrote:

 David Fetter escribió:

  Should the things you tried and others be in the regression tests?  If
  so, should we start with whatever had been in the regression tests
  when the line type was dropped?

 Actually, the patch does include a regression test for the revived type
 (and it passes).  I don't think more than that is needed.

 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation

Re: [HACKERS] Freezing without write I/O

2013-09-25 Thread Heikki Linnakangas


On 18.09.2013 22:55, Jeff Janes wrote:

On Mon, Sep 16, 2013 at 6:59 AM, Heikki Linnakangashlinnakan...@vmware.com

wrote:
Here's a rebased version of the patch, including the above-mentioned
fixes. Nothing else new.


I've applied this to 0892ecbc015930d, the last commit to which it applies
cleanly.

When I test this by repeatedly incrementing a counter in a randomly chosen
row, then querying the whole table and comparing the results to what my
driver knows they should be, I get discrepancies.


Ok, I found the bug. The problem was that when a HOT chain begins with a 
dead tuple, when the page was frozen, the dead tuple was not removed, 
but the xmin of the live tuple in the chain was replaced with FrozenXid. 
That breaks the HOT-chain following code, which checks that the xmin of 
the next tuple in the chain matches the xmax of the previous tuple.


I fixed that by simply not freezing a page which contains any dead 
tuples. That's OK because the page will be visited by vacuum before it 
becomes old enough to be mature. However, it means that the invariant 
that a page can only contain XIDs within one XID-LSN range, determined 
by the LSN, is no longer true. AFAICS it everything still works, but I 
would feel more comfortable if we could uphold that invariant, for 
debugging reasons if nothing else. Will have to give that some more 
thought..


Thanks for the testing! I just posted an updated version of the patch 
elsewhere in this thread.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cube extension split algorithm fix

2013-09-25 Thread Stas Kelvich

Hello, hackers.

I've fixed split algorithm that was implemented in cube extension. I've changed 
it according to the original Guttman paper (old version was more simple 
algorithm) and also ported Alexander Korotkov's algorithm from box datatype 
indexing that work faster and better on low dimensions.

On my test dataset (1M records, 7 dimensions, real world database of goods) 
numbers was following:

Building index over table (on expression):
old: 67.296058 seconds
new: 48.842391 seconds

Cube point search, mean, 100 queries
old: 0.001025 seconds
new: 0.000427 seconds

Index on field size:
old: 562 MB
new: 283 MB

Stas.



split.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-25 Thread MauMau


From: Peter Eisentraut pete...@gmx.net

On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote:
4. I guess some users really want to continue to use ShiftJIS or EUC_JP 
for

database encoding, and use NCHAR for a limited set of columns to store
international text in Unicode:
- to avoid code conversion between the server and the client for 
performance

- because ShiftJIS and EUC_JP require less amount of storage (2 bytes for
most Kanji) than UTF-8 (3 bytes)
This use case is described in chapter 6 of Oracle Database Globalization
Support Guide.


But your proposal wouldn't address the first point, because data would
have to go client - server - NCHAR.

The second point is valid, but it's going to be an awful amount of work
for that limited result.


I (or, Oracle's use case) meant the following, for example:

initdb -E EUC_JP
CREATE DATABASE mydb ENCODING EUC_JP NATIONAL ENCODING UTF-8;
CREATE TABLE mytable (
   col1 char(10),   -- EUC_JP text
   col2 Nchar(10), -- UTF-8 text
);
client encoding = EUC_JP

That is,

1. Currently, the user is only handling Japanese text.  To avoid unnecessary 
conversion, he uses EUC_JP for both client and server.
2. He needs to store some limited amount of international (non-Japanese) 
text in a few columns for a new feature of the system.  But the 
international text is limited, so he wants to sacrifice performance and 
storage cost due to code conversion for most text and more bytes for each 
character.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

2013-09-25 Thread Peter Eisentraut

On 9/25/13 5:31 AM, Heikki Linnakangas wrote:
 Attached is a new version, which adds that field to HeapTupleData. Most
 of the issues on you listed above have been fixed, plus a bunch of other
 bugs I found myself. The bug that Jeff ran into with his count.pl script
 has also been fixed.

This patch doesn't apply at all:

Reversed (or previously applied) patch detected!  Assuming -R.
1 out of 39 hunks FAILED -- saving rejects to file
src/backend/access/heap/heapam.c.rej
5 out of 9 hunks FAILED -- saving rejects to file
src/backend/access/heap/rewriteheap.c.rej
Reversed (or previously applied) patch detected!  Assuming -R.
Reversed (or previously applied) patch detected!  Assuming -R.
Reversed (or previously applied) patch detected!  Assuming -R.
10 out of 12 hunks FAILED -- saving rejects to file
src/backend/access/transam/xlog.c.rej
13 out of 18 hunks FAILED -- saving rejects to file
src/backend/commands/cluster.c.rej
Reversed (or previously applied) patch detected!  Assuming -R.
15 out of 17 hunks FAILED -- saving rejects to file
src/backend/commands/vacuum.c.rej
5 out of 24 hunks FAILED -- saving rejects to file
src/backend/commands/vacuumlazy.c.rej
3 out of 9 hunks FAILED -- saving rejects to file
src/backend/postmaster/autovacuum.c.rej
Reversed (or previously applied) patch detected!  Assuming -R.
Reversed (or previously applied) patch detected!  Assuming -R.
Reversed (or previously applied) patch detected!  Assuming -R.
1 out of 4 hunks FAILED -- saving rejects to file
src/backend/utils/misc/guc.c.rej
Reversed (or previously applied) patch detected!  Assuming -R.
Reversed (or previously applied) patch detected!  Assuming -R.
1 out of 2 hunks FAILED -- saving rejects to file
src/include/commands/cluster.h.rej
1 out of 1 hunk FAILED -- saving rejects to file
src/include/commands/vacuum.h.rej
Reversed (or previously applied) patch detected!  Assuming -R.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

2013-09-25 Thread Peter Eisentraut

On 9/23/13 5:36 PM, Alexander Korotkov wrote:
 In the attached version of patch double finding of ItemPointer during
 insert is avoided. Overhead becomes lower as expected.

Fails cpluspluscheck:

./src/include/access/gin_private.h: In function ‘char*
ginDataPageLeafReadItemPointer(char*, ItemPointer)’:
./src/include/access/gin_private.h:797:2: warning: comparison between
signed and unsigned integer expressions [-Wsign-compare]



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension kNN support

2013-09-25 Thread Peter Eisentraut

On 9/22/13 7:38 PM, Stas Kelvich wrote:
 Here is the patch that introduces kNN search for cubes with euclidean, 
 taxicab and chebyshev distances.

cube and earthdistance regression tests fail.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] invalid regexp crashes the server on windows or 9.3

2013-09-25 Thread Tom Lane

Erik Rijkers e...@xs4all.nl writes:
 On Wed, September 25, 2013 09:33, Marc Mamin wrote:
 This regexp call freezes with almost 100% CPU usage, so I guess it creates 
 an infinite loop.
 cic_db=# select regexp_matches ('aa bb aa ba baa x','(^((?!aa)))+','gx');

I poked into this a bit and decided that it's a variant of a problem we've
seen before.  I filed a report with the Tcl crew to see if they have any
opinion about it:
https://core.tcl.tk/tcl/tktview?name=8f245009b0
but prior experience suggests they won't be a lot of help.

 On 9.3 and 9.4devel, on linux (centos 6.4), that statement / regex returns 
 after ~2 minutes (on a modest desktop) with that
 same error, but not crashing.

Yeah, it should in principle fail eventually with a too-many-states error,
since the loop is creating more NFA states than it deletes.  But on a Mac
laptop with an assert-enabled PG build, I got tired of waiting for that
after about an hour ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

2013-09-25 Thread Heikki Linnakangas


On 25.09.2013 15:48, Peter Eisentraut wrote:

On 9/25/13 5:31 AM, Heikki Linnakangas wrote:

Attached is a new version, which adds that field to HeapTupleData. Most
of the issues on you listed above have been fixed, plus a bunch of other
bugs I found myself. The bug that Jeff ran into with his count.pl script
has also been fixed.


This patch doesn't apply at all:


Huh, that's strange. Merged with latest changes from master, here's a 
new version that applies.


- Heikki


xid-lsn-ranges-5.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some interesting news about Linux 3.12 OOM

2013-09-25 Thread Greg Stark

On Wed, Sep 25, 2013 at 12:15 AM, Daniel Farina dan...@heroku.com wrote:

 Enable the memcg OOM killer only for user faults, where it's really the
 only option available.


Is this really a big deal? I would expect most faults to be user faults.

It's certainly a big deal that we need to ensure we can handle ENOMEM from
syscalls and library functions we weren't expecting to return it. But I
don't expect it to actually reduce the OOM killing sprees by much.


-- 
greg

Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Steve Singer


On 09/17/2013 10:31 AM, Andres Freund wrote:

This patch set now fails to apply because of the commit Rename various
freeze multixact variables.
And I am even partially guilty for that patch...

Rebased patches attached.


While testing the logical replication changes against my WIP logical 
slony I am sometimes getting error messages from the WAL sender of the form:

unexpected duplicate for tablespace  X relfilenode  X

The call stack is

HeapTupleSatisfiesMVCCDuringDecoding
heap_hot_search_buffer
index_fetch_heap
index_getnext
systable_getnext
RelidByRelfilenode
ReorderBufferCommit
 DecodeCommit
.
.
.


I am working off something based on your version 
e0acfeace6d695c229efd5d78041a1b734583431



Any ideas?


Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: UTF8 national character data type support WIP patch and list of open issues.

2013-09-25 Thread Greg Stark

On Tue, Sep 24, 2013 at 1:04 PM, MauMau maumau...@gmail.com wrote:

 Yes, so Tatsuo san suggested to restrict server encoding - NCHAR
 encoding combination to those with lossless conversion.


If it's not lossy then what's the point? From the client's point of view
it'll be functionally equivalent to text then.


-- 
greg

Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Andres Freund

On 2013-09-25 11:01:44 -0400, Steve Singer wrote:
 On 09/17/2013 10:31 AM, Andres Freund wrote:
 This patch set now fails to apply because of the commit Rename various
 freeze multixact variables.
 And I am even partially guilty for that patch...
 
 Rebased patches attached.
 
 While testing the logical replication changes against my WIP logical slony I
 am sometimes getting error messages from the WAL sender of the form:
 unexpected duplicate for tablespace  X relfilenode  X

Any chance you could provide a setup to reproduce the error?

 Any ideas?

I'll look into it. Could you provide any context to what youre doing
that's being decoded?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI freezing bug

2013-09-25 Thread Heikki Linnakangas


On 22.09.2013 00:12, Hannu Krosing wrote:

On 09/21/2013 10:46 PM, Andres Freund wrote:


Heikki Linnakangashlinnakan...@vmware.com  schrieb:

Kevin Grittnerkgri...@ymail.com  wrote:

Andres Freundand...@2ndquadrant.com  wrote:

On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:

When a tuple is predicate-locked, the key of the lock is

ctid+xmin.

However, when a tuple is frozen, its xmin is changed to FrozenXid.

That

effectively
invalidates any predicate lock on the tuple, as checking for a

lock

on

the
same tuple later won't find it as the xmin is different.

Attached is an isolationtester spec to demonstrate this.

Do you have any idea to fix that besides keeping the xmin horizon

below the

lowest of the xids that are predicate locked? Which seems nasty to
compute and is probably not trivial to fit into the procarray.c
machinery?

A better solution probably is to promote tuple-level locks if they

exist

to a relation level one upon freezing I guess?

That would work.  A couple other ideas would be to use the oldest
serializable xmin which is being calculated in predicate.c to
either prevent freezing of newer transaction or to summarize
predicate locks (using the existing SLRU-based summarization
functions) which use xmin values for xids which are being frozen.
The transactions must already be committed, and so are eligible for
summarization.

What's the point of using xmin as part of the lock key in the first
place, wouldn't ctid alone suffice? If the tuple was visible to the
reading transaction, it cannot be vacuumed away until the transaction
commits. No other tuple can appear with the same ctid.

SSI locks can live longer than one TX/snapshot.

But the only way that ctid can change without xmin changing is when
you update the tuple in the same TX that created it.


I don't think that's relevant. We're not talking about the ctid field 
of a tuple, ie. HeapTupleHeader.t_ctid. We're talking about its physical 
location, ie. HeapTuple-t_self.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] record identical operator

2013-09-25 Thread Kevin Grittner

Kevin Grittner kgri...@ymail.com wrote:

 Here is v2 of the patch which changes from the universally
 disliked operator names v1 used.  It also fixes bugs in the row
 comparisons for pass-by-reference types, fixes a couple nearby
 comments, and adds regression tests for a matview containing a
 box column.

I accidentally omitted the regression tests for matview with a box
column from v2.  Rather than redoing the whole thing, here is a v2a
patch to add just that omitted part.
 
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/src/test/regress/expected/matview.out
--- b/src/test/regress/expected/matview.out
***
*** 412,418  ERROR:  new data for mv contains duplicate rows without any NULL columns
  DETAIL:  Row: (1,10)
  DROP TABLE foo CASCADE;
  NOTICE:  drop cascades to materialized view mv
! -- make sure that all indexes covered by unique indexes works
  CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3);
  CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo;
  CREATE UNIQUE INDEX ON mv (a);
--- 412,418 
  DETAIL:  Row: (1,10)
  DROP TABLE foo CASCADE;
  NOTICE:  drop cascades to materialized view mv
! -- make sure that all columns covered by unique indexes works
  CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3);
  CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo;
  CREATE UNIQUE INDEX ON mv (a);
***
*** 424,426  REFRESH MATERIALIZED VIEW mv;
--- 424,446 
  REFRESH MATERIALIZED VIEW CONCURRENTLY mv;
  DROP TABLE foo CASCADE;
  NOTICE:  drop cascades to materialized view mv
+ -- make sure that types with unusual equality tests work
+ CREATE TABLE boxes (id serial primary key, b box);
+ INSERT INTO boxes (b) VALUES
+   ('(32,32),(31,31)'),
+   ('(2.004,2.004),(1,1)'),
+   ('(1.996,1.996),(1,1)');
+ CREATE MATERIALIZED VIEW boxmv AS SELECT * FROM boxes;
+ CREATE UNIQUE INDEX boxmv_id ON boxmv (id);
+ UPDATE boxes SET b = '(2,2),(1,1)' WHERE id = 2;
+ REFRESH MATERIALIZED VIEW CONCURRENTLY boxmv;
+ SELECT * FROM boxmv ORDER BY id;
+  id |  b  
+ +-
+   1 | (32,32),(31,31)
+   2 | (2,2),(1,1)
+   3 | (1.996,1.996),(1,1)
+ (3 rows)
+ 
+ DROP TABLE boxes CASCADE;
+ NOTICE:  drop cascades to materialized view boxmv
*** a/src/test/regress/sql/matview.sql
--- b/src/test/regress/sql/matview.sql
***
*** 143,149  REFRESH MATERIALIZED VIEW mv;
  REFRESH MATERIALIZED VIEW CONCURRENTLY mv;
  DROP TABLE foo CASCADE;
  
! -- make sure that all indexes covered by unique indexes works
  CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3);
  CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo;
  CREATE UNIQUE INDEX ON mv (a);
--- 143,149 
  REFRESH MATERIALIZED VIEW CONCURRENTLY mv;
  DROP TABLE foo CASCADE;
  
! -- make sure that all columns covered by unique indexes works
  CREATE TABLE foo(a, b, c) AS VALUES(1, 2, 3);
  CREATE MATERIALIZED VIEW mv AS SELECT * FROM foo;
  CREATE UNIQUE INDEX ON mv (a);
***
*** 154,156  INSERT INTO foo VALUES(3, 4, 5);
--- 154,169 
  REFRESH MATERIALIZED VIEW mv;
  REFRESH MATERIALIZED VIEW CONCURRENTLY mv;
  DROP TABLE foo CASCADE;
+ 
+ -- make sure that types with unusual equality tests work
+ CREATE TABLE boxes (id serial primary key, b box);
+ INSERT INTO boxes (b) VALUES
+   ('(32,32),(31,31)'),
+   ('(2.004,2.004),(1,1)'),
+   ('(1.996,1.996),(1,1)');
+ CREATE MATERIALIZED VIEW boxmv AS SELECT * FROM boxes;
+ CREATE UNIQUE INDEX boxmv_id ON boxmv (id);
+ UPDATE boxes SET b = '(2,2),(1,1)' WHERE id = 2;
+ REFRESH MATERIALIZED VIEW CONCURRENTLY boxmv;
+ SELECT * FROM boxmv ORDER BY id;
+ DROP TABLE boxes CASCADE;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI freezing bug

2013-09-25 Thread Heikki Linnakangas


On 21.09.2013 23:46, Andres Freund wrote:



Heikki Linnakangashlinnakan...@vmware.com  schrieb:

Kevin Grittnerkgri...@ymail.com  wrote:

Andres Freundand...@2ndquadrant.com  wrote:

On 2013-09-20 13:55:36 +0300, Heikki Linnakangas wrote:

When a tuple is predicate-locked, the key of the lock is

ctid+xmin.

However, when a tuple is frozen, its xmin is changed to FrozenXid.

That

effectively
invalidates any predicate lock on the tuple, as checking for a

lock

on

the
same tuple later won't find it as the xmin is different.

Attached is an isolationtester spec to demonstrate this.


Do you have any idea to fix that besides keeping the xmin horizon

below the

lowest of the xids that are predicate locked? Which seems nasty to
compute and is probably not trivial to fit into the procarray.c
machinery?


A better solution probably is to promote tuple-level locks if they

exist

to a relation level one upon freezing I guess?


That would work.  A couple other ideas would be to use the oldest
serializable xmin which is being calculated in predicate.c to
either prevent freezing of newer transaction or to summarize
predicate locks (using the existing SLRU-based summarization
functions) which use xmin values for xids which are being frozen.
The transactions must already be committed, and so are eligible for
summarization.


What's the point of using xmin as part of the lock key in the first
place, wouldn't ctid alone suffice? If the tuple was visible to the
reading transaction, it cannot be vacuumed away until the transaction
commits. No other tuple can appear with the same ctid.


SSI locks can live longer than one TX/snapshot.


Hmm, true.

Another idea: we could vacuum away predicate locks, when the 
corresponding tuples are vacuumed from the heap. Before the 2nd phase of 
vacuum, before removing the dead tuples, we could scan the predicate 
lock hash and remove any locks belonging to the tuples we're about to 
remove. And not include xmin in the lock key.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension types support

2013-09-25 Thread Peter Eisentraut

On 9/24/13 11:44 AM, Stas Kelvich wrote:
 In this patch I've implemented support for different storage types for cubes.

Doesn't build:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-fpic -I. -I. -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o 
cubeparse.o cubeparse.c -MMD -MP -MF .deps/cubeparse.Po
cubeparse.y:35:13: warning: ‘check_coord’ used but never defined [enabled by 
default]
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute 
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g 
-fpic -shared -o cube.so cube.o cubeparse.o -L../../src/port -L../../src/common 
-L/usr/lib  -Wl,--as-needed  -lm 
/usr/bin/ld: cubeparse.o: relocation R_X86_64_PC32 against symbol `check_coord' 
can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [cube.so] Error 1



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-25 Thread Peter Eisentraut

On 9/25/13 4:46 AM, David Rowley wrote:
 Please find attached the new patch ... version v0.5 and also updated
 benchmark results.

Please fix compiler warnings:

elog.c: In function ‘log_line_prefix.isra.3’:
elog.c:2436:22: warning: ‘padding’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
elog.c:2172:6: note: ‘padding’ was declared here



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension split algorithm fix

2013-09-25 Thread Peter Eisentraut

On 9/25/13 7:14 AM, Stas Kelvich wrote:
 I've fixed split algorithm that was implemented in cube extension.

This patch creates a bunch of new compiler warnings.  Please fix those.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Steve Singer


On 09/25/2013 11:08 AM, Andres Freund wrote:

On 2013-09-25 11:01:44 -0400, Steve Singer wrote:

On 09/17/2013 10:31 AM, Andres Freund wrote:

This patch set now fails to apply because of the commit Rename various
freeze multixact variables.
And I am even partially guilty for that patch...

Rebased patches attached.

While testing the logical replication changes against my WIP logical slony I
am sometimes getting error messages from the WAL sender of the form:
unexpected duplicate for tablespace  X relfilenode  X

Any chance you could provide a setup to reproduce the error?



The steps to build a setup that should reproduce this error are:

1.  I had apply the attached patch on top of your logical replication 
branch so my pg_decode_init  would now if it was being called as part of 
a INIT_REPLICATION or START_REPLICATION.
Unless I have misunderstood something you probably will want to merge 
this fix in


2.  Get my WIP for adding logical support to slony from: 
g...@github.com:ssinger/slony1-engine.git branch logical_repl  
(4af1917f8418a)
(My code changes to slony are more prototype level code quality than 
production code quality)


3.
cd slony1-engine
./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever)
make
make install

4.   Grab the clustertest framework JAR from 
https://github.com/clustertest/clustertest-framework and build up a 
clustertest jar file


5.  Create a file
slony1-engine/clustertest/conf/java.conf
that contains the path to the above JAR file  as a shell variable 
assignment: ie

CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar

6.
cp clustertest/conf/disorder.properties.sample 
clustertest/conf/disorder.properties



edit disorder.properites to have the proper values for your 
environment.  All 6 databases can point at the same postgres instance, 
this test will only actually use 2 of them(so far).


7. Run the test
cd clustertest
./run_all_disorder_tests.sh

This involves having the slon connect to the walsender on the database 
test1 and replicate the data into test2 (which is a different database 
on the same postmaster)


If this setup seems like too much effort I can request one of the 
commitfest VM's from Josh and get everything setup there for you.


Steve


Any ideas?

I'll look into it. Could you provide any context to what youre doing
that's being decoded?

Greetings,

Andres Freund



From 9efae980c357d3b75c6d98204ed75b8d29ed6385 Mon Sep 17 00:00:00 2001
From: Steve Singer ssin...@ca.afilias.info
Date: Mon, 16 Sep 2013 10:21:39 -0400
Subject: [PATCH] set the init parameter to true when performing an INIT
 REPLICATION

---
 src/backend/replication/walsender.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2187d96..1b8f289 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -743,7 +743,7 @@ InitLogicalReplication(InitLogicalReplicationCmd *cmd)
 	sendTimeLine = ThisTimeLineID;
 
 	initStringInfo(output_message);
-	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr,
+	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true, InvalidXLogRecPtr,
 	   NIL,	replay_read_page,
 	   WalSndPrepareWrite, WalSndWriteData);
 
-- 
1.7.10.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FW: REVIEW: Allow formatting in log_line_prefix

2013-09-25 Thread David Rowley

On Thu, Sep 26, 2013 at 4:57 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 9/25/13 4:46 AM, David Rowley wrote:
  Please find attached the new patch ... version v0.5 and also updated
  benchmark results.

 Please fix compiler warnings:

 elog.c: In function ‘log_line_prefix.isra.3’:
 elog.c:2436:22: warning: ‘padding’ may be used uninitialized in this
 function [-Wmaybe-uninitialized]
 elog.c:2172:6: note: ‘padding’ was declared here


Fixed in attached version.

Regards

David Rowley


log_line_formatting_v0.6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2013-09-25 Thread Alvaro Herrera

Amit Kapila escribió:
 On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

  One thing still to tackle is when to mark ranges as unsummarized.  Right
  now, any new tuple on a page range would cause a new index entry to be
  created and a new revmap update.  This would cause huge index bloat if,
  say, a page is emptied and vacuumed and filled with new tuples with
  increasing values outside the original range; each new tuple would
  create a new index tuple.  I have two ideas about this (1. mark range as
  unsummarized if 3rd time we touch the same page range;
 
Why only at 3rd time?
Doesn't it need to be precise, like if someone inserts a row having
 value greater than max value of corresponding index tuple,
then that index tuple's corresponding max value needs to be updated
 and I think its updated with the help of validity map.

Of course.  Note I no longer have the concept of a validity map; I have
switched things to use a reverse range map, or revmap for short.  The
revmap is responsible for mapping each page number to an individual
index TID.  If the TID stored in the revmap is InvalidTid, that means
the range is not summarized.  Summarized ranges are always considered as
match query quals, and thus all tuples in them are returned in the
bitmap for heap recheck.

The way it works currently, is that any tuple insert (that's outside the
bounds of the current index tuple) causes a new index tuple to be
created, and the revmap is updated to point to the new index tuple.  The
old index tuple is orphaned and will be deleted at next vacuum.  This
works fine.  However the problem is excess orphaned tuples; I don't want
a long series of updates to create many orphaned dead tuples.  Instead I
would like the system to, at some point, stop creating new index tuples
and instead set the revmap to InvalidTid.  That would stop the index
bloat.

For example:
considering we need to store below info for each index tuple:
In each index tuple (corresponding to one page range), we store:
 - first block this tuple applies to
 - last block this tuple applies to
 - for each indexed column:
   * min() value across all tuples in the range
   * max() value across all tuples in the range
 
Assume first and last block for index tuple is same (assume block
 no. 'x') and min value is 5 and max is 10.
Now user insert/update value in block 'x' such that max value of
 index col. is 11, if we don't update corresponding
index tuple or at least invalidate it, won't it lead to wrong results?

Sure, that would result in wrong results.  Fortunately that's not how I
am suggesting to do it.

I note you're reading an old version of the design.  I realize now that
this is my mistake because instead of posting the new design in the
cover letter for the patch, I only put it in the minmax-proposal file.
Please give that file a read to see how the design differs from the
design I originally posted in the old thread.


  The amcostestimate routine is completely bogus; right now it returns
  constant 0, meaning the index is always chosen if it exists.
 
   I think for first version, you might want to keep things simple, but
 there should be some way for optimizer to select this index.
   So rather than choose if it is present, we can make optimizer choose
 when some-one says set enable_minmax index to true.

Well, enable_bitmapscan already disables minmax indexes, just like it
disables other indexes.

   How about keeping this up-to-date during foreground operations.
 Vacuum/Maintainer task maintaining things usually have problems of
 bloat and
   then we need optimize/workaround issues.
   Lot of people have raised this or similar point previously and what
 I read you are of opinion that it seems to be slow.

Well, the current code does keep the index up to date -- I did choose to
implement what people suggested :-)

   Now it can so happen that min and max values are sometimes not right
 because later the operation is rolled back, but I think such cases
 will
   be less and we can find some way to handle such cases may be
 maintainer task only, but the handling will be quite simpler.

Agreed.

   On Windows, patch gives below compilation errors:
   src\backend\access\minmax\mmtuple.c(96): error C2057: expected
 constant expression

I have fixed all these compile errors (fix attached).  Thanks for
reporting them.  I'll post a new version shortly.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
*** a/src/backend/access/minmax/mmtuple.c
--- b/src/backend/access/minmax/mmtuple.c
***
*** 93,117  MMTuple *
  minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple,
    Size *size)
  {
! 	Datum		values[diskDsc-natts];
! 	bool		nulls[diskDsc-natts];
  	bool		anynulls = false;
  	MMTuple*rettuple;
  	int			keyno;
  	uint16		phony_infomask;
! 	bits8

Re: [HACKERS] Minmax indexes

2013-09-25 Thread Alvaro Herrera

Erik Rijkers wrote:

 After a  --data-checksums initdb (successful), the following error came up:
 
 after the statement: create index t_minmax_idx on t using minmax (r);
 
 WARNING:  page verification failed, calculated checksum 25951 but expected 0
 ERROR:  invalid page in block 1 of relation base/21324/26267_vm
 
 it happens reliably. every time I run the program.

Thanks for the report.  That's fixed with the attached.

 Below is the whole program that I used.

Hmm, this test program shows that you're trying to use the index to
optimize min() and max() queries, but that's not what these indexes do.
You will need to use operators  = = =  (or BETWEEN, which is the
same thing) to see your index in action.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
commit 762ebb8f6ecfb36b9976bb9aa87d0a7f8fb601b4
Author: Alvaro Herrera alvhe...@alvh.no-ip.org
Date:   Wed Sep 25 17:03:28 2013 -0300

Set checksum to new revmap pages

per Erik Rijkers

diff --git a/src/backend/access/minmax/mmrevmap.c b/src/backend/access/minmax/mmrevmap.c
index 43bff95..5ff5ca2 100644
--- a/src/backend/access/minmax/mmrevmap.c
+++ b/src/backend/access/minmax/mmrevmap.c
@@ -323,6 +323,7 @@ mmRevmapExtend(mmRevmapAccess *rmAccess, BlockNumber blkno)
 	while (blkno = rmAccess-physPagesInRevmap)
 	{
 		extended = true;
+		PageSetChecksumInplace(page, blkno);
 		smgrextend(rmAccess-idxrel-rd_smgr, MM_REVMAP_FORKNUM,
    rmAccess-physPagesInRevmap, page, false);
 		rmAccess-physPagesInRevmap++;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-25 Thread MauMau


From: Greg Stark st...@mit.edu

If it's not lossy then what's the point? From the client's point of view
it'll be functionally equivalent to text then.


Sorry, what Tatsuo san suggested meant was same or compatible, not lossy. 
I quote the relevant part below.  This is enough for the use case I 
mentioned in my previous mail several hours ago (actually, that is what 
Oracle manual describes...).


http://www.postgresql.org/message-id/20130920.085853.1628917054830864151.t-is...@sraoss.co.jp

[Excerpt]

What about limiting to use NCHAR with a database which has same
encoding or compatible encoding (on which the encoding conversion is
defined)? This way, NCHAR text can be automatically converted from
NCHAR to the database encoding in the server side thus we can treat
NCHAR exactly same as CHAR afterward.  I suppose what encoding is used
for NCHAR should be defined in initdb time or creation of the database
(if we allow this, we need to add a new column to know what encoding
is used for NCHAR).

For example, CREATE TABLE t1(t NCHAR(10)) will succeed if NCHAR is
UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
SHIFT-JIS and database encoding is UTF-8 because there is a conversion
between UTF-8 and SHIFT-JIS. However will not succeed if NCHAR is
SHIFT-JIS and database encoding is ISO-8859-1 because there's no
conversion between them.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] bitmap indexes

2013-09-25 Thread Alvaro Herrera

Hi,

Here are some quick items while skimming this patch.  I am looking at
commit 6448de29d from your github repo, branch bmi.

What's with the pg_bitmapindex stuff in pg_namespace.h?  It doesn't seem
to be used anywhere.

This led me to research how these indexes are stored.  I note that what
we're doing here is to create another regular table and a btree index on
top of it, and those are the ones that actually store the index data.
This seems grotty and completely unlike the way we do things elsewhere
(compare GIN indexes which have rbtrees inside them).  I think this
should be stored in separate forks, or separate kinds of pages
intermixed in the index's main fork.

I don't like that files are named bitmap.c/.h.  We already have bitmap
scans, so having the generic concept (bitmap) show up as a file name is
confusing.  To cite an example, see the name of the bmgetbitmap function
(which returns a TID bitmap from a bitmap index.  How is that not
confusing?).  I think I would be more comfortable with the files being
called bmi or maybe bitmapidx, or something like that.  (For sure do
not change the AM's name.  I mean CREATE INDEX .. USING bmi would
suck.)

Not happy with (the contents of) src/include/access/bitmap.h.  There's
way too much stuff in a single file.  I think a three-file split would
work nicely: one for the AM routine declarations (bitmap.h), one for
xlog stuff (bitmap_xlog.h), one for internal routines and struct
declarations (bitmap_priv.h, bitmap_internals.h, or something like
that).  Also, I think macros and structs should be defined in a narrow
scope as possible; for example, macros such as HEADER_SET_FILL_BIT_ON
should be defined in bitmaputil.c, not bitmap.h (that macro is missing
parens BTW).  For macros that are defined in headers, it would be better
to have prefixes that scope them to bitmaps; for example IS_FILL_WORD
should maybe have a BM_ prefix or something similar.

I don't think it's necessary to renumber relopt_kind.  Just stash the
new one at the end of the enum.

bmoptions's DESCR entry in pg_proc.h says btree.

contrib/bmfuncs.c defines CHECK_PAGE_OFFSET_RANGE but doesn't seem to
use it anywhere.

same file defines CHECK_RELATION_BLOCK_RANGE using a bare { } block. Our
style is to have these multi-statement macros use do {} while (false).

#include lines in source files are normally alphabetically sorted.  The
new code fails to meet this expectation in many places.

First four lines of _bitmap_init seem gratuitous ..

All those #ifdef DEBUG_BMI lines sprinkled all over the place look
pretty bad; they interrupt the indentation flow.  I would suggest to
define a macro or some macros to emit debugging messages, which are
enabled or disabled in a single place depending on DEBUG_BMI.  Something
simple such as DO_DB() in fd.c would suffice.

I don't like this comment one bit: /* misteriously, MemSet segfaults...  :( */
I think that's probably a bug that should be investigated rather than
papered over.

I don't understand the cur_bmbuild thingy .. I think that stuff should
be passed down as arguments to whatever do the index build, instead of
being a magical global var that also signals failure to find hash
functions for some datatypes.

Above definition of BMBuildHashData there's a comment referring us to
execGrouping.c but I cannot understand what it refers to.

block unfound?  In general I think it's poor style to spell out the
function name in error messages.  I mean, ereport() already reports the
function name.  Also, please don't split long error messages across
multiple lines; better to leave the line to run off the screen.

I'm unsure about distinguishing errors in the recovery routines that
raise ERROR from those that PANIC.  I mean, they would both cause the
server to PANIC.

The bitmap_xlog_cleanup routine talks about an uninitialized reln.
That's a large point about xlog recovery -- you don't have relations.
You only have relfilenodes.  I think you need to shuffle some routines
so that they can work with only a relfilenode.

Generally, the xlog stuff needs a lot of comments.

A pgindent run would be beneficial.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

2013-09-25 Thread Erik Rijkers

On Wed, September 25, 2013 22:34, Alvaro Herrera wrote:

 [minmax-5.patch]

I have the impression it's not quite working correctly.

The attached program returns different results for different values of 
enable_bitmapscan (consistently).

( Btw, I had to make the max_locks_per_transaction higher for even not-so-large 
tables -- is that expected?  For a 100M row
table, max_locks_per_transaction=1024 was not enough; I set it to 2048.  Might 
be worth some documentation, eventually. )

From eyeballing the results it looks like the minmax result (i.e. the result 
set with enable_bitmapscan = 1) yields only
the last part because the only 'last' rows seem to be present (see the values 
in column i in table tmm in the attached
program).


Thanks,

Erikjan Rijkers









test.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench progress report improvements - split 2

2013-09-25 Thread Noah Misch

On Tue, Sep 24, 2013 at 08:42:15PM +0200, Fabien COELHO wrote:
 meet all those goals simultaneously with simpler code, can we not?

  int64 wait = (int64) (throttle_delay *
  Min(7.0, -log(1 - pg_erand48(thread-random_state;

 If you truncate roughly the multipler, as it is done here with min, you 
 necessarily create a bias, my random guess would be a 0.5% under  
 estimation, but maybe it is more... Thus you would have to compute and 
 the correcting factor as well. Its computation is a little bit less easy 
 than with the quantized formula where I just used a simple SUM, and you 
 have to really do the maths here. So I would keep the simple solution, 
 but it is fine with me if you do the maths!

Ah, true; I guess each approach has different advantages.  I've committed your
latest version.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-25 Thread Bruce Momjian

On Sat, Sep 21, 2013 at 05:07:11PM -0700, Peter Geoghegan wrote:
 In the average/uncontended case, the subxact example bloats less than
 all alternatives to my design proposed to date (including the unborn
 heap tuple idea Robert mentioned in passing to me in person the other
 day, which I think is somewhat similar to a suggestion of Heikki's
 [1]). The average case is very important, because in general
 contention usually doesn't happen.

This thread had a lot of discussion about bloating.  I wonder, does the
code check to see if there is a matching row _before_ adding any data? 
Our test-and-set code first checks to see if the lock is free, then if
it it is, it locks the bus and does a test-and-set.   Couldn't we easily
check the indexes for matches before doing any locking?  It seems that
would avoid bloat in most cases, and allow for a simpler implementation.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-25 Thread Peter Geoghegan

On Wed, Sep 25, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
 This thread had a lot of discussion about bloating.  I wonder, does the
 code check to see if there is a matching row _before_ adding any data?

That's pretty much what the patch does.

 Our test-and-set code first checks to see if the lock is free, then if
 it it is, it locks the bus and does a test-and-set.   Couldn't we easily
 check the indexes for matches before doing any locking?  It seems that
 would avoid bloat in most cases, and allow for a simpler implementation.

The value locks are only really necessary for getting consensus across
unique indexes on whether or not to go forward, and to ensure that
insertion can *finish* unhindered once we're sure that's appropriate.
Once we've committed to insertion, we hold them across heap tuple
insertion and release each value lock as part of something close to
conventional btree index tuple insertion (with an index tuple with an
ordinary heap pointer inserted). I believe that all schemes proposed
to date have some variant of what could be described as value locking,
such as ordinary index tuples inserted speculatively.

Value locks are *not* held during row locking, and an attempt at row
locking is essentially opportunistic for various reasons (it boils
down to the fact that re-verifying uniqueness outside of the btree
code is very unappealing, and in any case would naturally sometimes be
insufficient - what happens if key values change across row
versions?). This might sound a bit odd, but is in a sense no different
to the current state of affairs, where the first waiter on a blocking
xact that inserted a would-be duplicate is not guaranteed to be the
first to get a second chance at inserting. I don't believe that there
are any significant additional lock starvation hazards.

In the simple case where there is a conflicting tuple that's already
committed, value locks above and beyond what the btree code does today
are unnecessary (provided the attempt to acquire a row lock is
eventually successful, which mostly means that no one else has
updated/deleted - otherwise we try again).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench progress report improvements - split 3 v2

2013-09-25 Thread Noah Misch

On Tue, Sep 24, 2013 at 10:41:17PM +0200, Fabien COELHO wrote:
 These changes are coupled because measures are changed, and their
 reporting as well. Submitting separate patches for these different
 features would result in conflicting or dependent patches, so I
 wish to avoid that if possible.

My feelings on the patch split haven't changed; your three bullet points call
for four separate patches.  Conflicting patches are bad, but dependent patches
are okay; just disclose the dependency order.  How about this: as a next step,
please extract just this feature that I listed last Saturday:

  Patch (4): Redefine latency as reported by pgbench and report lag more.

Once that's committed, we can move on to others.

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - exclude pthread_create() from connection start timing

2013-09-25 Thread Noah Misch

Concerning one of the eventual would-be split patches...

On Tue, Sep 24, 2013 at 10:41:17PM +0200, Fabien COELHO wrote:
  - Take thread start time at the beginning of the thread (!)

Otherwise it includes pretty slow thread/fork system start times in
the measurements. May help with bug #8326. This also helps with throttling,
as the start time is used to adjust the rate: if it is not the actual
start time, there is a bit of a rush at the beginning in order to
catch up. Taking the start time when the thread actually starts is
the sane thing to do to avoid surprises at possibly strange measures
on short runs.

On Sun, Sep 22, 2013 at 10:08:54AM +0200, Fabien COELHO wrote:
 I really spent *hours* debugging stupid measures on the previous round of 
 pgbench changes, when adding the throttling stuff. Having the start time  
 taken when the thread really starts is just sanity, and I needed that 
 just to rule out that it was the source of the strange measures.

I don't get it; why is taking the time just after pthread_create() more sane
than taking it just before pthread_create()?  We report separate TPS values
for including connections establishing and excluding connections
establishing because establishing the database connection is expensive,
typically far more expensive than pthread_create().  The patch for threaded
pgbench made the decision to account for pthread_create() as though it were
part of establishing the connection.  You're proposing to not account for it
all.  Both of those designs are reasonable to me, but I do not comprehend the
benefit you anticipate from switching from one to the other.

 -j 800 vs -j 100 : ITM that if I you create more threads, the time delay  
 incurred is cumulative, so the strangeness of the result should worsen.

Not in general; we do one INSTR_TIME_SET_CURRENT() per thread, just before
calling pthread_create().  However, thread 0 is a special case; we set its
start time first and actually start it last.  Your observation of cumulative
delay fits those facts.  Initializing the thread-0 start time later, just
before calling its threadRun(), should clear this anomaly without changing
other aspects of the measurement.


While pondering this area of the code, it occurs to me -- shouldn't we
initialize the throttle rate trigger later, after establishing connections and
sending startup queries?  As it stands, we build up a schedule deficit during
those tasks.  Was that deliberate?

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

37 matches

Mail list logo