Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Andres Freund
On 2014-10-14 17:53:10 -0400, Robert Haas wrote:
 On Tue, Oct 14, 2014 at 4:42 PM, Andres Freund and...@2ndquadrant.com wrote:
  The code in CHashSearch shows the problem there: you need to STORE the
  hazard pointer before you begin to do the LOAD operations to scan the
  bucket, and you must finish all of those LOADs before you STORE the
  NULL hazard pointer.  A read or write barrier won't provide store/load
  or load/store ordering.
 
  I'm not sure I understand the problem here - but a read + write barrier
  should suffice? To avoid falling back to two full barriers, we'd need a
  separate define pg_read_write_barrier(), but that's not a problem. IIUC
  that should allow us to avoid emitting any actual code on x86.
 
 Well, thinking about x86 specifically, it definitely needs at least
 one mfence, after setting the hazard pointer and before beginning to
 read the bucket chain.

Yes, I can see that now. I do wonder if that's not going to prove quite
expensive... I think I can see some ways around needing that. But I
think that needs some benchmarking first - no need to build a even more
complex solution if not needed.

The solution I'm thinking of is to essentially move away from hazard
pointers and store something like a generation counter per
backend. Which is updated less often, and in combination with other
operations. When a backend need to clean up and sees that there's a
straggler with a old generation it sends that backend a signal to ensure
it sets the latest generation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better support of exported snapshots with pg_dump

2014-10-15 Thread Michael Paquier
On Wed, Oct 15, 2014 at 2:46 PM, Andres Freund and...@2ndquadrant.com
wrote:

  This seems more user-friendly. But well I agree that we could do
  a larger set of things that could be used for even other purposes:
  - Ability to define snapshot name with pg_dump
  - Take system or database-wide lock
  - Extra client application running the whole
  Now is this set of features worth doing knowing that export snapshot has
  been designed for multi-threaded closed applications? Not much sure.
  Regards,

 What do you mean with designed for multi-threaded closed applications?

External snapshots creation and control should be localized within
dedicated client applications only. At least that's what I understand from
it as that's how it is used now.
-- 
Michael


Re: [HACKERS] Scaling shared buffer eviction

2014-10-15 Thread Amit Kapila
On Tue, Oct 14, 2014 at 3:32 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-10-14 15:24:57 +0530, Amit Kapila wrote:
  After that I observed that contention for LW_SHARED has reduced
  for this load, but it didn't help much in terms of performance, so I
again
  rechecked the profile and this time most of the contention is moved
  to spinlock used in dynahash for buf mapping tables, please refer
  the profile (for 128 client count; Read only load) below:
 
  bgreclaimer patch + wait free lw_shared acquisition patches -
 
--

 This profile is without -O2 again. I really don't think it makes much
 sense to draw much inference from an unoptimized build.

Profile data with -O2 is below.  This shows that top
contributors are calls for BufTableLookup and spin lock caused
by BufTableInsert and BufTableDelete.  To resolve spin lock
contention, patch like above might prove to be useful (although
I have to still evaluate the same).  I would like to once take
LWLOCK_STATS data as well before proceeding further.
Do you have any other ideas?

   11.17%  swapper  [unknown]   [H] 0x011e0328

+   4.62% postgres  postgres[.]
hash_search_with_hash_value
+   4.35%  pgbench  [kernel.kallsyms]   [k] .find_busiest_group

+   3.71% postgres  postgres[.] s_lock

2.56% postgres  [unknown]   [H] 0x01500120

+   2.23%  pgbench  [kernel.kallsyms]   [k] .idle_cpu

+   1.97% postgres  postgres[.] LWLockAttemptLock

+   1.73% postgres  postgres[.] LWLockRelease

+   1.47% postgres  [kernel.kallsyms]   [k]
.__copy_tofrom_user_power7
+   1.44% postgres  postgres[.] GetSnapshotData

+   1.28% postgres  postgres[.] _bt_compare

+   1.04%  swapper  [kernel.kallsyms]   [k] .int_sqrt

+   1.04% postgres  postgres[.] AllocSetAlloc

+   0.97%  pgbench  [kernel.kallsyms]   [k] .default_wake_function





Detailed Data

-   4.62% postgres  postgres[.]
hash_search_with_hash_value
   - hash_search_with_hash_value

  - 2.19% BufTableLookup

 - 2.15% BufTableLookup

  ReadBuffer_common

- ReadBufferExtended

   - 1.32% _bt_relandgetbuf


 - 0.73% BufTableDelete

 - 0.71% BufTableDelete

  ReadBuffer_common

  ReadBufferExtended

  - 0.69% BufTableInsert

 - 0.68% BufTableInsert

  ReadBuffer_common

  ReadBufferExtended

0.66% hash_search_with_hash_value

-   4.35%  pgbench  [kernel.kallsyms]   [k] .find_busiest_group

   - .find_busiest_group

  - 4.28% .find_busiest_group

 - 4.26% .load_balance

- 4.26% .idle_balance

   - .__schedule

  - 4.26% .schedule_hrtimeout_range_clock

   .do_select

   .core_sys_select

-   3.71% postgres  postgres[.] s_lock

   - s_lock

  - 3.19% hash_search_with_hash_value

 - 3.18% hash_search_with_hash_value

- 1.60% BufTableInsert

 ReadBuffer_common

   - ReadBufferExtended

- 1.57% BufTableDelete

 ReadBuffer_common

   - ReadBufferExtended

  - 0.93% index_fetch_heap



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Simon Riggs
On 14 October 2014 17:08, Lucas Lersch lucasler...@gmail.com wrote:

 Unfortunately, in the generated trace with over 2 million buffer requests,
 only ~14k different pages are being accessed, out of the 800k of the whole
 database. Am I missing something here?

We can't tell what you're doing just by knowing the number of unique
items in your list.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
Hi All,

Please find attached a patch which implements support for UPDATE table1
SET(*)=...
The patch supports both UPDATE table SET(*)=(a,b,c) and UPDATE table1
SET(*)=(SELECT a,b,c FROM...).
It solves the problem of doing UPDATE from a record variable of the same
type as the table e.g. update foo set (*) = (select foorec.*) where ...;

The design is simple. It basically expands the * in transformation stage,
does the necessary type checking and adds it to the parse tree. This allows
for normal execution for the rest of the stages.


Thoughts/Comments?

Regards,

Atri


updatestar_ver1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 14 October 2014 17:43, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Oct 11, 2014 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 As soon as you issue the above query, you have clearly indicated your
 intention to steal. Receiving information is no longer accidental, it
 is an explicit act that is logged in the auditing system against your
 name. This is sufficient to bury you in court and it is now a real
 deterrent. Redaction has worked.

 To me, this feels thin.  It's true that this might be good enough for
 some users, but I wouldn't bet on it being good enough for very many
 users, and I really hope there's a better option.  We have an existing
 method of doing data redaction via security barrier views.

I agree with thin. There is a leak in the design, so let me coin the
phrase imprecise security. Of course, the leaks reduce the value of
such a feature; they just don't reduce it all the way to zero.

Security barrier views or views of any kind don't do the required job.

We are not able to easily classify people as Trusted or Untrusted.

We're seeking to differentiate between the right to use a column for
queries and the right to see the value itself. Or put another way, you
can read the book, you just can't photocopy it and take the copy home.
Or, you can try on the new clothes to see if they fit, but you can't
take them home for free. Both of those examples have imprecise
security measures in place to control and reduce negative behaviours
and in every other industry this is known as security.

In IT terms, we're looking at controlling and reducing improper access
to data by an otherwise Trusted person. The only problem is that some
actions on data items are allowed, others are not.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 05:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 For parallelism, I think we need a concept of group locking.  That is,
 suppose we have a user backend and N worker backends collaborating to
 execute some query.  For the sake of argument, let's say it's a
 parallel CLUSTER, requiring a full table lock.  We need all of the
 backends to be able to lock the table even though at least one of them
 holds AccessExclusiveLock.  This suggests that the backends should all
 be members of a locking group, and that locks within the same locking
 group should be regarded as mutually non-conflicting.

 In the background worker case, I imagined that the foreground process
 would hold a lock and the background processes would just assume they
 could access the table without holding locks of their own.  Aren't
 you building a mountain where a molehill would do?

Yeh. Locks should be made in the name of the main transaction and
released by it.

When my family goes to a restaurant, any member of the party may ask
for a table and the request is granted for the whole family. But the
lock is released only when I pay the bill. Once we have the table, any
stragglers know we have locked the table and they just come sit at the
table without needing to make their own lock request to the Maitre D',
though they clearly cache the knowledge that we have the table locked.

So all lock requests held until EOX should be made in the name of the
top level process. Any child process wanting a lock should request it,
but on discovering it is already held at parent level should just
update the local lock table. Transient locks, like catalog locks can
be made and released locally; I think there is more detail there but
it shouldn't affect the generalisation.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Marti Raudsepp
Hi

On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma atri.j...@gmail.com wrote:
 Please find attached a patch which implements support for UPDATE table1
 SET(*)=...

I presume you haven't read Tom Lane's proposal and discussion about
multiple column assignment in UPDATE:
http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
(Assigning all columns was also discussed there)

And there's a WIP patch:
http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
On Wednesday, October 15, 2014, Marti Raudsepp ma...@juffo.org wrote:

 Hi

 On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma atri.j...@gmail.com
 javascript:; wrote:
  Please find attached a patch which implements support for UPDATE table1
  SET(*)=...

 I presume you haven't read Tom Lane's proposal and discussion about
 multiple column assignment in UPDATE:
 http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
 (Assigning all columns was also discussed there)

 And there's a WIP patch:
 http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us



Thanks for the links, but this patch only targets SET(*) case, which, if I
understand correctly, the patch you mentioned doesn't directly handle (If I
understand correctly, the target of the two patches is different).

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
On Wed, Oct 15, 2014 at 2:18 PM, Atri Sharma atri.j...@gmail.com wrote:





 On Wednesday, October 15, 2014, Marti Raudsepp ma...@juffo.org wrote:

 Hi

 On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma atri.j...@gmail.com
 wrote:
  Please find attached a patch which implements support for UPDATE table1
  SET(*)=...

 I presume you haven't read Tom Lane's proposal and discussion about
 multiple column assignment in UPDATE:
 http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
 (Assigning all columns was also discussed there)

 And there's a WIP patch:
 http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us



 Thanks for the links, but this patch only targets SET(*) case, which, if I
 understand correctly, the patch you mentioned doesn't directly handle (If I
 understand correctly, the target of the two patches is different).


Digging more, I figured that the patch I posted builds on top of Tom's
patch,  since it did not add whole row cases.

Regards,

Atri


Re: [HACKERS] proposal: plpgsql - Assert statement

2014-10-15 Thread Ali Akbar
2014-09-30 10:04 GMT+07:00 Jim Nasby j...@nasby.net:

 On 9/17/14, 7:40 PM, Jan Wieck wrote:

 Exactly. Doing something like

  ASSERT (select count(*) from foo
  where fk not in (select pk from bar)) = 0;

 is a perfectly fine, arbitrary boolean expression. It will probably work
 well in a development environment too. And I am very sure that it will not
 scale well once that code gets deployed. And I know how DBAs react to the
 guaranteed following performance problem. They will disable ALL assert ...
 or was there some sort of assert class system proposed that I missed?


Actually, compared with for example Java or C, in production systems,
usually all asserts are disabled for performance (in java removed by JIT,
in C we define NDEBUG).


  We're also putting too much weight on the term assert here. C-style
 asserts are generally not nearly as useful in a database as general
 sanity-checking or error handling, especially if you're trying to use the
 database to enforce data sanity.


+1.
without any query capability, assert will become much less useful. If we
cannot query in assert, we will code:

-- perform some query
ASSERT WHEN some_check_on_query_result;

.. and disabling the query in production system will become another trouble.

My wish-list for asserts is:

 - Works at a SQL level
 - Unique/clear way to identify asserts (so you're not guessing where the
 assert came from)

+1


 - Allows for over-riding individual asserts (so if you need to do
 something you're not supposed to do you still have the protection of all
 other asserts)
 - Less verbose than IF THEN RAISE END IF

+1

-- 
Ali Akbar


Re: [HACKERS] Patch to support SEMI and ANTI join removal

2014-10-15 Thread David Rowley
On Thu, Oct 9, 2014 at 12:40 AM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-10-09 00:21:44 +1300, David Rowley wrote:
  Ok, so I've been hacking away at this for a couple of evenings and I
 think
  I have a working prototype finally!

 Cool!


Patch attached.


  So it seems it's not quite as efficient as join removal at planning time,
  but still a big win when it's possible to perform the join skipping.

 Have you checked where the overhead is? Is it really just the additional
 node that the tuples are passed through?


I've not checked this yet, but I'd assume that it has to be from the extra
node. I'll run some profiles soon.


 Have you measured the overhead of the plan/execution time checks over
 master?


I did a bit of benchmarking last night, but this was mostly for testing
that I've not added too much overhead on the nest loop code.
For the merge and hashjoin code I've managed to keep the special skipping
code in the EXEC_MJ_INITIALIZE_OUTER and HJ_BUILD_HASHTABLE part of the
main switch statement, so the extra checks should only be performed on the
first call of the node when skips are not possible. For nested loop I can't
see any other way but to pay the small price of setting the skipflags and
checking if there are any skip flags on every call to the node.

I tested the overhead of this on my laptop by creating 2 tables with 1
million rows each, joining them on an INT column, where each value of the
int column was unique. I seem to have added about a 2% overhead to this. :(
Which I was quite surprised at, giving it's just 101 million extra
settings if the skipflags and checking that the skip flags are not empty,
but perhaps the extra local variable is causing something else to not make
it into a register.   At the moment I can't quite see another way to do it,
but I guess it may not be the end of the world as the chances of having to
perform a nest loop join on 2 tables of that size is probably not that
high.

Test case:
create table t3 (id int primary key);
create table t2 (id int primary key, t3_id int not null references t3);
create table t1 (id int primary key, t2_id int not null references t2);
insert into t3 select x.x from generate_series(1,100) x(x);
insert into t2 select x.x,x.x from generate_series(1,100) x(x);
insert into t1 select x.x,x.x from generate_series(1,100) x(x);
vacuum;
set enable_hashjoin = off;
set enable_mergejoin = off;

select count(*) from t1 inner join t2 on t1.id=t2.id;

Unpatched:
duration: 120 s
number of transactions actually processed: 45
latency average: 2666.667 ms
tps = 0.371901 (including connections establishing)
tps = 0.371965 (excluding connections establishing)

Patched:
Master
duration: 120 s
number of transactions actually processed: 44
latency average: 2727.273 ms
tps = 0.363933 (including connections establishing)
tps = 0.363987 (excluding connections establishing)

102.19%

Of course if we do the join on the column that has the foreign key, then
this is much faster.

test=# select count(*) from t1 inner join t2 on t1.t2_id=t2.id;
  count
-
 100
(1 row)


Time: 105.206 ms

The explain analyze from the above query looks like:
test=# explain (analyze, costs off, timing off) select count(*) from t1
inner join t2 on t1.t2_id=t2.id;
QUERY PLAN
--
 Aggregate (actual rows=1 loops=1)
   -  Nested Loop (actual rows=100 loops=1)
 -  Seq Scan on t1 (actual rows=100 loops=1)
 -  Index Only Scan using t2_pkey on t2 (never executed)
   Index Cond: (id = t1.t2_id)
   Heap Fetches: 0
 Execution time: 124.990 ms
(7 rows)

As you can see the scan on t2 never occurred.

I've so far only managed to come up with 1 useful regression test for this
new code. It's not possible to tell if the removal has taken place at plan
time, as the plan looks the same as if it didn't get removed. The only way
to tell us from the explain analyze, but the problem is that the execution
time is shown and can't be removed. Perhaps I should modify the explain to
tag join nodes that the planner has marked to say skipping may be possible?
But this is not really ideal as it's only the join nodes that know about
skipping and they just don't bother executing the child nodes, so it's
really up to the executor to decide which child nodes don't get called, so
to add something to explain might require making it more smart than it
needs to be.

Right now I'm not quite sure if I should modify any costs. Quite possibly
hash joins could have the costing reduced a little to try to encourage
hashjoins over merge joins, as with merge joins we can't skip sort
operations, but with hash joins we can skip the hash table build.

Regards

David Rowley


inner_join_removals_2014-10-15_0f3f1ea.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

[HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Amit Kapila
I have observed that for renaming some of the objects
AccessExclusiveLock is taken on object whereas for
other kind of objects no lock is taken on object before
renaming the object.

The object's that are renamed via AlterObjectRename_internal()
takes the lock (during get_object_address() call) whereas for
other objects, there is no lock.  I think there is exception to it
i.e for Rename table, it also takes lock.  Refer below function:


ExecRenameStmt()
{

..
..
case OBJECT_AGGREGATE:
case OBJECT_COLLATION:
case OBJECT_CONVERSION:
case OBJECT_EVENT_TRIGGER:
case OBJECT_FDW:
...

address = get_object_address(stmt-renameType,
stmt-object, stmt-objarg,
relation,
AccessExclusiveLock, false);

Assert(relation == NULL);

catalog = heap_open(address.classId, RowExclusiveLock);

AlterObjectRename_internal(catalog,
  address.objectId,
  stmt-newname);
..

}

Is there a reason for different locking strategy?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Pavel Stehule
2014-10-13 17:22 GMT+02:00 Andrew Dunstan and...@dunslane.net:


 On 10/13/2014 09:37 AM, Andrew Dunstan wrote:


 On 09/26/2014 04:54 PM, Andrew Dunstan wrote:


 Here is a patch for the generator and aggregate functions for jsonb that
 we didn't manage to get done in time for 9.4. They are all equivalents of
 the similarly names json functions. Included are

 to_jsonb
 jsonb_build_object
 jsonb_build_array
 jsonb_object
 jsonb_agg
 jsonb_object_agg


 Still to come: documentation.

 Adding to the next commitfest.



 Revised patch to fix compiler warnings.


 And again, initializing an incompletely initialized variable, as found by
 Pavel Stehule.


I checked a code, and I have only two small objection - a name
jsonb_object_two_arg is not good - maybe json_object_keys_values ?

Next: there are no tests for to_jsonb function.

Regards

Pavel



 cheers

 andrew


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
Sorry for taking so long to answer. I am sending attached the patch with
the changes I did to pgsql code. I followed the steps for compiling and
installing pgsql from:
http://www.postgresql.org/docs/current/static/install-short.html

In summary, the page_id of the page being released in ReleaseBuffer() and
ReleaseAndReadBuffer() is written to the file: /usr/loca/pgsql/data/trace.
This file is created manually.

I have also created a PrivateDirtyFlag for each backend, in analogy to the
PrivateRefCount. I use this to keep track if the current backend performed
an update operation in a page in the buffer pool or simply a read operation
(it is not relevant now). The trace file consists of one line for each
ReleaseBuffer() or ReleaseAndReadBuffer() call. The line has the format:

operation,tblSpace,dbNode,relNode,blockNumber

Once the trace file is complete after the execution of the tpcc benchmark,
I use the following bash script to get only unique pages:

cut -d',' -f2-5 trace | sort -n -t',' -k1 -k2 -k3 -k4 | uniq

Today I  realized that I was making a mistake in executing the
oltpbenchmark application. From the 64 warehouses created for tpcc, only 1
was being accessed (the 14k distinct pages that I mentioned). I increased
the terminal option of the tpcc benchmark from 1 to 64, resulting in one
terminal for each warehouse.

This provided me with a higher number of distinct pages being
accessed. Unfortunately, from the 800k pages in the database (64
warehouses), executing tpcc for 10min resulted in 400k distinct pages being
accessed. This number is much better than the previous results, but I think
it is still not realistic.

I would like to thank you guys for all the attention given to my problem :)


On Wed, Oct 15, 2014 at 9:49 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On 14 October 2014 17:08, Lucas Lersch lucasler...@gmail.com wrote:

  Unfortunately, in the generated trace with over 2 million buffer
 requests,
  only ~14k different pages are being accessed, out of the 800k of the
 whole
  database. Am I missing something here?

 We can't tell what you're doing just by knowing the number of unique
 items in your list.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
Lucas Lersch
diff -rupN ./postgresql-9.3.5_original/config.log ./postgresql-9.3.5_trace/config.log
--- ./postgresql-9.3.5_original/config.log	2014-10-15 10:30:11.552923099 +0200
+++ ./postgresql-9.3.5_trace/config.log	2014-10-15 10:35:45.786580479 +0200
@@ -349,7 +349,7 @@ configure:7734: $? = 0
 configure:7755: result: yes
 configure:7766: checking for library containing setproctitle
 configure:7807: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -lm  5
-/tmp/ccf4os1l.o: In function `main':
+/tmp/ccMWb4Lv.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `setproctitle'
 collect2: ld returned 1 exit status
 configure:7814: $? = 1
@@ -389,7 +389,7 @@ configure: failed program was:
 |   return 0;
 | }
 configure:7807: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -lutil  -lm  5
-/tmp/ccSriijn.o: In function `main':
+/tmp/ccxsb80w.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `setproctitle'
 collect2: ld returned 1 exit status
 configure:7814: $? = 1
@@ -431,7 +431,7 @@ configure: failed program was:
 configure:7845: result: no
 configure:7853: checking for library containing dlopen
 configure:7894: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -lm  5
-/tmp/ccZgGSzt.o: In function `main':
+/tmp/ccBvyipD.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `dlopen'
 collect2: ld returned 1 exit status
 configure:7901: $? = 1
@@ -479,7 +479,7 @@ configure:7988: $? = 0
 configure:8019: result: none required
 configure:8027: checking for library containing shl_load
 configure:8068: gcc -o conftest -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard  -D_GNU_SOURCEconftest.c -ldl -lm  5
-/tmp/cc2JE0qC.o: In function `main':
+/tmp/ccSVMb9F.o: In function `main':
 conftest.c:(.text.startup+0x7): undefined reference to `shl_load'
 collect2: ld returned 1 exit status
 configure:8075: $? = 1
@@ -564,7 +564,7 @@ configure:8254: $? = 0
 configure:8285: result: none required
 

Re: [HACKERS] [9.4 bug] The database server hangs with write-heavy workload on Windows

2014-10-15 Thread MauMau

From: MauMau maumau...@gmail.com
Thank you very much.  I didn't anticipate such a difficult complicated 
cause.  The user agreed to try the patch tonight.  I'll report back the 
result as soon as I got it from him.


The test ran successfully without hang for 24 hours.  It was run with your 
patch + the following:


BTW, in LWLockWaitForVar(), the first line of the following code fragment 
is not necessary, because lwWaitLink is set to head immediately.  I think 
it would be good to eliminate as much unnecessary code as possible from 
the spinlock section.


 proc-lwWaitLink = NULL;

 /* waiters are added to the front of the queue */
 proc-lwWaitLink = lock-head;



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: Access method extendability

2014-10-15 Thread Alexander Korotkov
Hackers,

Postgres was initially designed to support access methods extendability.
This extendability lives to present day. However, this is mostly internal
in-core extendability. One can quite easily add new access method into
PostgreSQL core. But if one try to implement access method as external
module, he will be faced with following difficulties:

   1. Need to directly insert into pg_am, because of no CREATE ACCESS
   METHOD command. And no support of dependencies between am and opclasses
   etc.
   2. Module can't define xlog records. So, new am would be not WAL-logged.

The first problem is purely mechanical. Nothing prevents us to implement
CREATE ACCESS METHOD and DROP ACCESS METHOD commands and support all
required dependencies.

Problem of WAL is a bit more complex. According to previous discussions, we
don't want to let extensions declare their own xlog records. If we let them
then recovery process will depend on extensions. That is much violates
reliability. Solution is to implement some generic xlog record which is
able to represent difference between blocks in some general manner.

3 patches are attached:

   1. CREATE/DROP ACCESS METHOD commands. With support of dependencies.
   2. New generic xlog record type.
   3. Bloom contrib module as example of usage of previous two features.
   This module was posted few years ago by Teodor Sigaev. Now, it's wrapped as
   an extension and WAL-logged.

Patches are in WIP state. No documentation and very little of comments.
However, it believe that it's enough to do some general concept review.

Some notes about generic xlog format. Generic xlog represent difference
between pages as operations set of two kinds:

   1. Move memory inside the page. Optional flag is to zero gap on a
   previous memory location.
   2. Copy memory fragment of memory from xlog record to page. As an option
   bitwise logical operations are supported as well as plain copy.

Generic xlog can open page in two modes:

   1. Create mode: page is zeroed independent on its lsn.
   2. Update mode: page is updated only if it's lsn is lower than record lsn

Usually, xlog record is filled in critical sections when memory allocations
is prohibited. Thus, user have to previously initialize it with knowledge
of pages count, total operations count and total length of data.

--
With best regards,
Alexander Korotkov.


create-am.1.patch.gz
Description: GNU Zip compressed data


generic-xlog.1.patch.gz
Description: GNU Zip compressed data


bloom-contrib.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Simon Riggs
On 15 October 2014 12:49, Lucas Lersch lucasler...@gmail.com wrote:
 Sorry for taking so long to answer. I am sending attached the patch with the
 changes I did to pgsql code. I followed the steps for compiling and
 installing pgsql from:
 http://www.postgresql.org/docs/current/static/install-short.html

Are you recording the bufferid or the blockid?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-15 Thread Michael Meskes
On 15.10.2014 00:26, Tom Lane wrote:
 * I've not touched ecpg except for cosmetic changes to keep the struct
 definitions in sync, and to fix the previously-mentioned bogus free()
 attempt.  I doubt that it would be worth teaching ecpg how to access the
 zic timezone database --- the problem of configuring where to find those
 files seems like more trouble than it's worth given the lack of
 complaints.  I'm not sure what we should do about the obsolete timezone
 abbreviations in its table.

Maybe we should just remove thme for the new release. Yes, that might
break some applications, but then the server doesn't know these either,
so the applications might break anyway.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
I am recording the BufferDesc.tag.blockNum for the buffer along with the
spcNode, dbNode, relNode, also present in the tag.

On Wed, Oct 15, 2014 at 2:27 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 15 October 2014 12:49, Lucas Lersch lucasler...@gmail.com wrote:
  Sorry for taking so long to answer. I am sending attached the patch with
 the
  changes I did to pgsql code. I followed the steps for compiling and
  installing pgsql from:
  http://www.postgresql.org/docs/current/static/install-short.html

 Are you recording the bufferid or the blockid?

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
Lucas Lersch


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Simon Riggs
On 15 October 2014 13:44, Lucas Lersch lucasler...@gmail.com wrote:

 I am recording the BufferDesc.tag.blockNum for the buffer along with the
 spcNode, dbNode, relNode, also present in the tag.

The TPC-C I/O is random, so if you run it for longer you should see a wider set.

Cacheing isn't possible as a way to improve txn rates.

Check that you're touching all tables.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
So is it a possible normal behavior that running tpcc for 10min only access
50% of the database? Furthermore, is there a guideline of parameters for
tpcc (# of warehouses, execution time, operations weight)?

On Wed, Oct 15, 2014 at 3:09 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 15 October 2014 13:44, Lucas Lersch lucasler...@gmail.com wrote:

  I am recording the BufferDesc.tag.blockNum for the buffer along with the
  spcNode, dbNode, relNode, also present in the tag.

 The TPC-C I/O is random, so if you run it for longer you should see a
 wider set.

 Cacheing isn't possible as a way to improve txn rates.

 Check that you're touching all tables.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
Lucas Lersch


Re: [HACKERS] narwhal and PGDLLIMPORT

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 01:53 AM, Craig Ringer wrote:

On 10/15/2014 12:53 PM, Noah Misch wrote:

Windows Server 2003 isn't even EOL yet.  I'd welcome a buildfarm member with
that OS and a modern toolchain.

It's possible to run multiple buildfarm animals on a single Windows
instance, each with a different toolchain.



Indeed, and even using the same buildroot. That's been built into the 
buildfarm for a long time. For example, nightjar and friarbird do this.




There's the chance of interactions that can't occur if each SDK is used
in isolation on its own machine, but it should be pretty minimal.



Make sure each has a distinct base_port.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 4:18 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 05:13, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 For parallelism, I think we need a concept of group locking.  That is,
 suppose we have a user backend and N worker backends collaborating to
 execute some query.  For the sake of argument, let's say it's a
 parallel CLUSTER, requiring a full table lock.  We need all of the
 backends to be able to lock the table even though at least one of them
 holds AccessExclusiveLock.  This suggests that the backends should all
 be members of a locking group, and that locks within the same locking
 group should be regarded as mutually non-conflicting.

 In the background worker case, I imagined that the foreground process
 would hold a lock and the background processes would just assume they
 could access the table without holding locks of their own.  Aren't
 you building a mountain where a molehill would do?

 Yeh. Locks should be made in the name of the main transaction and
 released by it.

 When my family goes to a restaurant, any member of the party may ask
 for a table and the request is granted for the whole family. But the
 lock is released only when I pay the bill. Once we have the table, any
 stragglers know we have locked the table and they just come sit at the
 table without needing to make their own lock request to the Maitre D',
 though they clearly cache the knowledge that we have the table locked.

 So all lock requests held until EOX should be made in the name of the
 top level process. Any child process wanting a lock should request it,
 but on discovering it is already held at parent level should just
 update the local lock table. Transient locks, like catalog locks can
 be made and released locally; I think there is more detail there but
 it shouldn't affect the generalisation.

Hmm, interesting idea.  Suppose, though, that the child process
requests a lock that can't immediately be granted, because the catalog
it's trying to access is locked in AccessExclusiveLock mode by an
unrelated transaction.  The unrelated transaction, in turn, is blocked
trying to acquire some resource, which the top level parallelism
process.  Assuming the top level parallelism process is waiting for
the child (or will eventually wait), this is a deadlock, but without
some modification to the deadlock detector, it can't see one of the
edges.

Figuring out what to do about that is really the heart of this
project, I think, and there are a variety of designs possible.  One of
the early ideas that I had was to the parallel workers directly
twaddle the main processes' PGPROC and lock table state.  In other
words, instead of taking locks using their own PGPROCs, everybody uses
a single PGPROC.  I made several attempts at getting designs along
these lines off the ground, but it got complicated and invasive: (1)
The processes need to coordinate to make sure that you don't have two
people twaddling the lock state at the same time; (2) The existing
data structures won't support more than one process waiting at a time,
but there's no reason why one parallel worker couldn't be trying to
lock one catalog while another one is trying to lock a different
catalog; (3) On a related note, when a lock wait ends, you can't just
wake up the process that owns the PGPROC, but rather the one that's
actually waiting; (4) the LWLockReleaseAll algorithm just falls apart
in this environment, as far as I can see.

The alternative design which I've been experimenting with is to have
each process use its own PGPROC and PROCLOCK structures, but to tag
each PROCLOCK with not only the owning PGPROC but also the group
leader's PGPROC.  This has not been entirely smooth sailing, but it
sees to break much less code than trying to have everybody use one
PGPROC.  Most of the changes that seem to be needed to make things
work are pretty well-isolated; rather than totally rearranging the
lock manager, you're just adding extra code that runs only in the
parallel case.

I'm definitely open to the idea that there's a better, simpler design
out there, but I haven't been able to think of one that doesn't break
deadlock detection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Stephen Frost
* Lucas Lersch (lucasler...@gmail.com) wrote:
 So is it a possible normal behavior that running tpcc for 10min only access
 50% of the database? Furthermore, is there a guideline of parameters for
 tpcc (# of warehouses, execution time, operations weight)?

Depends- you may be aware that we support index-only scans in certain
situations.  This means that only the index page for a given relation
(and the visibility map) are accessed, and the heap is not.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-15 Thread Tom Lane
Michael Meskes mes...@postgresql.org writes:
 On 15.10.2014 00:26, Tom Lane wrote:
 * I've not touched ecpg except for cosmetic changes to keep the struct
 definitions in sync, and to fix the previously-mentioned bogus free()
 attempt.  I doubt that it would be worth teaching ecpg how to access the
 zic timezone database --- the problem of configuring where to find those
 files seems like more trouble than it's worth given the lack of
 complaints.  I'm not sure what we should do about the obsolete timezone
 abbreviations in its table.

 Maybe we should just remove thme for the new release. Yes, that might
 break some applications, but then the server doesn't know these either,
 so the applications might break anyway.

The same thought had occurred to me.  Probably the main use of the
datetime parsing code in ecpg is for interpreting outputs from the
server, and (at least by default) the server doesn't use timezone
abbreviations when printing timestamps.  So maybe that's largely
dead code anyhow.  I would not propose back-patching such a change,
but we could try it in 9.5 and see if anyone complains.

A less drastic remedy would be to remove just those abbreviations
whose meaning has actually changed over time.  Eventually that
might be all of them ... but in the meantime, we could at least
argue that we weren't breaking any case that worked well before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale

2014-10-15 Thread Kouhei Kaigai
  On 10/03/2014 07:08 AM, Kouhei Kaigai wrote:
   Hello,
  
   I recently got a trouble on development of my extension that
   utilizes the shared buffer when it released each buffer page.
  
   This extension transfers contents of the shared buffers to GPU
   device using DMA feature, then kicks a device kernel code.
 
  Wow, that sounds crazy.
 
   Once backend/extension calls ReadBuffer(), resowner.c tracks which
   buffer was referenced by the current resource owner, to ensure these
   buffers being released at end of the transaction.
   However, it seems to me implementation of resowner.c didn't assume
   many buffers are referenced by a particular resource owner
 simultaneously.
   It manages the buffer index using an expandable array, then looks up
   the target buffer by sequential walk but from the tail because
   recently pinned buffer tends to be released first.
   It made a trouble in my case. My extension pinned multiple thousands
   buffers, so owner-buffers[] were enlarged and takes expensive cost
   to walk on.
   In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in
   total during hash-joining 2M rows; even though hash-joining itself
   takes less than 16 seconds.
  
   What is the best way to solve the problem?
 
  How about creating a separate ResourceOwner for these buffer pins, and
  doing a wholesale ResourceOwnerRelease() on it when you're done?
 
 Let me clarify your idea.
 
 1. Create a separate ResourceOwner under the CurrentResourceOwner.
 2. Switch CurrentResourceOwner to the self-constructed resource owner
during ReadBuffer() on thousands buffers.
 3. Switch back to the original CurrentResourceOwner.
 4. Kick DMA and run GPU device kernel (actual job running...) 5. Switch
 CurrentResourceOwner to the self-constructed resource owner
again, during ReleaseBuffer() in reverse order.
 6. Switch back to the original CurrentResourceOwner, then calls
ResourceOwnerDelete().
 
 Hmm... at this moment, I cannot find something not easy to implement.
 I'd like to try this idea, then report it later.
 
Let me share the result.

The above approach could eliminated my problem. Individual resource-owner
for each set of shared-buffer and ReleaseBuffer() in reverse order reduced
time to release pinned buffers from 36sec-67.3msec when I run hash-joining
on 20M records; that involves 59168 buffers are pinned concurrently in
maximum (32 asynchronous requests, a request packs 1849 buffers).

Thanks for the suggestion!


postgres=# explain analyze select * from t0 natural join t1 natural join t2;
INFO:  time to release buffers: 67.289ms

QUERY PLAN

---
Custom (GpuHashJoin)  (cost=3468.00..471640.11 rows=19740924 width=139) (actual 
time=193.086..5913.459 rows=2000 loops=1)
   pseudo scan tlist: 1:(t0.bid), 2:(t0.aid), 3:t0.id, 4:t0.cat, 
5:t0.cid, 6:t0.did, 7:t0.x, 8:t0.y, 9:t0.z, 11:t2.btext, 
12:[t2.bid], 10:t1.atext, 13:[t1.aid]
   hash clause 1: (t0.bid = t2.bid)
   hash clause 2: (t0.aid = t1.aid)
   -  Custom (GpuScan) on t0  (cost=500.00..467167.24 rows=2024 width=73) 
(actual time=7.757..1056.426 rows=2000 loops=1)
   -  Custom (MultiHash)  (cost=734.00..734.00 rows=4 width=37) (actual 
time=23.382..23.382 rows=4 loops=1)
 hash keys: bid
 -  Seq Scan on t2  (cost=0.00..734.00 rows=4 width=37) (actual 
time=0.007..5.124 rows=4 loops=1)
 -  Custom (MultiHash)  (cost=734.00..734.00 rows=4 width=37) 
(actual time=11.919..11.919 rows=4 loops=1)
   hash keys: aid
   -  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=37) 
(actual time=0.010..5.299 rows=4 loops=1)
 Planning time: 0.904 ms
 Execution time: 6667.986 ms
(13 rows)

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Tom Lane
Amit Kapila amit.kapil...@gmail.com writes:
 I have observed that for renaming some of the objects
 AccessExclusiveLock is taken on object whereas for
 other kind of objects no lock is taken on object before
 renaming the object.

The usual theory for DDL updates of all types (not just rename)
is that an explicit lock is only needed for objects whose catalog
representation comprises more than one row.  Otherwise, the implicit
locking involved in updating that row is sufficient to serialize
different updates.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 14:46, Robert Haas robertmh...@gmail.com wrote:

 When my family goes to a restaurant, any member of the party may ask
 for a table and the request is granted for the whole family. But the
 lock is released only when I pay the bill. Once we have the table, any
 stragglers know we have locked the table and they just come sit at the
 table without needing to make their own lock request to the Maitre D',
 though they clearly cache the knowledge that we have the table locked.

 Hmm, interesting idea.  Suppose, though, that the child process
 requests a lock that can't immediately be granted, because the catalog
 it's trying to access is locked in AccessExclusiveLock mode by an
 unrelated transaction.  The unrelated transaction, in turn, is blocked
 trying to acquire some resource, which the top level parallelism
 process.  Assuming the top level parallelism process is waiting for
 the child (or will eventually wait), this is a deadlock, but without
 some modification to the deadlock detector, it can't see one of the
 edges.

Family disputes are fairly easily resolved ;-)

The first and basic point is that in most cases the parent should
already hold the required locks. This can only happen for briefly held
locks and/or more complex stuff. In the first case, getting
parallelism to work without that complex stuff would be useful. I'd be
happy if the first version simply throws an error if a child can't
acquire a lock immediately. Don't overthink the first version. Knowing
you'll disagree, lets take a further step...

Second point, the relationship between parent and children is clear.
If we do a deadlock detection, we should be able to search for that as
a special case, since we will know that we are a child and that such a
situation might occur. So just add in an edge so the rest of the
deadlock code works fine.

If that doesn't work, use a heurisic. If parent is waiting when child
does deadlock test, assume its a deadlock and abort the child
speculatively just in case. You can work out how to do that better in
the future, since it won't happen that often.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : REINDEX SCHEMA

2014-10-15 Thread Sawada Masahiko
On Mon, Oct 13, 2014 at 11:16 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Oct 12, 2014 at 1:27 PM, Stephen Frost sfr...@snowman.net wrote:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Sawada Masahiko wrote:
  Attached WIP patch adds new syntax REINEX SCHEMA which does reindexing
  all table of specified schema.
  There are syntax dose reindexing specified index, per table and per 
  database,
  but we can not do reindexing per schema for now.

 It seems doubtful that there really is much use for this feature, but if
 there is, I think a better syntax precedent is the new ALTER TABLE ALL
 IN TABLESPACE thingy, rather than your proposed REINDEX SCHEMA.
 Something like REINDEX TABLE ALL IN SCHEMA perhaps.

 Yeah, I tend to agree that we should be looking at the 'ALL IN
 TABLESPACE' and 'ALL IN SCHEMA' type of commands to keep things
 consistent.  This might be an alternative for the vacuum / analyze /
 reindex database commands also..

 Urgh.  I don't have a problem with that syntax in general, but it
 clashes pretty awfully with what we're already doing for REINDEX
 otherwise.


Attached patches are latest version patch.
I changed syntax to REINDEX ALL IN SCHEMA, but I felt a sense of
discomfort a little
as Robert mentioned.

Anyway, you can apply these patches in numerical order,
can use REINDEX ALL IN SCHEMA feature and  -S/--schema option in reindexdb.

000_reindex_all_in_schema_v2.patch : It contains REINDEX ALL IN SCHEMA feature
001_Add_schema_option_to_reindexdb_v1.patch : It contains reindexdb
-S/--schema supporting

Please review and comments.

Regards,

---
Sawada Masahiko


000_reindex_all_in_schema_v2.patch
Description: Binary data


001_Add_schema_option_to_reindexdb_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Alvaro Herrera
Stephen Frost wrote:

   Yes, I will update the patch.
   
   Still planning to do this..?
   
   Marking this back to waiting-for-author.
  
  Yes, but probably not for this commitfest unfortunately.
 
 Fair enough, I'll mark it 'returned with feedback'.

We lost this patch for the October commitfest, didn't we?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 We lost this patch for the October commitfest, didn't we?

I'm guessing you missed that a new version just got submitted..?

I'd be fine with today's being added to the october commitfest..

Of course, there's a whole independent discussion to be had about how
there wasn't any break between last commitfest and this one, but that
probably deserves its own thread.

THanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Lucas Lersch
I got the following numbers from my tpcc database:

Data size: ~6059MB
Index size: ~1390MB
Total size: ~7400MB

Even considering index-only scans, the ratio of around 50% of the database
pages being accessed seems unrealistic to me.



On Wed, Oct 15, 2014 at 3:50 PM, Stephen Frost sfr...@snowman.net wrote:

 * Lucas Lersch (lucasler...@gmail.com) wrote:
  So is it a possible normal behavior that running tpcc for 10min only
 access
  50% of the database? Furthermore, is there a guideline of parameters for
  tpcc (# of warehouses, execution time, operations weight)?

 Depends- you may be aware that we support index-only scans in certain
 situations.  This means that only the index page for a given relation
 (and the visibility map) are accessed, and the heap is not.

 Thanks,

 Stephen




-- 
Lucas Lersch


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Alvaro Herrera
Stephen Frost wrote:
 * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
  We lost this patch for the October commitfest, didn't we?
 
 I'm guessing you missed that a new version just got submitted..?

Which one, reindex schema?  Isn't that a completely different patch?

 I'd be fine with today's being added to the october commitfest..
 
 Of course, there's a whole independent discussion to be had about how
 there wasn't any break between last commitfest and this one, but that
 probably deserves its own thread.

It's not the first that that happens, and honestly I don't see all that
much cause for concern.  Heikki did move pending patches to the current
one, and closed a lot of inactive ones as 'returned with feedback'.
Attentive patch authors should have submitted new versions ... if they
don't, then someone else with an interest in the patch should do so.
If no one update the patches, what do we want them for?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 10:12 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 14:46, Robert Haas robertmh...@gmail.com wrote:
 When my family goes to a restaurant, any member of the party may ask
 for a table and the request is granted for the whole family. But the
 lock is released only when I pay the bill. Once we have the table, any
 stragglers know we have locked the table and they just come sit at the
 table without needing to make their own lock request to the Maitre D',
 though they clearly cache the knowledge that we have the table locked.

 Hmm, interesting idea.  Suppose, though, that the child process
 requests a lock that can't immediately be granted, because the catalog
 it's trying to access is locked in AccessExclusiveLock mode by an
 unrelated transaction.  The unrelated transaction, in turn, is blocked
 trying to acquire some resource, which the top level parallelism
 process.  Assuming the top level parallelism process is waiting for
 the child (or will eventually wait), this is a deadlock, but without
 some modification to the deadlock detector, it can't see one of the
 edges.

 Family disputes are fairly easily resolved ;-)

 The first and basic point is that in most cases the parent should
 already hold the required locks. This can only happen for briefly held
 locks and/or more complex stuff. In the first case, getting
 parallelism to work without that complex stuff would be useful. I'd be
 happy if the first version simply throws an error if a child can't
 acquire a lock immediately. Don't overthink the first version. Knowing
 you'll disagree, lets take a further step...

Well, I'm fervently in agreement with you on one point: the first
version of all this needs to be as simple as possible, or the time to
get to the first version will be longer than we can afford to wait.  I
think what we're discussing here is which things are important enough
that it makes sense to have them in the first version, and which
things can wait.  I also think we are in agreement that at least SOME
thought about lock management is needed; the question we're trying to
hash out is whether what I'm proposing to try to do here is
significantly more ambitious than what's really necessary for V1.
Which is a good question.

 Second point, the relationship between parent and children is clear.
 If we do a deadlock detection, we should be able to search for that as
 a special case, since we will know that we are a child and that such a
 situation might occur. So just add in an edge so the rest of the
 deadlock code works fine.

 If that doesn't work, use a heurisic. If parent is waiting when child
 does deadlock test, assume its a deadlock and abort the child
 speculatively just in case. You can work out how to do that better in
 the future, since it won't happen that often.

Well, the deadlock checker needs to work not only when one of the
parallel processes invokes it, but also when somebody else invokes it.
For example, suppose we have parallel processes A and B.  A holds lock
1 and awaits lock 2, while B holds awaits lock 3.  No problem!  Now
process X, which is holding lock 2 tries to grab lock 1.  X must be
able to detect the deadlock.  Now, that's not necessarily a problem
with what you said: it just means that the parent-child relationship
has to be clear from the contents of shared memory.

Which is pretty much the direction I'm aiming with the incomplete
patch I posted.  For the sake of simplicity, I want to assume that
every process in a locking group waits for every other process in the
same locking group, even though that might not be technically true in
every case.   If the parent is waiting for a lock and the guy who has
that lock is waiting for a parallel worker in the same group, my plan
(which I think matches your suggestion) is to call that a deadlock.
There are a few sticky points here: sometimes, the deadlock checker
doesn't actually abort transactions, but just rearranges the wait
queue, so something at least somewhat sensible needs to happen in that
case.

The thing that I'm aiming to do in the patch which I think you are
suggesting might not be necessary is to make it possible for the child
go ahead and request AccessShareLock on the scan relation even though
the parent might already hold some other lock (perhaps even
AccessExclusiveLock).   I want to make the lock manager smart enough
to recognize that those locks are mutually non-conflicting because of
the fact that the two processes are in close cooperation.  Clearly, we
could get by without that, but it adds complexity in other places: the
parent has to never release its lock (even if killed) until the child
is done with the relation; and the scan code itself needs to be
conditional, passing NoLock from children and some other mode in the
parent.  That's all manageable, but it looks to me like doing the
necessary surgery on the lock manager isn't actually going to be that
hard; most of the necessary logic is present in 

Re: [HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 10:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kapil...@gmail.com writes:
 I have observed that for renaming some of the objects
 AccessExclusiveLock is taken on object whereas for
 other kind of objects no lock is taken on object before
 renaming the object.

 The usual theory for DDL updates of all types (not just rename)
 is that an explicit lock is only needed for objects whose catalog
 representation comprises more than one row.  Otherwise, the implicit
 locking involved in updating that row is sufficient to serialize
 different updates.

That's an interesting point that I hadn't considered, but I'm willing
to believe that at least some of the differences might also be
haphazard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Fujii Masao
On Fri, Aug 8, 2014 at 4:08 PM, Guillaume Lelarge
guilla...@lelarge.info wrote:
 Hi,

 As part of our monitoring work for our customers, we stumbled upon an issue
 with our customers' servers who have a wal_keep_segments setting higher than
 0.

 We have a monitoring script that checks the number of WAL files in the
 pg_xlog directory, according to the setting of three parameters
 (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
 We usually add a percentage to the usual formula:

 greatest(
   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
   checkpoint_segments + wal_keep_segments + 1
 )

 And we have lots of alerts from the script for customers who set their
 wal_keep_segments setting higher than 0.

 So we started to question this sentence of the documentation:

 There will always be at least one WAL segment file, and will normally not be
 more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 or
 checkpoint_segments + wal_keep_segments + 1 files.

 (http://www.postgresql.org/docs/9.3/static/wal-configuration.html)

 While doing some tests, it appears it would be more something like:

 wal_keep_segments + (2 + checkpoint_completion_target) * checkpoint_segments
 + 1

 But after reading the source code (src/backend/access/transam/xlog.c), the
 right formula seems to be:

 wal_keep_segments + 2 * checkpoint_segments + 1

 Here is how we went to this formula...

 CreateCheckPoint(..) is responsible, among other things, for deleting and
 recycling old WAL files. From src/backend/access/transam/xlog.c, master
 branch, line 8363:

 /*
  * Delete old log files (those no longer needed even for previous
  * checkpoint or the standbys in XLOG streaming).
  */
 if (_logSegNo)
 {
 KeepLogSeg(recptr, _logSegNo);
 _logSegNo--;
 RemoveOldXlogFiles(_logSegNo, recptr);
 }

 KeepLogSeg(...) function takes care of wal_keep_segments. From
 src/backend/access/transam/xlog.c, master branch, line 8792:

 /* compute limit for wal_keep_segments first */
 if (wal_keep_segments  0)
 {
 /* avoid underflow, don't go below 1 */
 if (segno = wal_keep_segments)
 segno = 1;
 else
 segno = segno - wal_keep_segments;
 }

 IOW, the segment number (segno) is decremented according to the setting of
 wal_keep_segments. segno is then sent back to CreateCheckPoint(...) via
 _logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can
 remove or recycle all files before this segment number. This function gets
 the number of WAL files to recycle with the XLOGfileslop constant, which is
 defined as:

 /*
  * XLOGfileslop is the maximum number of preallocated future XLOG segments.
  * When we are done with an old XLOG segment file, we will recycle it as a
  * future XLOG segment as long as there aren't already XLOGfileslop future
  * segments; else we'll delete it.  This could be made a separate GUC
  * variable, but at present I think it's sufficient to hardwire it as
  * 2*CheckPointSegments+1.  Under normal conditions, a checkpoint will free
  * no more than 2*CheckPointSegments log segments, and we want to recycle
 all
  * of them; the +1 allows boundary cases to happen without wasting a
  * delete/create-segment cycle.
  */
 #define XLOGfileslop(2*CheckPointSegments + 1)

 (in src/backend/access/transam/xlog.c, master branch, line 100)

 IOW, PostgreSQL will keep wal_keep_segments WAL files before the current WAL
 file, and then there may be 2*CheckPointSegments + 1 recycled ones. Hence
 the formula:

 wal_keep_segments + 2 * checkpoint_segments + 1

 And this is what we usually find in our customers' servers. We may find more
 WAL files, depending on the write activity of the cluster, but in average,
 we get this number of WAL files.

 AFAICT, the documentation is wrong about the usual number of WAL files in
 the pg_xlog directory. But I may be wrong, in which case, the documentation
 isn't clear enough for me, and should be fixed so that others can't
 misinterpret it like I may have done.

 Any comments? did I miss something, or should we fix the documentation?

I think you're right. The correct formula of the number of WAL files in
pg_xlog seems to be

(3 + checkpoint_completion_target) * checkpoint_segments + 1

or

wal_keep_segments + 2 * checkpoint_segments + 1


Why? At the end of checkpoint, the WAL files which were generated since the
start of previous checkpoint cannot be removed and must remain in pg_xlog.
The number of them is

(1 + checkpoint_completion_target) * checkpoint_segments

or

wal_keep_segments

Also, at the end of checkpoint, as you pointed out, if there are
*many* enough old WAL files, 2 * checkpoint_segments + 1 WAL files will be
recycled. Then checkpoint_segments WAL files will be consumed till the end of
next checkpoint. But since there are already 2 * checkpoint_segments + 1
recycled WAL files, no more files are increased. So, WAL files that we cannot
remove and can recycle at the 

Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Ants Aasma
On Tue, Oct 14, 2014 at 6:19 PM, Robert Haas robertmh...@gmail.com wrote:
 With regard for using a hash table for the buffer mapping lock I'm
 doubtful that any form of separate chaining is the right one. We
 currently have a quite noticeable problem with the number of cache
 misses in the buffer mapping hash (and also the heavyweight lock mgr) -
 if we stay with hashes that's only going to be fundamentally lower than
 dynahash if we change the type of hashing. I've had good, *preliminary*,
 results using an open addressing + linear probing approach.

 I'm very skeptical of open addressing + linear probing.  Such hash
 tables tend to degrade over time, because you have to leave tombstones
 behind to ensure that future scans advance far enough.  There's really
 no way to recover without rebuilding the whole hash table, and
 eventually it degrades to linear search.  If we're spending too much
 time walking hash chains, I think the solution is to increase the
 number of buckets so that the chains get shorter.

What about cuckoo hashing? There was a recent paper on how to do fine
grained locking with cuckoo hashes. [1]

I'm imagining a bucketized cuckoo hash with 5 item buckets (5-way
associativity). This allows us to fit the bucket onto 2 regular sized
cache lines and have 8 bytes left over. Buckets would be protected by
seqlocks stored in the extra space. On the read side we would only
need 2 read barriers (basically free on x86), and we are guaranteed to
have an answer by fetching two pairs of cache lines. We can trade
memory bandwidth for latency by issuing prefetches (once we add
primitives for this). Alternatively we can trade insert speed for
lookup speed by using asymmetrically sized tables.

[1] https://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Ryan Johnson

On 15/10/2014 10:32 AM, Ants Aasma wrote:

On Tue, Oct 14, 2014 at 6:19 PM, Robert Haas robertmh...@gmail.com wrote:

With regard for using a hash table for the buffer mapping lock I'm
doubtful that any form of separate chaining is the right one. We
currently have a quite noticeable problem with the number of cache
misses in the buffer mapping hash (and also the heavyweight lock mgr) -
if we stay with hashes that's only going to be fundamentally lower than
dynahash if we change the type of hashing. I've had good, *preliminary*,
results using an open addressing + linear probing approach.

I'm very skeptical of open addressing + linear probing.  Such hash
tables tend to degrade over time, because you have to leave tombstones
behind to ensure that future scans advance far enough.  There's really
no way to recover without rebuilding the whole hash table, and
eventually it degrades to linear search.  If we're spending too much
time walking hash chains, I think the solution is to increase the
number of buckets so that the chains get shorter.

What about cuckoo hashing? There was a recent paper on how to do fine
grained locking with cuckoo hashes. [1]

I'm imagining a bucketized cuckoo hash with 5 item buckets (5-way
associativity). This allows us to fit the bucket onto 2 regular sized
cache lines and have 8 bytes left over. Buckets would be protected by
seqlocks stored in the extra space. On the read side we would only
need 2 read barriers (basically free on x86), and we are guaranteed to
have an answer by fetching two pairs of cache lines. We can trade
memory bandwidth for latency by issuing prefetches (once we add
primitives for this). Alternatively we can trade insert speed for
lookup speed by using asymmetrically sized tables.

[1] https://www.cs.princeton.edu/~mfreed/docs/cuckoo-eurosys14.pdf
Actually, I'd go with second-chance hashing [2], same number of hash 
functions but it's more stable (no infinite loops, for example). Most 
probably the techniques from [1] would apply equally well.


[2] 
http://www.eecs.harvard.edu/~michaelm/postscripts/infocom_hardware_submit.pdf


Ryan



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 2:03 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-10-14 17:53:10 -0400, Robert Haas wrote:
 On Tue, Oct 14, 2014 at 4:42 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  The code in CHashSearch shows the problem there: you need to STORE the
  hazard pointer before you begin to do the LOAD operations to scan the
  bucket, and you must finish all of those LOADs before you STORE the
  NULL hazard pointer.  A read or write barrier won't provide store/load
  or load/store ordering.
 
  I'm not sure I understand the problem here - but a read + write barrier
  should suffice? To avoid falling back to two full barriers, we'd need a
  separate define pg_read_write_barrier(), but that's not a problem. IIUC
  that should allow us to avoid emitting any actual code on x86.

 Well, thinking about x86 specifically, it definitely needs at least
 one mfence, after setting the hazard pointer and before beginning to
 read the bucket chain.

 Yes, I can see that now. I do wonder if that's not going to prove quite
 expensive... I think I can see some ways around needing that. But I
 think that needs some benchmarking first - no need to build a even more
 complex solution if not needed.

 The solution I'm thinking of is to essentially move away from hazard
 pointers and store something like a generation counter per
 backend. Which is updated less often, and in combination with other
 operations. When a backend need to clean up and sees that there's a
 straggler with a old generation it sends that backend a signal to ensure
 it sets the latest generation.

It's possible that might work ... but on the timescale we're talking
about here, that's asking the garbage collecting process to wait for
practically geologic time.

Back when I first wrote this code, I spent a fair amount of time
looking at existing work in the area of lock-free hash tables.
Essentially all of the work I was able to find on this topic assumes a
threaded model - or more precisely, it assumes that shared memory
allocation is cheap and easy and you'll have no problem getting as
much of it as you need whenever you need it.  This assumption often
isn't even spelled out explicitly: it's just assumed that that's the
programming environment you're working in.  Finding highly parallel
algorithms that don't rely on memory allocation as a primitive is
hard.  Hazard pointers are one of the few techniques I found that
seems like it can work in our architecture.  I'm quite reluctant to
leave aside techniques that have been proven to work well in favor of
inventing something entirely novel to PostgreSQL.

That having been said, there is some literature on generation numbers,
and I think something like that could be made to work.  It does have
some significant disadvantages, though.  One is that a single process
which fails to update its generation number prevents all reclamation,
system-wide.In my algorithm, a process whose execution is
suspended while a hazard pointer is set prevents recycling of only one
of many garbage lists.  A process searching for a reusable element can
mostly likely find some other garbage list to reclaim instead.  Also,
a generation number implies that we update the value periodically,
rather than before and after each critical section.  Anything that
forces garbage collection to be postponed longer than absolutely
necessary seems to me likely to be a loser.  It might be a little
faster as long as we have free elements to allocate, but as soon as
we're out and have to wait longer than we otherwise would for garbage
collection, and all system activity halts as a result, even for a few
milliseconds, it's going to be a whole lot slower.  Or at least, I
think.

That having been said, I don't know what to do about the fact that the
fence is too expensive.  I don't know that we've really established
that that's the true root of the problem rather than some other
pedestrian optimization failure.  But the existing code is using an
atomic operation to acquire a spinlock, then releasing it, walking the
bucket chain, and then using another atomic operation to acquire a
spinlock and then releasing it again.  Surely a pure fence shouldn't
cost more than a spinlock cycle?  Even with arguably one extra cache
line touch, that seems like it ought to be a win.  But my intuitions
in this area are shaky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffer Requests Trace

2014-10-15 Thread Jeff Janes
On Wed, Oct 15, 2014 at 6:22 AM, Lucas Lersch lucasler...@gmail.com wrote:

 So is it a possible normal behavior that running tpcc for 10min only
 access 50% of the database? Furthermore, is there a guideline of parameters
 for tpcc (# of warehouses, execution time, operations weight)?


I'm not familiar with your benchmarking tool.  With the one I am most
familiar with, pgbench, if you run it against a database which is too big
to fit in memory, it can take a very long time to touch each page once,
because the constant random disk reads makes it run very slowly.  Maybe
that is something to consider here--how many transactions were actually
executed during your 10 min run?

Also, the tool might build tables that are only used under certain run
options.  Perhaps you just aren't choosing the options which invoke usage
of those tables.  Since you have the trace data, it should be pretty easy
to count how many distinct blocks are accessed from each relation, and
compare that to the size of the relations to see which relations are unused
or lightly used.

Cheers,

Jeff


Re: [HACKERS] Column Redaction

2014-10-15 Thread Robert Haas
On Wed, Oct 15, 2014 at 4:04 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 14 October 2014 17:43, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Oct 11, 2014 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 As soon as you issue the above query, you have clearly indicated your
 intention to steal. Receiving information is no longer accidental, it
 is an explicit act that is logged in the auditing system against your
 name. This is sufficient to bury you in court and it is now a real
 deterrent. Redaction has worked.

 To me, this feels thin.  It's true that this might be good enough for
 some users, but I wouldn't bet on it being good enough for very many
 users, and I really hope there's a better option.  We have an existing
 method of doing data redaction via security barrier views.

 I agree with thin. There is a leak in the design, so let me coin the
 phrase imprecise security. Of course, the leaks reduce the value of
 such a feature; they just don't reduce it all the way to zero.

 Security barrier views or views of any kind don't do the required job.

 We are not able to easily classify people as Trusted or Untrusted.

 We're seeking to differentiate between the right to use a column for
 queries and the right to see the value itself. Or put another way, you
 can read the book, you just can't photocopy it and take the copy home.
 Or, you can try on the new clothes to see if they fit, but you can't
 take them home for free. Both of those examples have imprecise
 security measures in place to control and reduce negative behaviours
 and in every other industry this is known as security.

 In IT terms, we're looking at controlling and reducing improper access
 to data by an otherwise Trusted person. The only problem is that some
 actions on data items are allowed, others are not.

Sure, I don't disagree with any of that as a general principle.  I
just think we should look for some ways of shoring up your proposal
against some of the more obvious attacks, so as to have more good and
less bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-15 Thread Simon Riggs
On 15 October 2014 17:03, Robert Haas robertmh...@gmail.com wrote:

 Well, I'm fervently in agreement with you on one point: the first
 version of all this needs to be as simple as possible, or the time to
 get to the first version will be longer than we can afford to wait.  I
 think what we're discussing here is which things are important enough
 that it makes sense to have them in the first version, and which
 things can wait.

I'm guessing we might differ slightly on what constitutes as simple as possible.

Something usable, with severe restrictions, is actually better than we
have now. I understand the journey this work represents, so don't be
embarrassed by submitting things with heuristics and good-enoughs in
it. Our mentor, Mr.Lane, achieved much by spreading work over many
releases, leaving others to join in the task.

Might I gently enquire what the something usable we are going to see
in this release? I'm not up on current plans.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 15 October 2014 19:46, Robert Haas robertmh...@gmail.com wrote:

 In IT terms, we're looking at controlling and reducing improper access
 to data by an otherwise Trusted person. The only problem is that some
 actions on data items are allowed, others are not.

 Sure, I don't disagree with any of that as a general principle.  I
 just think we should look for some ways of shoring up your proposal
 against some of the more obvious attacks, so as to have more good and
 less bad.

Suggestions welcome. I'm not in a rush to implement this, so we have
time to mull it over.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Claudio Freire
On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 October 2014 16:45, Rod Taylor rod.tay...@gmail.com wrote:
 Redaction prevents accidental information loss only, forcing any loss
 that occurs to be explicit. It ensures that loss of information can be
 tied clearly back to an individual, like an ink packet that stains the
 fingers of a thief.

That is not true.

It can only be tied to a session. That's very far from an individual
in court terms, if you ask a lawyer.

You need a helluva lot more to tie that to an individual.

 Redaction clearly relies completely on auditing before it can have any
 additional effect. And the effectiveness of redaction needs to be
 understood next to Rod's example.

It forces you to audit all of the queries issued by the otherwise trusted user.

That is, I believe, a far from optimal design. When you have to audit
everything, you end up auditing nothing, a haystack of false positives
can easily hide the needle that is the true positive.

What you want, is something that allows selective auditing of
leak-prone queries.

But we've seen that joining is already a leak-prone query, so clearly
you cannot allow simple joining if you want the above.

What I propose, needs a schema change and some preparedness from the
DBA. But, how can you assume that to be asking too much and not say
the same from thorough auditing?

So, what I propose, is to require explicit separation of concepts at
the schema level.

On Sat, Oct 11, 2014 at 10:43 AM, Bruce Momjian br...@momjian.us wrote:
 For example, for a credit card type, you would output the last four
 digits, but is there any value to storing the non-visible digits?  You
 can check the checksum of the digits, but that can be done on input and
 doesn't require the storage of the digits.  Is there some function we
 could provide that would make that data type useful?  Could we provide
 comparison functions with delays or increasing delays?

Basically, as said above, the point is to provide a data type that is
nigh-useless.

Imagine a redacted card number as a tuple (full_value_id, suffix).
Suffix is in cleartext, and prefix_id is just an id pointing to a
lookup table for the type.

Regular users can read any redacted_number column, but will only get
the id (useless unless they already know what that prefix is), and
suffix. Format for that type would be  suffix and would serve
the purpose on the OP: it can be joined (equal value = equal id).
Moreover, the type can be design in one of two ways: equal values
contain equal id, or salted-values, where even equal values generated
from different computations (ie: not copied) have different ids. This
second mode would be the most secure, albeit a tad hard to use
perhaps.

But it would allow joining and everything. Only users that have access
to the lookup table would be allowed to resolve the full value, with a
non-security-defining function like:

extract_full_value(redacted_number)

Then you can audit all queries against the lookup table, and you have
rather strong security IMHO.

This can all be done without any new features to postgres. Maybe you
can add syntactic sugar, but you don't really need anything on the
core to accomplish the above.

The syntactic sugar can take the form of a new data type family (like
enum?) where you specify the redaction function, redacted data type,
output format, and from there everything else works atomagically, with
a

extract_full(any) - any

function that somehow knows what to do.


On Wed, Oct 15, 2014 at 3:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 19:46, Robert Haas robertmh...@gmail.com wrote:

 In IT terms, we're looking at controlling and reducing improper access
 to data by an otherwise Trusted person. The only problem is that some
 actions on data items are allowed, others are not.

 Sure, I don't disagree with any of that as a general principle.  I
 just think we should look for some ways of shoring up your proposal
 against some of the more obvious attacks, so as to have more good and
 less bad.

 Suggestions welcome. I'm not in a rush to implement this, so we have
 time to mull it over.

Does the above work for your intended purposes?

Hard to know from what you've posted until now, but I believe it does.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for better support of time-varying timezone abbreviations

2014-10-15 Thread Tom Lane
... and here is a draft patch for the timezone abbreviation data files.

I changed all the abbreviations for which the parent zone had used more
than one GMT offset since 1970.  That seemed like a good cutoff to avoid
wasting cycles on ancient history, especially since the IANA people
themselves don't make any large promises about the quality of their data
before 1970.

Although zones in the Russian Federation are the majority of zones that
had abbreviation changes, a quick look at this patch shows that they're
hardly the only ones.  We've been sticking our heads in the sand about
this problem for quite a while :-(

regards, tom lane

diff --git a/src/timezone/tznames/America.txt b/src/timezone/tznames/America.txt
index 54b51fe..9e62732 100644
*** a/src/timezone/tznames/America.txt
--- b/src/timezone/tznames/America.txt
*** AMT-14400# Amazon Time
*** 47,53 
   # (America/Cuiaba)
   # (America/Manaus)
   # (America/Porto_Velho)
! ART-10800# Argentina Time
   # (America/Argentina/Buenos_Aires)
   # (America/Argentina/Cordoba)
   # (America/Argentina/Tucuman)
--- 47,53 
   # (America/Cuiaba)
   # (America/Manaus)
   # (America/Porto_Velho)
! ARTAmerica/Argentina/Buenos_Aires  # Argentina Time
   # (America/Argentina/Buenos_Aires)
   # (America/Argentina/Cordoba)
   # (America/Argentina/Tucuman)
*** ART-10800# Argentina Time
*** 58,64 
   # (America/Argentina/Mendoza)
   # (America/Argentina/Rio_Gallegos)
   # (America/Argentina/Ushuaia)
! ARST-7200 D  # Argentina Summer Time
  # CONFLICT! AST is not unique
  # Other timezones:
  #  - AST: Arabic Standard Time (Asia)
--- 58,64 
   # (America/Argentina/Mendoza)
   # (America/Argentina/Rio_Gallegos)
   # (America/Argentina/Ushuaia)
! ARSTAmerica/Argentina/Buenos_Aires  # Argentina Summer Time
  # CONFLICT! AST is not unique
  # Other timezones:
  #  - AST: Arabic Standard Time (Asia)
*** GMT 0# Greenwich Mean Time
*** 228,234 
   # (Etc/GMT)
   # (Europe/Dublin)
   # (Europe/London)
! GYT-14400# Guyana Time
   # (America/Guyana)
  HADT   -32400 D  # Hawaii-Aleutian Daylight Time
   # (America/Adak)
--- 228,234 
   # (Etc/GMT)
   # (Europe/Dublin)
   # (Europe/London)
! GYTAmerica/Guyana  # Guyana Time
   # (America/Guyana)
  HADT   -32400 D  # Hawaii-Aleutian Daylight Time
   # (America/Adak)
*** PST-28800# Pacific Standard Time
*** 285,299 
   # (Pacific/Pitcairn)
  PYST   -10800 D  # Paraguay Summer Time
   # (America/Asuncion)
! PYT-14400# Paraguay Time
   # (America/Asuncion)
! SRT-10800# Suriname Time
   # (America/Paramaribo)
  UYST-7200 D  # Uruguay Summer Time
   # (America/Montevideo)
  UYT-10800# Uruguay Time
   # (America/Montevideo)
! VET-16200# Venezuela Time (caution: this used to mean -14400)
   # (America/Caracas)
  WGST-7200 D  # Western Greenland Summer Time
   # (America/Godthab)
--- 285,299 
   # (Pacific/Pitcairn)
  PYST   -10800 D  # Paraguay Summer Time
   # (America/Asuncion)
! PYTAmerica/Asuncion  # Paraguay Time
   # (America/Asuncion)
! SRTAmerica/Paramaribo  # Suriname Time
   # (America/Paramaribo)
  UYST-7200 D  # Uruguay Summer Time
   # (America/Montevideo)
  UYT-10800# Uruguay Time
   # (America/Montevideo)
! VETAmerica/Caracas  # Venezuela Time
   # (America/Caracas)
  WGST-7200 D  # Western Greenland Summer Time
   # (America/Godthab)
diff --git a/src/timezone/tznames/Antarctica.txt b/src/timezone/tznames/Antarctica.txt
index 5a03250..2359020 100644
*** a/src/timezone/tznames/Antarctica.txt
--- b/src/timezone/tznames/Antarctica.txt
*** CLST   -10800 D  # Chile Summer Time
*** 16,26 
  CLT-14400# Chile Time
   # (America/Santiago)
   # (Antarctica/Palmer)
! DAVT25200# Davis Time (Antarctica)
   # (Antarctica/Davis)
  DDUT36000# Dumont-d`Urville Time (Antarctica)
   # (Antarctica/DumontDUrville)
! MAWT18000   

Re: [HACKERS] replicating DROP commands across servers

2014-10-15 Thread Andres Freund
On 2014-10-04 21:12:24 -0400, Robert Haas wrote:
 On Fri, Oct 3, 2014 at 4:58 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
 wrote:
  Robert Haas wrote:
  I'm not really very convinced that it's a good idea to expose this
  instead of just figuring out a way to parse the object identity.
 
  That's the first thing I tried.  But it's not pretty: you have to
  extract schema names by splitting at a period (and what if a schema name
  contains a period?),
 
 Please tell me that the literals are escaped if necessary.  If so,
 this is pretty easy.  quote_literal() is not a hard transformation to
 reverse, and splitting on a unquoted period is not hard...

  split out on ON for certain object types,
 
 ...nor is splitting on any other fixed text string, such as  ON .

I'm not following here. Maybe just because I'm misunderstanding your
position.

The patch imo consists out of the following parts:
1) Addition of dependency information to the dropped object list
2) Actual get_object_address() handling for default values - the current
   behaviour looks pretty borked to me.
3) The reverse of getObjectTypeDescription()
4) getObjectIdentityParts() - a slightly more detailed version of
   getObjectIdentity() that requires less string parsing
5) drop even trigger support for a few types.

Are you saying you want to add a function to do 4) via parsing inside
postgres or are you suggesting to do that in every user of this
facility?

If the former, why would it be preferrable to do string parsing if we
have access to the source data? That part of the patch looks trivial to
me?

If the latter, I don't see the advantage either - this is complicated
enough, why should different users repeat the work?


Am I misunderstanding you here?


Having reread the patch just now I basically see two things to
criticize:
a) why isn't this accessible at SQL level? That seems easy to address.
b) Arguably some of this could well be done in separate commits.

  It's just not sane to try to parse such text strings.
 
 But this is a pretty ridiculous argument.  We have an existing parser
 that does it just fine

Which happens to be the part of postgres that's copied most often. So
it's certainly not something appearing to be trivial.

,and a special-purpose parser that does just
 that (and not all of the other stuff that the main parser does) would
 be a great deal simpler.

That parser also happens to be far from trivial (if we're talking about
parseNameAndArgTypes() - which just solves one of the cases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-10-15 Thread Alvaro Herrera
Andres Freund wrote:

 Having reread the patch just now I basically see two things to
 criticize:
 a) why isn't this accessible at SQL level? That seems easy to address.
 b) Arguably some of this could well be done in separate commits.

Fair comments.  I will split it up.

FWIW, I spent some time today fighting with this stuff but the
OBJECT_POLICY stuff has been causing me some trouble.  I'm not sure that
what's there currently in get_object_address is completely sane -- for
one thing I'm unsure that tables are the only object kind in the system
that are subject to policies.  In that case, we have a shortage of
abstraction here, it seems, which will need some additional fixes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 07:38 AM, Pavel Stehule wrote:



2014-10-13 17:22 GMT+02:00 Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net:



On 10/13/2014 09:37 AM, Andrew Dunstan wrote:


On 09/26/2014 04:54 PM, Andrew Dunstan wrote:


Here is a patch for the generator and aggregate functions
for jsonb that we didn't manage to get done in time for
9.4. They are all equivalents of the similarly names json
functions. Included are

to_jsonb
jsonb_build_object
jsonb_build_array
jsonb_object
jsonb_agg
jsonb_object_agg


Still to come: documentation.

Adding to the next commitfest.



Revised patch to fix compiler warnings.


And again, initializing an incompletely initialized variable, as
found by Pavel Stehule.


I checked a code, and I have only two small objection - a name 
jsonb_object_two_arg is not good - maybe json_object_keys_values ?


It's consistent with the existing json_object_two_arg. In all cases I 
think I kept the names the same except for changing json to jsonb. 
Note that these _two_arg functions are not visible at the SQL level - 
they are only visible in the C code.


I'm happy to be guided by others in changing or keeping these names.



Next: there are no tests for to_jsonb function.




Oh, my bad. I'll add some.

Thank for the review.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 15 October 2014 20:41, Claudio Freire klaussfre...@gmail.com wrote:
 On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 October 2014 16:45, Rod Taylor rod.tay...@gmail.com wrote:
 Redaction prevents accidental information loss only, forcing any loss
 that occurs to be explicit. It ensures that loss of information can be
 tied clearly back to an individual, like an ink packet that stains the
 fingers of a thief.

 That is not true.

 It can only be tied to a session. That's very far from an individual
 in court terms, if you ask a lawyer.

 You need a helluva lot more to tie that to an individual.

So you're familiar then with this process? So you know that an auditor
would trigger an investigation, resulting in deeper surveillance and
gathering of evidence that ends with various remedial actions, such as
court. How would that process start then, if not this way?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column Redaction

2014-10-15 Thread Claudio Freire
On Wed, Oct 15, 2014 at 4:59 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 20:41, Claudio Freire klaussfre...@gmail.com wrote:
 On Sat, Oct 11, 2014 at 4:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 October 2014 16:45, Rod Taylor rod.tay...@gmail.com wrote:
 Redaction prevents accidental information loss only, forcing any loss
 that occurs to be explicit. It ensures that loss of information can be
 tied clearly back to an individual, like an ink packet that stains the
 fingers of a thief.

 That is not true.

 It can only be tied to a session. That's very far from an individual
 in court terms, if you ask a lawyer.

 You need a helluva lot more to tie that to an individual.

 So you're familiar then with this process? So you know that an auditor
 would trigger an investigation, resulting in deeper surveillance and
 gathering of evidence that ends with various remedial actions, such as
 court. How would that process start then, if not this way?


I've seen lots of such investigations fail because the evidence wasn't
strong enough to link to a particular person, but rather a computer
terminal or something like that.

Unless you also physically restrict access to such terminal to a
single person through other means (which is quite uncommon practice
except perhaps in banks), that evidence is barely circumstantial.

But you'd have to ask a lawyer in your country to be sure. I can only
speak for my own experiences in my own country which is probably not
yours nor has the same laws. Law is a complex beast.

So, you really want actual information security in addition to that
deterrent you speak of. I don't say the deterrent is bad, I only say
it's not good enough on its own.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Jeff Janes
On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge guilla...@lelarge.info
wrote:

 Hi,

 As part of our monitoring work for our customers, we stumbled upon an
 issue with our customers' servers who have a wal_keep_segments setting
 higher than 0.

 We have a monitoring script that checks the number of WAL files in the
 pg_xlog directory, according to the setting of three parameters
 (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
 We usually add a percentage to the usual formula:

 greatest(
   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
   checkpoint_segments + wal_keep_segments + 1
 )


I think the first bug is even having this formula in the documentation to
start with, and in trying to use it.

and will normally not be more than...

This may be normal for a toy system.  I think that the normal state for
any system worth monitoring is that it has had load spikes at some point in
the past.

So it is the next part of the doc, which describes how many segments it
climbs back down to upon recovering from a spike, which is the important
one.  And that doesn't mention wal_keep_segments at all, which surely
cannot be correct.

I will try to independently derive the correct formula from the code, as
you did, without looking too much at your derivation  first, and see if we
get the same answer.

Cheers,

Jeff


Re: [HACKERS] WIP: dynahash replacement for buffer table

2014-10-15 Thread Andres Freund
On 2014-10-15 13:41:25 -0400, Robert Haas wrote:
 On Wed, Oct 15, 2014 at 2:03 AM, Andres Freund and...@2ndquadrant.com wrote:
  The solution I'm thinking of is to essentially move away from hazard
  pointers and store something like a generation counter per
  backend. Which is updated less often, and in combination with other
  operations. When a backend need to clean up and sees that there's a
  straggler with a old generation it sends that backend a signal to ensure
  it sets the latest generation.
 
 It's possible that might work ... but on the timescale we're talking
 about here, that's asking the garbage collecting process to wait for
 practically geologic time.

I think it depends on what we're tying the generation increase to. We
very well could add a implict generation increment to, say, lwlock
acquisition - which are full barriers anyway. And I don't think we'll
normally have a that high rate of garbage collection anyway - there
should be plenty of headroom.

 Back when I first wrote this code, I spent a fair amount of time
 looking at existing work in the area of lock-free hash tables.
 Essentially all of the work I was able to find on this topic assumes a
 threaded model - or more precisely, it assumes that shared memory
 allocation is cheap and easy and you'll have no problem getting as
 much of it as you need whenever you need it.

FWIW, I think many of the solutions that are actually used in practice
don't rely on that heavily. I know at least some that require memory to
be reserved beforehand, in special contexts.

 This assumption often
 isn't even spelled out explicitly: it's just assumed that that's the
 programming environment you're working in.  Finding highly parallel
 algorithms that don't rely on memory allocation as a primitive is
 hard.  Hazard pointers are one of the few techniques I found that
 seems like it can work in our architecture.  I'm quite reluctant to
 leave aside techniques that have been proven to work well in favor of
 inventing something entirely novel to PostgreSQL.

I don't think something like generation numbers is really that new and
unproven - it's essentially a more trivial version of RCU. Which is used
quite heavily, and under different names.

That said, I'm far from convinced that they are the solution - they just
were the first thing that came to my mind thinking about the problem.

 That having been said, there is some literature on generation numbers,
 and I think something like that could be made to work.  It does have
 some significant disadvantages, though.  One is that a single process
 which fails to update its generation number prevents all reclamation,
 system-wide.In my algorithm, a process whose execution is
 suspended while a hazard pointer is set prevents recycling of only one
 of many garbage lists.  A process searching for a reusable element can
 mostly likely find some other garbage list to reclaim instead.

Hm. Couldn't a similar scheme with several lists be used with generation
numbers?

 Also, a generation number implies that we update the value
 periodically, rather than before and after each critical section.

Hm. Don't think it necessarily has to do that.

 Anything that forces garbage collection to be postponed longer than
 absolutely necessary seems to me likely to be a loser.  It might be a
 little faster as long as we have free elements to allocate, but as
 soon as we're out and have to wait longer than we otherwise would for
 garbage collection, and all system activity halts as a result, even
 for a few milliseconds, it's going to be a whole lot slower.  Or at
 least, I think.

I think it really depends on the user of the facility. If the generation
were e.g. also tied to lwlocks I couldn't see bufmgr.c outrunning that
realistically.

 That having been said, I don't know what to do about the fact that the
 fence is too expensive.

I'm far from sure that it's the problem at hand here - the reason I'm
wondering about the barriers is primarily that the buffer mapping hash
table is one of the top profile entries in in all concurrent
workloads. The top one often even, after removing some locking
bottleneck. Nearly all of the time is spent there due to cache
misses. While I think we can get the table smaller and more efficient
for the same NBuffers value, realistically we need to cope with much
bigger NBuffers values.

Since cache misses are a problem that we're going to have to deal with,
restricting the processor's tool for efficiently dealing with that (out
of order execution, SMT), doesn't seem like a wise choice.

 I don't know that we've really established
 that that's the true root of the problem rather than some other
 pedestrian optimization failure.

Absolutely.

 But the existing code is using an
 atomic operation to acquire a spinlock, then releasing it, walking the
 bucket chain, and then using another atomic operation to acquire a
 spinlock and then releasing it again.

Well, we don't do that for lookups right now. 

Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Guillaume Lelarge
2014-10-15 22:11 GMT+02:00 Jeff Janes jeff.ja...@gmail.com:

 On Fri, Aug 8, 2014 at 12:08 AM, Guillaume Lelarge guilla...@lelarge.info
  wrote:

 Hi,

 As part of our monitoring work for our customers, we stumbled upon an
 issue with our customers' servers who have a wal_keep_segments setting
 higher than 0.

 We have a monitoring script that checks the number of WAL files in the
 pg_xlog directory, according to the setting of three parameters
 (checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
 We usually add a percentage to the usual formula:

 greatest(
   (2 + checkpoint_completion_target) * checkpoint_segments + 1,
   checkpoint_segments + wal_keep_segments + 1
 )


 I think the first bug is even having this formula in the documentation to
 start with, and in trying to use it.


I agree. But we have customers asking how to compute the right size for
their WAL file system partitions. Right size is usually a euphemism for
smallest size, and they usually tend to get it wrong, leading to huge
issues. And I'm not even speaking of monitoring, and alerting.

A way to avoid this issue is probably to erase the formula from the
documentation, and find a new way to explain them how to size their
partitions for WALs.

Monitoring is another matter, and I don't really think a monitoring
solution should count the WAL files. What actually really matters is the
database availability, and that is covered with having enough disk space in
the WALs partition.

and will normally not be more than...

 This may be normal for a toy system.  I think that the normal state for
 any system worth monitoring is that it has had load spikes at some point in
 the past.


Agreed.


 So it is the next part of the doc, which describes how many segments it
 climbs back down to upon recovering from a spike, which is the important
 one.  And that doesn't mention wal_keep_segments at all, which surely
 cannot be correct.


Agreed too.


 I will try to independently derive the correct formula from the code, as
 you did, without looking too much at your derivation  first, and see if we
 get the same answer.


Thanks. I look forward reading what you found.

What seems clear to me right now is that no one has a sane explanation of
the formula. Though yours definitely made sense, it didn't seem to be what
the code does.


-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 03:54 PM, I wrote:


On 10/15/2014 07:38 AM, Pavel Stehule wrote:




I checked a code, and I have only two small objection - a name 
jsonb_object_two_arg is not good - maybe json_object_keys_values ?


It's consistent with the existing json_object_two_arg. In all cases I 
think I kept the names the same except for changing json to jsonb. 
Note that these _two_arg functions are not visible at the SQL level - 
they are only visible in the C code.


I'm happy to be guided by others in changing or keeping these names.



If we really want to change the name of json_object_two_arg, it would 
probably be best to change it NOW in 9.4 before it gets out into a 
production release at all.


Thoughts?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Maximum number of WAL files in the pg_xlog directory

2014-10-15 Thread Josh Berkus
On 10/15/2014 01:25 PM, Guillaume Lelarge wrote:
 Monitoring is another matter, and I don't really think a monitoring
 solution should count the WAL files. What actually really matters is the
 database availability, and that is covered with having enough disk space in
 the WALs partition.

If we don't count the WAL files, though, that eliminates the best way to
detecting when archiving is failing.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Alvaro Herrera
Andrew Dunstan wrote:

 If we really want to change the name of json_object_two_arg, it
 would probably be best to change it NOW in 9.4 before it gets out
 into a production release at all.

Doesn't it require initdb?  If so, I think it's too late now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb generator functions

2014-10-15 Thread Andrew Dunstan


On 10/15/2014 05:47 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


If we really want to change the name of json_object_two_arg, it
would probably be best to change it NOW in 9.4 before it gets out
into a production release at all.

Doesn't it require initdb?  If so, I think it's too late now.



Yeah, you're right, it would.

OK, forget that.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replicating DROP commands across servers

2014-10-15 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Andres Freund wrote:
 
  Having reread the patch just now I basically see two things to
  criticize:
  a) why isn't this accessible at SQL level? That seems easy to address.
  b) Arguably some of this could well be done in separate commits.
 
 Fair comments.  I will split it up.

Here's a split version.  The last part is still missing some polish --
in particular handling for OBJECT_POLICY, and the SQL interface which
would also allow us to get something in the regression tests.


Note: in this patch series you can find the ObjectTypeMap thing that you
thought was obsolete in the DDL deparse patch ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
From 92816868c1c717519a76a83e65cd0b48e3726fbf Mon Sep 17 00:00:00 2001
From: Alvaro Herrera alvhe...@alvh.no-ip.org
Date: Fri, 4 Apr 2014 16:05:48 -0300
Subject: [PATCH 1/3] add normal/original flags to
 pg_event_trigger_dropped_objects

---
 doc/src/sgml/func.sgml  | 13 ++
 src/backend/catalog/dependency.c| 21 ++-
 src/backend/commands/event_trigger.c| 16 +---
 src/include/catalog/pg_proc.h   |  2 +-
 src/include/commands/event_trigger.h|  3 ++-
 src/test/regress/expected/event_trigger.out | 40 +
 src/test/regress/sql/event_trigger.sql  | 30 ++
 7 files changed, 114 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..45f3efa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17583,6 +17583,19 @@ FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger();
 entryObject sub-id (e.g. attribute number for columns)/entry
/row
row
+entryliteraloriginal/literal/entry
+entrytypebool/type/entry
+entryFlag used to identify the root object of the deletion/entry
+   /row
+   row
+entryliteralnormal/literal/entry
+entrytypebool/type/entry
+entry
+ Flag indicating that there's a normal dependency relationship
+ in the dependency graph leading to this object
+/entry
+   /row
+   row
 entryliteralobject_type/literal/entry
 entrytypetext/type/entry
 entryType of the object/entry
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 256486c..6485e3d 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -205,16 +205,25 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
 	/*
 	 * Keep track of objects for event triggers, if necessary.
 	 */
-	if (trackDroppedObjectsNeeded())
+	if (trackDroppedObjectsNeeded()  !(flags  PERFORM_DELETION_INTERNAL))
 	{
 		for (i = 0; i  targetObjects-numrefs; i++)
 		{
-			ObjectAddress *thisobj = targetObjects-refs + i;
-
-			if ((!(flags  PERFORM_DELETION_INTERNAL)) 
-EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
+			const ObjectAddress *thisobj = targetObjects-refs[i];
+			const ObjectAddressExtra *extra = targetObjects-extras[i];
+			bool	original = false;
+			bool	normal = false;
+
+			if (extra-flags  DEPFLAG_ORIGINAL)
+original = true;
+			if (extra-flags  DEPFLAG_NORMAL)
+normal = true;
+			if (extra-flags  DEPFLAG_REVERSE)
+normal = true;
+
+			if (EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
 			{
-EventTriggerSQLDropAddObject(thisobj);
+EventTriggerSQLDropAddObject(thisobj, original, normal);
 			}
 		}
 	}
diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 1b8c94b..0bab971 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -112,6 +112,8 @@ typedef struct SQLDropObject
 	const char *objname;
 	const char *objidentity;
 	const char *objecttype;
+	bool		original;
+	bool		normal;
 	slist_node	next;
 } SQLDropObject;
 
@@ -1105,7 +1107,7 @@ trackDroppedObjectsNeeded(void)
  * Register one object as being dropped by the current command.
  */
 void
-EventTriggerSQLDropAddObject(ObjectAddress *object)
+EventTriggerSQLDropAddObject(const ObjectAddress *object, bool original, bool normal)
 {
 	SQLDropObject *obj;
 	MemoryContext oldcxt;
@@ -1124,6 +1126,8 @@ EventTriggerSQLDropAddObject(ObjectAddress *object)
 
 	obj = palloc0(sizeof(SQLDropObject));
 	obj-address = *object;
+	obj-original = original;
+	obj-normal = normal;
 
 	/*
 	 * Obtain schema names from the object's catalog tuple, if one exists;
@@ -1251,8 +1255,8 @@ pg_event_trigger_dropped_objects(PG_FUNCTION_ARGS)
 	{
 		SQLDropObject *obj;
 		int			i = 0;
-		Datum		values[7];
-		bool		nulls[7];
+		Datum		values[9];
+		bool		nulls[9];
 
 		obj = slist_container(SQLDropObject, next, iter.cur);
 
@@ -1268,6 +1272,12 @@ pg_event_trigger_dropped_objects(PG_FUNCTION_ARGS)
 		/* objsubid */
 		values[i++] = 

Re: [HACKERS] Column Redaction

2014-10-15 Thread Simon Riggs
On 15 October 2014 21:03, Claudio Freire klaussfre...@gmail.com wrote:

 So you're familiar then with this process? So you know that an auditor
 would trigger an investigation, resulting in deeper surveillance and
 gathering of evidence that ends with various remedial actions, such as
 court. How would that process start then, if not this way?

 I've seen lots of such investigations fail because the evidence wasn't
 strong enough to link to a particular person, but rather a computer
 terminal or something like that.

So your solution to the evidence problem is to do nothing? Or you have
a better suggestion?

Nothing is certain, apart from doing nothing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #10823: Better REINDEX syntax.

2014-10-15 Thread Stephen Frost
* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
 Stephen Frost wrote:
  * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
   We lost this patch for the October commitfest, didn't we?
  
  I'm guessing you missed that a new version just got submitted..?
 
 Which one, reindex schema?  Isn't that a completely different patch?

Err, sorry, wasn't looking close enough, evidently. :/

  I'd be fine with today's being added to the october commitfest..
  
  Of course, there's a whole independent discussion to be had about how
  there wasn't any break between last commitfest and this one, but that
  probably deserves its own thread.
 
 It's not the first that that happens, and honestly I don't see all that
 much cause for concern.  Heikki did move pending patches to the current
 one, and closed a lot of inactive ones as 'returned with feedback'.

Inactive due to lack of review is the concern, but there is also a
concern that it's intended as a way to ensure committers have time to
work on their own patches instead of just working on patches submitted
through the commitfest process.  Now, I think we all end up trying to
balance making progress on our own patches while also providing help to
the commitfest, but that's the situation we were in constantly before
the commitfest process was put in place because it didn't scale very
well.

If we're always in 'commitfest' mode then we might as well eliminate the
notion of timing them.

 Attentive patch authors should have submitted new versions ... if they
 don't, then someone else with an interest in the patch should do so.
 If no one update the patches, what do we want them for?

As for this, sure, if there's a review and no response then it's fair to
mark the patch as returned with feedback.  The issue is both when no
patch gets a review and when the commitfest never ends.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Column Redaction

2014-10-15 Thread Claudio Freire
On Wed, Oct 15, 2014 at 8:59 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 October 2014 21:03, Claudio Freire klaussfre...@gmail.com wrote:

 So you're familiar then with this process? So you know that an auditor
 would trigger an investigation, resulting in deeper surveillance and
 gathering of evidence that ends with various remedial actions, such as
 court. How would that process start then, if not this way?

 I've seen lots of such investigations fail because the evidence wasn't
 strong enough to link to a particular person, but rather a computer
 terminal or something like that.

 So your solution to the evidence problem is to do nothing? Or you have
 a better suggestion?

 Nothing is certain, apart from doing nothing.

Is solving the evidence problem in scope of the postgresql project?

The solution is to not require evidence in order to be protected from
data theft.

Having evidence is nice, you can punish effective attacks, which is a
deterrent to any attacker as you pointed out, and may even include
financial compensation. It requires physical security as well as
software security, and I'm not qualified to solve that problem without
help from a lawyer (but I do know you need help from a lawyer to make
sure the evidence you gather is usable).

Not having usable evidence, however, could fail to deter knowledgeable
attackers (remember, in this setting, it would be an inside job, so it
would be a very knowledgeable attacker).

But in any case, if the deterrence isn't enough, and you get attacked,
anything involving redaction as fleshed out in the OP is good for
nothing. The damage has been done already. The feature doesn't
meaningfully slow down extraction of data, so anything you do can only
punish the attacker, not prevent further data theft or damaged
reputation/business.

Something that requires superuser privilege (or specially granted
privilege) in order to gain access to the unredacted value, on the
other hand, would considerably slow down the attacker. From my
proposal, only the second form (unnormalized redacted tuples) would
provide any meaningful data security in this sense, but even in the
other, less limiting form, it would still prevent unauthorized users
from extracting the value: you can no longer do binary search with
unredacted data, only a full brute-force search would work. That's
because the full value id (that I called prefix id, sorry, leftover
from an earlier draft) doesn't relate to the unredacted value, so
sorting comparisons ( =  =) don't provide usable information about
value space.

So, if there is a chance to implement redaction in a way that truly
protects redacted data... even if it costs a bit of performance
sometimes. Is avoiding the performance hit worth the risk?

I guess the potential users of such a feature are the only ones
qualified to answer, and the answer has great weight on how the
feature could be implemented.

Well, and of course, the quality of the implementation. If my proposal
has weaknesses I did not realize yet, it may be worthless. But that's
true of all proposals that aim for any meaningful level of security:
it's worth a lengthy look.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2014-10-15 Thread Jeff Davis
On Fri, 2014-08-29 at 10:12 -0400, Tom Lane wrote:
 What about removing the callback per se and just keeping the argument,
 as it were.  That is, a MemoryContext has a field of type size_t *
 that is normally NULL, but if set to non-NULL, then we increment the
 pointed-to value for pallocs and decrement for pfrees.

I like that idea; patch attached. Two differences:
  * At block level, not chunk level.
  * I use a uint64, because a size_t is only guaranteed to hold one
allocation, and we need to sum up many allocations.

When unused, it does still appear to have a little overhead in Robert's
test on his power machine. It seems to be between a 0.5-1.0% regression.
There's not much extra code on that path, just a branch, pointer
dereference, and an addition; so I don't think it will get much cheaper
than it is.

This regression seems harder to reproduce on my workstation, or perhaps
it's just noisier.

 One thought in either case is that we don't have to touch the API for
 MemoryContextCreate: rather, the feature can be enabled in a separate
 call after making the context.

That seems fairly awkward to me because the pointer needs to be
inherited from the parent context when not specified. I left the extra
API call in.

The inheritance is awkward anyway, though. If you create a tracked
context as a child of an already-tracked context, allocations in the
newer one won't count against the original. I don't see a way around
that without introducing even more performance problems.

If this patch is unacceptable, my only remaining idea is to add the
memory only for the current context with no inheritance (thereby
eliminating the branch, also). That will require HashAgg to traverse all
the child contexts to check whether the memory limit has been exceeded.
As Tomas pointed out, that could be a lot of work in the case of
array_agg with many groups.

Regards,
Jeff Davis

*** a/src/backend/utils/mmgr/aset.c
--- b/src/backend/utils/mmgr/aset.c
***
*** 438,451  AllocSetContextCreate(MemoryContext parent,
  	  Size initBlockSize,
  	  Size maxBlockSize)
  {
! 	AllocSet	context;
  
  	/* Do the type-independent part of context creation */
! 	context = (AllocSet) MemoryContextCreate(T_AllocSetContext,
! 			 sizeof(AllocSetContext),
! 			 AllocSetMethods,
! 			 parent,
! 			 name);
  
  	/*
  	 * Make sure alloc parameters are reasonable, and save them.
--- 438,482 
  	  Size initBlockSize,
  	  Size maxBlockSize)
  {
! 	/* inherit 'track_mem' from parent context, if available */
! 	uint64 *track_mem = (parent == NULL) ? NULL : parent-track_mem;
! 
! 	return AllocSetContextCreateTracked(parent, name, minContextSize,
! 		initBlockSize, maxBlockSize,
! 		track_mem);
! }
! 
! /*
!  * AllocSetContextCreateTracked
!  *		Create a new AllocSet context, tracking total memory usage.
!  *
!  * parent: parent context, or NULL if top-level context
!  * name: name of context (for debugging --- string will be copied)
!  * minContextSize: minimum context size
!  * initBlockSize: initial allocation block size
!  * maxBlockSize: maximum allocation block size
!  * track_mem: caller-supplied variable to use for memory tracking
!  */
! MemoryContext
! AllocSetContextCreateTracked(MemoryContext parent,
! 			 const char *name,
! 			 Size minContextSize,
! 			 Size initBlockSize,
! 			 Size maxBlockSize,
! 			 uint64 *track_mem)
! {
! 	AllocSet			set;
! 	MemoryContext		context;
  
  	/* Do the type-independent part of context creation */
! 	context = MemoryContextCreate(T_AllocSetContext,
!   sizeof(AllocSetContext),
!   AllocSetMethods,
!   parent,
!   name,
!   track_mem);
! 
! 	set = (AllocSet) context;
  
  	/*
  	 * Make sure alloc parameters are reasonable, and save them.
***
*** 459,467  AllocSetContextCreate(MemoryContext parent,
  	if (maxBlockSize  initBlockSize)
  		maxBlockSize = initBlockSize;
  	Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */
! 	context-initBlockSize = initBlockSize;
! 	context-maxBlockSize = maxBlockSize;
! 	context-nextBlockSize = initBlockSize;
  
  	/*
  	 * Compute the allocation chunk size limit for this context.  It can't be
--- 490,498 
  	if (maxBlockSize  initBlockSize)
  		maxBlockSize = initBlockSize;
  	Assert(AllocHugeSizeIsValid(maxBlockSize)); /* must be safe to double */
! 	set-initBlockSize = initBlockSize;
! 	set-maxBlockSize = maxBlockSize;
! 	set-nextBlockSize = initBlockSize;
  
  	/*
  	 * Compute the allocation chunk size limit for this context.  It can't be
***
*** 477,486  AllocSetContextCreate(MemoryContext parent,
  	 * and actually-allocated sizes of any chunk must be on the same side of
  	 * the limit, else we get confused about whether the chunk is big.
  	 */
! 	context-allocChunkLimit = ALLOC_CHUNK_LIMIT;
! 	while ((Size) (context-allocChunkLimit + ALLOC_CHUNKHDRSZ) 
  		 

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-15 Thread Simon Riggs
On 27 September 2014 03:55, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Sep 26, 2014 at 11:47 AM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:

 Gavin Flower wrote:

  Curious: would it be both feasible and useful to have multiple
  workers process a 'large' table, without complicating things too
  much?  The could each start at a different position in the file.

 Feasible: no.  Useful: maybe, we don't really know.  (You could just as
 well have a worker at double the speed, i.e. double vacuum_cost_limit).


 Vacuum_cost_delay is already 0 by default.  So unless you changed that,
 vacuum_cost_limit does not take effect under vacuumdb.

 It is pretty easy for vacuum to be CPU limited, and even easier for analyze
 to be CPU limited (It does a lot of sorting).  I think analyzing is the main
 use case for this patch, to shorten the pg_upgrade window.  At least, that
 is how I anticipate using it.

I've been trying to review this thread with the thought what does
this give me?. I am keen to encourage contributions and also keen to
extend our feature set, but I do not wish to complicate our code base.
Dilip's developments do seem to be good quality; what I question is
whether we want this feature.

This patch seems to allow me to run multiple VACUUMs at once. But I
can already do this, with autovacuum.

Is there anything this patch can do that cannot be already done with autovacuum?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve automatic analyze messages for inheritance trees

2014-10-15 Thread Simon Riggs
On 6 October 2014 11:07, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:
 I noticed that analyze messages shown by autovacuum don't discriminate
 between non-inherited cases and inherited cases, as shown in the below
 example:

 LOG:  automatic analyze of table postgres.public.pt system usage: CPU
 0.00s/0.01u sec elapsed 0.06 sec
 LOG:  automatic analyze of table postgres.public.pt system usage: CPU
 0.00s/0.02u sec elapsed 0.11 sec

 (The first one is for table postgres.public.pt and the second one is
 for table inheritance tree postgres.public.pt.)

 So, I'd like to propose improving the messages for inherited cases, in
 order to easily distinguish such cases from non-inherited cases.  Please
 find attached a patch.  I'll add this to the upcoming CF.

Thanks for the suggestion. It seems like a useful addition.

Existing log analysis may wish to see the automatic analyze of table
on each row.
So it would be good to keep
   automatic analyze of table \%s.%s.%s\

Can we add some words after this to indicate inheritance? (I have no
suggestions at present)
e.g.
  automatic analyze of table \%s.%s.%s\ (new words go here)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-15 Thread Brightwell, Adam
All,

The attached patch for review implements a directory permission system that
allows for providing a directory read/write capability to directories for
COPY TO/FROM and Generic File Access Functions to non-superusers.  This is
not a complete solution as it does not currently contain documentation or
regression tests.  Though it is my hopes to get some feedback as I am sure
there are some aspects of it that need to be reworked.  So I thought I'd
put it out there for comments/review.

The approach taken is to create directory aliases that have a unique name
and path, as well as an associated ACL list.  A superuser can create a new
alias to any directory on the system and then provide READ or WRITE
permissions to any non-superuser.  When a non-superuser then attempts to
execute a COPY TO/FROM or any one of the generic file access functions, a
permission check is performed against the aliases for the user and target
directory.  Superusers are allowed to bypass all of these checks.  All
alias paths must be an absolute path in order to avoid potential risks.
However, in the generic file access functions superusers are still allowed
to execute the functions with a relative path where non-superusers are
required to provide an absolute path.

- Implementation Details -

System Catalog:
pg_diralias
 - dirname - the name of the directory alias
 - dirpath - the directory path - must be absolute
 - diracl - the ACL for the directory

Syntax:
CREATE DIRALIAS name AS 'path'
ALTER DIRALIAS name AS 'path'
ALTER DIRALIAS name RENAME TO new_name
DROP DIRALIAS name

This is probably the area that I would really appreciate your thoughts and
recommendations. To GRANT permissions to a directory alias, I had to create
a special variant of GRANT since READ and WRITE are not reserved keywords
and causes grammar issues.  Therefore, I chose to start with the following
syntax:

GRANT ON DIRALIAS name permissions TO roles

where permissions is either READ, WRITE or ALL.

Any comments, suggestions or feedback would be greatly appreciated.

Thanks,
Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index b257b02..8cdc5cb 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -41,7 +41,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \
 	pg_foreign_table.h pg_rowsecurity.h \
 	pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \
-	toasting.h indexing.h \
+	pg_diralias.h toasting.h indexing.h \
 )
 
 # location of Catalog.pm
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index d30612c..3717bf5 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -30,6 +30,7 @@
 #include catalog/pg_collation.h
 #include catalog/pg_conversion.h
 #include catalog/pg_database.h
+#include catalog/pg_diralias.h
 #include catalog/pg_default_acl.h
 #include catalog/pg_event_trigger.h
 #include catalog/pg_extension.h
@@ -48,6 +49,7 @@
 #include catalog/pg_ts_config.h
 #include catalog/pg_ts_dict.h
 #include commands/dbcommands.h
+#include commands/diralias.h
 #include commands/proclang.h
 #include commands/tablespace.h
 #include foreign/foreign.h
@@ -3183,6 +3185,190 @@ ExecGrant_Type(InternalGrant *istmt)
 	heap_close(relation, RowExclusiveLock);
 }
 
+/*
+ * ExecuteGrantDirAliasStmt
+ *   handles the execution of the GRANT/REVOKE ON DIRALIAS command.
+ *
+ * stmt - the GrantDirAliasStmt that describes the directory aliases and
+ *permissions to be granted/revoked.
+ */
+void
+ExecuteGrantDirAliasStmt(GrantDirAliasStmt *stmt)
+{
+	Relation		pg_diralias_rel;
+	Oidgrantor;
+	List		   *grantee_ids = NIL;
+	AclMode			permissions;
+	ListCell	   *item;
+
+	/* Must be superuser to grant directory alias permissions */
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg(must be superuser to grant directory alias permissions)));
+
+	/*
+	 * Grantor is optional.  If it is not provided then set it to the current
+	 * user.
+	 */
+	if (stmt-grantor)
+		grantor = get_role_oid(stmt-grantor, false);
+	else
+		grantor = GetUserId();
+
+	/* Convert grantee names to oids */
+	foreach(item, stmt-grantees)
+	{
+		PrivGrantee *grantee = (PrivGrantee *) lfirst(item);
+
+		if (grantee-rolname == NULL)
+			grantee_ids = lappend_oid(grantee_ids, ACL_ID_PUBLIC);
+		else
+		{
+			Oid roleid = get_role_oid(grantee-rolname, false);
+			grantee_ids = lappend_oid(grantee_ids, roleid);
+		}
+	}
+
+	permissions = ACL_NO_RIGHTS;
+
+	/* If ALL was provided then set permissions to ACL_ALL_RIGHTS_DIRALIAS */
+	if (stmt-permissions == NIL)
+		permissions = ACL_ALL_RIGHTS_DIRALIAS;
+	else
+	{
+		/* Condense all permissions */
+		foreach(item, stmt-permissions)
+		{
+			AccessPriv *priv = (AccessPriv *) 

Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-15 Thread Tom Lane
Brightwell, Adam adam.brightw...@crunchydatasolutions.com writes:
 The attached patch for review implements a directory permission system that
 allows for providing a directory read/write capability to directories for
 COPY TO/FROM and Generic File Access Functions to non-superusers.

TBH, this sounds like it's adding a lot of mechanism and *significant*
risk of unforeseen security issues in order to solve a problem that we
do not need to solve.  The field demand for such a feature is just about
indistinguishable from zero.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CREATE POLICY and RETURNING

2014-10-15 Thread Fujii Masao
Hi,

While I was checking the behavior of RLS, I found that the policy for SELECT
doesn't seem to be applied to RETURNING. Is this intentional? Please see
the following example.

CREATE ROLE foo LOGIN NOSUPERUSER;
CREATE TABLE hoge AS SELECT col FROM generate_series(1,10) col;
ALTER TABLE hoge ENABLE ROW LEVEL SECURITY;
GRANT SELECT, DELETE ON hoge TO foo;
CREATE POLICY hoge_select_policy ON hoge FOR SELECT TO foo USING (col  4);
CREATE POLICY hoge_delete_policy ON hoge FOR DELETE TO foo USING (col  8);
\c - foo
DELETE FROM hoge WHERE col = 6 RETURNING *;

The policy hoge_select_policy should disallow the user foo to see the row
with col = 6. But the last DELETE RETURNING returns that row.

One minor suggestion is: what about changing the message as follows?
There are two more similar messages in policy.c, and they use the word
row-policy instead of policy. For the consistency, I think that
the following also should use the word row-policy.

diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 3627539..df45385 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -551,7 +551,7 @@ CreatePolicy(CreatePolicyStmt *stmt)
if (HeapTupleIsValid(rsec_tuple))
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_OBJECT),
-errmsg(policy \%s\ for relation
\%s\ already exists,
+errmsg(row-policy \%s\ for
relation \%s\ already exists,

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Locking for Rename To new_name works differently for different objects

2014-10-15 Thread Amit Kapila
On Wed, Oct 15, 2014 at 9:34 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Oct 15, 2014 at 10:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Amit Kapila amit.kapil...@gmail.com writes:
  I have observed that for renaming some of the objects
  AccessExclusiveLock is taken on object whereas for
  other kind of objects no lock is taken on object before
  renaming the object.
 
  The usual theory for DDL updates of all types (not just rename)
  is that an explicit lock is only needed for objects whose catalog
  representation comprises more than one row.  Otherwise, the implicit
  locking involved in updating that row is sufficient to serialize
  different updates.

I am not sure if this rule is followed for all DDL's, as an example, I
tried to compare for FUNCTION and SCHEMA.

Function has entries in pg_proc, pg_depend (schema, language,
returntype, etc.), so by above theory any update should take explicit
lock which I think holds good whereas if we see for Schema, it has
entries in pg_namespace,  pg_depend (owner), but it doesn't take
explicit lock during Rename.

Yet another anomaly is for Comment on Object .. there is only
one row in pg_description, however it still takes explicit lock to
avoid concurrent activity which looks right to me.

 That's an interesting point that I hadn't considered, but I'm willing
 to believe that at least some of the differences might also be
 haphazard.

Yeah, I also think so, is it important enough that we spend energy to
find all such differences and fix them.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-10-15 Thread Amit Kapila
On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs si...@2ndquadrant.com wrote:


 I've been trying to review this thread with the thought what does
 this give me?. I am keen to encourage contributions and also keen to
 extend our feature set, but I do not wish to complicate our code base.
 Dilip's developments do seem to be good quality; what I question is
 whether we want this feature.

 This patch seems to allow me to run multiple VACUUMs at once. But I
 can already do this, with autovacuum.

 Is there anything this patch can do that cannot be already done with
autovacuum?

The difference lies in the fact that vacuumdb (or VACUUM) gives
the option to user to control the vacuum activity for cases when
autovacuum doesn't suffice the need, one of the example is to perform
vacuum via vacuumdb after pg_upgrade or some other maintenance
activity (as mentioned by Jeff upthread).  So I think in all such cases
having parallel option can give benefit in terms of performance which
is already shown by Dilip upthread by running some tests (with and
without patch).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] [Segmentation fault] pg_dump binary-upgrade fail for type without element

2014-10-15 Thread Rushabh Lathia
Hi All,

pg_dump binary-upgrade fail with segmentation fault for type without
element.

Consider the following testcase:

rushabh@postgresql$ ./db/bin/psql postgres
psql (9.5devel)
Type help for help.

postgres=# drop type typ;
DROP TYPE
postgres=# create type typ as ();
CREATE TYPE
postgres=# \q
rushabh@postgresql$ ./db/bin/pg_dump postgres --binary-upgrade
pg_dump: row number 0 is out of range 0..-1
Segmentation fault (core dumped)

Stake trace:

(gdb) bt
#0  0x003a2cc375f2 in strtoull_l_internal () from /lib64/libc.so.6
#1  0x00417a08 in dumpCompositeType (fout=0x1365200,
tyinfo=0x13b1340) at pg_dump.c:9356
#2  0x004156a2 in dumpType (fout=0x1365200, tyinfo=0x13b1340) at
pg_dump.c:8449
#3  0x00414b08 in dumpDumpableObject (fout=0x1365200,
dobj=0x13b1340) at pg_dump.c:8135
#4  0x004041f8 in main (argc=3, argv=0x7fff838ff6e8) at
pg_dump.c:812

Into dumpCompositeType(), query fetch the elements for the composite type,
but in case there are no elements for the type then it returns zero rows. In
the following code block:

if (binary_upgrade)
{
Oidtyprelid = atooid(PQgetvalue(res, 0, i_typrelid));

binary_upgrade_set_type_oids_by_type_oid(fout, q,
 tyinfo-dobj.catId.oid);
binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
}

it fetching the typrelid which require for binary_upgrade. But in case query
is returning zero rows (for the composite type without element) is failing.

Looking further into code I found that rather then fetching typrelid, we can
use the already stored typrelid from TypeInfo structure.

Following commit added code related to binary_upgrade:

commit 28f6cab61ab8958b1a7dfb019724687d92722538
Author: Bruce Momjian br...@momjian.us
Date:   Wed Jan 6 05:18:18 2010 +

binary upgrade:

Preserve relfilenodes for views and composite types --- even though we
don't store data in, them, they do consume relfilenodes.

Bump catalog version.

PFA patch to fix the issue. I think this need to fix in the back branch as
well
because its effecting pg_upgrade. Fix should backport till PG91, as on PG90
it was not allowed to create type without element.

postgres=# select version();

version
---
 PostgreSQL 9.0.18 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
(1 row)
postgres=# create type typ as ();
ERROR:  syntax error at or near )
LINE 1: create type typ as ();
^

Regards,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2915329..64d3856 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -9290,7 +9290,6 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 	int			i_attalign;
 	int			i_attisdropped;
 	int			i_attcollation;
-	int			i_typrelid;
 	int			i;
 	int			actual_atts;
 
@@ -9311,8 +9310,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 			pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, 
 		  a.attlen, a.attalign, a.attisdropped, 
 		  CASE WHEN a.attcollation  at.typcollation 
-		  THEN a.attcollation ELSE 0 END AS attcollation, 
-		  ct.typrelid 
+		  THEN a.attcollation ELSE 0 END AS attcollation 
 		  FROM pg_catalog.pg_type ct 
 JOIN pg_catalog.pg_attribute a ON a.attrelid = ct.typrelid 
 	LEFT JOIN pg_catalog.pg_type at ON at.oid = a.atttypid 
@@ -9330,8 +9328,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 		appendPQExpBuffer(query, SELECT a.attname, 
 			pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, 
 		  a.attlen, a.attalign, a.attisdropped, 
-		  0 AS attcollation, 
-		  ct.typrelid 
+		  0 AS attcollation 
 	 FROM pg_catalog.pg_type ct, pg_catalog.pg_attribute a 
 		  WHERE ct.oid = '%u'::pg_catalog.oid 
 		  AND a.attrelid = ct.typrelid 
@@ -9349,15 +9346,12 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
 	i_attalign = PQfnumber(res, attalign);
 	i_attisdropped = PQfnumber(res, attisdropped);
 	i_attcollation = PQfnumber(res, attcollation);
-	i_typrelid = PQfnumber(res, typrelid);
 
 	if (binary_upgrade)
 	{
-		Oid			typrelid = atooid(PQgetvalue(res, 0, i_typrelid));
-
 		binary_upgrade_set_type_oids_by_type_oid(fout, q,
  tyinfo-dobj.catId.oid);
-		binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
+		binary_upgrade_set_pg_class_oids(fout, q, tyinfo-typrelid, false);
 	}
 
 	qtypname = pg_strdup(fmtId(tyinfo-dobj.name));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-15 Thread Craig Ringer
On 10/16/2014 12:25 PM, Fujii Masao wrote:
 Hi,
 
 While I was checking the behavior of RLS, I found that the policy for SELECT
 doesn't seem to be applied to RETURNING. Is this intentional? 

This is why I was opposed to having a SELECT policy at all. It should
be VISIBLE, INSERT, UPDATE, DELETE.

I say VISIBLE instead of READ because I don't think the rows
affected by an UPDATE or DELETE should be affected by whether or not
they have a RETURNING clause. That's IMO nonsensical.and violates the
usual expectations about which clauses can have filtering effects.

So the read-filtering policy should apply to all statements. Not just
SELECT.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Segmentation fault] pg_dump binary-upgrade fail for type without element

2014-10-15 Thread Rushabh Lathia
PFA patch patch for the master branch.

On Thu, Oct 16, 2014 at 11:09 AM, Rushabh Lathia rushabh.lat...@gmail.com
wrote:

 Hi All,

 pg_dump binary-upgrade fail with segmentation fault for type without
 element.

 Consider the following testcase:

 rushabh@postgresql$ ./db/bin/psql postgres
 psql (9.5devel)
 Type help for help.

 postgres=# drop type typ;
 DROP TYPE
 postgres=# create type typ as ();
 CREATE TYPE
 postgres=# \q
 rushabh@postgresql$ ./db/bin/pg_dump postgres --binary-upgrade
 pg_dump: row number 0 is out of range 0..-1
 Segmentation fault (core dumped)

 Stake trace:

 (gdb) bt
 #0  0x003a2cc375f2 in strtoull_l_internal () from /lib64/libc.so.6
 #1  0x00417a08 in dumpCompositeType (fout=0x1365200,
 tyinfo=0x13b1340) at pg_dump.c:9356
 #2  0x004156a2 in dumpType (fout=0x1365200, tyinfo=0x13b1340) at
 pg_dump.c:8449
 #3  0x00414b08 in dumpDumpableObject (fout=0x1365200,
 dobj=0x13b1340) at pg_dump.c:8135
 #4  0x004041f8 in main (argc=3, argv=0x7fff838ff6e8) at
 pg_dump.c:812

 Into dumpCompositeType(), query fetch the elements for the composite type,
 but in case there are no elements for the type then it returns zero rows.
 In
 the following code block:

 if (binary_upgrade)
 {
 Oidtyprelid = atooid(PQgetvalue(res, 0, i_typrelid));

 binary_upgrade_set_type_oids_by_type_oid(fout, q,
  tyinfo-dobj.catId.oid);
 binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
 }

 it fetching the typrelid which require for binary_upgrade. But in case
 query
 is returning zero rows (for the composite type without element) is failing.

 Looking further into code I found that rather then fetching typrelid, we
 can
 use the already stored typrelid from TypeInfo structure.

 Following commit added code related to binary_upgrade:

 commit 28f6cab61ab8958b1a7dfb019724687d92722538
 Author: Bruce Momjian br...@momjian.us
 Date:   Wed Jan 6 05:18:18 2010 +

 binary upgrade:

 Preserve relfilenodes for views and composite types --- even though we
 don't store data in, them, they do consume relfilenodes.

 Bump catalog version.

 PFA patch to fix the issue. I think this need to fix in the back branch as
 well
 because its effecting pg_upgrade. Fix should backport till PG91, as on PG90
 it was not allowed to create type without element.

 postgres=# select version();

 version

 ---
  PostgreSQL 9.0.18 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
 (1 row)
 postgres=# create type typ as ();
 ERROR:  syntax error at or near )
 LINE 1: create type typ as ();
 ^

 Regards,
 Rushabh Lathia
 www.EnterpriseDB.com




-- 
Rushabh Lathia
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c56a4cb..c528577 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -9269,7 +9269,6 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 	int			i_attalign;
 	int			i_attisdropped;
 	int			i_attcollation;
-	int			i_typrelid;
 	int			i;
 	int			actual_atts;
 
@@ -9290,8 +9289,7 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 			pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, 
 		  a.attlen, a.attalign, a.attisdropped, 
 		  CASE WHEN a.attcollation  at.typcollation 
-		  THEN a.attcollation ELSE 0 END AS attcollation, 
-		  ct.typrelid 
+		  THEN a.attcollation ELSE 0 END AS attcollation 
 		  FROM pg_catalog.pg_type ct 
 JOIN pg_catalog.pg_attribute a ON a.attrelid = ct.typrelid 
 	LEFT JOIN pg_catalog.pg_type at ON at.oid = a.atttypid 
@@ -9309,8 +9307,7 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 		appendPQExpBuffer(query, SELECT a.attname, 
 			pg_catalog.format_type(a.atttypid, a.atttypmod) AS atttypdefn, 
 		  a.attlen, a.attalign, a.attisdropped, 
-		  0 AS attcollation, 
-		  ct.typrelid 
+		  0 AS attcollation 
 	 FROM pg_catalog.pg_type ct, pg_catalog.pg_attribute a 
 		  WHERE ct.oid = '%u'::pg_catalog.oid 
 		  AND a.attrelid = ct.typrelid 
@@ -9328,15 +9325,12 @@ dumpCompositeType(Archive *fout, DumpOptions *dopt, TypeInfo *tyinfo)
 	i_attalign = PQfnumber(res, attalign);
 	i_attisdropped = PQfnumber(res, attisdropped);
 	i_attcollation = PQfnumber(res, attcollation);
-	i_typrelid = PQfnumber(res, typrelid);
 
 	if (dopt-binary_upgrade)
 	{
-		Oid			typrelid = atooid(PQgetvalue(res, 0, i_typrelid));
-
 		binary_upgrade_set_type_oids_by_type_oid(fout, q,
  tyinfo-dobj.catId.oid);
-		binary_upgrade_set_pg_class_oids(fout, q, typrelid, false);
+		binary_upgrade_set_pg_class_oids(fout, q, tyinfo-typrelid, false);
 	}
 
 	qtypname = pg_strdup(fmtId(tyinfo-dobj.name));

-- 
Sent via 

Re: [HACKERS] Improve automatic analyze messages for inheritance trees

2014-10-15 Thread Etsuro Fujita

(2014/10/16 11:45), Simon Riggs wrote:

On 6 October 2014 11:07, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:

I noticed that analyze messages shown by autovacuum don't discriminate
between non-inherited cases and inherited cases, as shown in the below
example:

LOG:  automatic analyze of table postgres.public.pt system usage: CPU
0.00s/0.01u sec elapsed 0.06 sec
LOG:  automatic analyze of table postgres.public.pt system usage: CPU
0.00s/0.02u sec elapsed 0.11 sec

(The first one is for table postgres.public.pt and the second one is
for table inheritance tree postgres.public.pt.)

So, I'd like to propose improving the messages for inherited cases, in
order to easily distinguish such cases from non-inherited cases.  Please
find attached a patch.  I'll add this to the upcoming CF.


Thanks for the suggestion. It seems like a useful addition.

Existing log analysis may wish to see the automatic analyze of table
on each row.
So it would be good to keep
automatic analyze of table \%s.%s.%s\


Agreed.


Can we add some words after this to indicate inheritance? (I have no
suggestions at present)
e.g.
   automatic analyze of table \%s.%s.%s\ (new words go here)


How about this?

automatic analyze of table \%s.%s.%s\ as inheritance tree

Thank you for the comment.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2014-10-15 Thread Craig Ringer
On 10/16/2014 01:44 PM, Craig Ringer wrote:
 So the read-filtering policy should apply to all statements. Not just
 SELECT.

Oh, IIRC one wrinkle in the prior discussion about this was that doing
this will prevent the implementation of policies that permit users to
update/delete rows they cannot otherwise see.

That's an argument in favour of only applying a read-filtering policy
where a RETURNING clause is present, but that introduces the surprise!
the effects of your DELETE changed based on an unrelated clause! issue.

Keep in mind, when considering RETURNING, that users don't always add
this clause directly. PgJDBC will tack a RETURNING clause on the end of
a statement if the user requests generated keys, for example. They will
be very surprised if the behaviour of their DML changes based on whether
or not they asked to get generated keys.

To my mind having behaviour change based on RETURNING is actively wrong,
wheras policies that permit rows to be updated/deleted but not selected
are a nice-to-have at most.

I'd really like to see some more coverage of the details of how these
policies apply to inheritance, both the read- and write- sides of DML
with RETURNING clauses, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers