Re: [HACKERS] 10.0

2016-06-21 Thread José Luis Tallón

On 06/20/2016 10:14 PM, Robert Haas wrote:

On Mon, Jun 20, 2016 at 4:00 PM, David G. Johnston
 wrote:

10.x is the desired output.

10.x is the output that some people desire.


(explicitly skipped up-thread to add this -- please forgive my jumping in)

Since we are still (as a community) debating this, I felt the need to 
add yet another possibility ...

(/me dons some asbestos underwear)

next version:
100100 [server_version] will be  (gasp!)  10g R1 (for marketing people) 
or simply 10.1.0 for the rest of the world.

Next update would be 1001001, a.k.a. 10g R1 u1

i.e. we would skip 10.0 alltogether, preserving everybody's sanity 
in the process.



IMV,  10g R1 uX should plainly be a different *major* release than 10g 
R2 uY

and hence requiring a proper "migration" (i.e. pg_upgrade)
...or at least that's what several multi-billion software companies 
have taught most everybody to believe :$




A significant number of
people, including me, would prefer to stick with the current
three-part versioning scheme, possibly with some change to the
algorithm for bumping the first digit (e.g. every 5 years like
clockwork).
That's another story  Either 5 (for historical reasons) or 10 year 
(sounds good)
 ... or whenever it makes sense! (such as when a multi-modal, hybrid 
row+column, vectored, MPP, super-duper-distributed, automagically 
self-scaling or the like release is ready)



Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Protocol buffer support for Postgres

2016-04-26 Thread José Luis Tallón

On 04/26/2016 08:06 AM, 陈天舟 wrote:
I am interested in adding Protocol Buffer support for Postgres. 
Protocol Buffer occupies less space than JSON. More importantly, it 
has schema and is forward/backward compatible. All these make it a 
very good format for persistency.


Have you investigated JSONB vs ProtoBuf space usage ?
(the key being  the "B" -- Postgres' own binary JSON implementation)


The "per-column schema" thing sounds difficult to do without major 
changes to the core unless/until we have generalized user-defined 
metadata for objects 



Just my .02€

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-13 Thread José Luis Tallón

On 04/13/2016 04:43 PM, Craig Ringer wrote:
On 13 April 2016 at 22:11, José Luis Tallón 
<jltal...@adv-solutions.net <mailto:jltal...@adv-solutions.net>> wrote:

[snip]

I can certainly prepare a small patch for the first commitfest of
9.7 if this sounds viable.


I'd be surprised if it was popular.


I am familiar with some cases where it would have been a lifesaver

It's hard to imagine a way to do it robustly when dealing with 
pre-lexer input, unless you're doing simple pattern matching to 
identify and replace whole statements.


INDEED. No intention whatsoever to do much more than that  O:-)


/ J.L.



Re: [HACKERS] Parser extensions (maybe for 10?)

2016-04-13 Thread José Luis Tallón

On 04/12/2016 06:45 AM, Craig Ringer wrote:
On 12 April 2016 at 12:36, Arcadiy Ivanov > wrote:



Is there any interest and/or tips to allow a pluggable parser or
at least allow some syntactical pluggability by extensions?
I think this may allow some projects to move towards becoming an
extension as opposed to forking the project entirely.



FWIW, I have previously sketched a "syntax rewriter" of sorts: a simple 
filter which is applied to input before the lexer even sees it.
Quite some "syntax magic" can be achieved by allowing an extension to 
*hook* into this functionality in order to do some rewriting; If turning 
one statement into several (multi-statement) is also allowed quite some 
serious emulation could be achieved.


I can certainly prepare a small patch for the first commitfest of 9.7 if 
this sounds viable.




Thanks,

/ J.L.



Re: [HACKERS] Default Roles

2016-04-07 Thread José Luis Tallón

On 04/07/2016 09:50 PM, Stephen Frost wrote:

Robert, José,

I've rebased this on top of master and added a few additional checks and
regression tests.


Applies and compiles cleanly, of course. Passes all 164 tests, too.
- make installcheck-world ok
- interdiff checked, nothing very surprising

*Tests:
 using "pg_abcdef"  (very unlikely to ever exist) is indeed better than 
using "pg_backup" to test system 'reservedness'


*Documentation: changes seem to make it less repetitive regarding 
"pg_signal_backend". Should reduce diff size when future system roles 
get added ;)


*Code:

Spotted the addedif (strncmp(*newval, "pg_", 3) == 0)
at src/backend/commands/variable.c
(plus pre-existing) src/bin/pg_dump/pg_dumpall.c

I hadn't realized it could be needed there... I'm not familiar enough 
with the code just yet.


I reckon there's no need to add a separate helper to check this at the 
moment; might be needed later, when the superuser review patches get 
merged :)



I'm planning to continue going over the patch tomorrow morning with
plans to push this before the feature freeze deadline.


Good. Thank you for the effort.

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [CommitFest App] Feature request -- review e-mail additions

2016-03-30 Thread José Luis Tallón

Hello,

Just wanted to suggest two minor mods to the review e-mails 
auto-generated by the app:


* Prepend a [review] tag to the e-mail subject
... so that e-mails sent to -hackers will read  " [HACKERS] 
[review] "


* Auto-CC the patch author on this e-mail
I guess this should speed up reactions / make communication a bit 
more fluid.



Dunno whether it'd be appreciated if I posted a tentative patch ¿?


Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-03-30 Thread José Luis Tallón

On 03/30/2016 06:14 PM, Robert Haas wrote:
So basically the use of the ENCRYPTED keyword means "if it does 
already seem to be the sort of MD5 blob we're expecting, turn it into 
that". 


If it does NOT already seem to be... I guess?

And we just rely on the format to distinguish between an MD5 verifier 
and an unencrypted password. Personally, I think a good start here, 
and I think you may have something like this in the patch already, 
would be to split rolpassword into two columns, say rolencryption and 
rolpassword. 


This inches closer to Michael's suggestion to have multiple verifiers 
per pg_authid user ...


rolencryption says how the password verifier is encrypted and 
rolpassword contains the verifier itself. Initially, rolencryption 
will be 'plain' or 'md5', but later we can add 'scram' as another 
choice, or maybe it'll be more specific like 'scram-hmac-doodad'.


May I suggest using  "{" ["."] "}" just like Dovecot does?

e.g. "{md5.hex}e748797a605a1c95f3d6b5f140b2d528"

where no "{ ... }" prefix means just fallback to the old method of 
trying to guess what the blob contains?
This would invalidate PLAIN passwords beginning with "{", though, 
so some measures would be needed.


And then maybe introduce syntax like this: alter user rhaas set 
password 'raw-unencrypted-passwordt' using 'verifier-method'; alter 
user rhaas set password verifier 'verifier-goes-here' using 
'verifier-method'; That might require making verifier a key word, 
which would be good to avoid. Perhaps we could use "password 
validator" instead? 


I'd like USING best ... though by prepending the schema for ENCRYPTED, 
the required information is already conveyed within the verifier, so no 
need to specify it again :)



Just my .02€


/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Roles

2016-03-30 Thread José Luis Tallón
If this gets into 9.6, we give users another full release cycle to 
ensure there are no reserved rolenames in use.
Then, I reckon that the additional roles/system-role-based fine-grained 
authorization could go in for 9.7 without much trouble -- this is badly 
needed, IMHO


Thank you, Stephen and all others who provided feedback.


On 03/30/2016 01:14 PM, Jose Luis Tallon wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

* Applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
* ./configure && make -j4 ok
[snip]

Looking forward to seeing the other proposed default roles in!


The new status of this patch is: Ready for Committer





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How can we expand PostgreSQL ecosystem?

2016-03-07 Thread José Luis Tallón

On 03/07/2016 07:30 AM, Tsunakawa, Takayuki wrote:

From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Mark Kirkwood
For cloud - in particular Openstack (which I am working with ATM), the
biggest thing would be:

- multi-master replication

or failing that:

- self managing single master failover (voting/quorum etc)


Hmm consul (optionally in a dockerized setup) ?
https://www.consul.io/


so that operators can essentially 'set and forget'. We currently use
Mysql+ Galera (multi master) and Mongodb (self managing single master)
and the convenience and simplicity is just so important (Openstack is a
huge complex collection of services - hand holding of any one service is
pretty much a non starter).

Yes, I was also asked whether PostgreSQL has any optional functionality like 
Galera Cluster for MySQL.  He was planning a scalable PaaS service which 
performs heavy reads and writes.  Demand exists.


AFAIK, Galera has its own set of drawbacks


HTH,

/ JL



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Auditing

2016-02-02 Thread José Luis Tallón

On 02/02/2016 02:05 AM, Curtis Ruck wrote:

[snip]

P.S., do you know what sucks, having a highly performant PostGIS 
database that works great, and being told to move to Oracle or SQL 
Server (because they have auditing).  Even though they charge extra 
for Geospatial support (seriously?) or when they don't even have 
geospatial support (10 years ago).  My customer would prefer to 
re-engineer software designed around PostgreSQL and pay the overpriced 
licenses, than not have auditing.  I agree that their cost analysis is 
probably way off, even 10 years later, my only solution would be to 
move to Oracle, SQL Server, a NoSQL solution, or pay EnterpriseDB for 
their 2 year old version that doesn't have all the cool/modern jsonb 
support.


Huh?  PPAS 9.5.0.5 is already out there since at least last week; Before 
that PPAS 9.4.5.y or so was there ...

(Not affiliated with EDB, but precision is important)

I agree that auditing is a big selling point and frequently used... But 
it's got to be done "the Postgres way", and that takes time (and usually 
provides superior results).



Just my .02€


/ J.L.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_controldata/pg_resetxlog "Latest checkpoint's NextXID" format

2015-12-30 Thread José Luis Tallón

On 12/30/2015 06:46 AM, Simon Riggs wrote:
On 30 December 2015 at 00:17, Joe Conway > wrote:


On 12/29/2015 07:15 AM, Tom Lane wrote:
> Yeah.  Use of the same x/y notation with two different bases
seems like
> a recipe for confusion.  It's probably too late to do anything about
> this for 9.5, but I'd be +1 for adopting Jose's suggestion or some
> other formatting tweak in HEAD.

I made the "%u/%u" -> "%u:%u" change in the controldata patch I just
posted, but I suppose I should commit that separately. Any complaints
about that?


There is already long precedent about how to represent an XID with an 
epoch... and it is neither of those two formats.


http://www.postgresql.org/docs/devel/static/functions-info.html
"Table 9-63. Transaction IDs and Snapshots"
"The internal transaction ID type (xid) is 32 bits wide and wraps 
around every 4 billion transactions. However, these functions export a 
64-bit format that is extended with an "epoch" counter so it will not 
wrap around during the life of an installation."


So? If I am guessing correctly, you propose to use %llu (more precisely, 
UINT64_FORMAT) ?

(please correct me if I got it wrong)

IMHO, we have been telling users that XIDs are 32bits forever, so 
showing a 64bits int where an XID is expected can easily induce confusion.
Moreover, the "epoch : whatever" format is widely used (Debian & 
derivatives, just to name some) and understood...

... but I might be wrong.

Any format different from %X/%X seems better than what we have 
know, in any case



Thanks,

/ J.L.



Re: [HACKERS] pg_controldata/pg_resetxlog "Latest checkpoint's NextXID" format

2015-12-29 Thread José Luis Tallón

On 12/29/2015 01:18 PM, Heikki Linnakangas wrote:

On 29/12/15 07:14, Joe Conway wrote:

I wonder why "Latest checkpoint's NextXID" is formated like this:

8<-
printf(_("Latest checkpoint's NextXID:  %u/%u\n"),
   ControlFile.checkPointCopy.nextXidEpoch,
   ControlFile.checkPointCopy.nextXid);
8<-

Shouldn't it use "%X/%X", same as e.g. "Prior checkpoint location" and
all the other XIDs?


No. The "locations" in the output are WAL locations. Those are 
customarily printed with %X/%X. But NextXID is a transaction ID, those 
are printed in decimal, with %u.


But Joe has a point here Others could also be confused if he doubted 
about this.


"Latest checkpoint's NextXID:  %u:%u\n", maybe ??
(slash becomes colon)

Worst part: fuzzies the translations unnecesarily


/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New email address

2015-11-26 Thread José Luis Tallón

On 11/26/2015 09:12 PM, Greg Stark wrote:

On Wed, Nov 25, 2015 at 6:55 PM, Tom Lane  wrote:

But my point was that while the RFC says what to put there there's
absolutely no reference anywhere for when the information should cause
any MUA or MTA to behave differently.

Agreed.  To my mind that's a reason why Sender should not be DKIM-signed.
Unfortunately, RFC 6376 explicitly suggests doing so ... and it looks like
some people are taking that advice.

Hm, I see it as a reason why signing Sender is reasonable. If it were
a functional header then there might be a reason it would have to be
changed. But if it's purely informational and the receiving MUA is
going to display to the user (which is a bad idea imho but Gmail and
Exchange both do it) then it makes sense to expect some authentication
for it. I think the thinking is basically "sign everything we're going
to present to the user phishers can't claim to be someone they're
not". In which case it's fairly important that things like Sender be
signed. Or that everyone agree it's just a useless header and stop
sending or displaying it.


From DMARC.org's Wiki:
<<< 2 Add an "Original Authentication Results" header to indicate you have
performed the authentication and you are validating it
3 Take ownership of the email, by removing the DKIM signature and 
putting your own
as well as changing the from header in the email to contain an email 
address

within your mailing list domain. >>>


Read elsewhere: "To allow for forwarding scenarios, DMARC also allows 
the *Display From* to be cryptographically signed by DKIM, and if any 
unauthorized spammer or phisher were to attempt to assume that identity, 
the encryption would fail."



I don't think we should base any action on guesses of what Gmail does.
Google may do something we don't expect that's more complex to work
around the problem. For one thing you can have email addresses at
Google from a number of domains so they may well be able to have more
than one policy for different users.

Yep

I would suggest we stop doing things that are obviously incompatible
with DKIM -- header and body munging for example. And I suspect we can
stop touching Sender without any ill effects too.

One idea might be to add a script to check a user's domain for
p=reject and send them a warning when subscribing (or sending mail to
the list?) warning them of the problem.

Definitively worth the effort, unless an almost perfect solution is found :S


/ J.L.



Re: [HACKERS] New email address

2015-11-24 Thread José Luis Tallón

On 11/24/2015 07:55 PM, Tom Lane wrote:

[snip]
The clearly critical thing, though, is that when forwarding a message from
a person at a DMARC-using domain, we would have to replace the From: line
with something @postgresql.org.  This is what gets it out from under the
original domain's DMARC policy.


One possibility that comes to mind:

- Remove the sender's DMARC headers+signature **after thoroughly 
checking it** (to minimize the amount of UBE/UCE/junk going in)
- Replace the sender's (i.e. 'From:' header) with 
list-sender+munched-em...@postgresql.org (VERP-ified address)


- Add the required headers, footers, change the subject line, etc

- DKIM-sign the resulting message with postgresql.org's keys before 
sending it

[snip]

If Rudy's right that Gmail is likely to start using p=reject DMARC policy,
we are going to have to do something about this before that; we have too
many people on gmail.  I'm not exactly in love with replacing From:
headers but there may be little alternative.  We could do something like
From: Persons Real Name 
Reply-To: ...
so that at least the person's name would still be readable in MUA
displays.

Yup


We'd have to figure out whether we want the Reply-To: to be the original
author or the list; as I recall, neither of those are fully satisfactory.
Or just strip it, though that trump the sender's explicit preference 
(expressed by setting the header)



I might be able to help a bit with implementation if needed.


/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Note about comparation PL/SQL packages and our schema/extensions

2015-11-05 Thread José Luis Tallón

On 11/05/2015 01:31 PM, Craig Ringer wrote:

On 5 November 2015 at 14:36, Pavel Stehule  wrote:
[snip]


2. The schema variables - a server side session (can be emulated now) and
server side local schema session variables (doesn't exist) is pretty useful
for storing some temp data or high frequent change data - and can
significantly increase speed of some use cases. Now we emulate it via PLPerl
shared array, but the encapsulation is missing.

This is the feature I feel we could really use.

I see *lots* of people emulating session variables by (ab)using custom
GUCs. The missing-ok variant of current_setting helps with this to the
point where it's fairly OK now.


AFAICS, (ab)using  custom GUCs is the "blessed" (by Tom, no less) way to 
do it...

See http://www.postgresql.org/message-id/16931.1172871...@sss.pgh.pa.us
 and really made possible in 9.4  :)

Though the "usual" @@ syntax would certainly help some users migrate 
over ...



The main advantage package variables have - IMO - are package
permissions. You can define a variable that is writeable only by
functions within a package. That's really handy for things like row
security since it lets you have variables you can only set via a
function that can do things like refuse to run again with different
args, validate input, etc. So you can do expensive work once, then
cheap row security checks against the preset variable. Or use it for
things like "current customer" settings when using pooled connections.


Some sort of "packages" ---in this sense--- could be implemented as 
extensions, but I guess a more integrated approach would be welcome.



It might make sense to extend custom GUCs for this rather than invent
a new mechanism, since GUCs have lots of useful properties like
global, db, user, session and transaction scoping, etc. I'm not really
sure... I just agree that it's a good idea to be able to have
something with similar capabilities to package variables. Especially
security properties.


3. The initialization routines - the routines called when any object from
schema is used first time.

... which is somewhat similar to having an "on session start" trigger.
Also an oft-wanted feature.


Frequently requested, only because one other database requires it for 
what we do with role-level configuration via GUCs.
The other use case I see would definitively be accomodated by having 
packages with the properties you describe above.


These properties might be even emulated via some clever extension 


Just my two cents.


Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patent warning about the Greenplum source code

2015-11-02 Thread José Luis Tallón

On 11/02/2015 02:41 PM, Simon Riggs wrote:
On 1 November 2015 at 07:47, Bruce Momjian > wrote:


On Sun, Nov  1, 2015 at 01:27:13AM -0500, Bruce Momjian wrote:
> On Fri, Oct 30, 2015 at 04:47:35AM -0400, Bruce Momjian wrote:
> > Therefore, I caution people from viewing the Greenplum source
code as
> > you might see patented ideas that could be later implemented in
> > Postgres, opening Postgres up to increased patent violation
problems.  I
> > am also concerned about existing community members who work for
> > Pivotal/Greenplum and therefore are required to view the
patented source
> > code.  The license issue might eventually be improved by
> > Pivotal/Greenplum, but, for now, I think caution is necessary.
> >
> > Of course, never mention known-patented ideas in any community
forum,
> > including this email list.
>
> I just found out that Citus Data has patent applications pending, so
> viewing Citus Data source code has the same problems as Greenplum.

Actually, it might only be their closed source software that contains
patents, i.e. not pg_shard.  I will check and report back when I can
unless someone else reports here first.


While you are doing that, please also check EnterpriseDB. My 
information is that there are patents filed there, so we must check 
that just as much as any other company or person. If you didn't know 
before, you do now.


I am disappointed that your approach to this appears unbalanced and 
partisan. Worse, Greenplum have been quite vocal about their 
intentions, so any feedback you have could easily have been given many 
months ago, not on the day of their announcement. I think you should 
have declared this situation in a very different way to the way you 
have approached this. 5 minutes thought on whether other companies 
might also have been affected would have been sensible, plus the whole 
thing could have been discussed completely offlist. If you do discuss 
things on-list then you should at least state for the record that you 
are an EnterpriseDB employee when discussing your concerns, since that 
is likely to have a material affect on how this situation is viewed by 
anyone worried by your post.


FWIW, Bruce has --for as long as I can remember-- always sent e-mail to 
the list including a signature similar to the following:

-
Bruce Momjian  http://momjian.us
EnterpriseDB http://enterprisedb.com
-

So I guess his affiliation with EnterpriseDB is pretty clear even 
to outsiders just perusing the archives.
Others' interests are IMHO not nearly as clear from their e-mails' 
contents, though.


(not that I have any particular voice/opinion on this matter anyway. I 
am precluded from taking a look at any such release for the time being 
for other reasons...)



I do thank you for all the time you devote to Postgres. All 
community members' contributions are very much appreciated.



/ J.L.



Re: [HACKERS] pg_basebackup and replication slots

2015-10-26 Thread José Luis Tallón

On 10/26/2015 12:58 PM, Joshua D. Drake wrote:

Hello,

The fact that pg_basebackup doesn't use replicaiton slots, is that a 
technical limitation or just a, "we need a patch"?




Given the rest of the thread any possibility whatsoever that it'd be 
backpatched into 9.5 before release?

Guess it'd be a very welcome addition


Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread José Luis Tallón

On 10/26/2015 05:49 AM, Amit Kapila wrote:
On Mon, Oct 26, 2015 at 9:03 AM, Любен Каравелов > wrote:

>
>
> - Цитат от Kisung Kim (ks...@bitnine.co.kr 
), на 26.10.2015 в 04:36 -

>
> > However, what I want to know is about the update performance 
difference

> > between PG and Oracle if there any.
> > The case I described is for exaggerating the difference between PG and
> > Oracle.
> >
> > I want to explain for our clients that PG's update performance is
> > comparable to Oracle's.
> >
>
> Oracle is also using MVCC but copies the old row in the rollback 
segment and

> rewrites the values in-place.

I think Oracle just copies the changed part of old row to rollback 
segment.

Also in Redo logs, it just writes the changed column value (both old and
new).  So for the case we are discussing in this thread (one changed
column out of 200 columns), Oracle will just write the old value of that
column in Redo and then in rollback segment, and write the new value
in Redo and then do the in-place update in heap row.


IMV, where Oracle is heavily optimized for "most DML transactions will 
commit successfully" and "no long-running transactions shall ever 
exists" / "not many transactions will have to read previous 
snapshots"(based on PI), Postgres does not actually make any such 
assumptions.


Hence, for long running transactions / massive concurrency-many 
clients reading and writing older snapshots, Postgres will be faster 
(less work to do compared to re-constructing rows based on PIs)


Plus, for updates where the size of the NEW row is bigger than the 
previous one (think adding text) the overhead is actually greater for 
Oracle (plus, they don't compress variable length values by default / no 
TOAST )... so here Postgres would be faster.
For text-intensive workloads, Postgres is measurably faster than 
Oracle mostly due to this fact (plus much more efficient in it use of 
storage/RAM...)




In PostgreSQL, the whole new row is written in heap and diff tuple 
(difference

of old and new tuple; this optimization is done in 9.4, commit a3115f0d)
in WAL.  I think we can try to optimize and use the same technique for
heap as used for WAL to make PostgreSQL more efficient for such 
scenario's,

however as of today, my guess is that PostgreSQL's update would be lagging
in this area.


Yup. But see above for a potential reason where it might not be that 
bad, especially after the optimization you mention.



> It is still 2 writes as in Postgres.

The difference is in the amount of data written per write.


Yes, but compressed (for varlena-based datum/data), batched 
(group-commit) so mostly sequential, and non-duplicated (WAL vs REDO+UNDO).


So I guess the difference is quite small nowadays, and differences will 
be heavily influenced by actual workload.



Just my 2 (euro-) cents.


/ J.L.



Re: [HACKERS] Duplicated assignment of slot_name in walsender.c

2015-10-22 Thread José Luis Tallón

On 10/22/2015 12:36 AM, Alvaro Herrera wrote:

Andres Freund wrote:


That seems fairly insignificant. For one this is a rather infrequent and
expensive operation, for another every decent compiler can optimize
those away. Note that those duplicate strlen() calls are there in a lot
of places in walsender.c

diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index c6043cd..5487cc0 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -762,10 +762,10 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr 
targetPagePtr, int req
  static void
  CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
  {
-   const char *slot_name;
const char *snapshot_name = NULL;
charxpos[MAXFNAMELEN];
StringInfoData buf;
+   int len;


Surely"size_t len" ?
Or am I missing some platform where size_t is not defined ?

Minor nitpicking, of course. But once we are cleaning the code up, 
might as well change this too



Thanks!

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2015-09-16 Thread José Luis Tallón

On 09/15/2015 06:57 PM, Anastasia Lubennikova wrote:

Proposal Clarification.
I see that discussion become too complicated. So, I'd like to clarify 
what we are talking about.


[snip]
What are we doing now:
CREATE UNIQUE INDEX on tbl(f1,f2);
CREATE INDEX on tbl(f1, f2, f3, f4);

[snip]
Suggestions.
CREATE INDEX idx ON tbl (f1, f2, f3) [UNIQUE ON (f1, f2)] [INCLUDE (f4)];


Summarizing some suggestions upthread, it seems like the "best" syntax 
would be something similar to:


-- Non-unique index + "leaf" information (f4)
CREATE INDEX idx ON tbl (f1, f2, f3) [INCLUDING (f4)]

-- Unique index on f1,f2, + leaf information (f3)
CREATE UNIQUE INDEX idx ON tbl (f1, f2) [INCLUDING (f3)]

And, even:
ALTER INDEX idx INCLUDING (f4)
 ... which would trigger a REINDEX CONCURRENTLY internally ?


FWIW this would include also the functionality provided by the suggested
CREATE INDEX idx ON tbl (f1, f2, f3) UNIQUE ON (f1, f2);

while being less verbose, IMHO.


The obvious inconvenient being that the indexes will grow a bit, so 
fewer entries will fit in memory.



Also, we don't meddle with WITH clauses (for smgr parameters or the 
like) nor USING  clauses.
I reckon that implementation might be a bit intrusive (requiring changes 
to most index AMs even?)




Thanks!

/ J.L.



[HACKERS] Proposal: backend niceness / session_priority

2015-07-30 Thread José Luis Tallón

Hackers,

I have found myself needing to run some maintenance routines 
(VACUUM, REINDEX, REFRESH MATERIALIZED VIEW mostly) at a lower priority 
so as not to disturb concurrent *highly transactional* connections. This 
issue is also noted within the TODO[0] list in the Wiki .


* There was some discussion on 2007 [1] regarding Priorities for users 
or queries? 
http://archives.postgresql.org/pgsql-general/2007-02/msg00493.php


Since PostgreSQL lacks the resource management capabilities of the 
Big Ones ( Resource Groups - Red, WorkLoad Manager - Blue ) or the 
Resource Governor in MS SQL Server, we can try and approximate the 
requested behaviour by reducing the CPU priority (nice) of the backend 
in question. Please note that we would be using scheduler priority to 
try and modulate I/O, though I'm aware of the limitations of this mechanism.


Using renice(1) from outside is not only cumbersome and error prone 
but very much unuseable for the use cases I am contemplating.



* Moveover, as seen in the Priorities wiki page [2], there exists an 
extension providing a set_backend_priority() function, to be called 
set_backend_priority(pg_backend_pid(), 20).
This approach is, sadly, not portable to non-POSIX operating systems 
(e.g. Windows), and IMO quite too convoluted to use and tied to actual 
implementation details.




* I have been playing with some code which uses a GUC for this purpose, 
though only define/support three different priorities would make sense 
for the final implementation IMO: NORMAL, LOW_PRIORITY, IDLE
Checked platform compatibility too: this behaviour can be 
implemented on Windows, too. For everything else, there's nice (2)




However, there is a relatively minor catch here which is the reason 
behind this e-mail: user interface


- Inventing a new command seems overkill to me. Plus, I don't know 
what we could model it on --- given that the real solution for this 
problem would be a fully featured priority manager ---


- I have been playing with a GUC that ignores being reset --- so as to 
comply with nice's specification when not running as a privileged user 
--- but I reckon that this behaviour might be surprising at best:

SET session_priority TO 'low';-- Ok, low priority
VACUUM FREEZE my_test_table;
RESET session_priority;-- Nope, still low prio. Emit notice?

The way to reset the priority would be to RECONNECT. And this is my 
main pain point though it does fullfill the need.



However, this approach does fullfill my needs and ---it seems--- 
the OP's needs: be able to run a maintenance task at a low priority 
(i.e. disturbing other concurrent queries as little as possible). 
Expected use case:  cronjob running  psql -c 'SET session_priority TO 
low; REINDEX blabla CONCURRENTLY; VACUUM foobar;'



All suggestions welcome.

I'll be wrapping a more-or-less-done patch on monday if somebody wants 
to take a look and criticize on actual code (I won't be working on this 
tomorrow) unless somebody points me at a better solution



Thanks,

/ J.L.



[0] https://wiki.postgresql.org/wiki/Todo - Miscellaneous performance
[1] http://archives.postgresql.org/pgsql-general/2007-02/msg00493.php
[2] https://wiki.postgresql.org/wiki/Priorities

[3] http://docs.oracle.com/cd/E11882_01/server.112/e25494/dbrm.htm
[4] 
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.1.0/com.ibm.db2.luw.doc/com.ibm.db2.luw.doc-gentopic6.html

[5] https://msdn.microsoft.com/en-us/library/bb933866.aspx


Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-19 Thread José Luis Tallón

On 05/19/2015 09:00 PM, Simon Riggs wrote:

[snip]


I think the idea of having SET SESSION AUTH pass a cookie, and
only let
RESET SESSION AUTH work when the same cookie is supplied, is pretty
reasonable.


As long as the cookie is randomly generated for each use, then I don't 
see a practical problem with that approach.


Protocol level solution means we have to wait 1.5 years before anybody 
can begin using that. I'm also dubious that a small hole in the 
protocol arrangements could slam that door shut because we couldn't 
easily backpatch.


Having an in-core pooler would be just wonderful because then we could 
more easily trust it and we wouldn't need to worry.


Ufff Please don't do that.
Postgres is just a database. And a very good one at that. Let us keep 
it that way and not try to re-implement everything within it --- We're 
not the big red company after all :)


There are places where a pooler is badly needed and others where it 
is just overkill and counterproductive.
Plus, scalability models / usage patterns are not nearly the same (nor 
even compatible sometimes!) between databases and poolers.



There exist perfectly good solutions already (and they can certainly be 
improved), such as PgBouncer (or even PgPool-II) or others can be adopted.



Just my .02€


/ J.L.




Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-17 Thread José Luis Tallón

On 05/13/2015 06:03 AM, Alvaro Herrera wrote:

Craig Ringer wrote:

For some time I've wanted a way to SET SESSION AUTHORISATION or SET
ROLE in a way that cannot simply be RESET, so that a connection may be
handed to a less-trusted service or application to do some work with.

Some years back, I checked the SQL standard for insight on how they
handle this stuff (courtesy of Jim Nasby IIRC).  It took me a while to
figure out that the way they do it is not to have a RESET command in the
first place!  In their model, you enter a secure execution context (for
example, an SQL function) by calling SET SESSION AUTHORIZATION; and once
there, the only way to revert to the original session authorization is
to exit the execution context -- and once that happens, the attacker
no longer has control.  Since they have reduced privileges, they can't
call SET SESSION AUTHORIZATION themselves to elevate their access.  In
this model, you're automatically protected.


I did address this same concern some four months ago, by suggesting to 
implement an IMPERSONATE command, as part of the rolesattributes rework.
This thought was *precisely* oriented towards the sam goal as Craig's 
suggestion.


Please keep in mind that SET ROLE and/or IMPERSONATE and/or SET SESSION 
AUTHORIZATION [WITH COOKIE] should be doable SQL-only, so no protocol 
changes whatsoever would be needed.


On the other hand, ISTM that what we all intend to achieve is some 
Postgres equivalent of the SUID bit... so why not just do something 
equivalent?

---
LOGIN-- as user with the appropriate role membership / privilege?
...
SET ROLE / SET SESSION AUTHORIZATION WITH COOKIE / IMPERSONATE

... do whatever ...-- unprivileged user can NOT do the 
impersonate thing


DISCARD ALL-- implicitly restore previous authz
---

I mentioned this in some developer meeting; got blank stares back, IIRC.


Let's hope something goes through this time. It seems to be a more 
pressing need now than it was then  :)




Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: Non-user-resettable SET SESSION AUTHORISATION

2015-05-17 Thread José Luis Tallón

On 05/17/2015 07:39 PM, Tom Lane wrote:

=?windows-1252?Q?Jos=E9_Luis_Tall=F3n?= jltal...@adv-solutions.net writes:

On the other hand, ISTM that what we all intend to achieve is some
Postgres equivalent of the SUID bit... so why not just do something
equivalent?
---
  LOGIN-- as user with the appropriate role membership / privilege?
  ...
  SET ROLE / SET SESSION AUTHORIZATION WITH COOKIE / IMPERSONATE
  ... do whatever ...-- unprivileged user can NOT do the
impersonate thing
  DISCARD ALL-- implicitly restore previous authz
---

Oh?  What stops the unprivileged user from doing DISCARD ALL?


Indeed. The pooler would need to block this.
Or we would need to invent another (this time, privileged) verb in order 
to restore authz.



I think if we have something like this, it has to be non-resettable
period: you can't get back the old session ID except by reconnecting
and re-authorizing.  Otherwise there's just too much risk of security
holes.


Yes.
Thank you for your feedback, Tom.


/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-10 Thread José Luis Tallón

On 05/08/2015 09:57 PM, Josh Berkus wrote:

[snip]

It's certainly possible to have workloads triggering that, but I think
it's relatively uncommon.  I in most cases I've checked the multixact
consumption rate is much lower than the xid consumption. There are some
exceptions, but often that's pretty bad code.

I have a couple workloads in my pool which do consume mxids faster than
xids, due to (I think) execeptional numbers of FK conflicts.  It's
definitely unusual, though, and I'm sure they'd rather have corruption
protection and endure some more vacuums.


Seen corruption happen recently with OpenBravo on PostgreSQL 9.3.6 
(Debian; binaries upgraded from 9.3.2) in a cluster pg_upgraded from 9.2.4

(albeit with quite insufficient autovacuum / poorly configured Postgres)

I fear that this might be more widespread than we thought, depending on 
the exact workload/activity pattern.
If it would help, I can try to get hold of a copy of the cluster in 
question (if the customer keeps any copy at all)



If we do this, though, it
might be worthwhile to backport the multixact age function, so that
affected users can check and schedule mxact wraparound vacuums
themselves, something you currently can't do on 9.3.


Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

2015-04-28 Thread José Luis Tallón

On 04/27/2015 08:49 AM, Jim Nasby wrote:

On 4/25/15 1:19 PM, Bruce Momjian wrote:

Note if you are storing a table with rows that exceed 2KB in size
(aggregate size of each row) then the Maximum number of rows in a
table may be limited to 4 Billion, see TOAST.


That's not accurate though; you could be limited to far less than 4B 
rows. If each row has 10 fields that toast, you'd be limited to just 
400M rows.


ISTM like the solution is almost here, and could be done without too 
much (additional) work:
* We have already discussed having a page-per-sequence with the new 
SeqAMs being introduced and how that would improve scalability.

* We have commented on having a sequence per TOAST table
(hence, 4B toasted values per table each up to 4B chunks in size... 
vs just 4B toasted values per cluster)


I'm not sure that I can do it all by myself just yet, but I sure 
can try if there is interest.
(just after I'm done with another patch that is independent from 
this, though)


This would be material for 9.6, of course :)

Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing our reliance on MD5

2015-02-11 Thread José Luis Tallón

On 02/11/2015 03:55 PM, Claudio Freire wrote:

On 02/11/2015 03:39 PM, Claudio Freire wrote:

[snip]
Seems the risk of someone either lifting pg_authid from disk or by hacking
the system and being postgres, thereby accessing passwords stored
somewhere
else, is actually the bigger problem. But also one that should be
reasonably
easy (TM) to fix in a backwards compatible way? (just rewrite with a new
hash whenever the password is changed, but keep reading md5 until they are
all replaced.
Problem with all challenge-response authentication protocols, is that
the hash you have stored has to match the hash you use on the wire
protocol.

It's not like you can store a SHA and provide MD5 authentication.

Yes, except that you can do fallback to plaintext if the client requests
(S)CRAM-SHA and you have (S)CRAM-MD5 instead, allowing for some
interoperability and backwards compatibility in the process: pre-change
libpq/JDBC could authenticate using password to a server with just
SCRAM-SHA512 credentials.

In any case, just storing the password BLOB(text or base64 encoded) along
with a mechanism identifier would go a long way towards making this part
pluggable... just like we do with LDAP/RADIUS/Kerberos/PAM today.


Wait... you mean falling back to storing plaintext or falling back to
transmitting plaintext over the wire?


Fallback to authentication using plaintext ... if the client is old. And 
might as well require SSL before allowing that --- the sane solution, 
definitively.
But let us not forget that there exist environments where even 
non-SSL plaintext is acceptable when the network is secure/trusted. We 
should not prevent that usage if users need it (performance or 
management reasons) if at all possible, I think.


This would imply adding a fallback or plaintext negotiation when 
possible to the new mechanisms.

We could even restrict that behaviour to hostssl entries in pg_hba.conf


Both seem a step backwards IMO.
Hmmm... as opposed to breaking applications innecesarily when simply 
enabling SSL/TLS would not make it insecure? or when users don't really 
need it?


There are many organizations out there happily using 3rd party 
applications that they can not modify easily and this change would break 
them gratuitously.

- New policy. Introduce new hashes, update credentials (when possible!)
- New applications can benefit from the newer/better security 
immediately
- Client app that uses older libpq/JDBC. Doesn't understand the new 
mech

- ... and now the user is left with a broken app and no easy way back

alternative: old app has to authenticate itself just as if 
pg_hba.conf said password, possibly restricted to doing so over an 
SSL-secured connection.



Moreover, requiring everybody to change all passwords and clients *at 
once* seems like a very poor decision towards allowing for graceful 
upgrades and make rolling changes back possible, right?


Additionally, there are cases where passwords are not stored in 
plaintext anywhere (required to be able to generate new credentials) and 
updating all clients at once simply isn't possible.
Please note that, AFAICS, adding a second entry to pg_hba.conf with the 
fallback mechanism won't work.


Now I come to think of it, the behaviour I just proposed would help 
users and maintainers of connection poolers to achieve seamless upgrades 
while increasing security (the pooler would negotiate using the newer 
mechanism with postgres using the client-provided password)




Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing our reliance on MD5

2015-02-11 Thread José Luis Tallón

On 02/11/2015 03:39 PM, Claudio Freire wrote:

[snip]
Seems the risk of someone either lifting pg_authid from disk or by hacking
the system and being postgres, thereby accessing passwords stored somewhere
else, is actually the bigger problem. But also one that should be reasonably
easy (TM) to fix in a backwards compatible way? (just rewrite with a new
hash whenever the password is changed, but keep reading md5 until they are
all replaced.
Problem with all challenge-response authentication protocols, is that
the hash you have stored has to match the hash you use on the wire
protocol.

It's not like you can store a SHA and provide MD5 authentication.


Yes, except that you can do fallback to plaintext if the client 
requests (S)CRAM-SHA and you have (S)CRAM-MD5 instead, allowing for some 
interoperability and backwards compatibility in the process: pre-change 
libpq/JDBC could authenticate using password to a server with just 
SCRAM-SHA512 credentials.


In any case, just storing the password BLOB(text or base64 encoded) 
along with a mechanism identifier would go a long way towards making 
this part pluggable... just like we do with LDAP/RADIUS/Kerberos/PAM today.



/ J.L.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing our reliance on MD5

2015-02-11 Thread José Luis Tallón

On 02/11/2015 03:14 PM, Magnus Hagander wrote:


[snip]
The hash value in pg_authid already contains md5 as a prefix. No 
need for another column.


Yes, but for variable length mechanism names (i.e. not just 3 chars) it 
would become increasingly difficult to differentiate between the algo 
name and the stored credentials especially if we delegated the list 
of available mechanisms to an external library and/or in the case of 
upgrades.
(variable-length matching based on a table of available mechs and 
using strncmp isn't complicated, admittedly  but why bother?)


... plus we have already added many new columns to store the new 
capabilities in, as opposed to a bitmask.


I might well be overlooking something else, of course.


Regards,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reducing our reliance on MD5

2015-02-11 Thread José Luis Tallón

On 02/11/2015 04:40 PM, Tom Lane wrote:

=?UTF-8?B?Sm9zw6kgTHVpcyBUYWxsw7Nu?= jltal...@adv-solutions.net writes:

In any case, just storing the password BLOB(text or base64 encoded)
along with a mechanism identifier would go a long way towards making
this part pluggable... just like we do with LDAP/RADIUS/Kerberos/PAM today.

That's exactly the direction we must NOT go.

Upgrading the security of stored passwords in pg_authid is at least as
important as upgrading the wire protocol security; very possibly more so.
Any solution that requires cleartext passwords to be kept by the server
is simply not going to be accepted.


I definitively haven't explained myself properly.
I *never* suggested storing plaintext in pg_authid, but using plaintext 
authentication (which can always be matched against an on-disk hash, 
whatever the type) as a fallback to allow for seamless upgrades of security.
(once you are authenticated by using the old credentials, the 
server can transparently store the new hash)


When I referred to a text or base64 encoded I never implied on-disk 
plaintext (unless the user specifically requires that, which they might).



To avoid ambiguities, my proposal closely mimicks Dovecot's 
implementation of password schemes and credential upgrades

http://wiki2.dovecot.org/Authentication/PasswordSchemes




Thanks,

J.L.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

2015-02-03 Thread José Luis Tallón

On 02/03/2015 03:44 AM, Jim Nasby wrote:

[snip]

The alternative could be some long LOB (HugeOBject?) using the
equivalent to serial8 whereas regular LOBs would use serial4.


Well, it depends on how we did this. We could (for example) add a 
field to pg_class that determines what type to use for toast pointers; 
OID, int, or bigint. That could then be taken into account in the 
*toast* functions.


But as others have pointed out, we haven't even had any real 
complaints about toast using OIDs as being an issue until now, so I 
think it's premature to start messing with this. At most it's just 
something to keep in mind so we don't preclude doing this in the future.


A patch creating those HOBs (Huge Objects) might well make sense *after* 
the sequence refactoring got merged.
Removing the bottleneck due to the OID allocator for this use case will 
be definitively welcome
(I don't dare to code that just yet, but here's hoping someone will 
step in O:-)


BTW, regarding the size of what gets toasted; I've often thought it 
would be useful to allow a custom size limit on columns so that you 
could easily force data to be toasted if you knew you were very 
unlikely to access it. Basically, a cheap form of vertical partitioning.


Hmmm alter column set storage external / set storage extended ?

From http://www.postgresql.org/docs/9.4/static/sql-altertable.html :
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | 
EXTENDED | MAIN }


This would do what you described, right?


HTH,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

2015-02-02 Thread José Luis Tallón

On 02/02/2015 09:36 PM, Roger Pack wrote:

On 2/2/15, José Luis Tallón jltal...@adv-solutions.net wrote:

On 01/31/2015 12:25 AM, Jim Nasby wrote:

[snip]
It's a bit more complex than that. First, toast isn't limited to
bytea; it holds for ALL varlena fields in a table that are allowed to
store externally. Second, the limit is actually per-table: every table
gets it's own toast table, and each toast table is limited to 4B
unique OIDs. Third, the OID counter is actually global, but the code
should handle conflicts by trying to get another OID. See
toast_save_datum(), which calls GetNewOidWithIndex().

Now, the reality is that GetNewOidWithIndex() is going to keep
incrementing the global OID counter until it finds an OID that isn't
in the toast table. That means that if you actually get anywhere close
to using 4B OIDs you're going to become extremely unhappy with the
performance of toasting new data.

Indeed ..


I don't think it would be horrifically hard to change the way toast
OIDs are assigned (I'm thinking we'd basically switch to creating a
sequence for every toast table), but I don't think anyone's ever tried
to push toast hard enough to hit this kind of limit.

We did. The Billion Table Project, part2 (a.k.a. when does Postgres'
OID allocator become a bottleneck) The allocator becomes
essentially unusable at about 2.1B OIDs, where it performed very well at
quite empty( 100M objects) levels.

So yes, using one sequence per TOAST table should help.
Combined with the new SequenceAMs / sequence implementation being
proposed (specifically: one file for all sequences in a certain
tablespace) this should scale much better.

But it wouldn't be perfect, right? I mean if you had multiple
deletion/insertions and pass 4B then the one sequence per TOAST
table would still wrap [albeit more slowly], and performance start
degrading the same way.  And there would still be the hard 4B limit.
Perhaps the foreign key to the TOAST table could be changed from oid
(32 bits) to something else (64 bits) [as well the sequence] so that
it never wraps?


Hmm 2^32 times aprox. 2kB (as per usual heuristics, ~4 rows per heap 
page) is 8796093022208 (~9e13) bytes

 ... which results in 8192 1GB segments :O
Looks like partitioning might be needed much sooner than that (if only 
for index efficiency reasons)... unless access is purely sequential.


The problem with changing the id from 32 to 64 bits is that the storage 
*for everybody else* doubles, making the implementation slower for 
most though this might be actually not that important.
The alternative could be some long LOB (HugeOBject?) using the 
equivalent to serial8 whereas regular LOBs would use serial4.



Anybody actually reaching this limit out there?



Regards,

/ J .L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [GENERAL] 4B row limit for CLOB tables

2015-02-02 Thread José Luis Tallón

On 01/31/2015 12:25 AM, Jim Nasby wrote:

[snip]
It's a bit more complex than that. First, toast isn't limited to 
bytea; it holds for ALL varlena fields in a table that are allowed to 
store externally. Second, the limit is actually per-table: every table 
gets it's own toast table, and each toast table is limited to 4B 
unique OIDs. Third, the OID counter is actually global, but the code 
should handle conflicts by trying to get another OID. See 
toast_save_datum(), which calls GetNewOidWithIndex().


Now, the reality is that GetNewOidWithIndex() is going to keep 
incrementing the global OID counter until it finds an OID that isn't 
in the toast table. That means that if you actually get anywhere close 
to using 4B OIDs you're going to become extremely unhappy with the 
performance of toasting new data.


Indeed ..

I don't think it would be horrifically hard to change the way toast 
OIDs are assigned (I'm thinking we'd basically switch to creating a 
sequence for every toast table), but I don't think anyone's ever tried 
to push toast hard enough to hit this kind of limit.


We did. The Billion Table Project, part2 (a.k.a. when does Postgres' 
OID allocator become a bottleneck) The allocator becomes 
essentially unusable at about 2.1B OIDs, where it performed very well at 
quite empty( 100M objects) levels.


So yes, using one sequence per TOAST table should help.
Combined with the new SequenceAMs / sequence implementation being 
proposed (specifically: one file for all sequences in a certain 
tablespace) this should scale much better.



My 2c.


Regards,

/ J.L.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-12-31 Thread José Luis Tallón

On 12/30/2014 04:16 PM, Stephen Frost wrote:

[snip]
The approach I was thinking was to document and implement this as
impliciting granting exactly USAGE and SELECT rights, no more (not
BYPASSRLS) and no less (yes, the role could execute functions).  I agree
that doing so would be strictly more than what pg_dump actually requires
but it's also what we actually have support for in our privilege system.


Hmmm coupled with your comments below, I'd say some tweaking of the 
existing privileges is actually needed if we want to add these new 
capabilities.


BTW, and since this is getting a bit bigger than originally considered: 
would it be interesting to support some extension-defined capabilities, too?
(for things can't be easily controlled by the existing USAGE / 
SELECT / ... rights, I mean)



it would only give you COPY access. (And also
COPY != SELECT in the way that the rule system applies, I think? And this
one could be for COPY only)

COPY certainly does equal SELECT rights..  We haven't got an independent
COPY privilege and I don't think it makes sense to invent one.


FWIW, a COPY(DUMP?) privilege different from SELECT would make sense.
Considering your below comments it would be better that it not imply 
BYPASSRLS, though.



It
sounds like you're suggesting that we add a hack directly into COPY for
this privilege, but my thinking is that the right place to change for
this is in the privilege system and not hack it into individual
commands..  I'm also a bit nervous that we'll end up painting ourselves
into a corner if we hack this to mean exactly what pg_dump needs today.

Agreed.


Lastly, I've been considering other use-cases for this privilege beyond
the pg_dump one (thinking about auditing, for example).


ISTR there was something upthread on an AUDIT role, right?
This might be the moment to add it



[snip]

Similar concerns would exist for the existing REPLICATION role for example
- that one clearly lets you bypass RLS as well, just not with a SQL
statement.

I'm not sure that I see the point you're making here.  Yes, REPLICATION
allows you to do a filesystem copy of the entire database and that
clearly bypasses RLS and *all* of our privilege system.  I'm suggesting
that this role attribute work as an implicit grant of privileges we
already have.  That strikes me as easy to document and very clear for
users.


+1

[snip]
So you're saying a privilege that would allow you to do
pg_start_backup()/pg_stop_backup() but *not* actually use pg_basebackup?
Yes.


That would be EXCLUSIVEBACKUP or something like that, to be consistent
with existing terminology though.

Ok.  I agree that working out the specific naming is difficult and would
like to focus on that, but we probably need to agree on the specific
capabilities first. :)


Yes :)
For the record, LOGBACKUP vs PHYSBACKUP was suggested a couple days ago. 
I'd favor DUMP(logical) and BACKUP(physical) --- for lack of a better name.
The above reasoning would have pg_basebackup requiring REPLICATION, 
which is a logical consequence of the implementation but strikes me as a 
bit surprising in the light of these other privileges.




[snip]

Personalyl I think using the DUMP name makes that a lot more clear. Maybe
we need to avoid using BACKUP alone as well, to make sure it doesn't go the
other way - using BASEBACKUP and EXCLUSIVEBACKUP for those two different
ones perhaps?

DUMP - implicitly grants existing rights, to facilitate pg_dump and
perhaps some other use-cases
BASEBACKUP - allows pg_basebackup, which means bypassing all in-DB
  privilege systems
EXCLUSIVEBACKUP - just allows pg_start/stop_backup and friends

I'm still not entirely happy with the naming, but I feel like we're
getting there.  One thought I had was using ARCHIVE somewhere, but I
kind of like BASEBACKUP meaning what's needed to run pg_basebackup, and,
well, we say 'backup' in the pg_start/stop_backup function names, so
it's kind of hard to avoid that.  EXCLUSIVEBACKUP seems really long tho.


ARCHIVE, though completely appropriate for the exclusivebackup case 
above (so this would become DUMP/BASEBACKUP/ARCHIVE +REPLICATION) might 
end up causing quite some confusion ... (what? WAL archiving is NOT 
granted by the archive privilege, but requires a superuser to turn it 
on(via ALTER SYSTEM)?


POLA again

I had defined them when I started the thread:

pg_start_backup
pg_stop_backup
pg_switch_xlog
pg_create_restore_point


... for BACKUP / EXCLUSIVEBACKUP (or, actually, FSBACKUP/PHYSBACKUP ...)

Later I added:

pg_xlog_replay_pause
pg_xlog_replay_resume

Though I'd be find if the xlog_replay ones were their own privilege (eg:
REPLAY or something).

+1

I think just calling them xlog related functions is doing us a disservice
there. Definitely once we have an actual documentation to write for it, but
also in this discussion.

[snip]

If it's for replicas, then why are we not using the REPLICATION privilege
which is extremely similar to this?

I 

Re: [HACKERS] [COMMITTERS] pgsql: Use a bitmask to represent role attributes

2014-12-23 Thread José Luis Tallón

On 12/23/2014 04:46 PM, Andres Freund wrote:

[snip]
I find Tom's concern about needing more
than 64 attributes to be ill-founded; I can't really see that
happening on any timescale that matters.


Hmm... most probably, not (or so I hope)... Unless we begin to add many 
differerent capabilities, like it was recently suggested.
I, for one, have at least two of them to propose, but I guess not that 
many more should be needed.



I personally would prefer a 'custom' type to represent the
permissions. Internally that could very well be current bitmask, but the
external representation could be more complex (i.e. some textual
representation). That'd make it easy to make the representation wider/more
complex if needed.


Indeed, though this would imply adding a new bitstring? type to core 
Postgres.
Do you have any further input on what this type would look like ? Any 
operators that might be useful? ISTM that this would actually be the 
greatest strength of a type proper (vs. hardcoded bit-wise operations 
in core)


In any case, having the type's input/output perform the conversion 
from/to text is quite equivalent to the current implementation. 
Considering that this custom type would need to be in core, the 
differences should be minimal.

Or am I missing something obvious?

Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal: two new role attributes and/or capabilities?

2014-12-23 Thread José Luis Tallón

Hello,

I've found myself needing two role capabilities? as of lately, when 
thinking about restricting some roles to the barely minimum allowed 
permissions needed to perform their duties ... as opposed to having a 
superuser role devoted to these task.


The capabilities would be:
* MAINTENANCE --- Ability to run
VACUUM [ANALYZE | FREEZE] (but not VACUUM FULL),
ANALYZE (including SET LOCAL statistics_target TO 1),
REINDEX CONCURRENTLY  (but not the blocking, regular, one)
REFRESH MATERIALIZED VIEW CONCURRENTLY (but not the blocking one)
COPY ???

Rationale: delegate the routine maintenance tasks to a low 
privilege role, which can't do harm (apart from some performance 
degradation) --- hence the no exclusive locking operations requirement.


* IMPERSONATE --- Ability to do SET AUTHORIZATION TO some_role; and 
RESET AUTHORIZATION
This might be further refined to provide a way to say This role is 
authorized to impersonate role1 but no other


Rationale: for use by connection poolers (esp. pgBouncer), where 
the role used for connection would only have the LOGIN and IMPERSONATE 
privileges. The remaining operations would be authorized against the 
supplanted role (i.e. ability to create tables/indexes or views, perform 
DML and/or DDL, etc)

AFAIK, a superuser role is needed for this purpose currently.


The relevant code is quite simple and looks like it could be very 
useful. Any suggestions / input on this?
I can certainly prepare a patch for this (bear with me, It'll be my 
first here), and I'm willing to include more features if deemed useful.




Regards,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: two new role attributes and/or capabilities?

2014-12-23 Thread José Luis Tallón

On 12/23/2014 05:29 PM, Stephen Frost wrote:

* José Luis Tallón (jltal...@adv-solutions.net) wrote:

 I've found myself needing two role capabilities? as of lately,
when thinking about restricting some roles to the barely minimum
allowed permissions needed to perform their duties ... as opposed to
having a superuser role devoted to these tasks.

Excellent.  We've been looking at the same considerations.


 The capabilities would be:
* MAINTENANCE --- Ability to run
 VACUUM [ANALYZE | FREEZE] (but not VACUUM FULL),
 ANALYZE (including SET LOCAL statistics_target TO 1),

There's likely to be discussion about these from the perspective that
you really shouldn't need to run them all that much.  Why isn't
autovacuum able to handle this?


For some (arguably, ill-devised) use cases of INSERT - SELECT aggregate 
- DELETE (third party, closed-source app, massive insert rate) at the 
very least, autovacuum can't possibly cope with the change rate in some 
tables, given that there are quite many other interactive queries running.


Manually performing VACUUM / VACUUM ANALYZE on the (few) affected tables 
every 12h or so fixes the performance problem for the particular queries 
without impacting the other users too much --- the tables and indexes in 
question have been moved to a separate tablespace/disk volume of their own.



In short, this addresses situations where some tables have a much higher 
update rate than the rest of the database so that performance degrades 
with time --- the application became unusable after about 6 days' worth 
of updates until the manual vacuums were setup



 REINDEX CONCURRENTLY  (but not the blocking, regular, one)
 REFRESH MATERIALIZED VIEW CONCURRENTLY (but not the blocking one)

These are interesting, but would these make sense at the role level?
Both of these commands explicitly take specific relations to operate
against, after all.


Yup. Let's imagine a cron job invoking psql in order to perform 
maintenance routine.
The particular command(s) can be generated on-the-fly by querying the 
catalog and then send them in one go to be run sequentially by the one 
backend as a crude form of rate limiting/quality-of-service of sorts 
(renice -p or even ionice -p seems quite inadequate).


This automation becomes impossible to do if the object owners differ 
(only the owner or a superuser can perform these operations AFAICS -- 
there is no mention of it in the current documentation) unless the DBA 
makes the maintenance role a member of every other role ... which 
quickly becomes a problem.



 COPY ???

The question around this one goes back to the CREATE DIRECTORY
discussion that happened this fall.  I'm still hopeful that we can do
*something* in this area, but I'm not sure what that's going to end up
looking like.  The problem with COPY is that it's either trivial to use
it to become a superuser, or insanely difficult to secure sufficiently.


Yes. That's the reason for the question marks  :-\
Some dump to csv then load somewhere else kind of jobs might benefit 
from this feature, but I'm not sure the convenience is worth the risk.



 Rationale: delegate the routine maintenance tasks to a low
privilege role, which can't do harm (apart from some performance
degradation) --- hence the no exclusive locking operations
requirement.

This makes sense for the reindex/refresh cases, though no harm might
be over-stating it.


Well it's performance degradation vs DoS due to massive (exclusive) 
locking  :S
At least restricting it to one backend (connection_limit=1) allows quite 
some rate limit.



* IMPERSONATE --- Ability to do SET AUTHORIZATION TO some_role;
and RESET AUTHORIZATION
 This might be further refined to provide a way to say This role
is authorized to impersonate role1 but no other
 Rationale: for use by connection poolers (esp. pgBouncer), where
the role used for connection would only have the LOGIN and
IMPERSONATE privileges. The remaining operations would be authorized
against the supplanted role (i.e. ability to create tables/indexes
or views, perform DML and/or DDL, etc)
 AFAIK, a superuser role is needed for this purpose currently.

No..  You can have 'no-inherit' roles which you can use for exactly this
purpose.  The initial login role can have no rights on the database,
except to SET ROLE to other roles which have been granted to it.


Hmm the current documentation states that: The specified role_name 
must be a role that the current session user is a member of.
I can see use cases where making the login role a member of every other 
used role quickly becomes a burden, and that's the main driver for this 
feature (I'm thinking about multiple app servers running several 
applications each, minimum two roles per application)



You should never have your pgBouncer or other pooling connection logging
in as a superuser.


At least the default pgBouncer config explicitly says (albeit for 8.2)
doc

Re: [HACKERS] [COMMITTERS] pgsql: Use a bitmask to represent role attributes

2014-12-23 Thread José Luis Tallón

On 12/23/2014 06:06 PM, Bruce Momjian wrote:

On Tue, Dec 23, 2014 at 11:34:09AM -0500, Tom Lane wrote:

Stephen Frost sfr...@snowman.net writes:

If that's the only consideration for this, well, that's certainly quite
straight-forward to change in the other direction too.  The new function
suggested by Andres actually makes it really easy to get a textual list
of all the role attributes which a role has from the bitmask too.

We could have that regardless of the representation, if the function is
defined along the lines of given a user OID, give me a text string
representing the user's attributes.  However, that only helps for
pg_dumpall and any other clients whose requirement is exactly satisfied
by a string that fits into CREATE/ALTER USER.  The current formatting
of psql's \du, for example, absolutely requires adding more client-side
code every time we add a property; whether the catalog representation is
bools or a bitmask really isn't going to change the pain level much there.

I am with Tom on this --- there is more wasted space in the 'name'
column pg_authid.rolname than by shoving 40 boolean values into a
bitmap.  Adding the complexity of a bitmap doesn't make sense here.
Well, the code simplification alone might be worth the effort... and it 
does make adding additional attributes easier.



I also apologize for the late feedback.

Offtopic, what I would really _love_ to see improved is our display of
object permissions:

 Access privileges
 Schema |  Name  | Type  | Access privileges | Column 
privileges | Policies

++---+---+---+--
 public | crypto | table | postgres=arwdDxt/postgres+|  
 |
||   | =r/postgres   |  
 |

That is nasty user display ---  it looks like line noise.


Hmm...  http://www.postgresql.org/docs/9.4/static/sql-grant.html does 
describe the mapping from letters to permissions, but I agree that it 
could be easier for beginners.
Any idea on how this display can be made more human friendly? (just 
for the sake of discussion --- I don't think I have time to do much 
about that, unfortunately)



Cheers,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: two new role attributes and/or capabilities?

2014-12-23 Thread José Luis Tallón

On 12/23/2014 07:01 PM, David G Johnston wrote:
Hmm the current documentation states that: The specified 
role_name must be a role that the current session user is a member 
of. I can see use cases where making the login role a member of every 
other used role quickly becomes a burden, and that's the main driver 
for this feature (I'm thinking about multiple app servers running 
several applications each, minimum two roles per application)

So you want to say:

GRANT IMPERSONATE TO bouncer; --covers the ALL requirement


Yes, and exclusively for this purpose.


instead of

GRANT victim1 TO bouncer;
GRANT victim2 TO bouncer;
etc...

-- these would still be used to cover the limited users requirement
?


Yup.


Seems contrary to the principle of least privilege goal...


We still wouldn't grant any CREATE DATABASE, CREATE TABLESPACE, 
CREATE/LOAD EXTENSION, CREATE LANGUAGE, etc
 (and the ability to create/use/manipulate data within the database 
will still be constrained by the impersonated login)



I'd rather there be better, more user friendly, SQL-based APIs to the
permissions system that would facilitate performing and reviewing grants.

+1. All suggestions welcome.

If something like IMPERSONATE was added I would strongly suggest a
corresponding [NO]IMPERSONATE for CREATE USER so that the admin can make
specific roles unimpersonable

Indeed, I had thought about this too.


- and also make SUPERUSER roles unimpersonable by rule.


Yes, of course. Otherwise, the distinction would not have any sense.


Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: two new role attributes and/or capabilities?

2014-12-23 Thread José Luis Tallón

On 12/23/2014 07:01 PM, David G Johnston wrote:

[snip]
So you want to say:

GRANT IMPERSONATE TO bouncer; --covers the ALL requirement

instead of

GRANT victim1 TO bouncer;
GRANT victim2 TO bouncer;
etc...

-- these would still be used to cover the limited users requirement
?


|GRANT IMPERSONATE ON actual_role TO login_role|

would actually get us closer to how some other databases do, now 
that I think of it. This could be just some syntactic sugar.

Might definitively ease migrations, if nothing else.


I appreciate the feedback. Thanks!


/ J.L.



Re: [HACKERS] Proposal: two new role attributes and/or capabilities?

2014-12-23 Thread José Luis Tallón

On 12/23/2014 07:52 PM, Stephen Frost wrote:

[snip]
Manually performing VACUUM / VACUUM ANALYZE on the (few) affected
tables every 12h or so fixes the performance problem for the
particular queries without impacting the other users too much ---
the tables and indexes in question have been moved to a separate
tablespace/disk volume of their own.
Autovacuum can certainly run vacuum/analyze on a few tables every 12
hours, so I'm not really following where you see autovacuum being unable
to cope.  I agree that there *are* such cases, but getting more
information about those cases and exactly what solution *does* work
would really help us improve autovacuum to address those use-cases.


I'll try to. I don't have direct access, and the use case is quite edgy 
to be fair.

Plus, the configuration and hardware leaves quite a bit to be desired...

...but it's a real use case and the solution (even if only treating the 
symptoms) is quite straight-forward and easy.

In short, this addresses situations where some tables have a much
higher update rate than the rest of the database so that performance
degrades with time --- the application became unusable after about 6
days' worth of updates until the manual vacuums were setup

This really looks like a configuration issue with autovacuum..  Perhaps
you need to make it more aggressive than the default and have it run
more threads?


Yes to both. Up to something which actually affected performance a bit.
But basically only a few tables exhibited this behaviour among several 
hundreds in this particular situation.



Have you turned the autovacuum logging up all the way?
Is autovacuum giving up due to locking?
Not one of my systems, and I don't have access to it anymore, but I 
don't think this was the reason.


However, having some hundred million deleted rows piling every few hours 
quite increases the load. For the record, the (closed-source) 
application did issue the DELETEs on the table, so partitioning + 
TRUNCATE child_part was not applicable.



In any case, I was aiming at making this kind of operations possible and 
easier --- regardless of whether they are solving the right problem or 
not, or whether there exists an optimal solution --- since I have seen 
some real life solutions that could benefit from it.
I agree that routine index maintenance is a better match for this 
feature, though :)

 REINDEX CONCURRENTLY  (but not the blocking, regular, one)
 REFRESH MATERIALIZED VIEW CONCURRENTLY (but not the blocking one)

These are interesting, but would these make sense at the role level?
Both of these commands explicitly take specific relations to operate
against, after all.

Yup. Let's imagine a cron job invoking psql in order to perform
maintenance routine.

If they make sense at a relation level then they should be
relation-level GRANT'd permissions, not role-level attributes.


Same as before.
Let's imagine this coupled with REINDEX SCHEMA CONCURRENTLY ... or 
simply when constructing the list of tables dynamically and there is no 
other use for such a grant.
Arguably, this isn't that much of a problem if there exists a way to 
easily revoke all such permissions from all objects in one go (just like 
recently discussed in another thread)



The particular command(s) can be generated on-the-fly by querying
the catalog and then send them in one go to be run sequentially by
the one backend as a crude form of rate
limiting/quality-of-service of sorts (renice -p or even ionice
-p seems quite inadequate).

This sounds like it's something that we might want an autovacuum-like
background process to handle..  Some kind of auto-reindex-concurrently.
There are already plans to deal with updating of materialized views, as
I understand it.


While I can definitively see it for materialized views (they *are* 
views, after all), this pattern potentially gets us adding everything 
but the kitchen sink inside the database.
FWIW, it's only a matter of providing a mechanism for maintenance 
routines to use very unprivileged users to perform their duties on the 
whole cluster without having to explicitly grant permissions and/or 
include these into another, regular, role.
Please keep in mind that these  roles [having only LOGIN and 
MAINTENANCE] would NOT be able to perform any DML or DDL whatsoever, nor 
any queries (unless explicitly granted permission for SELECTs).



[snip]

Yes. That's the reason for the question marks  :-\
Some dump to csv then load somewhere else kind of jobs might
benefit from this feature, but I'm not sure the convenience is worth
the risk.

I've run into quite a few processes which would really benefit from
this, and would even be safe to use (the processes running the COPY
commands don't have any rights on the directories except through PG),
but it's not clear if that use-case is sufficiently broad for the
feature to be worthwhile..  At least, some feel it isn't.  Can you
describe your use-case more and perhaps the needle will move on 

Re: [HACKERS] Proposal VACUUM SCHEMA

2014-12-22 Thread José Luis Tallón

On 12/21/2014 10:30 PM, Fabrízio de Royes Mello wrote:

[snip]


I do agree that vacuum schema might very well be useful (I'll probably 
use it myself from time to time, too).
ANALYZE SCHEMA (specially coupled with some transaction-wide SET 
statistics_target could be beneficial)




 And why that, but not
 say schema-wide ANALYZE, CLUSTER, TRUNCATE, ...


+1. I can write patches for each of this maintenance statement too.


Hmm... I think Tom might have been a bit rethorical (or even sarcastic 
with that), but I can definitely be wrong.


Do we really want to have some such operation potentially (and 
inadvertently) locking for *hours* at a time?


CLUSTER SCHEMA somename;

... where schema somename contains myHugeTable

Given that the cluster command exclusively locks and rewrites the 
table, it might lock queries and overwhelm the I/O subsystem for quite a 
long time.



TRUNCATE SCHEMA whateversounds quite dangerous, too.



Just my .02€

/ J.L.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-15 Thread José Luis Tallón

On 12/15/2014 07:42 AM, Claudio Freire wrote:

[snip]


If you do that, you start with empty partitions, and each insert 
updates the BRIN tuple. Avoiding concurrency loss in this case would 
be tricky, but in theory this could allow very general partition 
exclusion. In fact it could even work with constraint exclusion right 
now: you'd have a single-tuple BRIN index for each partition and 
benefit from it. But you don't need to pay the price of updating BRIN 
indexes, as min-max tuples for each partition can be produced while 
creating the partitions if the syntax already provides the 
information. Then, it's just a matter of querying this meta-data which 
just happens to have the form of a BRIN tuple for each partition.


Yup. Indeed this is the way I outlined in my previous e-mail.

The only point being: Why bother with BRIN when we already have the 
range machinery, and it's trivial to add pointers to partitions from 
each range?


I suggested that BRIN would solve a situation when the amount of 
partitions is huge (say, thousands) and we might need to be able to 
efficiently locate the appropriate partition. In this situation, a 
linear search might become prohibitive, or the data structure (a simple 
B-Tree, maybe) become too big to be worth keeping in memory. This is 
where being able to store the partition index on disk would be 
interesting.


Moreover, I guess that ---by using this approach 
(B-Tree[range]-partition_id and/or BRIN)--- we could efficiently answer 
the question do we have any tuple with this key in some partition? 
which AFAICS is pretty close to us having global indexes.




Regards,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-13 Thread José Luis Tallón

On 12/12/2014 05:43 AM, Amit Langote wrote:

[snip]
In case of what we would have called a 'LIST' partition, this could look like

... FOR VALUES (val1, val2, val3, ...)

Assuming we only support partition key to contain only one column in such a 
case.


Hmmm….

[...] PARTITION BY LIST(col1 [, col2, ...])

just like we do for indexes would do.


and CREATE PARTITION child_name OF parent_name
FOR [VALUES] (val1a,val2a), (val1b,val2b), (val1c,val2c)
[IN tblspc_name]

just like we do for multi-valued inserts.


In case of what we would have called a 'RANGE' partition, this could look like

... FOR VALUES (val1min, val2min, ...) TO (val1max, val2max, ...)

How about BETWEEN ... AND ... ?


Unless I'm missing something obvious, we already have range types for 
this, don't we?


...   PARTITION BY RANGE (col)

CREATE PARTITION child_name OF parent_name
FOR [VALUES] '[val1min,val1max)', '[val2min,val2max)', 
'[val3min,val3max)'

[IN tblspc_name]

and I guess this should simplify a fully flexible implementation (if you 
can construct a RangeType for it, you can use that for partitioning).
This would substitute the ugly (IMHO) VALUES LESS THAN syntax with a 
more flexible one
(even though it might end up being converted into less than 
boundaries internally for implementation/optimization purposes)


In both cases we would need to allow for overflows / default partition 
different from the parent table.



Plus some ALTER PARTITION part_name TABLESPACE=tblspc_name


The main problem being that we are assuming named partitions here, which 
might not be that practical at all.



[snip]
I would include the noise keyword VALUES just for readability if 
anything. 


+1


FWIW, deviating from already standard syntax (Oracle-like --as 
implemented by PPAS for example-- or DB2-like) is quite 
counter-productive unless we have very good reasons for it... which 
doesn't mean that we have to do it exactly like they do (specially if we 
would like to go the incremental implementation route).


Amit: mind if I add the DB2 syntax for partitioning to the wiki, too?

This might as well help with deciding the final form of 
partitioning (and define the first implementation boundaries, too)



Thanks,

/ J.L.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-13 Thread José Luis Tallón

On 12/13/2014 03:09 AM, Alvaro Herrera wrote:

[snip]
Arbitrary SQL expressions (including functions) are not the thing to use
for partitioning -- at least that's how I understand this whole
discussion.  I don't think you want to do proofs as such -- they are
expensive.


Yup. Plus, it looks like (from reading Oracle's documentation) they end 
up converting the LESS THAN clauses into range lists internally.

Anyone that can attest to this? (or just disprove it, if I'm wrong)

I just suggested using the existing RangeType infrastructure for this ( 
,  and  operators, specifically, might do the trick) before 
reading your mail citing BRIN.
... which might as well allow some interesting runtime 
optimizations when range partitioning is used and *a huge* number of 
partitions get defined --- I'm specifically thinking about massive OLTP 
with very deep (say, 5 years' worth) archival partitioning where it 
would be inconvenient to have the tuple routing information always in 
memory.
I'm specifically suggesting some ( range_value - partitionOID) mapping 
using a BRIN index for this --- it could be auto-created just like we do 
for primary keys.


Just my 2c


Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On partitioning

2014-12-13 Thread José Luis Tallón

On 12/13/2014 05:57 PM, José Luis Tallón wrote:

On 12/13/2014 03:09 AM, Alvaro Herrera wrote:

[snip]
Arbitrary SQL expressions (including functions) are not the thing to use
for partitioning -- at least that's how I understand this whole
discussion.  I don't think you want to do proofs as such -- they are
expensive.


Yup. Plus, it looks like (from reading Oracle's documentation) they 
end up converting the LESS THAN clauses into range lists internally.

Anyone that can attest to this? (or just disprove it, if I'm wrong)

I just suggested using the existing RangeType infrastructure for this 
( ,  and  operators, specifically, might do the trick) before 
reading your mail citing BRIN.
... which might as well allow some interesting runtime 
optimizations when range partitioning is used and *a huge* number of 
partitions get defined --- I'm specifically thinking about massive 
OLTP with very deep (say, 5 years' worth) archival partitioning where 
it would be inconvenient to have the tuple routing information always 
in memory.
I'm specifically suggesting some ( range_value - partitionOID) 
mapping using a BRIN index for this --- it could be auto-created just 
like we do for primary keys.


Reviewing the existing documentation on this topic I have stumbled on an 
e-mail by Simon Riggs from almost seven years ago

http://www.postgresql.org/message-id/1199296574.7260.149.ca...@ebony.site

 where he suggested a way of physically partitioning tables by using 
segments in a way that sounds to be quite close to what we are proposing 
here.


ISTM that the partitioning meta-data might very well be augmented a bit 
in the direction Simon pointed to, adding support for effectively 
read-only and/or explicitly marked read-only PARTITIONS (not segments 
in this case) for an additional optimization. We would need some syntax 
additions (ALTER PARTITION name SET READONLY) in this case.

This feature can be added later on, of course.


I'd like to explicitly remark the potentially performance-enhancing 
effect of fillfactor=100 (cfr. 
http://www.postgresql.org/docs/9.3/static/sql-createtable.html) and 
partitions marked effectively read-only (cfr. Simon's proposal) when 
coupled with fullscan analyze vs. the regular sample-based analyze 
that autovacuum performs.
When a partition consists of multiple *segments*, a generalization of 
the proposed BRIN index (to cover segments in addition to partitions) 
will further speed up scans.





Just for the record, allowing some partitions to be moved to foreign 
tables (i.e. foreign servers, via postgres_fdw) will multiply the 
usefullness of this partitioned table wide BRIN index  now 
becoming a real global index.



Just my 2c


Thanks,

/ J.L.







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread José Luis Tallón

On 12/02/2014 08:21 PM, Andres Freund wrote:

[snip]

2. Instead of the single amdata field, make it possible for the
implementation to define any number of fields with any datatype in the
tuple. That would make debugging, monitoring etc. easier.

My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence.


Hmm...  having done my fair bit of testing, I can say that this isn't 
actually that bad (though depends heavily on the underlying filesystem 
and workload, of course)
With this approach, I fear extreme I/O contention with an update-heavy 
workload... unless all sequence activity is finally WAL-logged and hence 
writes to the actual files become mostly sequential and asynchronous.


May I possibly suggest a file-per-schema model instead? This approach 
would certainly solve the excessive i-node consumption problem that --I 
guess-- Andres is trying to address here.
Moreover, the one file per schema for sequences solution would fit a 
quite common model of grouping tables (in schemas) for physical 
[tablespace] location purposes

Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...


IMHO, this is jst as wasteful as the current approach (one-page file per 
sequence) in terms of disk usage and complicates the code a bit  but 
I really don't see how we can have more than one sequence state per page 
without severe (page) locking problems.
However, someone with deeper knowledge of page pinning and buffer 
manager internals could certainly devise a better solution...


Just my 2c

Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2014-12-03 Thread José Luis Tallón

On 12/03/2014 11:24 AM, Andres Freund wrote:

On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:

snip]

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.


Got it, thank you for the explanation.


May I possibly suggest a file-per-schema model instead? This approach would
certainly solve the excessive i-node consumption problem that --I guess--
Andres is trying to address here.

I don't think that really has any advantages.


Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have 
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences save 
for *massively* multi-tenant databases.


The downside being that all that random I/O --- in general, it can't 
really be sequential unless there are very very few sequences--- can't 
be spread to other spindles. Create a sequence_default_tablespace GUC 
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe?

 (I could take a shot at the patch, if deemed worthwhile)


[snip]

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...


Yes, you are right. I stand corrected.


[snip]
I think there's pretty much no chance of accepting more than one page
per sequence


Definitively.


Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PITR failing to stop before DROP DATABASE

2014-11-25 Thread José Luis Tallón

On 11/25/2014 11:01 PM, Tomas Vondra wrote:

[snip]

So you could see the same after crash recovery, but it's a lot easier to
reproduce with PITR.

So we remove the files, and if there happens to be a crash at the right
moment, it results in a database with a record in pg_database, but no
directory/files?

Is it possible to cancel the command half-way through, leaving the
database in an essentially broken state (half the files exists, half is
already deleted?



This could be fixed by doing DROP DATABASE the same way we do DROP
TABLE. At the DROP DATABASE command, just memorize the OID of the
dropped database, but don't delete anything yet. Perform the actual
deletion after flushing the commit record to disk. But then you would
have the opposite problem - you might be left with a database that's
dropped according to pg_database, but the files are still present on disk.


A rename from oid/ to oid.del/ is indeed atomic ...

... and so DROP DATABASE could become:
 - lock database (sic)
 - rename dir
 - delete from pg_database
 - flush commit record to disk
 - background? deletion of all files
 (the command would not return until this last phase finished)

If we cancel before flushing, the undo (and recovery after a crash) 
would just rename the directory back in place.
If we flushed, recovery after re-starting postmaster would simply delete 
the remaining files within oid.del/ and then rmdir the directory itself.


  ... since essentially any other possibility will just leave an 
inconsistent db around and wastes space unnecesarily.


Just my two cents, of course.


I'm in favor of modifying DROP DATABASE so that it behaves like DROP
TABLE. I find it more consistent (two DROP commands should not really do
such wildly different things).


+1

ISTM this would allow us to remove the ForceSyncCommit(); and allow DROP
DATABASE to be executed within a transaction (say, within a management
script etc.).



Regards,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes superuser review

2014-11-21 Thread José Luis Tallón

On 11/06/2014 03:31 AM, Robert Haas wrote:

[snip]

We haven't reached consensus on this one yet and I didn't want it to fall
too far off the radar.

Here is what I summarize as the current state of the discussion:

1. Syntax:

ALTER ROLE role { ADD | DROP } CAPABILITY capability


Though a bit late to this thread, I would like to request comments on 
potentially beneficial new  roles ?? and/or capabilities which I have 
recently found needing myself.

The suggested syntax looks intuitive and potentially very flexible.
I'll try to summarize up what I recall from the thread plus my own 
itchs, to try and get others to comment and expand on the matter.


We currently have:
* SUPERUSER/ CREATEUSER
* CREATEDB
* CREATEROLE
* LOGIN
* REPLICATION

(plus INHERITS and ADMIN options, of course)

It has also been suggested to include a
* BACKUP role (capability?) i.e. ability to take an snapshot and 
read all relations, views, triggers and functions (even bypassing RLS) 
and the catalog in order to produce a full, consistent dump of the whole 
cluster.


and I seem to recall something along the lines of
* AUDIT, potentially limited to just engage

I am hereby suggesting the addition of a
* MAINTENANCE role, which would be able to perform VACUUM, ANALYZE, 
REINDEX *CONCURRENTLY* and REFRESH MATERIALIZED VIEW *CONCURRENTLY* ... 
and potentially even ALTER TABLE VALIDATE CONSTRAINT (if we are able to 
produce a non-blocking/fully concurrent version)


... which might become very useful for DBAs wishing to use some 
password-less roles for scheduled maintenance routines while at the same 
time reducing the exposure.



While at it, the replication role might as well gain the ability to 
promote/demote a cluster (standby-active), or shall it be some kind of 
FAILOVER role/capability ?




Thanks in advance.

/ J.L.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PageRepairFragmentation performance

2014-11-18 Thread José Luis Tallón

On 11/18/2014 07:03 PM, Heikki Linnakangas wrote:
When profiling replay the WAL generated by pgbench, I noticed the 
PageRepairFragmentation consumes a large fraction of the CPU time:

[snip]

1. Replace the qsort with something cheaper. The itemid arrays being 
sorted are small, a few hundred item at most, usually even smaller. In 
this pgbench test case I used, the typical size is about 60. With a 
small array a plain insertion sort is cheaper than the generic 
qsort(), because it can avoid the function overhead etc. involved with 
generic qsort. Or we could use something smarter, like a radix sort, 
knowing that we're sorting small integers. Or we could implement an 
inlineable version of qsort and use that.
IIRC, we would have a theoretical complexity of quicksort and radix sort 
should be approximately the same for 256-1024 items... O(n*log(n)) vs 
O(d*n), where d is ~log2(n) or just 16 in this case. However, 
lexicographical (bitstring-wise ordering) might not be what we are 
aiming for here


AFAIK, an inlined quicksort should be about the best performing sort 
available (most of the enhancement coming from staying within the I-cache)


2. Instead of sorting the array and using memmove in-place, we could 
copy all the tuples to a temporary buffer in arbitrary order, and 
finally copy the temporary copy back to the buffer. That requires two 
memory copies per tuple, instead of one memmove, but memcpy() is 
pretty darn fast. It would be a loss when there are only a few large 
tuples on the page, so that avoiding the sort doesn't help, or when 
the tuples are mostly already in the correct places, so that most of 
the memmove()s are no-ops. But with a lot of small tuples, it would be 
a win, and it would be simple.


Memmove *should* be no slower than memcpy if both are actually 
translated by the compiler to use intrinsics as opposed to calling the 
functions --- as it seems to be done here (cfr. __memmove_ssse3_back )
A simple if in order to eliminate the no-op memmoves might as well do 
it, too.


Just my two (euro) cents, though

The second option would change behaviour slightly, as the tuples would 
be placed on the page in different physical order than before. It 
wouldn't be visible to to users, though.


I spent some time hacking approach 1, and replaced the qsort() call 
with a bucket sort. I'm not sure if a bucket sort is optimal, or 
better than a specialized quicksort implementation, but it seemed simple.


With the testcase I've been using - replaying about 2GB of WAL 
generated by pgbench - this reduces the replay time from about 55 s to 
45 s.


Not bad at all... though I suspect most of it might come from staying 
within the I-cache as opposed to regular qsort.

The smaller itemIdSortData structure surely helps a bit, too :)

Thoughts? Attached is the patch I put together. It's actually two 
patches: the first is just refactoring, putting the common code 
between PageRepairFragmentation, PageIndexMultiDelete, and 
PageIndexDeleteNoCompact to function. The second replaces the qsort(). 


Definitively worth-while, even if just for the refactor. The speed-up 
sounds very good, too.




Thanks,

/ J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-03 Thread José Luis Tallón

On 10/03/2014 11:02 AM, Dimitri Fontaine wrote:

Jim Nasby jim.na...@bluetreble.com writes:

EXPLAIN
ALTER TABLE 

I'm thinking it would be better to have something you could set at a session
level, so you don't have to stick EXPLAIN in front of all your DDL.


We were considering the potential needs of accidental DBAs here at 
first, if memory serves me well.
I definitively see the value of EXPLAIN [DDL STATEMENT]... even if 
implemented as a regular this is what should happen, without even 
attempting to run a thing (not even dry run transactions), but 
including the full catalog search / attribute resolution etc.
Providing insight on the locking that would happen sounds like a real 
life-saver for many real life situations (i.e. would this ALTER 
completely lock my application due to the constant logging-to-table?)

This is, obviously IMHO.

Yeah I'm coming into that camp too, and I think the Event Trigger idea
gets us halfway there. Here's a detailed sketched of how it would work:

  1. preparatory steps: install the Event Trigger
  
 create extension norewrite;


  2. test run:

 psql -1 -f ddl.sql
 ERROR: Table Rewrite has been cancelled.

  3. Well actually we need to run that thing in production

 BEGIN;
   ALTER EVENT TRIGGER norewrite DISABLE;
   \i ddl.sql
   ALTER EVENT TRIGGER norewrite ENABLE;
 COMMIT;

Then it's also possible to have another Event Trigger that would
automatically issue a LOCK table NOWAIT; command before any DDL
against a table is run, in another extension:

   create extension ddl_lock_nowait;

The same applies, if your production rollout is blocked repeatedly and
you want to force it through at some point, it's possible to disable the
event trigger within the DDL script/transaction.


This serves a different purpose which is, at least, as worthwhile as the 
former: provide a real dry run mechanism for advanced users.
Stephen's delta fork sounds like a promising approach ... even if a 
bit too Oracle-ish (sounds an awful lot like UNDO logs!) for my liking.



As for the dry-run idea, I don't think that's really necessary. I've never
seen anyone serious that doesn't have a development environment, which is
where you would simply deploy the real DDL using verbose mode and see what
the underlying commands actually do.

The major drawback of the Event Trigger idea is that the transaction is
cancelled as soon as a Rewrite Event is fired when you have installed
the protective trigger. It means that you won't see the next problem
after the first one, so it's not a dry-run.

But considering what you're saying here, it might well be enough.


It is a very convenient first step (minimally invasive, and good use of 
existing infrastructure)... since it allows an easy testing phase in 
order to iron out potential shortcomings and gather input on some other 
applications.



Thanks,

/ Jose



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DDL Damage Assessment

2014-10-02 Thread José Luis Tallón
On 10/02/2014 06:30 PM, Dimitri Fontaine wrote:
 Hi fellow hackers,
 [snip]
 Questions:

  1. Do you agree that a systematic way to report what a DDL command (or
 script, or transaction) is going to do on your production database
 is a feature we should provide to our growing user base?

Yes, please

  2. What do you think such a feature should look like?
EXPLAIN [(verbose, format)] [DDL_COMMAND]

as in:
EXPLAIN (verbose on, format text, impact on)
ALTER TABLE emp
ADD COLUMN foo2 jsonb NOT NULL DEFAULT '{}';

where the output would include something like:

...
EXCLUSIVE LOCK ON TABLE emp;   // due to IMPACT ON
REWRITE TABLE emp due to adding column foo2 (default='{}'::jsonb)   
// due to VERBOSE on
...


 3. Does it make sense to support the whole set of DDL commands from the
 get go (or ever) when most of them are only taking locks in their
 own pg_catalog entry anyway?

For completeness sake, yes.
But, unless the impact and verbose modifiers are specified, most
would be quite self-explanatory:

EXPLAIN (verbose on, impact on) TRUNCATE TABLE emp;
Execution plan:
- EXCLUSIVE LOCK ON TABLE emp;

- truncate index: II (file=N)  // 
= relfilenode
- truncate main fork: N (tablespace: T)// 
= relfilenode
- truncate visibility map

- RELEASE LOCK ON TABLE emp;

Summary: Z pages ( MMM MB ) would be freed

versus a simple:
EXPLAIN TRUNCATE TABLE emp;
Execution plan:
- truncate index: emp_pkey
- truncate index: emp_foo2_idx
- truncate relation emp


 Provided that we are able to converge towards a common enough answer to
 those questions, I propose to hack my way around and send patches to
 have it (the common answer) available in the next PostgreSQL release.


Sounds very good, indeed.
Count on me as tester :)


--
José Luis Tallón




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the concept of a DATABASE

2012-05-22 Thread José Luis Tallón

On 22/05/12 11:46, Simon Riggs wrote:

On 21 May 2012 20:40, Stephen Frostsfr...@snowman.net  wrote:

This is important. I like the idea of breaking down the barriers
between databases to allow it to be an option for one backend to
access tables in multiple databases. The current mechanism doesn't
actually prevent looking at data from other databases using internal
APIs, so full security doesn't exist. It's a very common user
requirement to wish to join tables stored in different databases,
which ought to be possible more cleanly with correct privileges.

That's really a whole different ball of wax and I don't believe what
Robert was proposing would actually allow that to happen due to the
other database-level things which are needed to keep everything
consistent...  That's my understanding, anyway.  I'd be happy as anyone
if we could actually make it work, but isn't like the SysCache stuff per
database?  Also, cross-database queries would actually make it more
difficult to have per-database roles, which is one thing that I was
hoping we might be able to work into this, though perhaps we could have
a shared roles table and a per-database roles table and only 'global'
roles would be able to issue cross-database queries..


IMVHO:  s/database/schema/g does resolve many of the problems that you 
were referring to... and 'dblink' should solve the rest, right?
Please, feel free to point out what I am (most probably) not considering 
-- not experienced enough yet :)


On the other hand, the separation of databases allows what otherwise 
would only be possible by using multiple instances of the database 
server (à la Oracle, AFAIK ) -- save for resource management, but that 
is another question whatsoever.



So collecting a few requirements from various places:

* Ability to have a Role that can only access one Database


Yes, please


* Allow user info to be dumped with a database, to make a db
completely self-consistent

+1


* Allow databases to be transportable
+1. Ideally, the binary format could be make platform-independent, so 
that a snapshot/rsync of the cluster can span architectures easily.
AFAIK, endianness-change is relatively cheap on current processors [1 
ASM instruction?] and it's not like we are memory-mapping tuples anyway 
(TOASTed values can certainly not be mapped), so it shouldn't be 
noticeable performance-wise.



* Allow users to access tables in1 database easily, with appropriate rights.


See above, but I am probably wrong ...



I don't see any reasons why these things would be against each other.


Look quite orthogonal to me.


The main objectives are to make a Database a more easily used
administrative grouping. At present, people who use multiple Databases
face many problems - they aren't as separate as you'd like, but
neither can they be ignored when required.

The idea of one main database per session is fine, but wiring it so
closely into the backend has a few disadvantages, many of them weird
internal things.

Are there arguments against those requirements before we spend time on
design/thinking?


OTOH, the postmaster/cluster - session/database coupling looks to me 
clean, simple... and seems to make the code simpler. This is can only be 
good (but again, I don't know enough yet to be sure)


Regards,

Jose Luis Tallon




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the concept of a DATABASE

2012-05-22 Thread José Luis Tallón

On 22/05/12 13:24, Simon Riggs wrote:

On 22 May 2012 12:05, José Luis Tallónjltal...@nosys.es  wrote:


IMVHO:  s/database/schema/g does resolve many of the problems that you were
referring to... and 'dblink' should solve the rest, right?
Please, feel free to point out what I am (most probably) not considering --
not experienced enough yet :)

The choice of schema/database is an important one. If you get it
wrong, you are in major difficulty. In many cases schemas would be a
better choice, but not in all cases. So I'm interested in solving the
problems for people who have multiple databases on same server.


Ok. Understood.
Thank you for the clarification


dblink is the only solution, but its very poor way to do this when we
have 2 databases on same server.

My thinking is that reaching out to multiple databases is actually
mostly easy, except in a few places where dbid is hardwired into the
backend.


The only drawback I see is that it might weaken the separation.

Even though arguably a kludge, dblink could have a shortcut added, 
whereby connections to another database within the same cluster would be 
serviced directly within the backend, as opossed to opening a new db 
connection. This is effectively a fastpath within dblink, which 
optimizes a relatively common case while at the same time not loosing 
generality.



On the other hand, the separation of databases allows what otherwise would
only be possible by using multiple instances of the database server (à la
Oracle, AFAIK ) -- save for resource management, but that is another
question whatsoever.

Separation of databases is fine. I have no intention to change that,
as long as the user wishes that.


Perfect.

Thanks,

Jose Luis Tallon


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the concept of a DATABASE

2012-05-22 Thread José Luis Tallón

On 22/05/12 13:47, Simon Riggs wrote:

On 22 May 2012 12:35, Florian Pflugf...@phlo.org  wrote:

* Allow users to access tables in1 database easily, with appropriate rights.

That one I'm very sceptical about. In the long run, I think we want better
separation of databases, not less, and this requirement carries a huge risk
of standing in the way of that. Also, I think that once we integrate the 
postgres
FDW into core (that's the plan in the long run, right?), we're going to get
a good approximation of that essentially for free.

It's a poor approximation of it, free or not.

If it actually worked well, I'd be happy. It doesn't. No proper
transaction support, no session pool, poor planning etc etc. At best
its a band-aid, not something sufficiently good to recommend for
general production use.

- Transaction support: should be fixed, I guess.

- Session pool: is this really needed? I would it externally -- for 
example, by specifying a connection string to a pgPool/pgBouncer as 
opposed to directly to the db server.


- Planning: add a tunable specifying a higher cost (with a exception for 
cluster-local dblinks, which would have a lower cost), and the rest 
should be straightforward. Of course, planning would'nt be as accurate 
---we can't access the other db statistics in order to potentially 
rewrite conditions---, but I don't think that would be a proper approach 
(separation concerns, again)



FDWs are very good but aren't good enough for intra-database usage.


The optimization I just proposed (plus the required fixes to FDW) might 
very well solve this, while providing useful enhancements for all users, 
whether they are accessing cluster-local databases or not (or even truly 
foreign datasources)



Regards,

Jose Luis Tallon


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers