Re: [GENERAL] regclass and format('%I')

2015-03-19 Thread Jason Dusek
On 15 March 2015 at 08:44, Tom Lane  wrote:

> "David G. Johnston"  writes:
> > ​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see
> how
> > it is possible ​for format to lay in a value at %I that is any more
> > insecure than the current behavior.  If the input string already matches
> > that pattern then it could be output as-is without any additional risk
> and
> > with the positive benefit of making this case work as expected.  The
> broken
> > case then exists when someone actually intends to name their identifier
> > <"something"> which then correctly becomes <"""something"""> on output.
>
> But that's exactly the problem: you just broke a case that used to work.
> format('%I') is not supposed to guess at what the user intends; it is
> supposed to produce a string that, after being passed through identifier
> parsing (dequoting or downcasing), will match the input.  It is not
> format's business to break that contract just because the input has
> already got some double quotes in it.
>
> An example of where this might be important is if you're trying to
> construct a query with arbitrary column headers in the output.  You
> can do
> format('... AS %I ...', ..., column_label, ...)
> and be confident that the label will be exactly what you've got in
> column_label.  This proposed change would break that for labels that
> happen to already have double-quotes --- but who are we to say that
> that can't have been what you wanted?
>

I agree with Tom that we shouldn't key off of contents in the string to
determine whether or not to quote. Introducing the behave I describe in an
intuitive way would require some kind of type-specific handling in
format(). I'm not sure what the cost of this is to the project, but David
makes the very reasonable point that imposing the burden of choosing
between `%s` and `%I` opens up the possibility of confusing vulnerabilities.

Kind Regards,
  Jason Dusek


Re: [GENERAL] regclass and format('%I')

2015-03-15 Thread David G. Johnston
On Sunday, March 15, 2015, Tom Lane  wrote:

> "David G. Johnston" > writes:
> > ​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see
> how
> > it is possible ​for format to lay in a value at %I that is any more
> > insecure than the current behavior.  If the input string already matches
> > that pattern then it could be output as-is without any additional risk
> and
> > with the positive benefit of making this case work as expected.  The
> broken
> > case then exists when someone actually intends to name their identifier
> > <"something"> which then correctly becomes <"""something"""> on output.
>
> But that's exactly the problem: you just broke a case that used to work.
> format('%I') is not supposed to guess at what the user intends; it is
> supposed to produce a string that, after being passed through identifier
> parsing (dequoting or downcasing), will match the input.  It is not
> format's business to break that contract just because the input has
> already got some double quotes in it.
>
> An example of where this might be important is if you're trying to
> construct a query with arbitrary column headers in the output.  You
> can do
> format('... AS %I ...', ..., column_label, ...)
> and be confident that the label will be exactly what you've got in
> column_label.  This proposed change would break that for labels that
> happen to already have double-quotes --- but who are we to say that
> that can't have been what you wanted?
>
> regards, tom lane
>

Ok, but that doesn't impact security.

Contracts can be amended to be more practical and what is being suggested
is not as radical as you make it out to be.  I'm with Jason and dislike the
"use %s" option and having a behavior of "quote if necessary" is not
unreasonable.  Maybe it needs to be a different code for compatibility
reasons.  In hindsight I'd make %Q the current unconditional quoting
behavior and use %I for conditional quoting but using %Q for that now would
at least make the behavior available.

David J.


Re: [GENERAL] regclass and format('%I')

2015-03-15 Thread Tom Lane
"David G. Johnston"  writes:
> ​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
> it is possible ​for format to lay in a value at %I that is any more
> insecure than the current behavior.  If the input string already matches
> that pattern then it could be output as-is without any additional risk and
> with the positive benefit of making this case work as expected.  The broken
> case then exists when someone actually intends to name their identifier
> <"something"> which then correctly becomes <"""something"""> on output.

But that's exactly the problem: you just broke a case that used to work.
format('%I') is not supposed to guess at what the user intends; it is
supposed to produce a string that, after being passed through identifier
parsing (dequoting or downcasing), will match the input.  It is not
format's business to break that contract just because the input has
already got some double quotes in it.

An example of where this might be important is if you're trying to
construct a query with arbitrary column headers in the output.  You
can do
format('... AS %I ...', ..., column_label, ...)
and be confident that the label will be exactly what you've got in
column_label.  This proposed change would break that for labels that
happen to already have double-quotes --- but who are we to say that
that can't have been what you wanted?

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread David G. Johnston
On Sat, Mar 14, 2015 at 8:29 AM, Tom Lane  wrote:

> Jason Dusek  writes:
> > It honestly seems far more reasonable to me that %s and %I should do
> > the exact same thing with regclass.
>
> You're mistaken.  The operation of format() is first to convert the
> non-format arguments to text strings, using the output functions for their
> data types, and then to further process those text strings according to
> the format specifiers:
>
> %s -- no additional processing, just insert the string as-is.
> %I -- apply double-quoting transformation to create a valid SQL identifier.
> %L -- apply single-quoting transformation to create a valid SQL literal.
>
> In the case of regclass, the output string is already double-quoted
> as necessary, so applying %I to it produces a doubly double-quoted
> string which is almost certainly not what you want.  But it's not
> format()'s job to be smarter than the user.  If it tried to avoid
> an extra pass of double quoting, it would get some cases wrong,
> potentially creating security holes.
>
>
>
TBH ​I'm not all that convinced by this argument​.

First, it is not being smarter than the user but allowing the user to
generalize their problem so that they do not need to take the nature of the
input data into account and can write a semantically meaningful pattern
string instead.  The risk of them incorrectly choosing between %s or %I and
opening a security hole seems higher - if not as widespread - than any
string logic we could apply.

Second, presupposing the the transformation of the input must be a single
"thing", and that we are doing the %I conversion based upon our own
internal (or SQL's at the matter may be) definition of what it means to
"quote an identifier", we should be capable of noticing that the provided
input is already a single "thing" which has been escaped according to said
rules.

​IOW, as long as the output string matches: ^"(?:"{2})*"$ I do not see how
it is possible ​for format to lay in a value at %I that is any more
insecure than the current behavior.  If the input string already matches
that pattern then it could be output as-is without any additional risk and
with the positive benefit of making this case work as expected.  The broken
case then exists when someone actually intends to name their identifier
<"something"> which then correctly becomes <"""something"""> on output.

Since there is a behavior change involved there needs to be a convincing
use-case for the new behavior in order to justify the effort to change it.

David J.


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread Pavel Stehule
2015-03-15 3:09 GMT+01:00 Jason Dusek :

> On 14 March 2015 at 09:17, David G. Johnston 
> wrote:
> > On Saturday, March 14, 2015, Jason Dusek  wrote:
> >> It honestly seems far more reasonable to me that %s and %I should do
> >> the exact same thing with regclass. My reasoning is as follows:
> >>
> >> ‘%I’ formats a something such that it is a valid identifier,
> >>
> >> regclass is already a valid identifier,
> >>
> >> therefore, do nothing.
> >>
> >> Another line of reasoning:
> >>
> >> If you format with ‘%s’ you are saying: I don’t care whether it’s a
> >> valid identifier or literal or whatever, just put the string there,
> >>
> >> but when we sub a regclass into a string, we want it to be a valid
> >> identifier,
> >>
> >> therefore we should write ‘%I’ when subbing it, so as not to confuse
> >> our readers,
> >>
> >> therefore ‘%I’ should do nothing.
> >>
> >
> > I agree with the theory but adding type specific logic to format is
> going to
> > be difficult.  The first thing the system does is convert all of the
> inputs
> > to text.  Inside format() everything is text and so it has no way to know
> > that the type was regclass and should not be quoted again.
>
> Could it work to add type-specific logic for the cast from `regclass`
> to `name`? It would be nice to have something formulaic: always format
> identifiers with `%I`, always cast to `name` before formatting.
>

I don't think, so it can help - first, it is workaround and it doesn't help
for somebody who doesn't read a documentation. Same effect you can get if
you write "doesn't use %I for regclass, regtype types", although this
sentence is strange. I agree with you so it is bug or minimally not user
friendly design.

A some good solution should be safe function quote_identif, that protect us
against double quoting. This logic should be elsewhere than inside "format"
function. I am thinking so we can do it. It breaks nothing. Implementation
should not be too much complex, because "new" function quote_identif can do
nothing for safe types, so it can take string as input parameter and typid
as second parameter.

Pavel


>
> Kind Regards,
>   Jason Dusek
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread Jason Dusek
On 14 March 2015 at 09:17, David G. Johnston  wrote:
> On Saturday, March 14, 2015, Jason Dusek  wrote:
>> It honestly seems far more reasonable to me that %s and %I should do
>> the exact same thing with regclass. My reasoning is as follows:
>>
>> ‘%I’ formats a something such that it is a valid identifier,
>>
>> regclass is already a valid identifier,
>>
>> therefore, do nothing.
>>
>> Another line of reasoning:
>>
>> If you format with ‘%s’ you are saying: I don’t care whether it’s a
>> valid identifier or literal or whatever, just put the string there,
>>
>> but when we sub a regclass into a string, we want it to be a valid
>> identifier,
>>
>> therefore we should write ‘%I’ when subbing it, so as not to confuse
>> our readers,
>>
>> therefore ‘%I’ should do nothing.
>>
>
> I agree with the theory but adding type specific logic to format is going to
> be difficult.  The first thing the system does is convert all of the inputs
> to text.  Inside format() everything is text and so it has no way to know
> that the type was regclass and should not be quoted again.

Could it work to add type-specific logic for the cast from `regclass`
to `name`? It would be nice to have something formulaic: always format
identifiers with `%I`, always cast to `name` before formatting.

Kind Regards,
  Jason Dusek


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread David G. Johnston
On Saturday, March 14, 2015, Jason Dusek  wrote:

> It honestly seems far more reasonable to me that %s and %I should do
> the exact same thing with regclass. My reasoning is as follows:
>
> ‘%I’ formats a something such that it is a valid identifier,
>
> regclass is already a valid identifier,
>
> therefore, do nothing.
>
> Another line of reasoning:
>
> If you format with ‘%s’ you are saying: I don’t care whether it’s a
> valid identifier or literal or whatever, just put the string there,
>
> but when we sub a regclass into a string, we want it to be a valid
> identifier,
>
> therefore we should write ‘%I’ when subbing it, so as not to confuse
> our readers,
>
> therefore ‘%I’ should do nothing.
>
>
I agree with the theory but adding type specific logic to format is going
to be difficult.  The first thing the system does is convert all of the
inputs to text.  Inside format() everything is text and so it has no way to
know that the type was regclass and should not be quoted again.

David J.


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread Tom Lane
Jason Dusek  writes:
> It honestly seems far more reasonable to me that %s and %I should do
> the exact same thing with regclass.

You're mistaken.  The operation of format() is first to convert the
non-format arguments to text strings, using the output functions for their
data types, and then to further process those text strings according to
the format specifiers:

%s -- no additional processing, just insert the string as-is.
%I -- apply double-quoting transformation to create a valid SQL identifier.
%L -- apply single-quoting transformation to create a valid SQL literal.

In the case of regclass, the output string is already double-quoted
as necessary, so applying %I to it produces a doubly double-quoted
string which is almost certainly not what you want.  But it's not
format()'s job to be smarter than the user.  If it tried to avoid
an extra pass of double quoting, it would get some cases wrong,
potentially creating security holes.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread Pavel Stehule
2015-03-14 10:09 GMT+01:00 Jason Dusek :

> It honestly seems far more reasonable to me that %s and %I should do
> the exact same thing with regclass. My reasoning is as follows:
>
> ‘%I’ formats a something such that it is a valid identifier,
>
> regclass is already a valid identifier,
>
> therefore, do nothing.
>
> Another line of reasoning:
>
> If you format with ‘%s’ you are saying: I don’t care whether it’s a
> valid identifier or literal or whatever, just put the string there,
>
> but when we sub a regclass into a string, we want it to be a valid
> identifier,
>
> therefore we should write ‘%I’ when subbing it, so as not to confuse
> our readers,
>
> therefore ‘%I’ should do nothing.
>

yes, it is true, when you use a safe type: regclass, regtype, you should
not to use %I due double quoting.

postgres=# select 16398::regclass ;
-[ RECORD 1 ]---
regclass | "omega a"

postgres=# select format('>>%I<<<', 16398::regclass );
-[ RECORD 1 ]--
format | >>"""omega a"""<<<

Should be fixed

Regards

Pavel

Regards

Pavel



> On 13 March 2015 at 12:42, David G. Johnston 
> wrote:
> > On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek 
> wrote:
> >>
> >> Hi All,
> >>
> >> The difference in how format handles `regclass` and `name` seems like an
> >> inconsistency:
> >>
> >> WITH conversions(casts, format, result) AS (
> >> VALUES (ARRAY['name']::regtype[], '%I', format('%I',
> >> name('select'))),
> >>(ARRAY['name']::regtype[], '%L', format('%L',
> >> name('select'))),
> >>(ARRAY['name']::regtype[], '%s', format('%s',
> >> name('select'))),
> >>(ARRAY['regclass']::regtype[], '%I', format('%I',
> >> regclass('select'))),
> >>(ARRAY['regclass']::regtype[], '%L', format('%L',
> >> regclass('select'))),
> >>(ARRAY['regclass']::regtype[], '%s', format('%s',
> >> regclass('select'))),
> >>(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
> >> name(regclass('select',
> >>(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
> >> name(regclass('select',
> >>(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
> >> name(regclass('select'
> >> ) SELECT * FROM conversions;
> >>   casts  | format |result
> >> -++--
> >>  {name}  | %I | "select"
> >>  {name}  | %L | 'select'
> >>  {name}  | %s | select
> >>  {regclass}  | %I | """select"""
> >>  {regclass}  | %L | '"select"'
> >>  {regclass}  | %s | "select"
> >>  {regclass,name} | %I | """select"""
> >>  {regclass,name} | %L | '"select"'
> >>  {regclass,name} | %s | "select"
> >>
> >> My assumption is that they both represent valid SQL identifiers, so it
> >> stands
> >> to reason that `%I` should result in a valid identifier for both of them
> >> (or
> >> neither one).
> >
> >
> > All three of the %I results are valid identifiers.
> >
> > regclass performs the same conversion that %I performs.  But since the
> > output of the regclass conversion is a valid identifier, with
> double-quotes,
> > the %I adds another pair of double-quotes and doubles-up the existing
> pair
> > thus leaving you with 6.
> >
> >  is a reserved word and thus can only be used as an identifier
> if it
> > is surrounded in double-quotes.  name() doesn't care (not that it is
> > user-documented that I can find) about making its value usable as an
> > identifier so when its output goes through %I you get the expected value.
> >
> > If you are going to use regclass you want to use %s to insert the result
> > into your string; not %I.
> >
> > David J.
> >
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] regclass and format('%I')

2015-03-14 Thread Jason Dusek
It honestly seems far more reasonable to me that %s and %I should do
the exact same thing with regclass. My reasoning is as follows:

‘%I’ formats a something such that it is a valid identifier,

regclass is already a valid identifier,

therefore, do nothing.

Another line of reasoning:

If you format with ‘%s’ you are saying: I don’t care whether it’s a
valid identifier or literal or whatever, just put the string there,

but when we sub a regclass into a string, we want it to be a valid identifier,

therefore we should write ‘%I’ when subbing it, so as not to confuse
our readers,

therefore ‘%I’ should do nothing.

On 13 March 2015 at 12:42, David G. Johnston  wrote:
> On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek  wrote:
>>
>> Hi All,
>>
>> The difference in how format handles `regclass` and `name` seems like an
>> inconsistency:
>>
>> WITH conversions(casts, format, result) AS (
>> VALUES (ARRAY['name']::regtype[], '%I', format('%I',
>> name('select'))),
>>(ARRAY['name']::regtype[], '%L', format('%L',
>> name('select'))),
>>(ARRAY['name']::regtype[], '%s', format('%s',
>> name('select'))),
>>(ARRAY['regclass']::regtype[], '%I', format('%I',
>> regclass('select'))),
>>(ARRAY['regclass']::regtype[], '%L', format('%L',
>> regclass('select'))),
>>(ARRAY['regclass']::regtype[], '%s', format('%s',
>> regclass('select'))),
>>(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
>> name(regclass('select',
>>(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
>> name(regclass('select',
>>(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
>> name(regclass('select'
>> ) SELECT * FROM conversions;
>>   casts  | format |result
>> -++--
>>  {name}  | %I | "select"
>>  {name}  | %L | 'select'
>>  {name}  | %s | select
>>  {regclass}  | %I | """select"""
>>  {regclass}  | %L | '"select"'
>>  {regclass}  | %s | "select"
>>  {regclass,name} | %I | """select"""
>>  {regclass,name} | %L | '"select"'
>>  {regclass,name} | %s | "select"
>>
>> My assumption is that they both represent valid SQL identifiers, so it
>> stands
>> to reason that `%I` should result in a valid identifier for both of them
>> (or
>> neither one).
>
>
> All three of the %I results are valid identifiers.
>
> regclass performs the same conversion that %I performs.  But since the
> output of the regclass conversion is a valid identifier, with double-quotes,
> the %I adds another pair of double-quotes and doubles-up the existing pair
> thus leaving you with 6.
>
>  is a reserved word and thus can only be used as an identifier if it
> is surrounded in double-quotes.  name() doesn't care (not that it is
> user-documented that I can find) about making its value usable as an
> identifier so when its output goes through %I you get the expected value.
>
> If you are going to use regclass you want to use %s to insert the result
> into your string; not %I.
>
> David J.
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] regclass and format('%I')

2015-03-13 Thread David G. Johnston
On Fri, Mar 13, 2015 at 12:18 PM, Jason Dusek  wrote:

> Hi All,
>
> The difference in how format handles `regclass` and `name` seems like an
> inconsistency:
>
> WITH conversions(casts, format, result) AS (
> VALUES (ARRAY['name']::regtype[], '%I', format('%I',
> name('select'))),
>(ARRAY['name']::regtype[], '%L', format('%L',
> name('select'))),
>(ARRAY['name']::regtype[], '%s', format('%s',
> name('select'))),
>(ARRAY['regclass']::regtype[], '%I', format('%I',
> regclass('select'))),
>(ARRAY['regclass']::regtype[], '%L', format('%L',
> regclass('select'))),
>(ARRAY['regclass']::regtype[], '%s', format('%s',
> regclass('select'))),
>(ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
> name(regclass('select',
>(ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
> name(regclass('select',
>(ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
> name(regclass('select'
> ) SELECT * FROM conversions;
>   casts  | format |result
> -++--
>  {name}  | %I | "select"
>  {name}  | %L | 'select'
>  {name}  | %s | select
>  {regclass}  | %I | """select"""
>  {regclass}  | %L | '"select"'
>  {regclass}  | %s | "select"
>  {regclass,name} | %I | """select"""
>  {regclass,name} | %L | '"select"'
>  {regclass,name} | %s | "select"
>
> My assumption is that they both represent valid SQL identifiers, so it
> stands
> to reason that `%I` should result in a valid identifier for both of them
> (or
> neither one).
>

​All three of the %I results are valid identifiers.

regclass performs the same conversion that %I performs.  But since the
output of the regclass conversion is a valid identifier, with
double-quotes, the %I adds another pair of double-quotes and doubles-up the
existing pair thus leaving you with 6.

 is a reserved word and thus can only be used as an identifier if
it is surrounded in double-quotes.  name() doesn't care (not that it is
user-documented that I can find) about making its value usable as an
identifier so when its output goes through %I you get the expected value.

​If you are going to use regclass you want to use %s to insert the result
into your string; not %I​.

David J.


[GENERAL] regclass and format('%I')

2015-03-13 Thread Jason Dusek
Hi All,

The difference in how format handles `regclass` and `name` seems like an
inconsistency:

WITH conversions(casts, format, result) AS (
VALUES (ARRAY['name']::regtype[], '%I', format('%I',
name('select'))),
   (ARRAY['name']::regtype[], '%L', format('%L',
name('select'))),
   (ARRAY['name']::regtype[], '%s', format('%s',
name('select'))),
   (ARRAY['regclass']::regtype[], '%I', format('%I',
regclass('select'))),
   (ARRAY['regclass']::regtype[], '%L', format('%L',
regclass('select'))),
   (ARRAY['regclass']::regtype[], '%s', format('%s',
regclass('select'))),
   (ARRAY['regclass', 'name']::regtype[], '%I', format('%I',
name(regclass('select',
   (ARRAY['regclass', 'name']::regtype[], '%L', format('%L',
name(regclass('select',
   (ARRAY['regclass', 'name']::regtype[], '%s', format('%s',
name(regclass('select'
) SELECT * FROM conversions;
  casts  | format |result
-++--
 {name}  | %I | "select"
 {name}  | %L | 'select'
 {name}  | %s | select
 {regclass}  | %I | """select"""
 {regclass}  | %L | '"select"'
 {regclass}  | %s | "select"
 {regclass,name} | %I | """select"""
 {regclass,name} | %L | '"select"'
 {regclass,name} | %s | "select"

My assumption is that they both represent valid SQL identifiers, so it stands
to reason that `%I` should result in a valid identifier for both of them (or
neither one).

Kind Regards,
  Jason Dusek


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general