Escaping placeholders, take 2

2014-12-28 Thread Tim Bunce
Wherein I attempt to summarize the requirements, the background, the options,
the risks, and the counter-proposal ...

** Background

Drivers that support placeholders parse the SQL statement to find them.
The parsing is really just simple tokenization that only knows how to skip
over quoted strings, quoted identifiers, and comments for the SQL dialect
used by the database. The remaining text is checked for placeholders.

A question mark is the international SQL standard for placeholders.
Per ISO/IEC 9075-2:2003 section 4.24 Dynamic SQL concepts:

In SQL-statements that are executed dynamically, the parameters are
called dynamic parameters (dynamic parameter specifications) and
are represented in SQL language by a question mark (?).

Some drivers also support placeholders expressed as a colon ':' followed
by a number and/or a name. Some drivers support other forms like a dollar
'$' followed by a number. Different types of placeholders cannot be
mixed within a statement.

The number of parameters passed to execute (etc.) has to match the number
of placeholders found in the statement or an error is reported.

An individual SQL statement typically comes primarily from one of these sources:
1. Written literally in the source code, perhaps with some $interpolation
2. Constructed dynamically, in whole or part, using a module like SQL::Abstract
3. From an external source, like an SQL script file or 'catalog'.


** Historical Issues and Work-arounds

Sometimes the driver will identify placeholders in the SQL statement
that weren't intended by the developer.  So some drivers have added
ad-hoc ways to limit the parsing of placeholders.  For example:

  pg_direct - query passed directly without parsing for placeholders at all
  pg_placeholder_dollaronly - question marks are not treated as placeholders
  pg_placeholder_nocolons - colons are not treated as placeholders
  ora_placeholders - disable processing of all placeholders
  odbc_ignore_named_placeholders - disable processing of :foo placeholders

In most cases these have been added as a workaround for statements that
contain colons followed by words that the driver was treating as placeholders.
(For example :old and :new when defining Oracle triggers.)

The exception is pg_placeholder_dollaronly which was added in 2008 to
allow use of postgres geometric operators like ?|| (are parallel)
See http://www.postgresql.org/docs/9.4/static/functions-geometry.html

So supporting multiple styles of placeholders has both caused problems
and allowed drivers to dodge the bullet when one style has become a
problem in a particular situation. But that's only a limited solution.
It doesn't promote the development and use of generic higher-level reusable
modules. (You could frame the progress of software development in terms
of enabling developers to work at ever-higher levels of abstraction.)


** Limitation of Existing Workarounds

A key problem with these attribute workarounds is that they affect the
entire statement.  That's a good approach when loading statements from
an external source, and often ok when writing individual DBI calls directly
where an alternative placeholder style or quote()'ing can be used.

In modern application development, however, the DBI is just another
foundation layer, low down in the stack. Above it are modules like
SQL::Abstract, and ORMs like DBIx::Class. Companies have large and
growing investments in these stacks, plus the modules they've built over
them to provide abstraction and encapsulation of business logic.

For these large applications the attribute workarounds are rarely helpful.

Working 'high up' in an application stack that's dynamically constructing
SQL with placeholders it should be possible to express any desired behavior.


** Requirements

Some mechanism is needed to allow an individual character sequence that
normally represents a placeholder, to appear in the SQL without it being
treated as a placeholder by the driver. To escape it's usual meaning.

If escaping question mark placeholders was already defined in 2008 then
pg_placeholder_dollaronly might not have been added to DBD::Pg.

The issue of escaping placeholders is on the table now because
PostgreSQL supports a JSON type with a rich set of features and
operators, some of which use a question mark.  For example, the
expression json_type_field ? 'bar' is true if the string 'bar' exists as
a key/element exist within the JSON value of json_type_field. For more
information see http://www.postgresql.org/docs/9.4/static/functions-json.html
Use of JSON and these operators is likely to be much more common than
use of the geometric operators mentioned earlier.

So, while the needs of Postgres users is what prompted this topic now,
the requirement is more general. It would be good to agree on an approach
that could be applied to most drivers if the need arises.


** Options

There seem to be several options worth considering

1. Backslash

A backslash could be placed 

Re: Escaping placeholders, take 2

2014-12-28 Thread Darren Duncan
Thank you for this post Tim, it seemed to lay out the issues well and make a lot 
of things clear.


I'm now inclined to support your backslash-escape proposal at the DBI driver 
level in principle.


(I also agree that the doubling method is nasty.)

Or alternately I suggest variation on that proposal that brings in shades of the 
vendor escape clauses.


I suggest a variant where instead of a single backslash character indicating an 
escape, we have some multi-character thing to indicate it, ideally involving 
delimiters so it is more clear on how far the effect of the new escape feature 
is supposed to go.


For example, using \{?} rather than \? or \{:foo} rather than \:foo.

One benefit of that is if you have some SQL that contains ? or : numerous times, 
you only need to surround the whole fragment with \{} and not individually 
escape each occurrence, making for neater code.


As to dealing with say literal { or } in the SQL as I could see reasonably 
happening, the quoting mechanism could be made more generic like Perl's q/etc, 
so for example it would take the form \X...X where whatever character appears 
after the \ is what is matched, and it could be a multiplicity if necessary 
within reason, eg \{{{...}}} would just work.  For that matter, heredocs or the 
quoted printable feature you can see in email messages or such, eg you have 
\foo...foo or some such.


I'm speaking in principle here, I'm not proposing a specific feature set, but 
both q as well as Perl 6's related quoting mechanism is useful for guidance, 
and I think something involving delimiters is best.


But if some of that sounds unduly complicated, I have a better idea.

I propose that the DBI include an API for users to tell the driver what possible 
escape delimiters they are using.  For example, doable at least at a statement 
level (and optionally on a connection/etc level for defaulting if that makes sense).


The API could involve an 'attr' given when preparing a SQL statement or other 
appropriate places.


  placeholder_escape_delimiters = [ '\{','}' ]
  placeholder_escape_delimiters = [ '\{','}','\[',']' ]
  placeholder_escape_delimiters = [ '\{{{','}}}' ]
  placeholder_escape_delimiters = [ '{{{','}}}' ]

In this way, the backslash is no longer special, or necessary, though I 
anticipate it would often still be used for the mnemonics.


Rather, when the driver is parsing the SQL for placeholders, if it encounters 
any left delimiter strings, if will leave the following SQL unaltered until it 
encounters the corresponding right delimiter string, and then it looks for 
placeholders again.


(As to numbered placeholders, which are effectively a special case of named 
placeholders, not being directly composable in SQL::Abstract, I see that as 
being a problem itself.  It would be a great help to developers in principle if 
the native way for working with parameters was named rather than positionally. 
However, that is really a separate matter to deal with and I think it is a good 
idea for Tim's proposal in some form to happen regardless of dealing with this 
separate matter.)


-- Darren Duncan



Re: Escaping placeholders, take 2

2014-12-28 Thread Ron Savage

Hi Darren

On 29/12/14 13:59, Darren Duncan wrote:


The API could involve an 'attr' given when preparing a SQL statement or
other appropriate places.

   placeholder_escape_delimiters = [ '\{','}' ]
   placeholder_escape_delimiters = [ '\{','}','\[',']' ]
   placeholder_escape_delimiters = [ '\{{{','}}}' ]
   placeholder_escape_delimiters = [ '{{{','}}}' ]

In this way, the backslash is no longer special, or necessary, though I
anticipate it would often still be used for the mnemonics.

Rather, when the driver is parsing the SQL for placeholders, if it
encounters any left delimiter strings, if will leave the following SQL
unaltered until it encounters the corresponding right delimiter string,
and then it looks for placeholders again.


Perhaps a job for https://metacpan.org/release/Text-Balanced-Marpa

--
Ron Savage - savage.net.au