Escaping placeholders, take 2
Wherein I attempt to summarize the requirements, the background, the options, the risks, and the counter-proposal ... ** Background Drivers that support placeholders parse the SQL statement to find them. The parsing is really just simple tokenization that only knows how to skip over quoted strings, quoted identifiers, and comments for the SQL dialect used by the database. The remaining text is checked for placeholders. A question mark is the international SQL standard for placeholders. Per ISO/IEC 9075-2:2003 section 4.24 Dynamic SQL concepts: In SQL-statements that are executed dynamically, the parameters are called dynamic parameters (dynamic parameter specifications) and are represented in SQL language by a question mark (?). Some drivers also support placeholders expressed as a colon ':' followed by a number and/or a name. Some drivers support other forms like a dollar '$' followed by a number. Different types of placeholders cannot be mixed within a statement. The number of parameters passed to execute (etc.) has to match the number of placeholders found in the statement or an error is reported. An individual SQL statement typically comes primarily from one of these sources: 1. Written literally in the source code, perhaps with some $interpolation 2. Constructed dynamically, in whole or part, using a module like SQL::Abstract 3. From an external source, like an SQL script file or 'catalog'. ** Historical Issues and Work-arounds Sometimes the driver will identify placeholders in the SQL statement that weren't intended by the developer. So some drivers have added ad-hoc ways to limit the parsing of placeholders. For example: pg_direct - query passed directly without parsing for placeholders at all pg_placeholder_dollaronly - question marks are not treated as placeholders pg_placeholder_nocolons - colons are not treated as placeholders ora_placeholders - disable processing of all placeholders odbc_ignore_named_placeholders - disable processing of :foo placeholders In most cases these have been added as a workaround for statements that contain colons followed by words that the driver was treating as placeholders. (For example :old and :new when defining Oracle triggers.) The exception is pg_placeholder_dollaronly which was added in 2008 to allow use of postgres geometric operators like ?|| (are parallel) See http://www.postgresql.org/docs/9.4/static/functions-geometry.html So supporting multiple styles of placeholders has both caused problems and allowed drivers to dodge the bullet when one style has become a problem in a particular situation. But that's only a limited solution. It doesn't promote the development and use of generic higher-level reusable modules. (You could frame the progress of software development in terms of enabling developers to work at ever-higher levels of abstraction.) ** Limitation of Existing Workarounds A key problem with these attribute workarounds is that they affect the entire statement. That's a good approach when loading statements from an external source, and often ok when writing individual DBI calls directly where an alternative placeholder style or quote()'ing can be used. In modern application development, however, the DBI is just another foundation layer, low down in the stack. Above it are modules like SQL::Abstract, and ORMs like DBIx::Class. Companies have large and growing investments in these stacks, plus the modules they've built over them to provide abstraction and encapsulation of business logic. For these large applications the attribute workarounds are rarely helpful. Working 'high up' in an application stack that's dynamically constructing SQL with placeholders it should be possible to express any desired behavior. ** Requirements Some mechanism is needed to allow an individual character sequence that normally represents a placeholder, to appear in the SQL without it being treated as a placeholder by the driver. To escape it's usual meaning. If escaping question mark placeholders was already defined in 2008 then pg_placeholder_dollaronly might not have been added to DBD::Pg. The issue of escaping placeholders is on the table now because PostgreSQL supports a JSON type with a rich set of features and operators, some of which use a question mark. For example, the expression json_type_field ? 'bar' is true if the string 'bar' exists as a key/element exist within the JSON value of json_type_field. For more information see http://www.postgresql.org/docs/9.4/static/functions-json.html Use of JSON and these operators is likely to be much more common than use of the geometric operators mentioned earlier. So, while the needs of Postgres users is what prompted this topic now, the requirement is more general. It would be good to agree on an approach that could be applied to most drivers if the need arises. ** Options There seem to be several options worth considering 1. Backslash A backslash could be placed
Re: Escaping placeholders, take 2
Thank you for this post Tim, it seemed to lay out the issues well and make a lot of things clear. I'm now inclined to support your backslash-escape proposal at the DBI driver level in principle. (I also agree that the doubling method is nasty.) Or alternately I suggest variation on that proposal that brings in shades of the vendor escape clauses. I suggest a variant where instead of a single backslash character indicating an escape, we have some multi-character thing to indicate it, ideally involving delimiters so it is more clear on how far the effect of the new escape feature is supposed to go. For example, using \{?} rather than \? or \{:foo} rather than \:foo. One benefit of that is if you have some SQL that contains ? or : numerous times, you only need to surround the whole fragment with \{} and not individually escape each occurrence, making for neater code. As to dealing with say literal { or } in the SQL as I could see reasonably happening, the quoting mechanism could be made more generic like Perl's q/etc, so for example it would take the form \X...X where whatever character appears after the \ is what is matched, and it could be a multiplicity if necessary within reason, eg \{{{...}}} would just work. For that matter, heredocs or the quoted printable feature you can see in email messages or such, eg you have \foo...foo or some such. I'm speaking in principle here, I'm not proposing a specific feature set, but both q as well as Perl 6's related quoting mechanism is useful for guidance, and I think something involving delimiters is best. But if some of that sounds unduly complicated, I have a better idea. I propose that the DBI include an API for users to tell the driver what possible escape delimiters they are using. For example, doable at least at a statement level (and optionally on a connection/etc level for defaulting if that makes sense). The API could involve an 'attr' given when preparing a SQL statement or other appropriate places. placeholder_escape_delimiters = [ '\{','}' ] placeholder_escape_delimiters = [ '\{','}','\[',']' ] placeholder_escape_delimiters = [ '\{{{','}}}' ] placeholder_escape_delimiters = [ '{{{','}}}' ] In this way, the backslash is no longer special, or necessary, though I anticipate it would often still be used for the mnemonics. Rather, when the driver is parsing the SQL for placeholders, if it encounters any left delimiter strings, if will leave the following SQL unaltered until it encounters the corresponding right delimiter string, and then it looks for placeholders again. (As to numbered placeholders, which are effectively a special case of named placeholders, not being directly composable in SQL::Abstract, I see that as being a problem itself. It would be a great help to developers in principle if the native way for working with parameters was named rather than positionally. However, that is really a separate matter to deal with and I think it is a good idea for Tim's proposal in some form to happen regardless of dealing with this separate matter.) -- Darren Duncan
Re: Escaping placeholders, take 2
Hi Darren On 29/12/14 13:59, Darren Duncan wrote: The API could involve an 'attr' given when preparing a SQL statement or other appropriate places. placeholder_escape_delimiters = [ '\{','}' ] placeholder_escape_delimiters = [ '\{','}','\[',']' ] placeholder_escape_delimiters = [ '\{{{','}}}' ] placeholder_escape_delimiters = [ '{{{','}}}' ] In this way, the backslash is no longer special, or necessary, though I anticipate it would often still be used for the mnemonics. Rather, when the driver is parsing the SQL for placeholders, if it encounters any left delimiter strings, if will leave the following SQL unaltered until it encounters the corresponding right delimiter string, and then it looks for placeholders again. Perhaps a job for https://metacpan.org/release/Text-Balanced-Marpa -- Ron Savage - savage.net.au