Wherein I attempt to summarize the requirements, the background, the options, the risks, and the counter-proposal ...
** Background Drivers that support placeholders parse the SQL statement to find them. The parsing is really just simple tokenization that only knows how to skip over quoted strings, quoted identifiers, and comments for the SQL dialect used by the database. The remaining text is checked for placeholders. A question mark is the international SQL standard for placeholders. Per ISO/IEC 9075-2:2003 section 4.24 Dynamic SQL concepts: In SQL-statements that are executed dynamically, the parameters are called dynamic parameters (<dynamic parameter specification>s) and are represented in SQL language by a <question mark> (?). Some drivers also support placeholders expressed as a colon ':' followed by a number and/or a name. Some drivers support other forms like a dollar '$' followed by a number. Different types of placeholders cannot be mixed within a statement. The number of parameters passed to execute (etc.) has to match the number of placeholders found in the statement or an error is reported. An individual SQL statement typically comes primarily from one of these sources: 1. Written literally in the source code, perhaps with some $interpolation 2. Constructed dynamically, in whole or part, using a module like SQL::Abstract 3. From an external source, like an SQL script file or 'catalog'. ** Historical Issues and Work-arounds Sometimes the driver will identify placeholders in the SQL statement that weren't intended by the developer. So some drivers have added ad-hoc ways to limit the parsing of placeholders. For example: pg_direct - query passed directly without parsing for placeholders at all pg_placeholder_dollaronly - question marks are not treated as placeholders pg_placeholder_nocolons - colons are not treated as placeholders ora_placeholders - disable processing of all placeholders odbc_ignore_named_placeholders - disable processing of :foo placeholders In most cases these have been added as a workaround for statements that contain colons followed by words that the driver was treating as placeholders. (For example :old and :new when defining Oracle triggers.) The exception is pg_placeholder_dollaronly which was added in 2008 to allow use of postgres geometric operators like "?||" ("are parallel") See http://www.postgresql.org/docs/9.4/static/functions-geometry.html So supporting multiple styles of placeholders has both caused problems and allowed drivers to "dodge the bullet" when one style has become a problem in a particular situation. But that's only a limited solution. It doesn't promote the development and use of generic higher-level reusable modules. (You could frame the progress of software development in terms of enabling developers to work at ever-higher levels of abstraction.) ** Limitation of Existing Workarounds A key problem with these attribute workarounds is that they affect the entire statement. That's a good approach when loading statements from an external source, and often ok when writing individual DBI calls directly where an alternative placeholder style or quote()'ing can be used. In modern application development, however, the DBI is just another foundation layer, low down in the stack. Above it are modules like SQL::Abstract, and ORMs like DBIx::Class. Companies have large and growing investments in these stacks, plus the modules they've built over them to provide abstraction and encapsulation of business logic. For these large applications the attribute workarounds are rarely helpful. Working 'high up' in an application stack that's dynamically constructing SQL with placeholders it should be possible to express any desired behavior. ** Requirements Some mechanism is needed to allow an individual character sequence that normally represents a placeholder, to appear in the SQL without it being treated as a placeholder by the driver. To "escape" it's usual meaning. If escaping question mark placeholders was already defined in 2008 then pg_placeholder_dollaronly might not have been added to DBD::Pg. The issue of escaping placeholders is on the table now because PostgreSQL supports a JSON type with a rich set of features and operators, some of which use a question mark. For example, the expression json_type_field ? 'bar' is true if the string 'bar' exists as a key/element exist within the JSON value of json_type_field. For more information see http://www.postgresql.org/docs/9.4/static/functions-json.html Use of JSON and these operators is likely to be much more common than use of the geometric operators mentioned earlier. So, while the needs of Postgres users is what prompted this topic now, the requirement is more general. It would be good to agree on an approach that could be applied to most drivers if the need arises. ** Options There seem to be several options worth considering.... 1. Backslash A backslash could be placed immediately before a question mark to 'escape' the usual interpretation. For example: SELECT * FROM table WHERE json_field \? ? $sql_abstract->where({ json_field => { '\?' => $bar } }) I've not been able to find any SQL dialect where the backslash is even valid (outside of quoted strings and quoted identifiers) let alone valid before a placeholder. The only significant downside is the need to consider single vs double quoting when expressing the backslash: $sql_abstract->where({ json_field => { "\\?" => $bar } }) but that seems very minor as the use of backslashes in single and double quotes is natural and familar for Perl developers. 2. Doubling A few people suggested that a placeholder could be escaped by repeating it. This is reminicent of the way that quotes can be embedded within an SQL string by doubling them, like 'don''t'. For example: SELECT * FROM table WHERE json_field ?? ? $sql_abstract->where({ json_field => { '??' => $bar } }) I presume there are no database dialects where two adjacent placeholders would be valid, so I think it's safe. Doubling of a delimiter as a way to embed the delimiter within the delimited text has established precedent. However I can't think of a precedent for doubling of something that isn't a delimiter in order to escape its meaning. So, to my mind, this doesn't work well because... - The relationship to doubling of delimiters is tenuous at best. - The construct has little mnemonic value, e.g. relative to backslash. - Doesn't apply easily to colon placeholders, especially for SQL dialects that already have a meaning for double colons. 3. Vendor escape clauses The X/Open SQL CAE specification defined an escape clause as: "a syntactic mechanism for vendor-specific SQL extensions to be implemented in the framework of standardized SQL". The original verbose form has been deprecated in favor of a short form using braces that was popularized by ODBC and JDBC. For example "{ts 'value'}". See http://tinyurl.com/m9r5rrh Effectively we'd have to agree on some kind of syntax within braces, perhaps something like: SELECT * FROM table WHERE json_field {verbatim ?} 'bar' $sql_abstract->where({ json_field => { '{verbatim ?}' => $bar } }) There's no easy way to be sure we'd avoid clashing with future SQL standards. If we go down this road we might open the door to deeper support of escape clause syntax as a mechanism for portability, as it was originally intended. But that's a potentially complex can of worms that no one has mentioned recently (except me, just now). ** Option Summary All three options appear safe enough to enable by default, i.e. I can't think of any cases where the behaviour of existing working code would change because the proposed construct wouldn't be valid to start with. No code will break! Overall I'm not very keen on vendor escape clauses, and even less so about doubling, so my preference is still to use a backslash. It's a simple and mnemonic use of the familar concept of using a backslash to 'escape' from the default meaning of a character. ** DBI Specification Change Here's a proposed DBI specification change: @@ -2422,2 +2422,7 @@ =head2 Placeholders and Bind Values +Some drivers allow you to prevent the recognition of a placeholder by placing a +single backslash character (C<\>) immediately before it. The driver will remove +the backslash character and ignore the placeholder, passing it unchanged to the +backend. If the driver supports this then L</get_info>(9000) will return true. @@ -4971,2 +4976,9 @@ =head3 C<get_info> +Values from 9000 to 9999 for get_info are officially reserved for use by Perl DBI. +Values in that range which have been assigned a meaning are defined here: + +C<9000>: true if a backslash character (C<\>) before placeholder-like text +(e.g. C<?>, C<:foo>) will prevent it being treated as a placeholder by the driver. +The backslash will be removed before the text is passed to the backend. (I'm undecided about making the get_info(9000) return value a bit-map e.g., 0x01= can escape question mark placeholders, 0x02= can escape colon placeholders, 0x04= can escape driver-specific placeholder forms. That's probably too much detail for a feature that's unlikely to be used in practice. See below.) ** Implementation The DBI has a preparse() method that's intended for drivers to use for identifying placeholders. Very few (if any?) drivers actually use that though. So implementing support for escaping placeholders would need to be done by each driver. For most driver's there no need or urgency in implementing it at all! Being able to escape placeholders is only of use if the placeholder syntax has some meaning in the SQL dialect of the database. For question mark placeholders the only database we're aware of that assigns a meaning to a question mark is PostgreSQL. For colon placeholders there are many databases that assign a meaning to colons as a prefix to name and/or numbers. Individual driver authors would, as always, be free to implement this placeholder escaping feature if and when they wish. ** Counter-proposal On Sun, Dec 21, 2014 at 03:17:39PM -0000, Greg Sabino Mullane (the tenth man) wrote: > > I'm going to play devil's advocate a bit here. There are some problems with > this approach. First, it will require that the user know if the underlying > DBD supports backslashes. Of course. This is no different from the developer requiring to know that the underlying DBD supports any other feature that the developer relies on. For most cases the familiar dependency management mechanisms work fine: use DBD::Foo '3.456'; There's also the get_info call, but I doubt that'll be needed in practice. > Which likely means that SQL::Abstract and/or DBIx::Class will need to > know as well. (Unless they expose the DBD directly to the user, which > ruins the point a bit). Not true. > So we will have both escaped and unescaped versions of SQL > floating around, subject to the whims of whether or not your particular > DBD supports it (and in which version). There's no such thing as "escaped and unescaped versions of SQL". That's a false premise. This is a really important point: the syntax being proposed *is invalid SQL*. Exactly the same approach has been used to safely extend other langauges. For example the '(?' sequence was invalid in early Perl 5 regexps which allowed Larry to adopt it to enable new features, like (?:foo) > Which leads to my counter-proposal: have SQL::Abstract accept > dollar-number placeholders. s/accept/generate/. I thought this might be workable and had looked at the SQL::Abstract source code a few times but didn't think it an attractive option. (Partly because of the complexity and performance cost, and partly because it just seems like a poor approach to the issue. Relying on the fact that some drivers support non-standard placeholder styles seems to be an inelegant fudge to me. It would be pushing complexity and cost into code that simply shouldn't need to care.) Then ribasushi, a principle maintainer of SQL::Abstract and DBIx::Class, pointed out the fatal flaw: numbered placeholders aren't directly composable. It's easy to pass around a fragment of SQL containing question mark placeholders along with a list of values to bind to those placeholders. These fragments can be directly composed into a larger SQL statement with the corresponding bind values simply pushed onto a list. The as_query() method in DBIx::Class is a good example use-case. It returns \[ $sql_with_placeholders, @binds ] and that form of 'SQL fragment' is "pervasive in the public API". E.g. see https://metacpan.org/pod/distribution/DBIx-Class/lib/DBIx/Class/Manual/Cookbook.pod#Subqueries For example: $resultset1->search_rs({ foo => { -in => $resultset2->as_query }, bar => { -not_in => $resultset3->as_query }, ... })->all; The $resultset1/2/3 variables could represent different complex multi-table joins with many conditions and many bind variables. This is a very powerful feature of DBIx::Class, and it works well. Trying to do the same with numbered placeholders (colon or dollar) would require parsing and rewriting of the SQL. Clearly non-trivial and slow. For this reason he wouldn't accept such a patch to SQL::Abstract, and I'd agree with that position. Question marks are the international standard for placeholders. The DBI should have specified a way to escape them from the start. I'm attempting to fix that now. To demonstrate the relative complexity, here's a proof-of-concept patch for DBD::Pg to implement support for escaping question mark placeholders: https://github.com/timbunce/dbdpg/commit/54358c7a7efeeaf2666c5e28c301e47624fb9615 You can see that it's just 8 lines of simple code. (Naturally I'll add tests and docs etc. to this before sending a pull request.) Tim p.s. When replying, please edit your replies to only quote the relevant parts of this (very long) email.