GitHub user rick-ibm opened a pull request:

    https://github.com/apache/spark/pull/9202

    [SPARK-10857] [SQL] Block SQL injection vulnerabilities under 
DataFrameWriter.jdbc().

    @marmbrus
    @rxin
    @joshrosen
    @srowen
    
    This patch attempts to address both 
https://issues.apache.org/jira/browse/SPARK-10857 and 
https://issues.apache.org/jira/browse/SPARK-10977. This patch is my original 
work which I license to the ASF. My ICLA is already on file as a Derby 
committer: http://people.apache.org/committer-index.html
    
    I have abandoned the approach discussed on those issues, viz., attempting 
to replace the user-supplied table name with a properly escaped and quoted 
chain of dot-separated delimited identifiers. I believe that JDBC does not 
supply enough information to attempt that substitution, particularly in the 
case of a Microsoft SQL Server running on a case-insensitive file system. An 
overview of the issues involved can be found here:  
https://github.com/ontop/ontop/wiki/Case-sensitivity-for-SQL-identifiers
    
    Instead, this patch simply verifies that the table name is a valid SQL 
identifier before inserting it into a CREATE/DROP/INSERT statement or into a 
statement which tests whether the table exists. That is, a SQL injection 
attempt will fail because the injected text does not parse as a valid SQL 
identifier.
    
    The validating code may not catch some bad names. This can happen for 
databases which don't allow embedded, escaped quotes inside quoted identifiers. 
Microsoft SQL Server and Oracle don't allow embedded quotes. For those 
databases, the bad table name will appear as an ungrammatical sequence of 
quoted identifiers. E.g.:
    
      "foo""bar"
    
    will appear as
    
      "foo" "bar"
    
    Instead of a "bad identifier" error, those databases will raise some other 
exception. I leave further polishing of this behavior to people who are experts 
on the affected databases.
    
    This patch makes the following changes:
    
    1) Introduces a new Java class, SqlIdentifierUtil. The code was cribbed 
from Derby, where it has undergone almost 2 decades of commodity testing in 
production applications. I thought that it was safer to re-use this code rather 
than translate it into Scala and run the risk of introducing translation 
errors. However, I can try my hand at translating this class into Scala if 
reviewers think that that is a better approach.
    
    The SqlIdentifierUtil class has 2 entry points:
    
    a) quoteString() - This method escapes embedded quotes when converting a 
string to a quoted identifier. JdbcDialect.quoteIdentifier() did not account 
for this subtlety. But the patch makes JdbcDialect.quoteIdentifier() call 
SqlIdentifier.quoteString() now.
    
    b) parseMultiPartSQLIdentifier() - This method takes a dot separated 
sequence of SQL identifiers like databaseName.schemaName.tableName and returns 
an array of normalized identifiers like [ DATABASENAME, SCHEMANAME, TABLENAME ].
    
    
    2) Introduces some new JdbcDialect methods:
    
    a) vetSqlIdentifier() - This method takes a table name and raises an 
exception if the name isn't a legal SQL identifier as determined by 
SqlIdentifierUtil.parseMultiPartSQLIdentifier().
    
    b) quoteChar() - This method returns the dialect-specific character which 
is used to frame quoted identifiers. This information is needed by 
SqlIdentifierUtil.parseMultiPartSQLIdentifier().
    
    
    3) Calls JdbcDialect.vetSqlIdentifier() everywhere that we plug a table 
name into a SQL statement. I have placed the calls to 
JdbcDialect.vetSqlIdentifier() as close to the statement text as possible in 
order to minimize the chance that a new code path will be introduced in the 
future which circumvents the SQL injection check.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rick-ibm/spark b_10857_sqlInjection

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9202
    
----
commit 4e167a4e9ec8385063c4f46b6b75ae369d2e6204
Author: Rick Hillegas <rhil...@us.ibm.com>
Date:   2015-10-19T20:00:32Z

    SPARK-10857: Vet table names which are inserted into SQL strings to verify 
that they are legal SQL identifiers in the underlying database.

commit 4d9d8859294f83bff172e7e1146a2024ac3b61ec
Author: Rick Hillegas <rhil...@us.ibm.com>
Date:   2015-10-21T18:46:26Z

    SPARK-10857: Add and remove some comments.

commit a5c4c698705c5ed7417b405e6a0518bd7b997336
Author: Rick Hillegas <rhil...@us.ibm.com>
Date:   2015-10-21T18:46:56Z

    SPARK-10857: Add and remove some comments.

commit b3b845de9960e003637d26fecaf8dccaa0206f59
Author: Rick Hillegas <rhil...@us.ibm.com>
Date:   2015-10-21T18:50:56Z

    DERBY-10857: Fix a scalastyle problem.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to