[jira] [Commented] (SPARK-6649) DataFrame created through SQLContext.jdbc() failed if columns table must be quoted

Frederick Reiss (JIRA) Mon, 04 May 2015 18:34:29 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527738#comment-14527738
 ]


Frederick Reiss commented on SPARK-6649:
----------------------------------------

I was able to reproduce this problem on Spark 1.3.1. The Spark SQL lexical 
analyzer treats anything enclosed in double quotes as a string. The Spark SQL 
parser only allows strings to be treated as literals, not identifiers.

On the spark-sql command line:
{noformat}
spark-sql> select "hello" as hello, 'world' as world;
hello   world
Time taken: 0.125 seconds, Fetched 1 row(s)

spark-sql> select "hello" as "hello", 'world' as "world";
15/05/04 18:03:05 ERROR SparkSQLDriver: Failed in [select "hello" as "hello", 
'world' as "world"]
org.apache.spark.sql.AnalysisException: cannot recognize input near 'as' 
'"hello"' ',' in selection target; line 1 pos 18
        at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:254)
        [many lines of stack trace]
{noformat}

The same thing happens from the Spark Scala shell:
{noformat}
scala> val df = sqlContext.sql("select \"hello\" as hello, 'world' as world")
df: org.apache.spark.sql.DataFrame = [hello: string, world: string]

scala> val df2 = sqlContext.sql("select \"hello\" as \"hello\", 'world' as 
\"world\"")
org.apache.spark.sql.AnalysisException: cannot recognize input near 'as' 
'"hello"' ',' in selection target; line 1 pos 18
        at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:254)
        at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
        [many lines of stack trace]
{noformat}

This behavior is not consistent with the SQL standard, though I suppose it is 
somewhat consistent with MySQL's default behavior.

According to the grammar in the SQL-92 document 
([http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt], strings should be 
delimited by single quotes:
{noformat}
<character string literal> ::=
              [ <introducer><character set specification> ]
              <quote> [ <character representation>... ] <quote>
                [ { <separator>... <quote> [ <character representation>... ] 
<quote> }... ]
...
<national character string literal> ::=
              N <quote> [ <character representation>... ] <quote>
                [ { <separator>... <quote> [ <character representation>... ] 
<quote> }... ]

<bit string literal> ::=
              B <quote> [ <bit>... ] <quote>
                [ { <separator>... <quote> [ <bit>... ] <quote> }... ]

<hex string literal> ::=
              X <quote> [ <hexit>... ] <quote>
                [ { <separator>... <quote> [ <hexit>... ] <quote> }... ]
...
<quote> ::= '
{noformat}
and identifiers *may* be delimited with double quotes:
{noformat}
<delimited identifier> ::=
              <double quote> <delimited identifier body> <double quote>
...
<double quote> ::= "
{noformat}

Thoughts? Are there any pull requests in flight that fix this problem already?

> DataFrame created through SQLContext.jdbc() failed if columns table must be 
> quoted
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-6649
>                 URL: https://issues.apache.org/jira/browse/SPARK-6649
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Frédéric Blanc
>            Priority: Minor
>
> If I want to import the content a table from oracle, that contains a column 
> with name COMMENT (a reserved keyword), I cannot use a DataFrame that map all 
> the columns of this table.
> {code:title=ddl.sql|borderStyle=solid}
> CREATE TABLE TEST_TABLE (
>     "COMMENT" VARCHAR2(10)
> );
> {code}
> {code:title=test.java|borderStyle=solid}
> SQLContext sqlContext = ...
> DataFrame df = sqlContext.jdbc(databaseURL, "TEST_TABLE");
> df.rdd();   // => failed if the table contains a column with a reserved 
> keyword
> {code}
> The same problem can be encounter if reserved keyword are used on table name.
> The JDBCRDD scala class could be improved, if the columnList initializer 
> append the double-quote for each column. (line : 225)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6649) DataFrame created through SQLContext.jdbc() failed if columns table must be quoted

Reply via email to