GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/19124

    [SPARK-21912][SQL] Creating ORC datasource table should check invalid 
column names

    ## What changes were proposed in this pull request?
    
    Currently, users meet job abortions while creating ORC data source tables 
with invalid column names. We had better prevent this by raising 
**AnalysisException** with a guide to use aliases instead like Paquet data 
source tables.
    
    **BEFORE**
    ```scala
    scala> sql("CREATE TABLE orc1 USING ORC AS SELECT 1 `a b`")
    17/09/04 13:28:21 ERROR Utils: Aborting task
    java.lang.IllegalArgumentException: Error: : expected at the position 8 of 
'struct<a b:int>' but ' ' is found.
    17/09/04 13:28:21 ERROR FileFormatWriter: Job job_20170904132821_0001 
aborted.
    17/09/04 13:28:21 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
    org.apache.spark.SparkException: Task failed while writing rows.
    ```
    
    **AFTER**
    ```scala
    scala> sql("CREATE TABLE orc1 USING ORC AS SELECT 1 `a b`")
    17/09/04 13:27:40 ERROR CreateDataSourceTableAsSelectCommand: Failed to 
write to table orc1
    org.apache.spark.sql.AnalysisException: Attribute name "a b" contains 
invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins with a new test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-21912

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19124
    
----
commit 808dfe0fcd9de2f43b33f0d1d084172b5624f2a8
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2017-09-04T20:46:15Z

    [SPARK-21912][SQL] Creating ORC datasource table should check invalid 
column names

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to