[ 
https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550289#comment-16550289
 ] 

Dilip Biswal edited comment on SPARK-24864 at 7/20/18 6:23 AM:
---------------------------------------------------------------

[~abhimadav] I don't see a problem here. The generated column name is different 
between spark and hive. Perhaps in spark 1.6, the generated column names were 
same between spark and hive i.e it starts with `_c[number]`. In this repro, 
spark by default generates the column name as "upper(name)".

{code}
scala> spark.sql("SELECT id, upper(name) FROM src1").printSchema
root
 |-- id: integer (nullable = true)
 |-- upper(name): string (nullable = true)
 {code}
 

So following would work in spark.
 
{code:java}
scala> spark.sql("CREATE VIEW vsrc1new AS SELECT id, `upper(name)` AS uname 
FROM (SELECT id, upper(name) FROM src1) vsrc1new");
 res13: org.apache.spark.sql.DataFrame = []

scala> spark.sql("select * from vsrc1new").show()
 +----+----+
|id|uname|

+----+----+
|1|TEST   |

+----+----+
{code}

In my opinion, its a good practice to give explicit aliases instead of relying 
on system generated ones especially if we r looking for portability across 
different database systems.
 
spark.sql("CREATE VIEW vsrc1new AS SELECT id, upper_name AS uname FROM (SELECT 
id, upper(name) as upper_name FROM src1) ");

cc [~smilegator] We changed the generated column names on purpose to make them 
more readable, right ?


was (Author: dkbiswal):
[~abhimadav] I don't see a problem here. The generated column name is different 
between spark and hive. Perhaps in spark 1.6, the generated column names were 
same between spark and hive i.e it starts with `_c[number]`. In this repro, 
spark by default generates the column name as "upper(name)".

{code}
scala> spark.sql("SELECT id, upper(name) FROM src1").printSchema
root
 |-- id: integer (nullable = true)
 |-- upper(name): string (nullable = true)
 {code}
 

So following would work in spark.
 
{code:java}
scala> spark.sql("CREATE VIEW vsrc1new AS SELECT id, `upper(name)` AS uname 
FROM (SELECT id, upper(name) FROM src1) vsrc1new");
 res13: org.apache.spark.sql.DataFrame = []

scala> spark.sql("select * from vsrc1new").show()
 +----+----+
|id|uname|

+----+----+
|1|TEST   |

+----+----+
{code}

cc [~smilegator] We changed the generated column names on purpose to make them 
more readable, right ?

> Cannot resolve auto-generated column ordinals in a hive view
> ------------------------------------------------------------
>
>                 Key: SPARK-24864
>                 URL: https://issues.apache.org/jira/browse/SPARK-24864
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1, 2.1.0
>            Reporter: Abhishek Madav
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Spark job reading from a hive-view fails with analysis exception when 
> resolving column ordinals which are autogenerated.
> *Exception*:
> {code:java}
> scala> spark.sql("Select * from vsrc1new").show
> org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
> input columns: [id, upper(name)]; line 1 pos 24;
> 'Project [*]
> +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
>    +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
>       +- SubqueryAlias vsrc1new
>          +- Project [id#634, upper(name#635) AS upper(name)#636]
>             +- MetastoreRelation default, src1
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
> {code}
> *Steps to reproduce:*
> 1: Create a simple table, say src
> {code:java}
> CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ','
> {code}
> 2: Create a view, say with name vsrc1new
> {code:java}
> CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, 
> upper(name) FROM src1) vsrc1new;
> {code}
> 3. Selecting data from this view in hive-cli/beeline doesn't cause any error.
> 4. Creating a dataframe using:
> {code:java}
> spark.sql("Select * from vsrc1new").show //throws error
> {code}
> The auto-generated column names for the view are not resolved. Am I possibly 
> missing some spark-sql configuration here? I tried the repro-case against 
> spark 1.6 and that worked fine. Any inputs are appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to