[jira] [Commented] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view
[ https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551859#comment-16551859 ] Dilip Biswal commented on SPARK-24864: -- I agree with [~srowen] > Cannot resolve auto-generated column ordinals in a hive view > > > Key: SPARK-24864 > URL: https://issues.apache.org/jira/browse/SPARK-24864 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Abhishek Madav >Priority: Major > > Spark job reading from a hive-view fails with analysis exception when > resolving column ordinals which are autogenerated. > *Exception*: > {code:java} > scala> spark.sql("Select * from vsrc1new").show > org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given > input columns: [id, upper(name)]; line 1 pos 24; > 'Project [*] > +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new` > +- 'Project [id#634, 'vsrc1new._c1 AS uname#633] > +- SubqueryAlias vsrc1new > +- Project [id#634, upper(name#635) AS upper(name)#636] > +- MetastoreRelation default, src1 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > {code} > *Steps to reproduce:* > 1: Create a simple table, say src > {code:java} > CREATE TABLE `src1`(`id` int, `name` string) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' > {code} > 2: Create a view, say with name vsrc1new > {code:java} > CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, > upper(name) FROM src1) vsrc1new; > {code} > 3. Selecting data from this view in hive-cli/beeline doesn't cause any error. > 4. Creating a dataframe using: > {code:java} > spark.sql("Select * from vsrc1new").show //throws error > {code} > The auto-generated column names for the view are not resolved. Am I possibly > missing some spark-sql configuration here? I tried the repro-case against > spark 1.6 and that worked fine. Any inputs are appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view
[ https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551798#comment-16551798 ] Sean Owen commented on SPARK-24864: --- No compatibility is promised between 1.x and 2.x. I don't think this behavior was guaranteed to begin with. You should always specify aliases directly if you depend on their value. > Cannot resolve auto-generated column ordinals in a hive view > > > Key: SPARK-24864 > URL: https://issues.apache.org/jira/browse/SPARK-24864 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Abhishek Madav >Priority: Major > > Spark job reading from a hive-view fails with analysis exception when > resolving column ordinals which are autogenerated. > *Exception*: > {code:java} > scala> spark.sql("Select * from vsrc1new").show > org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given > input columns: [id, upper(name)]; line 1 pos 24; > 'Project [*] > +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new` > +- 'Project [id#634, 'vsrc1new._c1 AS uname#633] > +- SubqueryAlias vsrc1new > +- Project [id#634, upper(name#635) AS upper(name)#636] > +- MetastoreRelation default, src1 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > {code} > *Steps to reproduce:* > 1: Create a simple table, say src > {code:java} > CREATE TABLE `src1`(`id` int, `name` string) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' > {code} > 2: Create a view, say with name vsrc1new > {code:java} > CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, > upper(name) FROM src1) vsrc1new; > {code} > 3. Selecting data from this view in hive-cli/beeline doesn't cause any error. > 4. Creating a dataframe using: > {code:java} > spark.sql("Select * from vsrc1new").show //throws error > {code} > The auto-generated column names for the view are not resolved. Am I possibly > missing some spark-sql configuration here? I tried the repro-case against > spark 1.6 and that worked fine. Any inputs are appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view
[ https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551296#comment-16551296 ] Abhishek Madav commented on SPARK-24864: Thanks for the reply. The views are currently crated by the customer and the spark-job hasn't been able to keep up with the upgrade from 1.6 -> 2.0+ hence they feel it is a regression. Is there anything that can be done to go back to the 1.6 way of column referencing? > Cannot resolve auto-generated column ordinals in a hive view > > > Key: SPARK-24864 > URL: https://issues.apache.org/jira/browse/SPARK-24864 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Abhishek Madav >Priority: Major > > Spark job reading from a hive-view fails with analysis exception when > resolving column ordinals which are autogenerated. > *Exception*: > {code:java} > scala> spark.sql("Select * from vsrc1new").show > org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given > input columns: [id, upper(name)]; line 1 pos 24; > 'Project [*] > +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new` > +- 'Project [id#634, 'vsrc1new._c1 AS uname#633] > +- SubqueryAlias vsrc1new > +- Project [id#634, upper(name#635) AS upper(name)#636] > +- MetastoreRelation default, src1 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > {code} > *Steps to reproduce:* > 1: Create a simple table, say src > {code:java} > CREATE TABLE `src1`(`id` int, `name` string) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' > {code} > 2: Create a view, say with name vsrc1new > {code:java} > CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, > upper(name) FROM src1) vsrc1new; > {code} > 3. Selecting data from this view in hive-cli/beeline doesn't cause any error. > 4. Creating a dataframe using: > {code:java} > spark.sql("Select * from vsrc1new").show //throws error > {code} > The auto-generated column names for the view are not resolved. Am I possibly > missing some spark-sql configuration here? I tried the repro-case against > spark 1.6 and that worked fine. Any inputs are appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view
[ https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551123#comment-16551123 ] Xiao Li commented on SPARK-24864: - Yeah, our generated alias names are different from the ones generated by Hive. Please explicitly specify the alias names in your query. > Cannot resolve auto-generated column ordinals in a hive view > > > Key: SPARK-24864 > URL: https://issues.apache.org/jira/browse/SPARK-24864 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Abhishek Madav >Priority: Major > > Spark job reading from a hive-view fails with analysis exception when > resolving column ordinals which are autogenerated. > *Exception*: > {code:java} > scala> spark.sql("Select * from vsrc1new").show > org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given > input columns: [id, upper(name)]; line 1 pos 24; > 'Project [*] > +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new` > +- 'Project [id#634, 'vsrc1new._c1 AS uname#633] > +- SubqueryAlias vsrc1new > +- Project [id#634, upper(name#635) AS upper(name)#636] > +- MetastoreRelation default, src1 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > {code} > *Steps to reproduce:* > 1: Create a simple table, say src > {code:java} > CREATE TABLE `src1`(`id` int, `name` string) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' > {code} > 2: Create a view, say with name vsrc1new > {code:java} > CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, > upper(name) FROM src1) vsrc1new; > {code} > 3. Selecting data from this view in hive-cli/beeline doesn't cause any error. > 4. Creating a dataframe using: > {code:java} > spark.sql("Select * from vsrc1new").show //throws error > {code} > The auto-generated column names for the view are not resolved. Am I possibly > missing some spark-sql configuration here? I tried the repro-case against > spark 1.6 and that worked fine. Any inputs are appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view
[ https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550289#comment-16550289 ] Dilip Biswal commented on SPARK-24864: -- [~abhimadav] I don't see a problem here. The generated column name is different between spark and hive. Perhaps in spark 1.6, the generated column names were same between spark and hive i.e it starts with `_c[number]`. In this repro, spark by default generates the column name as "upper(name)". {code} scala> spark.sql("SELECT id, upper(name) FROM src1").printSchema root |-- id: integer (nullable = true) |-- upper(name): string (nullable = true) {code} So following would work in spark. {code:java} scala> spark.sql("CREATE VIEW vsrc1new AS SELECT id, `upper(name)` AS uname FROM (SELECT id, upper(name) FROM src1) vsrc1new"); res13: org.apache.spark.sql.DataFrame = [] scala> spark.sql("select * from vsrc1new").show() +++ |id|uname| +++ |1|TEST | +++ {code} cc [~smilegator] We changed the generated column names on purpose to make them more readable, right ? > Cannot resolve auto-generated column ordinals in a hive view > > > Key: SPARK-24864 > URL: https://issues.apache.org/jira/browse/SPARK-24864 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0 >Reporter: Abhishek Madav >Priority: Major > Fix For: 2.4.0 > > > Spark job reading from a hive-view fails with analysis exception when > resolving column ordinals which are autogenerated. > *Exception*: > {code:java} > scala> spark.sql("Select * from vsrc1new").show > org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given > input columns: [id, upper(name)]; line 1 pos 24; > 'Project [*] > +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new` > +- 'Project [id#634, 'vsrc1new._c1 AS uname#633] > +- SubqueryAlias vsrc1new > +- Project [id#634, upper(name#635) AS upper(name)#636] > +- MetastoreRelation default, src1 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > {code} > *Steps to reproduce:* > 1: Create a simple table, say src > {code:java} > CREATE TABLE `src1`(`id` int, `name` string) ROW FORMAT DELIMITED FIELDS > TERMINATED BY ',' > {code} > 2: Create a view, say with name vsrc1new > {code:java} > CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, > upper(name) FROM src1) vsrc1new; > {code} > 3. Selecting data from this view in hive-cli/beeline doesn't cause any error. > 4. Creating a dataframe using: > {code:java} > spark.sql("Select * from vsrc1new").show //throws error > {code} > The auto-generated column names for the view are not resolved. Am I possibly > missing some spark-sql configuration here? I tried the repro-case against > spark 1.6 and that worked fine. Any inputs are appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org