[ 
https://issues.apache.org/jira/browse/SPARK-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-6450:
------------------------------
    Summary: MetastoreRelation.equals doesn't compare output attributes  (was: 
f)

> MetastoreRelation.equals doesn't compare output attributes
> ----------------------------------------------------------
>
>                 Key: SPARK-6450
>                 URL: https://issues.apache.org/jira/browse/SPARK-6450
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Anand Mohan Tumuluri
>            Assignee: Michael Armbrust
>            Priority: Blocker
>
> The below query was working fine till 1.3 commit 
> 9a151ce58b3e756f205c9f3ebbbf3ab0ba5b33fd.(Yes it definitely works at this 
> commit although this commit is completely unrelated)
> It got broken in 1.3.0 release with an AnalysisException: resolved attributes 
> ... missing from .... (although this list contains the fields which it 
> reports missing)
> {code}
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:189)
>       at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
>       at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
>       at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
>       at com.sun.proxy.$Proxy17.executeStatementAsync(Unknown Source)
>       at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
>       at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
>       at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
>       at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
>       at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>       at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>       at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
>       at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> {code}
> select Orders.Country, Orders.ProductCategory,count(1) from Orders join 
> (select Orders.Country, count(1) CountryOrderCount from Orders where 
> to_date(Orders.PlacedDate) > '2015-01-01' group by Orders.Country order by 
> CountryOrderCount DESC LIMIT 5) Top5Countries on Top5Countries.Country = 
> Orders.Country where to_date(Orders.PlacedDate) > '2015-01-01' group by 
> Orders.Country,Orders.ProductCategory;
> {code}
> The temporary workaround is to add explicit alias for the table Orders
> {code}
> select o.Country, o.ProductCategory,count(1) from Orders o join (select 
> r.Country, count(1) CountryOrderCount from Orders r where 
> to_date(r.PlacedDate) > '2015-01-01' group by r.Country order by 
> CountryOrderCount DESC LIMIT 5) Top5Countries on Top5Countries.Country = 
> o.Country where to_date(o.PlacedDate) > '2015-01-01' group by 
> o.Country,o.ProductCategory;
> {code}
> However this change not only affects self joins, it also seems to affect 
> union queries as well, like the below query which was again working 
> before(commit 9a151ce) got broken
> {code}
> select Orders.Country,null,count(1) OrderCount from Orders group by 
> Orders.Country,null
> union all
> select null,Orders.ProductCategory,count(1) OrderCount from Orders group by 
> null, Orders.ProductCategory
> {code}
> also fails with a Analysis exception.
> The workaround is to add different aliases for the tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to