Anand Mohan Tumuluri created SPARK-6450:
-------------------------------------------

             Summary: Self joining query failure
                 Key: SPARK-6450
                 URL: https://issues.apache.org/jira/browse/SPARK-6450
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.0
            Reporter: Anand Mohan Tumuluri
            Priority: Critical


The below query was working fine till 1.3 commit 
9a151ce58b3e756f205c9f3ebbbf3ab0ba5b33fd.(Yes it definitely works at this 
commit although this commit is completely unrelated)

It got broken in 1.3.0 release with an AnalysisException: resolved attributes 
... missing from .... (although this list contains the fields which it reports 
missing)
{code}
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:189)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:493)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60)
        at com.sun.proxy.$Proxy17.executeStatementAsync(Unknown Source)
        at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
        at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
        at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

{code}
select Orders.Country, Orders.ProductCategory,count(1) from Orders join (select 
Orders.Country, count(1) CountryOrderCount from Orders where 
to_date(Orders.PlacedDate) > '2015-01-01' group by Orders.Country order by 
CountryOrderCount DESC LIMIT 5) Top5Countries on Top5Countries.Country = 
Orders.Country where to_date(Orders.PlacedDate) > '2015-01-01' group by 
Orders.Country,Orders.ProductCategory;
{code}

The temporary workaround is to add explicit alias for the table Orders

{code}
select o.Country, o.ProductCategory,count(1) from Orders o join (select 
r.Country, count(1) CountryOrderCount from Orders r where to_date(r.PlacedDate) 
> '2015-01-01' group by r.Country order by CountryOrderCount DESC LIMIT 5) 
Top5Countries on Top5Countries.Country = o.Country where to_date(o.PlacedDate) 
> '2015-01-01' group by o.Country,o.ProductCategory;
{code}

However this change not only affects self joins, it also seems to affect union 
queries as well, like the below query which was again working before(commit 
9a151ce) got broken

{code}
select Orders.Country,null,count(1) OrderCount from Orders group by 
Orders.Country,null
union all
select null,Orders.ProductCategory,count(1) OrderCount from Orders group by 
null, Orders.ProductCategory
{code}
also fails with a Analysis exception.
The workaround is to add different aliases for the tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to