You can try the following syntax https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
SELECT * FROM A WHERE A.a IN (SELECT foo FROM B); Regards Deenar *Think Reactive Ltd* deenar.toras...@thinkreactive.co.uk 07714140812 On 28 October 2015 at 14:37, Richard Hillegas <rhil...@us.ibm.com> wrote: > On 28 October 2015 at 14:37, Richard Hillegas <rhil...@us.ibm.com> wrote: > Hi Kishor, > > Spark doesn't currently support subqueries in the WHERE clause. However, > it looks as though someone is working on this right now: > https://issues.apache.org/jira/browse/SPARK-4226 > > Hope this helps, > Rick Hillegas > > > > Kishor Bachhav <kbach...@pivotal.io> wrote on 10/28/2015 05:52:50 AM: > > > From: Kishor Bachhav <kbach...@pivotal.io> > > To: user@spark.apache.org > > Date: 10/28/2015 05:53 AM > > Subject: nested select is not working in spark sql > > > > > Hi, > > > I am trying to execute below query in spark sql but throws exception > > > > select n_name from NATION where n_regionkey = (select r_regionkey > > from REGION where r_name='ASIA') > > > Exception: > > Exception in thread "main" java.lang.RuntimeException: [1.55] > > failure: ``)'' expected but identifier r_regionkey found > > > > select n_name from NATION where n_regionkey = (select r_regionkey > > from REGION where r_name='ASIA') > > ^ > > at scala.sys.package$.error(package.scala:27) > > at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse > > (AbstractSparkSQLParser.scala:36) > > at > org.apache.spark.sql.SnappyParserDialect.parse(snappyParsers.scala:65) > > at > org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:169) > > at > org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:169) > > at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark > > $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115) > > at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark > > $sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114) > > at > scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) > > at > scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) > > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map > > $1.apply(Parsers.scala:242) > > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map > > $1.apply(Parsers.scala:242) > > at > scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) > > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > > $1$$anonfun$apply$2.apply(Parsers.scala:254) > > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > > $1$$anonfun$apply$2.apply(Parsers.scala:254) > > at > scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) > > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > > $1.apply(Parsers.scala:254) > > at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append > > $1.apply(Parsers.scala:254) > > at > scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) > > at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply > > $14.apply(Parsers.scala:891) > > at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply > > $14.apply(Parsers.scala:891) > > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > > at > scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) > > at scala.util.parsing.combinator.PackratParsers$$anon$1.apply > > (PackratParsers.scala:110) > > at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse > > (AbstractSparkSQLParser.scala:34) > > > > > Same is working in mysql as well as memsql. > > > Expected Result is > > > > memsql> select n_name from NATION where n_regionkey = (select > > r_regionkey from REGION where r_name='ASIA'); > > +-----------+ > > | n_name | > > +-----------+ > > | INDIA | > > | INDONESIA | > > | JAPAN | > > | CHINA | > > | VIETNAM | > > +-----------+ > > 5 rows in set (0.71 sec) > > > How can I make this work in spark sql? > > > Actually above query is one simplified version of Minimum cost > > supplier query (Q2) of TPCH which has this nested select nature. I > > am working on these TPCH queries. If anybody has the modified set of > > TPCH queries for spark sql, kindly let me know. It will be very useful > for me. > > > > select > > s_acctbal, > > s_name, > > n_name, > > p_partkey, > > p_mfgr, > > s_address, > > s_phone, > > s_comment > > from > > part, > > supplier, > > partsupp, > > nation, > > region > > where > > p_partkey = ps_partkey > > and s_suppkey = ps_suppkey > > and p_size = [SIZE] > > and p_type like '%[TYPE]' > > and s_nationkey = n_nationkey > > and n_regionkey = r_regionkey > > and r_name = '[REGION]' > > and ps_supplycost = ( > > select > > min(ps_supplycost) > > from > > partsupp, supplier, > > nation, region > > where > > p_partkey = ps_partkey > > and s_suppkey = ps_suppkey > > and s_nationkey = n_nationkey > > and n_regionkey = r_regionkey > > and r_name = '[REGION]' > > ) > > order by > > s_acctbal desc, > > n_name, > > s_name, > > p_partkey; > > > > > Regards > > Kishor > >