Re: How To Implement More Than One Subquery in Scala/Spark

2014-10-13 Thread arthur.hk.c...@gmail.com
Hi, Thank you so much! By the way, what is the DATEADD function in Scala/Spark? or how to implement DATEADD(MONTH, 3, '2013-07-01')” and DATEADD(YEAR, 1, '2014-01-01')” in Spark or Hive? Regards Arthur On 12 Oct, 2014, at 12:03 pm, Ilya Ganelin ilgan...@gmail.com wrote: Because of how

Re: How To Implement More Than One Subquery in Scala/Spark

2014-10-13 Thread Yin Huai
Question 1: Please check http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#hive-tables. Question 2: One workaround is to re-write it. You can use LEFT SEMI JOIN to implement the subquery with EXISTS and use LEFT OUTER JOIN + IS NULL to implement the subquery with NOT EXISTS. SELECT

How To Implement More Than One Subquery in Scala/Spark

2014-10-11 Thread arthur.hk.c...@gmail.com
Hi, My Spark version is v1.1.0 and Hive is 0.12.0, I need to use more than 1 subquery in my Spark SQL, below are my sample table structures and a SQL that contains more than 1 subquery. Question 1: How to load a HIVE table into Scala/Spark? Question 2: How to implement a

Re: How To Implement More Than One Subquery in Scala/Spark

2014-10-11 Thread Ilya Ganelin
Because of how closures work in Scala, there is no support for nested map/rdd-based operations. Specifically, if you have Context a { Context b { } } Operations within context b, when distributed across nodes, will no longer have visibility of variables specific to context a because