RE: Is there any external dependencies for lag() and lead() when using data frames?

Benjamin Ross Tue, 11 Aug 2015 07:22:45 -0700

I forgot to mention, my setup was:

-          Spark 1.4.1 running in standalone mode


-          Datastax spark cassandra connector 1.4.0-M1

-          Cassandra DB

-          Scala version 2.10.4


From: Benjamin Ross
Sent: Tuesday, August 11, 2015 10:16 AM
To: Jerry; Michael Armbrust
Cc: user
Subject: RE: Is there any external dependencies for lag() and lead() when using 
data frames?

Jerry,
I was able to use window functions without the hive thrift server.  HiveContext 
does not imply that you need the hive thrift server running.

Here’s what I used to test this out:
    var conf = new SparkConf(true).set("spark.cassandra.connection.host", 
"127.0.0.1")

    val sc = new SparkContext(conf)
    val sqlContext = new HiveContext(sc)
    val df = sqlContext
      .read
      .format("org.apache.spark.sql.cassandra")
      .options(Map( "table" -> "kv", "keyspace" -> "test"))
      .load()
    val w = Window.orderBy("value").rowsBetween(-2, 0)


I then submitted this using spark-submit.



From: Jerry [mailto:jerry.c...@gmail.com]
Sent: Monday, August 10, 2015 10:55 PM
To: Michael Armbrust
Cc: user
Subject: Re: Is there any external dependencies for lag() and lead() when using 
data frames?

By the way, if Hive is present in the Spark install, does show up in text when 
you start the spark shell? Any commands I can run to check if it exists? I 
didn't setup the spark machine that I use, so I don't know what's present or 
absent.
Thanks,
        Jerry

On Mon, Aug 10, 2015 at 2:38 PM, Jerry 
<jerry.c...@gmail.com<mailto:jerry.c...@gmail.com>> wrote:
Thanks...   looks like I now hit that bug about HiveMetaStoreClient as I now 
get the message about being unable to instantiate it. On a side note, does 
anyone know where hive-site.xml is typically located?
Thanks,
        Jerry

On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
You will need to use a HiveContext for window functions to work.

On Mon, Aug 10, 2015 at 1:26 PM, Jerry 
<jerry.c...@gmail.com<mailto:jerry.c...@gmail.com>> wrote:
Hello,
Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a 
data frame and I'm trying to figure out if I just have a bad setup or if this 
is a bug. As for the exceptions I get: when using selectExpr() with a string as 
an argument, I get "NoSuchElementException: key not found: lag" and when using 
the select method and ...spark.sql.functions.lag I get an AnalysisException. If 
I replace lag with abs in the first case, Spark runs without exception, so none 
of the other syntax is incorrect.
As for how I'm running it; the code is written in Java with a static method 
that takes the SparkContext as an argument which is used to create a 
JavaSparkContext which then is used to create an SQLContext which loads a json 
file from the local disk and runs those queries on that data frame object. FYI: 
the java code is compiled, jared and then pointed to with -cp when starting the 
spark shell, so all I do is "Test.run(sc)" in shell.
Let me know what to look for to debug this problem. I'm not sure where to look 
to solve this problem.
Thanks,
        Jerry

RE: Is there any external dependencies for lag() and lead() when using data frames?

Reply via email to