RE: Is there any external dependencies for lag() and lead() when using data frames?

2015-08-11 Thread Benjamin Ross
Jerry,
I was able to use window functions without the hive thrift server.  HiveContext 
does not imply that you need the hive thrift server running.

Here’s what I used to test this out:
var conf = new SparkConf(true).set(spark.cassandra.connection.host, 
127.0.0.1)

val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
val df = sqlContext
  .read
  .format(org.apache.spark.sql.cassandra)
  .options(Map( table - kv, keyspace - test))
  .load()
val w = Window.orderBy(value).rowsBetween(-2, 0)


I then submitted this using spark-submit.



From: Jerry [mailto:jerry.c...@gmail.com]
Sent: Monday, August 10, 2015 10:55 PM
To: Michael Armbrust
Cc: user
Subject: Re: Is there any external dependencies for lag() and lead() when using 
data frames?

By the way, if Hive is present in the Spark install, does show up in text when 
you start the spark shell? Any commands I can run to check if it exists? I 
didn't setup the spark machine that I use, so I don't know what's present or 
absent.
Thanks,
Jerry

On Mon, Aug 10, 2015 at 2:38 PM, Jerry 
jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote:
Thanks...   looks like I now hit that bug about HiveMetaStoreClient as I now 
get the message about being unable to instantiate it. On a side note, does 
anyone know where hive-site.xml is typically located?
Thanks,
Jerry

On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust 
mich...@databricks.commailto:mich...@databricks.com wrote:
You will need to use a HiveContext for window functions to work.

On Mon, Aug 10, 2015 at 1:26 PM, Jerry 
jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote:
Hello,
Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a 
data frame and I'm trying to figure out if I just have a bad setup or if this 
is a bug. As for the exceptions I get: when using selectExpr() with a string as 
an argument, I get NoSuchElementException: key not found: lag and when using 
the select method and ...spark.sql.functions.lag I get an AnalysisException. If 
I replace lag with abs in the first case, Spark runs without exception, so none 
of the other syntax is incorrect.
As for how I'm running it; the code is written in Java with a static method 
that takes the SparkContext as an argument which is used to create a 
JavaSparkContext which then is used to create an SQLContext which loads a json 
file from the local disk and runs those queries on that data frame object. FYI: 
the java code is compiled, jared and then pointed to with -cp when starting the 
spark shell, so all I do is Test.run(sc) in shell.
Let me know what to look for to debug this problem. I'm not sure where to look 
to solve this problem.
Thanks,
Jerry





RE: Is there any external dependencies for lag() and lead() when using data frames?

2015-08-11 Thread Benjamin Ross
I forgot to mention, my setup was:

-  Spark 1.4.1 running in standalone mode

-  Datastax spark cassandra connector 1.4.0-M1

-  Cassandra DB

-  Scala version 2.10.4


From: Benjamin Ross
Sent: Tuesday, August 11, 2015 10:16 AM
To: Jerry; Michael Armbrust
Cc: user
Subject: RE: Is there any external dependencies for lag() and lead() when using 
data frames?

Jerry,
I was able to use window functions without the hive thrift server.  HiveContext 
does not imply that you need the hive thrift server running.

Here’s what I used to test this out:
var conf = new SparkConf(true).set(spark.cassandra.connection.host, 
127.0.0.1)

val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
val df = sqlContext
  .read
  .format(org.apache.spark.sql.cassandra)
  .options(Map( table - kv, keyspace - test))
  .load()
val w = Window.orderBy(value).rowsBetween(-2, 0)


I then submitted this using spark-submit.



From: Jerry [mailto:jerry.c...@gmail.com]
Sent: Monday, August 10, 2015 10:55 PM
To: Michael Armbrust
Cc: user
Subject: Re: Is there any external dependencies for lag() and lead() when using 
data frames?

By the way, if Hive is present in the Spark install, does show up in text when 
you start the spark shell? Any commands I can run to check if it exists? I 
didn't setup the spark machine that I use, so I don't know what's present or 
absent.
Thanks,
Jerry

On Mon, Aug 10, 2015 at 2:38 PM, Jerry 
jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote:
Thanks...   looks like I now hit that bug about HiveMetaStoreClient as I now 
get the message about being unable to instantiate it. On a side note, does 
anyone know where hive-site.xml is typically located?
Thanks,
Jerry

On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust 
mich...@databricks.commailto:mich...@databricks.com wrote:
You will need to use a HiveContext for window functions to work.

On Mon, Aug 10, 2015 at 1:26 PM, Jerry 
jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote:
Hello,
Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a 
data frame and I'm trying to figure out if I just have a bad setup or if this 
is a bug. As for the exceptions I get: when using selectExpr() with a string as 
an argument, I get NoSuchElementException: key not found: lag and when using 
the select method and ...spark.sql.functions.lag I get an AnalysisException. If 
I replace lag with abs in the first case, Spark runs without exception, so none 
of the other syntax is incorrect.
As for how I'm running it; the code is written in Java with a static method 
that takes the SparkContext as an argument which is used to create a 
JavaSparkContext which then is used to create an SQLContext which loads a json 
file from the local disk and runs those queries on that data frame object. FYI: 
the java code is compiled, jared and then pointed to with -cp when starting the 
spark shell, so all I do is Test.run(sc) in shell.
Let me know what to look for to debug this problem. I'm not sure where to look 
to solve this problem.
Thanks,
Jerry





Re: Is there any external dependencies for lag() and lead() when using data frames?

2015-08-10 Thread Michael Armbrust
You will need to use a HiveContext for window functions to work.

On Mon, Aug 10, 2015 at 1:26 PM, Jerry jerry.c...@gmail.com wrote:

 Hello,

 Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries
 to a data frame and I'm trying to figure out if I just have a bad setup or
 if this is a bug. As for the exceptions I get: when using selectExpr() with
 a string as an argument, I get NoSuchElementException: key not found: lag
 and when using the select method and ...spark.sql.functions.lag I get an
 AnalysisException. If I replace lag with abs in the first case, Spark runs
 without exception, so none of the other syntax is incorrect.

 As for how I'm running it; the code is written in Java with a static
 method that takes the SparkContext as an argument which is used to create a
 JavaSparkContext which then is used to create an SQLContext which loads a
 json file from the local disk and runs those queries on that data frame
 object. FYI: the java code is compiled, jared and then pointed to with -cp
 when starting the spark shell, so all I do is Test.run(sc) in shell.

 Let me know what to look for to debug this problem. I'm not sure where to
 look to solve this problem.

 Thanks,
 Jerry



Is there any external dependencies for lag() and lead() when using data frames?

2015-08-10 Thread Jerry
Hello,

Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries
to a data frame and I'm trying to figure out if I just have a bad setup or
if this is a bug. As for the exceptions I get: when using selectExpr() with
a string as an argument, I get NoSuchElementException: key not found: lag
and when using the select method and ...spark.sql.functions.lag I get an
AnalysisException. If I replace lag with abs in the first case, Spark runs
without exception, so none of the other syntax is incorrect.

As for how I'm running it; the code is written in Java with a static method
that takes the SparkContext as an argument which is used to create a
JavaSparkContext which then is used to create an SQLContext which loads a
json file from the local disk and runs those queries on that data frame
object. FYI: the java code is compiled, jared and then pointed to with -cp
when starting the spark shell, so all I do is Test.run(sc) in shell.

Let me know what to look for to debug this problem. I'm not sure where to
look to solve this problem.

Thanks,
Jerry


Re: Is there any external dependencies for lag() and lead() when using data frames?

2015-08-10 Thread Jerry
By the way, if Hive is present in the Spark install, does show up in text
when you start the spark shell? Any commands I can run to check if it
exists? I didn't setup the spark machine that I use, so I don't know what's
present or absent.

Thanks,
Jerry

On Mon, Aug 10, 2015 at 2:38 PM, Jerry jerry.c...@gmail.com wrote:

 Thanks...   looks like I now hit that bug about HiveMetaStoreClient as I
 now get the message about being unable to instantiate it. On a side note,
 does anyone know where hive-site.xml is typically located?

 Thanks,
 Jerry

 On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust mich...@databricks.com
 wrote:

 You will need to use a HiveContext for window functions to work.

 On Mon, Aug 10, 2015 at 1:26 PM, Jerry jerry.c...@gmail.com wrote:

 Hello,

 Using Apache Spark 1.4.1 I'm unable to use lag or lead when making
 queries to a data frame and I'm trying to figure out if I just have a bad
 setup or if this is a bug. As for the exceptions I get: when using
 selectExpr() with a string as an argument, I get NoSuchElementException:
 key not found: lag and when using the select method and
 ...spark.sql.functions.lag I get an AnalysisException. If I replace lag
 with abs in the first case, Spark runs without exception, so none of the
 other syntax is incorrect.

 As for how I'm running it; the code is written in Java with a static
 method that takes the SparkContext as an argument which is used to create a
 JavaSparkContext which then is used to create an SQLContext which loads a
 json file from the local disk and runs those queries on that data frame
 object. FYI: the java code is compiled, jared and then pointed to with -cp
 when starting the spark shell, so all I do is Test.run(sc) in shell.

 Let me know what to look for to debug this problem. I'm not sure where
 to look to solve this problem.

 Thanks,
 Jerry