RE: Is there any external dependencies for lag() and lead() when using data frames?
Jerry, I was able to use window functions without the hive thrift server. HiveContext does not imply that you need the hive thrift server running. Here’s what I used to test this out: var conf = new SparkConf(true).set(spark.cassandra.connection.host, 127.0.0.1) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) val df = sqlContext .read .format(org.apache.spark.sql.cassandra) .options(Map( table - kv, keyspace - test)) .load() val w = Window.orderBy(value).rowsBetween(-2, 0) I then submitted this using spark-submit. From: Jerry [mailto:jerry.c...@gmail.com] Sent: Monday, August 10, 2015 10:55 PM To: Michael Armbrust Cc: user Subject: Re: Is there any external dependencies for lag() and lead() when using data frames? By the way, if Hive is present in the Spark install, does show up in text when you start the spark shell? Any commands I can run to check if it exists? I didn't setup the spark machine that I use, so I don't know what's present or absent. Thanks, Jerry On Mon, Aug 10, 2015 at 2:38 PM, Jerry jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote: Thanks... looks like I now hit that bug about HiveMetaStoreClient as I now get the message about being unable to instantiate it. On a side note, does anyone know where hive-site.xml is typically located? Thanks, Jerry On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust mich...@databricks.commailto:mich...@databricks.com wrote: You will need to use a HiveContext for window functions to work. On Mon, Aug 10, 2015 at 1:26 PM, Jerry jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote: Hello, Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a data frame and I'm trying to figure out if I just have a bad setup or if this is a bug. As for the exceptions I get: when using selectExpr() with a string as an argument, I get NoSuchElementException: key not found: lag and when using the select method and ...spark.sql.functions.lag I get an AnalysisException. If I replace lag with abs in the first case, Spark runs without exception, so none of the other syntax is incorrect. As for how I'm running it; the code is written in Java with a static method that takes the SparkContext as an argument which is used to create a JavaSparkContext which then is used to create an SQLContext which loads a json file from the local disk and runs those queries on that data frame object. FYI: the java code is compiled, jared and then pointed to with -cp when starting the spark shell, so all I do is Test.run(sc) in shell. Let me know what to look for to debug this problem. I'm not sure where to look to solve this problem. Thanks, Jerry
RE: Is there any external dependencies for lag() and lead() when using data frames?
I forgot to mention, my setup was: - Spark 1.4.1 running in standalone mode - Datastax spark cassandra connector 1.4.0-M1 - Cassandra DB - Scala version 2.10.4 From: Benjamin Ross Sent: Tuesday, August 11, 2015 10:16 AM To: Jerry; Michael Armbrust Cc: user Subject: RE: Is there any external dependencies for lag() and lead() when using data frames? Jerry, I was able to use window functions without the hive thrift server. HiveContext does not imply that you need the hive thrift server running. Here’s what I used to test this out: var conf = new SparkConf(true).set(spark.cassandra.connection.host, 127.0.0.1) val sc = new SparkContext(conf) val sqlContext = new HiveContext(sc) val df = sqlContext .read .format(org.apache.spark.sql.cassandra) .options(Map( table - kv, keyspace - test)) .load() val w = Window.orderBy(value).rowsBetween(-2, 0) I then submitted this using spark-submit. From: Jerry [mailto:jerry.c...@gmail.com] Sent: Monday, August 10, 2015 10:55 PM To: Michael Armbrust Cc: user Subject: Re: Is there any external dependencies for lag() and lead() when using data frames? By the way, if Hive is present in the Spark install, does show up in text when you start the spark shell? Any commands I can run to check if it exists? I didn't setup the spark machine that I use, so I don't know what's present or absent. Thanks, Jerry On Mon, Aug 10, 2015 at 2:38 PM, Jerry jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote: Thanks... looks like I now hit that bug about HiveMetaStoreClient as I now get the message about being unable to instantiate it. On a side note, does anyone know where hive-site.xml is typically located? Thanks, Jerry On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust mich...@databricks.commailto:mich...@databricks.com wrote: You will need to use a HiveContext for window functions to work. On Mon, Aug 10, 2015 at 1:26 PM, Jerry jerry.c...@gmail.commailto:jerry.c...@gmail.com wrote: Hello, Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a data frame and I'm trying to figure out if I just have a bad setup or if this is a bug. As for the exceptions I get: when using selectExpr() with a string as an argument, I get NoSuchElementException: key not found: lag and when using the select method and ...spark.sql.functions.lag I get an AnalysisException. If I replace lag with abs in the first case, Spark runs without exception, so none of the other syntax is incorrect. As for how I'm running it; the code is written in Java with a static method that takes the SparkContext as an argument which is used to create a JavaSparkContext which then is used to create an SQLContext which loads a json file from the local disk and runs those queries on that data frame object. FYI: the java code is compiled, jared and then pointed to with -cp when starting the spark shell, so all I do is Test.run(sc) in shell. Let me know what to look for to debug this problem. I'm not sure where to look to solve this problem. Thanks, Jerry
Re: Is there any external dependencies for lag() and lead() when using data frames?
You will need to use a HiveContext for window functions to work. On Mon, Aug 10, 2015 at 1:26 PM, Jerry jerry.c...@gmail.com wrote: Hello, Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a data frame and I'm trying to figure out if I just have a bad setup or if this is a bug. As for the exceptions I get: when using selectExpr() with a string as an argument, I get NoSuchElementException: key not found: lag and when using the select method and ...spark.sql.functions.lag I get an AnalysisException. If I replace lag with abs in the first case, Spark runs without exception, so none of the other syntax is incorrect. As for how I'm running it; the code is written in Java with a static method that takes the SparkContext as an argument which is used to create a JavaSparkContext which then is used to create an SQLContext which loads a json file from the local disk and runs those queries on that data frame object. FYI: the java code is compiled, jared and then pointed to with -cp when starting the spark shell, so all I do is Test.run(sc) in shell. Let me know what to look for to debug this problem. I'm not sure where to look to solve this problem. Thanks, Jerry
Is there any external dependencies for lag() and lead() when using data frames?
Hello, Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a data frame and I'm trying to figure out if I just have a bad setup or if this is a bug. As for the exceptions I get: when using selectExpr() with a string as an argument, I get NoSuchElementException: key not found: lag and when using the select method and ...spark.sql.functions.lag I get an AnalysisException. If I replace lag with abs in the first case, Spark runs without exception, so none of the other syntax is incorrect. As for how I'm running it; the code is written in Java with a static method that takes the SparkContext as an argument which is used to create a JavaSparkContext which then is used to create an SQLContext which loads a json file from the local disk and runs those queries on that data frame object. FYI: the java code is compiled, jared and then pointed to with -cp when starting the spark shell, so all I do is Test.run(sc) in shell. Let me know what to look for to debug this problem. I'm not sure where to look to solve this problem. Thanks, Jerry
Re: Is there any external dependencies for lag() and lead() when using data frames?
By the way, if Hive is present in the Spark install, does show up in text when you start the spark shell? Any commands I can run to check if it exists? I didn't setup the spark machine that I use, so I don't know what's present or absent. Thanks, Jerry On Mon, Aug 10, 2015 at 2:38 PM, Jerry jerry.c...@gmail.com wrote: Thanks... looks like I now hit that bug about HiveMetaStoreClient as I now get the message about being unable to instantiate it. On a side note, does anyone know where hive-site.xml is typically located? Thanks, Jerry On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust mich...@databricks.com wrote: You will need to use a HiveContext for window functions to work. On Mon, Aug 10, 2015 at 1:26 PM, Jerry jerry.c...@gmail.com wrote: Hello, Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a data frame and I'm trying to figure out if I just have a bad setup or if this is a bug. As for the exceptions I get: when using selectExpr() with a string as an argument, I get NoSuchElementException: key not found: lag and when using the select method and ...spark.sql.functions.lag I get an AnalysisException. If I replace lag with abs in the first case, Spark runs without exception, so none of the other syntax is incorrect. As for how I'm running it; the code is written in Java with a static method that takes the SparkContext as an argument which is used to create a JavaSparkContext which then is used to create an SQLContext which loads a json file from the local disk and runs those queries on that data frame object. FYI: the java code is compiled, jared and then pointed to with -cp when starting the spark shell, so all I do is Test.run(sc) in shell. Let me know what to look for to debug this problem. I'm not sure where to look to solve this problem. Thanks, Jerry