spark git commit: [SPARK-15396][SQL][DOC] It can't connect hive metastore database

rxin Sat, 21 May 2016 23:12:51 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 fd7e83119 -> da5d2300e



[SPARK-15396][SQL][DOC] It can't connect hive metastore database

#### What changes were proposed in this pull request?
The `hive.metastore.warehouse.dir` property in hive-site.xml is deprecated 
since Spark 2.0.0. Users might not be able to connect to the existing metastore 
if they do not use the new conf parameter `spark.sql.warehouse.dir`.

This PR is to update the document and example for explaining the latest changes 
in the configuration of default location of database.

Below is the screenshot of the latest generated docs:

<img width="681" alt="screenshot 2016-05-20 08 38 10" 
src="https://cloud.githubusercontent.com/assets/11567269/15433296/a05c4ace-1e66-11e6-8d2b-73682b32e9c2.png";>

<img width="789" alt="screenshot 2016-05-20 08 53 26" 
src="https://cloud.githubusercontent.com/assets/11567269/15433734/645dc42e-1e68-11e6-9476-effc9f8721bb.png";>

<img width="789" alt="screenshot 2016-05-20 08 53 37" 
src="https://cloud.githubusercontent.com/assets/11567269/15433738/68569f92-1e68-11e6-83d3-ef5bb221a8d8.png";>

No change is made in the R's example.

<img width="860" alt="screenshot 2016-05-20 08 54 38" 
src="https://cloud.githubusercontent.com/assets/11567269/15433779/965b8312-1e68-11e6-8bc4-53c88ceacde2.png";>

#### How was this patch tested?
N/A

Author: gatorsmile <gatorsm...@gmail.com>

Closes #13225 from gatorsmile/document.

(cherry picked from commit 6cb8f836da197eec17d33e4a547340c15e59d091)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/da5d2300
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/da5d2300
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/da5d2300

Branch: refs/heads/branch-2.0
Commit: da5d2300edec800377e6f0fc3a6f066d67638d05
Parents: fd7e831
Author: gatorsmile <gatorsm...@gmail.com>
Authored: Sat May 21 23:12:27 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Sat May 21 23:12:32 2016 -0700

----------------------------------------------------------------------
 docs/sql-programming-guide.md                   | 72 ++++++++++++--------
 .../spark/examples/sql/hive/HiveFromSpark.scala | 11 +--
 2 files changed, 50 insertions(+), 33 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/da5d2300/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index a9e1f9d..940c1d7 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -1663,43 +1663,50 @@ Configuration of Hive is done by placing your 
`hive-site.xml`, `core-site.xml` (
 
 <div data-lang="scala"  markdown="1">
 
-When working with Hive one must construct a `HiveContext`, which inherits from 
`SQLContext`, and
-adds support for finding tables in the MetaStore and writing queries using 
HiveQL. Users who do
-not have an existing Hive deployment can still create a `HiveContext`. When 
not configured by the
-hive-site.xml, the context automatically creates `metastore_db` in the current 
directory and
-creates `warehouse` directory indicated by HiveConf, which defaults to 
`/user/hive/warehouse`.
-Note that you may need to grant write privilege on `/user/hive/warehouse` to 
the user who starts
-the spark application.
+When working with Hive, one must instantiate `SparkSession` with Hive support, 
including
+connectivity to a persistent Hive metastore, support for Hive serdes, and Hive 
user-defined functions.
+Users who do not have an existing Hive deployment can still enable Hive 
support. When not configured
+by the `hive-site.xml`, the context automatically creates `metastore_db` in 
the current directory and
+creates a directory configured by `spark.sql.warehouse.dir`, which defaults to 
the directory
+`spark-warehouse` in the current directory that the spark application is 
started. Note that 
+the `hive.metastore.warehouse.dir` property in `hive-site.xml` is deprecated 
since Spark 2.0.0.
+Instead, use `spark.sql.warehouse.dir` to specify the default location of 
database in warehouse.
+You may need to grant write privilege to the user who starts the spark 
application.
 
 {% highlight scala %}
-// sc is an existing SparkContext.
-val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
+// warehouse_location points to the default location for managed databases and 
tables
+val conf = new 
SparkConf().setAppName("HiveFromSpark").set("spark.sql.warehouse.dir", 
warehouse_location)
+val spark = SparkSession.builder.config(conf).enableHiveSupport().getOrCreate()
 
-sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
-sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src")
+spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
+spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO 
TABLE src")
 
 // Queries are expressed in HiveQL
-sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)
+spark.sql("FROM src SELECT key, value").collect().foreach(println)
 {% endhighlight %}
 
 </div>
 
 <div data-lang="java"  markdown="1">
 
-When working with Hive one must construct a `HiveContext`, which inherits from 
`SQLContext`, and
-adds support for finding tables in the MetaStore and writing queries using 
HiveQL. In addition to
-the `sql` method a `HiveContext` also provides an `hql` method, which allows 
queries to be
-expressed in HiveQL.
+When working with Hive, one must instantiate `SparkSession` with Hive support, 
including
+connectivity to a persistent Hive metastore, support for Hive serdes, and Hive 
user-defined functions.
+Users who do not have an existing Hive deployment can still enable Hive 
support. When not configured
+by the `hive-site.xml`, the context automatically creates `metastore_db` in 
the current directory and
+creates a directory configured by `spark.sql.warehouse.dir`, which defaults to 
the directory
+`spark-warehouse` in the current directory that the spark application is 
started. Note that 
+the `hive.metastore.warehouse.dir` property in `hive-site.xml` is deprecated 
since Spark 2.0.0.
+Instead, use `spark.sql.warehouse.dir` to specify the default location of 
database in warehouse.
+You may need to grant write privilege to the user who starts the spark 
application.
 
 {% highlight java %}
-// sc is an existing JavaSparkContext.
-HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc);
+SparkSession spark = 
SparkSession.builder().appName("JavaSparkSQL").getOrCreate();
 
-sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");
-sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src");
+spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");
+spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO 
TABLE src");
 
 // Queries are expressed in HiveQL.
-Row[] results = sqlContext.sql("FROM src SELECT key, value").collect();
+Row[] results = spark.sql("FROM src SELECT key, value").collect();
 
 {% endhighlight %}
 
@@ -1707,18 +1714,25 @@ Row[] results = sqlContext.sql("FROM src SELECT key, 
value").collect();
 
 <div data-lang="python"  markdown="1">
 
-When working with Hive one must construct a `HiveContext`, which inherits from 
`SQLContext`, and
-adds support for finding tables in the MetaStore and writing queries using 
HiveQL.
+When working with Hive, one must instantiate `SparkSession` with Hive support, 
including
+connectivity to a persistent Hive metastore, support for Hive serdes, and Hive 
user-defined functions.
+Users who do not have an existing Hive deployment can still enable Hive 
support. When not configured
+by the `hive-site.xml`, the context automatically creates `metastore_db` in 
the current directory and
+creates a directory configured by `spark.sql.warehouse.dir`, which defaults to 
the directory
+`spark-warehouse` in the current directory that the spark application is 
started. Note that 
+the `hive.metastore.warehouse.dir` property in `hive-site.xml` is deprecated 
since Spark 2.0.0.
+Instead, use `spark.sql.warehouse.dir` to specify the default location of 
database in warehouse.
+You may need to grant write privilege to the user who starts the spark 
application.
+
 {% highlight python %}
-# sc is an existing SparkContext.
-from pyspark.sql import HiveContext
-sqlContext = HiveContext(sc)
+from pyspark.sql import SparkSession
+spark = SparkSession.builder.enableHiveSupport().getOrCreate()
 
-sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
-sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src")
+spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
+spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO 
TABLE src")
 
 # Queries can be expressed in HiveQL.
-results = sqlContext.sql("FROM src SELECT key, value").collect()
+results = spark.sql("FROM src SELECT key, value").collect()
 
 {% endhighlight %}
 

http://git-wip-us.apache.org/repos/asf/spark/blob/da5d2300/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala
----------------------------------------------------------------------
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala
 
b/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala
index 59bdfa0..d3bb7e4 100644
--- 
a/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala
+++ 
b/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala
@@ -37,10 +37,13 @@ object HiveFromSpark {
   def main(args: Array[String]) {
     val sparkConf = new SparkConf().setAppName("HiveFromSpark")
 
-    // A hive context adds support for finding tables in the MetaStore and 
writing queries
-    // using HiveQL. Users who do not have an existing Hive deployment can 
still create a
-    // HiveContext. When not configured by the hive-site.xml, the context 
automatically
-    // creates metastore_db and warehouse in the current directory.
+    // When working with Hive, one must instantiate `SparkSession` with Hive 
support, including
+    // connectivity to a persistent Hive metastore, support for Hive serdes, 
and Hive user-defined
+    // functions. Users who do not have an existing Hive deployment can still 
enable Hive support.
+    // When not configured by the hive-site.xml, the context automatically 
creates `metastore_db`
+    // in the current directory and creates a directory configured by 
`spark.sql.warehouse.dir`,
+    // which defaults to the directory `spark-warehouse` in the current 
directory that the spark
+    // application is started.
     val spark = SparkSession.builder
       .config(sparkConf)
       .enableHiveSupport()


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-15396][SQL][DOC] It can't connect hive metastore database

Reply via email to