subject:"\[GitHub\] spark pull request\: \[SPARK\-6908\] \[SQL\] Use isolated Hive client"

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-06-04 Thread dougb

Github user dougb commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-108910838
  
FYI, this change broke getting a DF via 
`sqlContext.table("database.table")` 
which worked from `1.3.0` up until this pr.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-06-01 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-107733461
  
@jeanlyn I think you are hitting 
https://issues.apache.org/jira/browse/SPARK-8020. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-06-01 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-107606325
  
@jeanlyn The warning message related `get_functions` is expected if you are 
using a Hive 0.13.1 metastore client to connect to a Hive 0.12.0 metastore 
server because `get_functions` was not in Hive 0.12.0. Regarding using Hive 
0.12 metastore client, can you post the full stack trace? Also, we can move our 
discussion to user mailing list. So, people who may have the same question can 
see our thread. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-06-01 Thread jeanlyn

Github user jeanlyn commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-107459171
  
I can use the build in class(0.13.1) to connect `0.12.0` metastore 
correctly except some warn and error which does not effect running
```
5/06/01 21:20:09 WARN metastore.RetryingMetaStoreClient: MetaStoreClient 
lost connection. Attempting to reconnect.
org.apache.thrift.TApplicationException: Invalid method name: 
'get_functions'
   at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
   at 
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
   at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_functions(ThriftHiveMetastore.java:2886)
   at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_functions(ThriftHiveMetastore.java:2872)
   at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getFunctions(HiveMetaStoreClient.java:1727)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
   at $Proxy10.getFunctions(Unknown Source)
   at 
org.apache.hadoop.hive.ql.metadata.Hive.getFunctions(Hive.java:2670)
   at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionNames(FunctionRegistry.java:674)
   at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionNames(FunctionRegistry.java:662)
   at 
org.apache.hadoop.hive.cli.CliDriver.getCommandCompletor(CliDriver.java:540)
   at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:174)
   at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
   at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/06/01 21:20:10 INFO hive.metastore: Trying to connect to metastore with 
URI thrift://172.19.154.28:9084
15/06/01 21:20:10 INFO hive.metastore: Connected to metastore.
15/06/01 21:20:10 ERROR exec.FunctionRegistry: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.TApplicationException: Invalid method name: 'get_functions'
spark-sql>
```
Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-06-01 Thread jeanlyn

Github user jeanlyn commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-107415482
  
I set the metastore to `0.12.0` by follow steps ,but get classnotdef 
exception:
* I chang the `spark.sql.hive.metastore.version` in `spark-defaults.conf` 
to `0.12.0`,i got
* set `spark.sql.hive.metastore.jars` to `maven`,or set 
`spark.sql.hive.metastore.jars` to classpath of hadoop and hive i got 
```
 java.lang.NoClassDefFoundError: com/google/common/base/Preconditions when 
creating Hive client using classpath
```
I am not sure i am understand correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-21 Thread coderfi

Github user coderfi commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-104511653
  
K. Filed as https://issues.apache.org/jira/browse/SPARK-7819

thank you for looking into this!

Fi


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-21 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-104504904
  
Yes, please file a JIRA.
On May 21, 2015 9:29 PM, "coderfi"  wrote:

> Has this been tested with mapr4?
>
> When running under python, I am running into an exception
>
> Caused by: java.lang.UnsatisfiedLinkError: Native Library /tmp/
> mapr-root-libMapRClient.4.0.2-mapr.so already loaded in another
> classloader
> at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1931)
> at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1890)
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1851)
> at java.lang.Runtime.load0(Runtime.java:795)
> at java.lang.System.load(System.java:1062)
> at com.mapr.fs.shim.LibraryLoader.load(LibraryLoader.java:29)
>
> This occurs under the following scenarios:
> create a spark context + hive context
> run a hive query
> stop the spark context
> create another spark context + hive context again
> run a hive query
> I get the Native Library exception
>
> From what I can tell, when the ClassLoader is hitting a MAPR class which
> is attempting to load the native library (I presume, as part of a static
> initializer).
> Unfortunately, the JVM prohibits this the second time around. :(
>
> I could file a ticket with a full stack trace if you'd like.
>
> thanks,
> Fi
>
> â
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-21 Thread coderfi

Github user coderfi commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-104503508
  
Has this been tested with mapr4?

When running under python, I am running into an exception

Caused by: java.lang.UnsatisfiedLinkError: Native Library 
/tmp/mapr-root-libMapRClient.4.0.2-mapr.so already loaded in another classloader
at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1931)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1890)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1851)
at java.lang.Runtime.load0(Runtime.java:795)
at java.lang.System.load(System.java:1062)
at com.mapr.fs.shim.LibraryLoader.load(LibraryLoader.java:29)

This occurs under the following scenarios:
   create a spark context + hive context
   run a hive query
   stop the spark context
   create another spark context + hive context again
   run a hive query
   I get the Native Library exception

From what I can tell, when the ClassLoader is hitting a MAPR class which is 
attempting to load the native library (I presume, as part of a static 
initializer).
Unfortunately, the JVM prohibits this the second time around. :(

I could file a ticket with a full stack trace if you'd like.
   
thanks,
Fi



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5876


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100077855
  
LGTM. I will merging it to master and branch 1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100077600
  
  [Test build #32167 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32167/consoleFull)
 for   PR 5876 at commit 
[`258d000`](https://github.com/apache/spark/commit/258d0002c1795c9c8e69c0a7393eb1a3c81ba411).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100077610
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32167/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100077609
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100052901
  
  [Test build #32167 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32167/consoleFull)
 for   PR 5876 at commit 
[`258d000`](https://github.com/apache/spark/commit/258d0002c1795c9c8e69c0a7393eb1a3c81ba411).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100052711
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100052724
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100052386
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100052137
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100052138
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32165/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100052136
  
  [Test build #32165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32165/consoleFull)
 for   PR 5876 at commit 
[`258d000`](https://github.com/apache/spark/commit/258d0002c1795c9c8e69c0a7393eb1a3c81ba411).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100051873
  
  [Test build #32165 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32165/consoleFull)
 for   PR 5876 at commit 
[`258d000`](https://github.com/apache/spark/commit/258d0002c1795c9c8e69c0a7393eb1a3c81ba411).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100051683
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100051672
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29907530
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +100,113 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, hiveExecutionVersion)
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a classpath in the standard format for both hive and hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = newTemporaryConfiguration())
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using Spark classes.")
+  new IsolatedClientLoader(
+version = metaVersion,
+execJars = jars.toSeq,
+config = allConfig,
+isolationOn = true)
+} else if (hiveMetastoreJars == "maven") {
+  // TODO: Support for loading the jars from an already downloaded 
location.
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using maven.")
+  IsolatedClientLoader.forVersion(hiveMetastoreVersion, allConfig )
+} else {
+  // Convert to files and expand any directories.
+  val ja

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29907415
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +100,113 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, hiveExecutionVersion)
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a classpath in the standard format for both hive and hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = newTemporaryConfiguration())
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using Spark classes.")
+  new IsolatedClientLoader(
+version = metaVersion,
+execJars = jars.toSeq,
+config = allConfig,
+isolationOn = true)
+} else if (hiveMetastoreJars == "maven") {
+  // TODO: Support for loading the jars from an already downloaded 
location.
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using maven.")
+  IsolatedClientLoader.forVersion(hiveMetastoreVersion, allConfig )
+} else {
+  // Convert to files and expand any directories.
+  val ja

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29907285
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +100,113 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, hiveExecutionVersion)
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a classpath in the standard format for both hive and hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = newTemporaryConfiguration())
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using Spark classes.")
+  new IsolatedClientLoader(
+version = metaVersion,
+execJars = jars.toSeq,
+config = allConfig,
+isolationOn = true)
+} else if (hiveMetastoreJars == "maven") {
+  // TODO: Support for loading the jars from an already downloaded 
location.
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using maven.")
+  IsolatedClientLoader.forVersion(hiveMetastoreVersion, allConfig )
+} else {
+  // Convert to files and expand any directories.
+  val jars

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29907032
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +100,113 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, hiveExecutionVersion)
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a classpath in the standard format for both hive and hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = newTemporaryConfiguration())
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using Spark classes.")
+  new IsolatedClientLoader(
+version = metaVersion,
+execJars = jars.toSeq,
+config = allConfig,
+isolationOn = true)
+} else if (hiveMetastoreJars == "maven") {
+  // TODO: Support for loading the jars from an already downloaded 
location.
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using maven.")
+  IsolatedClientLoader.forVersion(hiveMetastoreVersion, allConfig )
+} else {
+  // Convert to files and expand any directories.
+  val ja

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100047386
  
Just a couple of comments left, otherwise the parts I commented on look 
good. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29906552
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +100,113 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, hiveExecutionVersion)
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a classpath in the standard format for both hive and hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = newTemporaryConfiguration())
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using Spark classes.")
+  new IsolatedClientLoader(
+version = metaVersion,
+execJars = jars.toSeq,
+config = allConfig,
+isolationOn = true)
+} else if (hiveMetastoreJars == "maven") {
+  // TODO: Support for loading the jars from an already downloaded 
location.
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using maven.")
+  IsolatedClientLoader.forVersion(hiveMetastoreVersion, allConfig )
+} else {
+  // Convert to files and expand any directories.
+  val jars

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100047039
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32161/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100047037
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29906395
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +100,113 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, hiveExecutionVersion)
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a classpath in the standard format for both hive and hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = newTemporaryConfiguration())
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using Spark classes.")
+  new IsolatedClientLoader(
+version = metaVersion,
+execJars = jars.toSeq,
+config = allConfig,
+isolationOn = true)
+} else if (hiveMetastoreJars == "maven") {
+  // TODO: Support for loading the jars from an already downloaded 
location.
+  logInfo(
+s"Initializing HiveMetastoreConnection version 
$hiveMetastoreVersion using maven.")
+  IsolatedClientLoader.forVersion(hiveMetastoreVersion, allConfig )
+} else {
+  // Convert to files and expand any directories.
+  val jars

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100045495
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100045487
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29905464
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -74,7 +74,17 @@ private[hive] object SparkSQLCLIDriver {
   System.exit(1)
 }
 
-val sessionState = new CliSessionState(new 
HiveConf(classOf[SessionState]))
+val localMetastore = {
--- End diff --

Actually, you already are, you just need to remove this block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29905416
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -74,7 +74,17 @@ private[hive] object SparkSQLCLIDriver {
   System.exit(1)
 }
 
-val sessionState = new CliSessionState(new 
HiveConf(classOf[SessionState]))
+val localMetastore = {
--- End diff --

Use `newTemporaryConfiguation` [sic] here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29905330
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -439,7 +455,21 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
 }
 
 
-private object HiveContext {
+private[hive] object HiveContext {
+  /** The version of hive used internally by Spark SQL. */
+  val hiveExecutionVersion: String = "0.13.1"
+
+  val HIVE_METASTORE_VERSION: String = "spark.sql.hive.metastore.version"
+  val HIVE_METASTORE_JARS: String = "spark.sql.hive.metastore.jars"
+
+  /** Constructs a configuration for hive, where the metastore is located 
in a temp directory. */
+  def newTemporaryConfiguation(): Map[String, String] = {
--- End diff --

typo: Configuration


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29905368
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -439,7 +455,21 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
 }
 
 
-private object HiveContext {
+private[hive] object HiveContext {
+  /** The version of hive used internally by Spark SQL. */
+  val hiveExecutionVersion: String = "0.13.1"
+
+  val HIVE_METASTORE_VERSION: String = "spark.sql.hive.metastore.version"
+  val HIVE_METASTORE_JARS: String = "spark.sql.hive.metastore.jars"
+
+  /** Constructs a configuration for hive, where the metastore is located 
in a temp directory. */
+  def newTemporaryConfiguation(): Map[String, String] = {
+val tempDir = Utils.createTempDir()
+val localMetastore = new File(tempDir, "metastore")
--- End diff --

nit: `new File(tempDir, "metastore").getAbsolutePath`. Just in case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100042449
  
  [Test build #32157 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32157/consoleFull)
 for   PR 5876 at commit 
[`81bb366`](https://github.com/apache/spark/commit/81bb36628f647433a9bd2203cb4f30bfbda02b9e).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100042454
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100042457
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32157/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100042338
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100042341
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32149/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100042324
  
  [Test build #32149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32149/consoleFull)
 for   PR 5876 at commit 
[`5f3945e`](https://github.com/apache/spark/commit/5f3945eb3e0f8b12ba58e8d2f5715525b182fd53).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StringArrayParam(parent: Params, name: String, doc: String, 
isValid: Array[String] => Boolean)`
  * `class VectorAssembler(JavaTransformer, HasInputCols, HasOutputCol):`
  * `class HasInputCols(Params):`
  * `class RegressionMetrics(JavaModelWrapper):`
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100041843
  
  [Test build #32157 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32157/consoleFull)
 for   PR 5876 at commit 
[`81bb366`](https://github.com/apache/spark/commit/81bb36628f647433a9bd2203cb4f30bfbda02b9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100041690
  
Thanks for the comments @vanzin, I think I have addressed everything except 
the ability to configure hive through `spark.hadoop.*`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100041602
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100041616
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29903763
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
--- End diff --

Okay, thats reasonable.  I'd like to defer that to a follow up though as 
we've never allowed overriding the hive conf like this in the past.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29901320
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
--- End diff --

It's more for consistency than anything else. It also allows you to easily 
override things in the `spark-submit` command line or in `spark-defaults.conf` 
if you need some Spark-specific configuration for Hive and don't want to change 
the global Hive configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29900905
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
--- End diff --

I'm less sure about this one.  You can already override config the standard 
way using `hive-site.xml`. what is the advantage of this way?  If we were to do 
this would we want it to be spark.hive.*?  I'm not really opposed, I'd just 
like to avoid having too many paths for configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29900315
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -74,7 +74,16 @@ private[hive] object SparkSQLCLIDriver {
   System.exit(1)
 }
 
-val sessionState = new CliSessionState(new 
HiveConf(classOf[SessionState]))
+val localMetastore = {
+  val temp = Utils.createTempDir()
+  temp.delete()
--- End diff --

Ah yes, that is much nicer :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29900100
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection v

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29899875
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29899815
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29899653
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
--- End diff --

Its lazy because it ends up getting invoked during the constructor before 
it can be initialized, but you make a good point.  I've moved it to the 
HiveContext singleton and replaced other usages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100026910
  
I left a few comments mostly around the configuration parts of the change. 
I'm not really familiar with the rest to comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29898475
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -70,24 +71,22 @@ class TestHiveContext(sc: SparkContext) extends 
HiveContext(sc) {
   hiveconf.set("hive.plan.serialization.format", "javaXML")
 
   lazy val warehousePath = Utils.createTempDir()
-  lazy val metastorePath = Utils.createTempDir()
+  lazy val metastorePath = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
 
   /** Sets up the system initially or after a RESET command */
-  protected def configure(): Unit = {
-warehousePath.delete()
-metastorePath.delete()
-setConf("javax.jdo.option.ConnectionURL",
-  s"jdbc:derby:;databaseName=$metastorePath;create=true")
-setConf("hive.metastore.warehouse.dir", warehousePath.toString)
-  }
+  protected override def configure(): Map[String, String] = Map(
+  "javax.jdo.option.ConnectionURL" -> 
s"jdbc:derby:;databaseName=$metastorePath;create=true",
--- End diff --

4th instance... but who's counting. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29898324
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -39,17 +39,34 @@ import org.apache.spark.sql.hive.MetastoreRelation
  */
 private[hive]
 case class CreateTableAsSelect(
-database: String,
-tableName: String,
+tableDesc: HiveTable,
 query: LogicalPlan,
-allowExisting: Boolean,
-desc: Option[CreateTableDesc]) extends RunnableCommand {
+allowExisting: Boolean)
+  extends RunnableCommand {
+
+  def database: String = tableDesc.database
+  def tableName: String = tableDesc.name
 
   override def run(sqlContext: SQLContext): Seq[Row] = {
 val hiveContext = sqlContext.asInstanceOf[HiveContext]
 lazy val metastoreRelation: MetastoreRelation = {
-  // Create Hive Table
-  hiveContext.catalog.createTable(database, tableName, query.output, 
allowExisting, desc)
+  import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
+  import org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
+  import org.apache.hadoop.io.Text
+  import org.apache.hadoop.mapred.TextInputFormat
+
+  val withSchema =
+tableDesc.copy(
+  schema =
+query.output.map(c =>
+  HiveColumn(c.name, 
HiveMetastoreTypes.toMetastoreType(c.dataType), null)),
+inputFormat =
--- End diff --

indentation looks off here; shouldn't this be aligned with `schema =`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29897419
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection v

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29897294
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
+// `configure` goes second to override other settings.
+val allConfig = metadataConf.iterator.map(e => e.getKey -> 
e.getValue).toMap ++ configure
+
+val isolatedLoader = if (hiveMetastoreJars == "builtin") {
+  if (hiveExecutionVersion != hiveMetastoreVersion) {
+throw new IllegalArgumentException(
+  "Builtin jars can only be used when hive execution version == 
hive metastore version. " +
+  s"Execution: ${hiveExecutionVersion} != Metastore: 
${hiveMetastoreVersion}. " +
+  "Specify a vaild path to the correct hive jars using 
$HIVE_METASTORE_JARS " +
+  s"or change $HIVE_METASTORE_VERSION to $hiveExecutionVersion.")
+  }
+  val jars = getClass.getClassLoader match {
+case urlClassLoader: java.net.URLClassLoader => 
urlClassLoader.getURLs
+case other =>
+  throw new IllegalArgumentException(
+"Unable to locate hive jars to connect to metastore " +
+s"using classloader ${other.getClass.getName}. " +
+"Please set spark.sql.hive.metastore.jars")
+  }
+
+  logInfo(
+s"Initializing HiveMetastoreConnection v

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29897174
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
+  }
+  SessionState.setCurrentSessionState(executionHive.state)
+
+  /**
+   * The copy of the Hive client that is used to retrieve metadata from 
the Hive MetaStore.
+   * The version of the Hive client that is used here must match the 
metastore that is configured
+   * in the hive-site.xml file.
+   */
+  @transient
+  protected[hive] lazy val metadataHive: ClientInterface = {
+val metaVersion = 
IsolatedClientLoader.hiveVersion(hiveMetastoreVersion)
+
+// We instantiate a HiveConf here to read in the hive-site.xml file 
and then pass the options
+// into the isolated client loader
+val metadataConf = new HiveConf()
--- End diff --

I think this should be:

new HiveConf(SparkHadoopUtil.get.newConfiguration(sparkConf), 
classOf[HiveConf])

That would allow users to override configs using `spark.hadoop.*`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29896863
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
+
+  /**
+   * The copy of the hive client that is used for execution.  Currently 
this must always be
+   * Hive 13 as this is the version of Hive that is packaged with Spark 
SQL.  This copy of the
+   * client is used for execution related tasks like registering temporary 
functions or ensuring
+   * that the ThreadLocal SessionState is correctly populated.  This copy 
of Hive is *not* used
+   * for storing peristent metadata, and only point to a dummy metastore 
in a temporary directory.
+   */
+  @transient
+  protected[hive] lazy val executionHive: ClientWrapper = {
+logInfo(s"Initilizing execution hive, version $hiveExecutionVersion")
+new ClientWrapper(
+  version = IsolatedClientLoader.hiveVersion(hiveExecutionVersion),
+  config = Map(
+"javax.jdo.option.ConnectionURL" ->
+s"jdbc:derby:;databaseName=$localMetastore;create=true"))
--- End diff --

Same similar code again... maybe have this map as a `lazy val` somewhere, 
and have the temp dir be created there too? Then reference that `val` from the 
(so far) 3 call sites.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29896733
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
 ---
@@ -51,6 +52,12 @@ private[hive] object SparkSQLEnv extends Logging {
   sparkContext.addSparkListener(new StatsReportListener())
   hiveContext = new HiveContext(sparkContext)
 
+  hiveContext.metadataHive.setOut(new PrintStream(System.out, true, 
"UTF-8"))
--- End diff --

See 
https://github.com/apache/hive/blob/release-0.13.1/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L650-653


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29896662
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
+
+  /** The version of hive used internally by Spark SQL. */
+  lazy val hiveExecutionVersion: String = "0.13.1"
--- End diff --

Why lazy? I'm also seeing this string in a bunch of places, might be better 
to have a constant somewhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29896544
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
--- End diff --

Shouldn't this be just a classpath? Paths on Win32 are separated by 
semi-colons, and colons might behave differently there (aside from not being 
"natural" for Win32 users).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29896591
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
--- End diff --

Is there any way you can share this code with the CLI driver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29896241
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala
 ---
@@ -51,6 +52,12 @@ private[hive] object SparkSQLEnv extends Logging {
   sparkContext.addSparkListener(new StatsReportListener())
   hiveContext = new HiveContext(sparkContext)
 
+  hiveContext.metadataHive.setOut(new PrintStream(System.out, true, 
"UTF-8"))
--- End diff --

nit: is UTF-8 the right thing here? It's not the default encoding on all 
platforms. Maybe omit that argument (which probably means using the platform 
default, even though the javadocs don't explicitly mention that)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29895935
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -91,10 +100,14 @@ private[hive] object SparkSQLCLIDriver {
 
 // Set all properties specified via command line.
 val conf: HiveConf = sessionState.getConf
-sessionState.cmdProperties.entrySet().foreach { item: 
java.util.Map.Entry[Object, Object] =>
-  conf.set(item.getKey.asInstanceOf[String], 
item.getValue.asInstanceOf[String])
-  sessionState.getOverriddenConfigurations.put(
-item.getKey.asInstanceOf[String], 
item.getValue.asInstanceOf[String])
+sessionState.cmdProperties.entrySet().foreach { item =>
+  val key = item.getKey.asInstanceOf[String]
+  val value = item.getValue.asInstanceOf[String]
+  // We do not propogate metastore options to the execution copy of 
hive.
--- End diff --

nit: propagate


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29895833
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -74,7 +74,16 @@ private[hive] object SparkSQLCLIDriver {
   System.exit(1)
 }
 
-val sessionState = new CliSessionState(new 
HiveConf(classOf[SessionState]))
+val localMetastore = {
+  val temp = Utils.createTempDir()
+  temp.delete()
--- End diff --

Instead of this I'd use "$temp/something" as the metastore directory, to 
avoid the "temp" directory being created by another process before Hive has a 
chance to do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29895762
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -74,7 +74,16 @@ private[hive] object SparkSQLCLIDriver {
   System.exit(1)
 }
 
-val sessionState = new CliSessionState(new 
HiveConf(classOf[SessionState]))
+val localMetastore = {
+  val temp = Utils.createTempDir()
+  temp.delete()
+  temp
+}
+val cliConf = new HiveConf(classOf[SessionState])
+// Override the location of the metastore since this is only used for 
local execution.
--- End diff --

So, if I understand this correctly, you're creating just a "dummy" local 
metastore that won't actually be used; this is just to keep the Hive libraries 
happy, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100010571
  
  [Test build #32149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32149/consoleFull)
 for   PR 5876 at commit 
[`5f3945e`](https://github.com/apache/spark/commit/5f3945eb3e0f8b12ba58e8d2f5715525b182fd53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100010299
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-100010327
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29885759
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -1080,6 +1149,15 @@ 
https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C
   }
 
   protected val escapedIdentifier = "`([^`]+)`".r
+  protected val singleQuotedString = "\"([^\"]+)\"".r
+  protected val doubleQuotedString = "'([^']+)'".r
--- End diff --

typo?
```
protected val doubleQuotedString = "\"([^\"]+)\"".r
protected val singleQuotedString = "'([^']+)'".r
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29883765
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -144,39 +147,43 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
   options: Map[String, String],
   isExternal: Boolean): Unit = {
 val (dbName, tblName) = processDatabaseAndTableName("default", 
tableName)
-val tbl = new Table(dbName, tblName)
-
-tbl.setProperty("spark.sql.sources.provider", provider)
+val tableProperties = new scala.collection.mutable.HashMap[String, 
String]
+tableProperties.put("spark.sql.sources.provider", provider)
 if (userSpecifiedSchema.isDefined) {
   val threshold = hive.conf.schemaStringLengthThreshold
   val schemaJsonString = userSpecifiedSchema.get.json
   // Split the JSON string.
   val parts = schemaJsonString.grouped(threshold).toSeq
-  tbl.setProperty("spark.sql.sources.schema.numParts", 
parts.size.toString)
+  tableProperties.put("spark.sql.sources.schema.numParts", 
parts.size.toString)
   parts.zipWithIndex.foreach { case (part, index) =>
-tbl.setProperty(s"spark.sql.sources.schema.part.${index}", part)
+tableProperties.put(s"spark.sql.sources.schema.part.${index}", 
part)
   }
 }
-options.foreach { case (key, value) => tbl.setSerdeParam(key, value) }
 
-if (isExternal) {
-  tbl.setProperty("EXTERNAL", "TRUE")
-  tbl.setTableType(TableType.EXTERNAL_TABLE)
+val tableType = if (isExternal) {
+  tableProperties.put("EXTERNAL", "TRUE")
+  ExternalTable
 } else {
-  tbl.setProperty("EXTERNAL", "FALSE")
-  tbl.setTableType(TableType.MANAGED_TABLE)
-}
-
-// create the table
-synchronized {
-  client.createTable(tbl, false)
-}
+  tableProperties.put("EXTERNAL", "FALSE")
+  ManagedTable
+}
+
+client.createTable(
+  HiveTable(
+specifiedDatabase = Option(dbName),
+name = tblName,
+schema = Seq.empty,
+partitionColumns = Seq.empty,
+tableType = tableType,
+properties = tableProperties.toMap,
+serdeProperties = options))
   }
 
   def hiveDefaultTableFilePath(tableName: String): String = synchronized {
-val currentDatabase = 
client.getDatabase(hive.sessionState.getCurrentDatabase)
-
-hiveWarehouse.getTablePath(currentDatabase, tableName).toString
+// Code based on: hiveWarehouse.getTablePath(currentDatabase, 
tableName)
+new Path(
+  new Path(client.getDatabase(client.currentDatabase).location),
+  tableName.toLowerCase).toString
   }
 
   def tableExists(tableIdentifier: Seq[String]): Boolean = synchronized {
--- End diff --

Should we remove this explicit `synchronized`? There are a few places where 
we still have it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29883543
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
--- End diff --

Correct, but some operations do instantiate the client so we need to point 
it at a dummy database otherwise in many cases derby will complain that more 
that one processes is trying to open the database.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99973577
  
Seems we can remove the lock at 
https://github.com/marmbrus/spark/blob/useIsolatedClient/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L73.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-07 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/5876#discussion_r29871007
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -93,9 +99,129 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf("spark.sql.hive.convertCTAS", "false").toBoolean
 
+  /**
+   * The version of the hive client that will be used to communicate with 
the metastore.  Note that
+   * this does not necessarily need to be the same version of Hive that is 
used internally by
+   * Spark SQL for execution.
+   */
+  protected[hive] def hiveMetastoreVersion: String =
+getConf(HIVE_METASTORE_VERSION, "0.13.1")
+
+  /**
+   * The location of the jars that should be used to instantiate the 
HiveMetastoreClient.  This
+   * property can be one of three options:
+   *  - a colon-separated list of jar files or directories for hive and 
hadoop.
+   *  - builtin - attempt to discover the jars that were used to load 
Spark SQL and use those. This
+   *  option is only valid when using the execution version of 
Hive.
+   *  - maven - download the correct version of hive on demand from maven.
+   */
+  protected[hive] def hiveMetastoreJars: String =
+getConf(HIVE_METASTORE_JARS, "builtin")
+
   @transient
   protected[sql] lazy val substitutor = new VariableSubstitution()
 
+
+  /** A local instance of hive that is only used for execution. */
+  protected[hive] lazy val localMetastore = {
+val temp = Utils.createTempDir()
+temp.delete()
+temp
+  }
+
+  @transient
+  protected[hive] lazy val executionConf = new HiveConf()
+  executionConf.set(
+"javax.jdo.option.ConnectionURL", 
s"jdbc:derby:;databaseName=$localMetastore;create=true")
--- End diff --

We will not contact this metastore and the only purpose of it is to make 
the Hive used for execution happy, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99681253
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99681248
  
  [Test build #32053 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32053/consoleFull)
 for   PR 5876 at commit 
[`f5de7de`](https://github.com/apache/spark/commit/f5de7deb2b5638c7f2b13b5e2d133ccad2c78f97).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99681255
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32053/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99668469
  
@marmbrus unit test passed. I will update the code for supporting SerDe 
once this merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99668213
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99668214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32041/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99668195
  
  [Test build #32041 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32041/consoleFull)
 for   PR 5876 at commit 
[`7e8f010`](https://github.com/apache/spark/commit/7e8f010b4c4de9c29d0b3fbd491f2349fd106944).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99650040
  
  [Test build #32053 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32053/consoleFull)
 for   PR 5876 at commit 
[`f5de7de`](https://github.com/apache/spark/commit/f5de7deb2b5638c7f2b13b5e2d133ccad2c78f97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99649339
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99649466
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99643324
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32049/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99643320
  
  [Test build #32049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32049/consoleFull)
 for   PR 5876 at commit 
[`11e9c72`](https://github.com/apache/spark/commit/11e9c72e2b100238ef732e030e8d3bc7ccf8da25).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99643323
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99643097
  
  [Test build #32049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32049/consoleFull)
 for   PR 5876 at commit 
[`11e9c72`](https://github.com/apache/spark/commit/11e9c72e2b100238ef732e030e8d3bc7ccf8da25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99642855
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99642842
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99640561
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32030/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99640498
  
  [Test build #32030 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32030/consoleFull)
 for   PR 5876 at commit 
[`e7b3941`](https://github.com/apache/spark/commit/e7b3941669bfbf53ccb909f13d62eaf895b49d11).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99640557
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99635162
  
  [Test build #32041 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32041/consoleFull)
 for   PR 5876 at commit 
[`7e8f010`](https://github.com/apache/spark/commit/7e8f010b4c4de9c29d0b3fbd491f2349fd106944).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6908] [SQL] Use isolated Hive client

2015-05-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5876#issuecomment-99634922
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 148 matches

Mail list logo