[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100255729
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -213,6 +218,24 @@ class SparkSession private(
 new SparkSession(sparkContext, Some(sharedState))
   }
 
+  /**
+   * Start a new session, sharing the underlying `SparkContext` and cached 
data.
+   * If inheritSessionState is enabled, then SQL configurations, temporary 
tables,
+   * registered functions are copied over from parent `SparkSession`.
+   *
+   * @note Other than the `SparkContext`, all shared state is initialized 
lazily.
+   * This method will force the initialization of the shared state to 
ensure that parent
+   * and child sessions are set up with the same shared state. If the 
underlying catalog
+   * implementation is Hive, this will initialize the metastore, which may 
take some time.
+   */
--- End diff --

why not remove the boolean flag and just call this cloneSession?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100192972
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -38,16 +38,31 @@ import 
org.apache.spark.sql.util.ExecutionListenerManager
 
 /**
  * A class that holds all session-specific state in a given 
[[SparkSession]].
+ * If an `existingSessionState` is supplied, then its members will be 
copied over.
  */
-private[sql] class SessionState(sparkSession: SparkSession) {
+private[sql] class SessionState(
+sparkSession: SparkSession,
+existingSessionState: Option[SessionState]) {
--- End diff --

nit: `existingSessionState` -> `parentSessionState` to indicate we should 
copy its internal states.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100197297
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -213,6 +218,24 @@ class SparkSession private(
 new SparkSession(sparkContext, Some(sharedState))
--- End diff --

It's better to add `final` to avoid override this incorrectly. The method 
should be override is `newSession(inheritSessionState: Boolean)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100195293
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -213,6 +218,24 @@ class SparkSession private(
 new SparkSession(sparkContext, Some(sharedState))
   }
 
+  /**
+   * Start a new session, sharing the underlying `SparkContext` and cached 
data.
--- End diff --

nit: add `:: Experimental ::`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100190232
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala ---
@@ -46,4 +48,23 @@ class ExperimentalMethods private[sql]() {
 
   @volatile var extraOptimizations: Seq[Rule[LogicalPlan]] = Nil
 
+  /**
+   * Get an identical copy of this `ExperimentalMethods` instance.
+   * @note This is used when forking a `SparkSession`.
+   * `clone` is provided here instead of implementing equivalent 
functionality
+   * in `SparkSession.clone` since it needs to be updated
+   * as the class `ExperimentalMethods` is extended or  modified.
+   */
+  override def clone: ExperimentalMethods = {
+def cloneSeq[T](seq: Seq[T]): Seq[T] = {
+  val newSeq = new ListBuffer[T]
+  newSeq ++= seq
+  newSeq
+}
+
+val result = new ExperimentalMethods
+result.extraStrategies = cloneSeq(extraStrategies)
--- End diff --

You don't need to copy these two `Seq`s since they are not mutable `Seq`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100195235
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -213,6 +218,24 @@ class SparkSession private(
 new SparkSession(sparkContext, Some(sharedState))
   }
 
+  /**
+   * Start a new session, sharing the underlying `SparkContext` and cached 
data.
+   * If inheritSessionState is enabled, then SQL configurations, temporary 
tables,
+   * registered functions are copied over from parent `SparkSession`.
+   *
+   * @note Other than the `SparkContext`, all shared state is initialized 
lazily.
+   * This method will force the initialization of the shared state to 
ensure that parent
+   * and child sessions are set up with the same shared state. If the 
underlying catalog
+   * implementation is Hive, this will initialize the metastore, which may 
take some time.
+   */
--- End diff --

nit: please add `@Experimental` and `@InterfaceStability.Evolving`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100193350
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -38,16 +38,31 @@ import 
org.apache.spark.sql.util.ExecutionListenerManager
 
 /**
  * A class that holds all session-specific state in a given 
[[SparkSession]].
+ * If an `existingSessionState` is supplied, then its members will be 
copied over.
  */
-private[sql] class SessionState(sparkSession: SparkSession) {
+private[sql] class SessionState(
+sparkSession: SparkSession,
+existingSessionState: Option[SessionState]) {
+
+  private[sql] def this(sparkSession: SparkSession) = {
+this(sparkSession, None)
+  }
 
   // Note: These are all lazy vals because they depend on each other (e.g. 
conf) and we
   // want subclasses to override some of the fields. Otherwise, we would 
get a lot of NPEs.
 
   /**
* SQL-specific key-value configurations.
*/
-  lazy val conf: SQLConf = new SQLConf
+  lazy val conf: SQLConf = {
+val result = new SQLConf
+if (existingSessionState.nonEmpty) {
--- End diff --

nit:
```scala
existingSessionState.foreach(_.conf.getAllConfs.foreach {
  case (k, v) => if (v ne null) result.setConfString(k, v)
})
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100195109
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -213,6 +218,24 @@ class SparkSession private(
 new SparkSession(sparkContext, Some(sharedState))
--- End diff --

nit: use `newSession(inheritSessionState = false)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100195324
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -213,6 +218,24 @@ class SparkSession private(
 new SparkSession(sparkContext, Some(sharedState))
   }
 
+  /**
+   * Start a new session, sharing the underlying `SparkContext` and cached 
data.
+   * If inheritSessionState is enabled, then SQL configurations, temporary 
tables,
+   * registered functions are copied over from parent `SparkSession`.
+   *
+   * @note Other than the `SparkContext`, all shared state is initialized 
lazily.
+   * This method will force the initialization of the shared state to 
ensure that parent
+   * and child sessions are set up with the same shared state. If the 
underlying catalog
+   * implementation is Hive, this will initialize the metastore, which may 
take some time.
--- End diff --

nit: add `@since 2.1.1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100194991
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -151,7 +155,7 @@ private[hive] class TestHiveSparkSession(
 new TestHiveSessionState(self)
 
   override def newSession(): TestHiveSparkSession = {
--- End diff --

You can change it to override `def newSession(inheritSessionState: 
Boolean)` instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100198503
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala ---
@@ -123,4 +125,70 @@ class SparkSessionBuilderSuite extends SparkFunSuite {
   session.stop()
 }
   }
+
+  test("fork new session and inherit a copy of the session state") {
+val activeSession = 
SparkSession.builder().master("local").getOrCreate()
+val forkedSession = activeSession.newSession(inheritSessionState = 
true)
+
+assert(forkedSession ne activeSession)
+assert(forkedSession.sessionState ne activeSession.sessionState)
+
+forkedSession.stop()
+activeSession.stop()
+  }
+
+  test("fork new session and inherit sql config options") {
+val activeSession = SparkSession
+  .builder()
+  .master("local")
+  .config("spark-configb", "b")
--- End diff --

This is in the shared state. You should use `SparkSession.conf.set` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100194351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -65,12 +80,29 @@ private[sql] class SessionState(sparkSession: 
SparkSession) {
 hadoopConf
   }
 
-  lazy val experimentalMethods = new ExperimentalMethods
+  lazy val experimentalMethods: ExperimentalMethods = {
+existingSessionState
+  .map(_.experimentalMethods.clone)
+  .getOrElse(new ExperimentalMethods)
+  }
 
   /**
* Internal catalog for managing functions registered by the user.
*/
-  lazy val functionRegistry: FunctionRegistry = 
FunctionRegistry.builtin.copy()
+  lazy val functionRegistry: FunctionRegistry = {
--- End diff --

It's better to just add a `copy` method to `FunctionRegistry` to simplify 
these codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16826#discussion_r100194640
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -115,16 +113,22 @@ class TestHiveContext(
 private[hive] class TestHiveSparkSession(
 @transient private val sc: SparkContext,
 @transient private val existingSharedState: Option[SharedState],
+existingSessionState: Option[SessionState],
--- End diff --

Looks like you don't need to change this file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-06 Thread kunalkhamar
GitHub user kunalkhamar opened a pull request:

https://github.com/apache/spark/pull/16826

Fork SparkSession with option to inherit a copy of the SessionState.

## What changes were proposed in this pull request?

Forking a newSession() from SparkSession currently makes a new SparkSession 
that does not retain SessionState (i.e. temporary tables, SQL config, 
registered functions etc.) This change adds a flag 
newSession(inheritSessionState: Boolean) which can create a new SparkSession 
with a copy of the parent's SessionState.

## How was this patch tested?

Unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunalkhamar/spark fork-sparksession

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16826.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16826


commit 18ce1b82403cfb1e2b4fcac2dd5875f289d61ed3
Author: Kunal Khamar 
Date:   2017-02-03T00:31:05Z

Add capability to inherit SessionState (SQL conf, temp tables, registered 
functions) when forking a new SparkSession.

commit 9beb78d82dbc17aed047d818fdd4fca1b0d6
Author: Kunal Khamar 
Date:   2017-02-06T22:55:56Z

Add tests for forking new session with inherit config enabled. Update 
overloaded functions for Java bytecode compatibility.

commit a343d8af9c577158042e4af9f8832f46aeecd509
Author: Kunal Khamar 
Date:   2017-02-06T23:24:02Z

Fix constructor default args for bytecode compatibility.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org