[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12248


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-208656977
  
Going to merge this in maser. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-208650713
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-208650715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55545/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-208650403
  
**[Test build #55545 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55545/consoleFull)**
 for PR 12248 at commit 
[`0e46c58`](https://github.com/apache/spark/commit/0e46c58685ac8a605d2d98630488083cbef95c2a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-208594890
  
**[Test build #55545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55545/consoleFull)**
 for PR 12248 at commit 
[`0e46c58`](https://github.com/apache/spark/commit/0e46c58685ac8a605d2d98630488083cbef95c2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-208594533
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-11 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-208594528
  
It seems like we agree that this API is easy-to-support in Spark and 
hard/impossible to implement as cleanly in client code. As a result, I think 
this is okay to merge, so I'm going to run this one more time and will merge a 
bit after Jenkins passes. If anyone thinks that we need more discussion before 
accepting this API, let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-09 Thread ericl
Github user ericl commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207913175
  
Yes exactly, this is for implementing functionality such as tracing, where 
to users *any* existing code modification may be too burdensome due to e.g. too 
much plumbing or libraries they cannot modify.

It's the same argument for thread-locals, but in this case spanning driver 
-> worker interactions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207588437
  
This can as easily be ...

```
Properties p = ...
p.put("traceId", "foo")
...
def handleRead(...) {
  log4j.MDC.put("traceId", p.get("traceId"))
  ...
}
```

I get that if `handleRead` is buried somewhere in a library function you 
have to plumb through access to Properties explicitly in the library. Going via 
static methods on a thread-local task context is a little less transparent, but 
it is more convenient. That's really the win, that anything in your code has 
direct magic access to context props; I don't think anything else actually gets 
simpler.

I think the fact it's already an API reduces the cost of a change like this 
in comparison, so I can see the argument for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread ericl
Github user ericl commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207581308
  
@srowen, suppose you have a existing service running Spark jobs that read 
from a custom datasource. You want to add log4j trace annotations in order to 
attribute datasource logs back to the original caller of the service. However 
you want to avoid invasive changes to the existing code. This is a two-line 
change with the proposed API.

```
// in RPC server running as driver
def receive(request: RPC) {
sc.setLocalProperty("traceId", request.traceId)  // add this line
...
}

// in datasource library running on executors
def handleRead(...) {
log4j.MDC.put("traceId", TaskContext.getLocalProperty("traceId"))  // 
add this line
...
}
```

The alternative is to explicitly reference `traceId` in each of the tasks, 
but this would clutter application code with many references to diagnostics 
info, discouraging the use of diagnostic tools.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r59077967
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -42,20 +43,22 @@ import org.apache.spark.shuffle.ShuffleWriter
  * @param _initialAccums initial set of accumulators to be used in this 
task for tracking
  *   internal metrics. Other accumulators will be 
registered later when
  *   they are deserialized on the executors.
+ * @param localProperties copy of thread-local properties set by the user 
on the driver side.
  */
 private[spark] class ShuffleMapTask(
 stageId: Int,
 stageAttemptId: Int,
 taskBinary: Broadcast[Array[Byte]],
 partition: Partition,
 @transient private var locs: Seq[TaskLocation],
-_initialAccums: Seq[Accumulator[_]])
-  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums)
+_initialAccums: Seq[Accumulator[_]],
+localProperties: Properties)
+  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums, localProperties)
   with Logging {
 
   /** A constructor used only in test suites. This does not require 
passing in an RDD. */
   def this(partitionId: Int) {
-this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null)
+this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null, new Properties)
--- End diff --

Fair enough, I suppose allocating the empty map/properties object isn't 
that expensive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r59076114
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -42,20 +43,22 @@ import org.apache.spark.shuffle.ShuffleWriter
  * @param _initialAccums initial set of accumulators to be used in this 
task for tracking
  *   internal metrics. Other accumulators will be 
registered later when
  *   they are deserialized on the executors.
+ * @param localProperties copy of thread-local properties set by the user 
on the driver side.
  */
 private[spark] class ShuffleMapTask(
 stageId: Int,
 stageAttemptId: Int,
 taskBinary: Broadcast[Array[Byte]],
 partition: Partition,
 @transient private var locs: Seq[TaskLocation],
-_initialAccums: Seq[Accumulator[_]])
-  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums)
+_initialAccums: Seq[Accumulator[_]],
+localProperties: Properties)
+  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums, localProperties)
   with Logging {
 
   /** A constructor used only in test suites. This does not require 
passing in an RDD. */
   def this(partitionId: Int) {
-this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null)
+this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null, new Properties)
--- End diff --

It seemed safer to make it required. I can change this to an option if you 
think creating a Properties each time is too much overhead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r59076091
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -42,20 +43,22 @@ import org.apache.spark.shuffle.ShuffleWriter
  * @param _initialAccums initial set of accumulators to be used in this 
task for tracking
  *   internal metrics. Other accumulators will be 
registered later when
  *   they are deserialized on the executors.
+ * @param localProperties copy of thread-local properties set by the user 
on the driver side.
  */
 private[spark] class ShuffleMapTask(
 stageId: Int,
 stageAttemptId: Int,
 taskBinary: Broadcast[Array[Byte]],
 partition: Partition,
 @transient private var locs: Seq[TaskLocation],
-_initialAccums: Seq[Accumulator[_]])
-  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums)
+_initialAccums: Seq[Accumulator[_]],
+localProperties: Properties)
+  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums, localProperties)
   with Logging {
 
   /** A constructor used only in test suites. This does not require 
passing in an RDD. */
   def this(partitionId: Int) {
-this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null)
+this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null, new Properties)
--- End diff --

Properties objects are kind of analogous to `Maps` and I think that 
`Option[Map]` would be kind of a weird type in the same sense that 
`Option[Set]` (or any other collection type) is usually kind a weird code-smell 
So, this is fine with me as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r59074967
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -42,20 +43,22 @@ import org.apache.spark.shuffle.ShuffleWriter
  * @param _initialAccums initial set of accumulators to be used in this 
task for tracking
  *   internal metrics. Other accumulators will be 
registered later when
  *   they are deserialized on the executors.
+ * @param localProperties copy of thread-local properties set by the user 
on the driver side.
  */
 private[spark] class ShuffleMapTask(
 stageId: Int,
 stageAttemptId: Int,
 taskBinary: Broadcast[Array[Byte]],
 partition: Partition,
 @transient private var locs: Seq[TaskLocation],
-_initialAccums: Seq[Accumulator[_]])
-  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums)
+_initialAccums: Seq[Accumulator[_]],
+localProperties: Properties)
+  extends Task[MapStatus](stageId, stageAttemptId, partition.index, 
_initialAccums, localProperties)
   with Logging {
 
   /** A constructor used only in test suites. This does not require 
passing in an RDD. */
   def this(partitionId: Int) {
-this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null)
+this(0, 0, null, new Partition { override def index: Int = 0 }, null, 
null, new Properties)
--- End diff --

I wonder if we can avoid making empty `Properties` all over ... an 
`Option[Properties]`? a setter that is called only where needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207562982
  
I mean making a `Properties` object in the driver, and using a reference to 
it in a function that is executed on the executors. That's certainly in scope. 
For the example you give that seems equally simple and can be bottled up inside 
the library, still.

I understand Josh's use case more. There are certainly tasks and RDDs 
entirely internal to some Spark process. But those also won't know anything 
about what to do with some custom user properties. Maybe eventually they invoke 
a UDF that could use these properties. In many cases that UDF could still just 
refer to whatever config you like directly (right?) but I'm probably not 
thinking of some case where this fails to work.

I take the point about this already being an API for the caller anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207559966
  
@srowen, I think that the main use-case for this feature is associating 
metadata associated with a Spark action / execution and making that metadata 
accessible in that action's tasks. 

For instance, let's say that I run a Spark SQL query and want to propagate 
some metadata related to that query execution from the driver to the executors 
for use in tracing / debugging / instrumentation. Maybe I want to propagate a 
label associated with all tasks launched from the job, such as a job group 
name, and read that label in a custom log appender so that my log messages from 
those tasks contain that metadata.

In this case, the actual RDD code isn't controlled by the user and they 
don't really have a place to interpose broadcast variables or other custom code 
for propagating this metadata.

Even the user's library code were to use broadcast variables and define 
thread-local variables, etc., then they'd have to worry about some subtleties 
related to Spark's internal threading model: for example, thread-locals need to 
be handled carefully to make sure that they're correctly propagated across 
thread-boundaries in PythonRDD, RRDD, ScriptTransformation, PipedRDD, etc., and 
the set of places where you'd need to do that propagation corresponds exactly 
to the set of places where we already happen to be propagating the TaskContext 
thread-local.

Given that `localProperties` is already a stable public API, I think it 
makes sense to make those properties accessible in tasks, since it seems like a 
small and logical extension of an existing API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread ericl
Github user ericl commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207557823
  
> Your change is about passing around a Properties, right? You can simply 
access such an object anywhere you need to and it will be sent around as 
needed. There is nothing to do actually, not even explicitly setting it as some 
context property.

That's not true, if you access a static `Properties` object within an 
executor node it won't have the value you set in the driver, since closures 
only capture variables in lexical scope.

> Your example however seems to be about configuring some global 
per-function behavior, not sending props. In this example, why would the 
library not call setLogLevel internally, either in static initialization or as 
needed when any method is invoked -- why would the caller have to do it?

It's more about configuring behavior based on some property set by some 
upstream caller of the function. The idea is that the user wants to configure 
loglevel just for this job, without impacting any other jobs potentially 
running on the cluster.

> But, how is this helped by adding an additional Properties parameter?

Sorry, I should have made the example more explicit. setLogLevel would be 
implemented in the driver side as `sc.setLocalProperty("mylib.loglevel", 
level)`. On the executor side the library would query 
`TaskContext.getLocalProperty("mylib.loglevel")` to determine the verbosity of 
debug logs.

I think more generally that this adds a mechanism for passing values 
implicitly without requiring the user (that is writing Spark code) to manually 
reference it in each of their closures. You are right that this can be achieved 
via other mechanisms, but those may not be convenient or practical for the use 
case e.g. if you want to integrate with something like 
[X-trace](http://www.x-trace.net/wiki/doku.php) (which out of the scope of this 
PR, but would be easy to add once we have the mechanism).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207543635
  
Your change is about passing around a `Properties`, right? You can simply 
access such an object anywhere you need to and it will be sent around as 
needed. There is nothing to do actually, not even explicitly setting it as some 
context property.

Your example however seems to be about configuring some global per-function 
behavior, not sending props. In this example, why would the library not call 
`setLogLevel` internally, either in static initialization or as needed when any 
method is invoked -- why would the caller have to do it? But, how is this 
helped by adding an additional `Properties` parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread ericl
Github user ericl commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207534620
  
It's not though, if you want to propagate something new without manually 
passing it through all your closures this cannot be done today.

For example, consider a spark library that wants to implement a per-job 
`myLib.setLogLevel()` call. With context propagation you library author can 
provide semantics like this:

```
myLib.setLogLevel(INFO)
sc.parallelize(...).map(myLib.f1).filter(myLib.f2).collect()
```

What you have to do now is something more like:
```
sc.parallelize(...).map { x =>
  myLib.setLogLevel(INFO)
  myLib.f1(x)
}.filter { y =>
  myLib.setLogLevel(INFO)
  myLib.f2(y)
}.collect()
```

which is more verbose and hard to maintain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207507173
  
Propagating references from a closure is already transparent; you just 
reference whatever you want like a Properties object and it goes with the task. 
What's the use case for something more than this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread ericl
Github user ericl commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207506665
  
The main difference is that propagation is transparent to user code. For
example, this could be used to implement something like X-trace without
requiring manual instrumentation of closures

On Fri, Apr 8, 2016, 2:40 AM Sean Owen  wrote:

> Backing up, how does this differ from using broadcast variables for data,
> or for simply sending small properties objects in a closure? does it need
> all this complexity of yet another mechanism?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly or view it on GitHub
> 
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207350278
  
Backing up, how does this differ from using broadcast variables for data, 
or for simply sending small properties objects in a closure? does it need all 
this complexity of yet another mechanism?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207275679
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207275684
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55320/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207274639
  
**[Test build #55320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55320/consoleFull)**
 for PR 12248 at commit 
[`0e46c58`](https://github.com/apache/spark/commit/0e46c58685ac8a605d2d98630488083cbef95c2a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207223641
  
**[Test build #55320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55320/consoleFull)**
 for PR 12248 at commit 
[`0e46c58`](https://github.com/apache/spark/commit/0e46c58685ac8a605d2d98630488083cbef95c2a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207217913
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55291/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207217771
  
**[Test build #55291 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55291/consoleFull)**
 for PR 12248 at commit 
[`964ee4b`](https://github.com/apache/spark/commit/964ee4b00cb5928162ad893c4bc863ec934eb2e9).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207217912
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207160207
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207160179
  
**[Test build #55290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55290/consoleFull)**
 for PR 12248 at commit 
[`2542d01`](https://github.com/apache/spark/commit/2542d01efde219937ab3a7773488bcd98740acd6).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207160209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55290/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207155748
  
**[Test build #55291 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55291/consoleFull)**
 for PR 12248 at commit 
[`964ee4b`](https://github.com/apache/spark/commit/964ee4b00cb5928162ad893c4bc863ec934eb2e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207153790
  
**[Test build #55290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55290/consoleFull)**
 for PR 12248 at commit 
[`2542d01`](https://github.com/apache/spark/commit/2542d01efde219937ab3a7773488bcd98740acd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207152902
  
**[Test build #55285 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55285/consoleFull)**
 for PR 12248 at commit 
[`37f269e`](https://github.com/apache/spark/commit/37f269e0dd08f0acbff552bbb9d27b163d85d86f).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207152922
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55285/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207152919
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207152833
  
**[Test build #55287 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55287/consoleFull)**
 for PR 12248 at commit 
[`82646a9`](https://github.com/apache/spark/commit/82646a9abd0f10e874292de649b2dbed0db2db57).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207152849
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55287/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207152847
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207150129
  
**[Test build #55287 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55287/consoleFull)**
 for PR 12248 at commit 
[`82646a9`](https://github.com/apache/spark/commit/82646a9abd0f10e874292de649b2dbed0db2db57).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207149663
  
To fix MiMa, add the ignores to `MimaExcludes`; I'd follow the existing 
convention and create a new section at 
https://github.com/apache/spark/blob/49fb237081bbca0d811aa48aa06f4728fea62781/project/MimaExcludes.scala#L613
 and reference this JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207149426
  
**[Test build #55285 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55285/consoleFull)**
 for PR 12248 at commit 
[`37f269e`](https://github.com/apache/spark/commit/37f269e0dd08f0acbff552bbb9d27b163d85d86f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r58966987
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
   dataOut.writeLong(timestamp)
 }
 
+// Write the task properties separately so it is available before full 
task deserialization.
+val propBytes = Utils.serialize(task.localProperties)
--- End diff --

Hmm, good point. `Utils.serialize` is fine here, since it doesn't matter 
whether we use a custom serializer here and because today it's always going to 
be `JavaSerializer` anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r58966811
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
   dataOut.writeLong(timestamp)
 }
 
+// Write the task properties separately so it is available before full 
task deserialization.
+val propBytes = Utils.serialize(task.localProperties)
+dataOut.writeInt(propBytes.length)
+dataOut.write(propBytes, 0, propBytes.length)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r58966806
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
   dataOut.writeLong(timestamp)
 }
 
+// Write the task properties separately so it is available before full 
task deserialization.
+val propBytes = Utils.serialize(task.localProperties)
--- End diff --

Wasn't sure how to deserialize on the Executor side. Perhap env.serializer 
there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r58966802
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
   dataOut.writeLong(timestamp)
 }
 
+// Write the task properties separately so it is available before full 
task deserialization.
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r58966470
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
   dataOut.writeLong(timestamp)
 }
 
+// Write the task properties separately so it is available before full 
task deserialization.
+val propBytes = Utils.serialize(task.localProperties)
+dataOut.writeInt(propBytes.length)
+dataOut.write(propBytes, 0, propBytes.length)
--- End diff --

Here, I think you can simply do `dataOut.write(propBytes)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r58966411
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
   dataOut.writeLong(timestamp)
 }
 
+// Write the task properties separately so it is available before full 
task deserialization.
+val propBytes = Utils.serialize(task.localProperties)
--- End diff --

Just curious, why not `serializer.serialize(..)`? This is fine, but just 
wondeirng.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12248#discussion_r58966245
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
   dataOut.writeLong(timestamp)
 }
 
+// Write the task properties separately so it is available before full 
task deserialization.
--- End diff --

Since the properties aren't transient in `Task`, I guess this means that 
we'll write them out twice. If we want to avoid this, we can make 
`localProperties` into a `@transient` `var` which is `private[spark]` then 
re-set the field after deserializing the task. Tasks are send to executors 
using broadcast variables, so the extra space only makes a different for the 
first task from a stage that's run on an executor.

As a result, if we think that these serialized properties will typically be 
small then the extra space savings probably aren't a huge deal, but if we want 
to heavily optimize then we can do the `var` trick.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207146732
  
The MiMa failure is because `TaskContext` is a public abstract class, so 
adding a new method to it breaks binary compatibility for implementors of that 
class. However, `TaskContext` is not intended to actually be implemented by 
users; rather, it functions more as a public interface. Therefore, it should be 
safe to ignore this MiMa error:

```
[error]  * abstract method 
getLocalProperty(java.lang.String)java.lang.String in class 
org.apache.spark.TaskContext is present only in current version
[error]filter with: 
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.TaskContext.getLocalProperty")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207143159
  
**[Test build #55276 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55276/consoleFull)**
 for PR 12248 at commit 
[`b2fc541`](https://github.com/apache/spark/commit/b2fc541368e30ffd455c068b709d68251d09abd5).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207143175
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207143176
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55276/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12248#issuecomment-207141407
  
**[Test build #55276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55276/consoleFull)**
 for PR 12248 at commit 
[`b2fc541`](https://github.com/apache/spark/commit/b2fc541368e30ffd455c068b709d68251d09abd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14475] Propagate user-defined context f...

2016-04-07 Thread ericl
GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/12248

[SPARK-14475] Propagate user-defined context from driver to executors

## What changes were proposed in this pull request?

This adds a new API call `TaskContext.getLocalProperty` for getting 
properties set in the driver from executors. These local properties are 
automatically propagated from the driver to executors. For streaming, the 
context for streaming tasks will be the initial driver context when ssc.start() 
is called.

## How was this patch tested?

Unit tests.

cc @JoshRosen 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark sc-2813

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12248.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12248


commit a8b07f1163a92072c1bc71cf63f804df1f430060
Author: Eric Liang 
Date:   2016-04-07T22:30:39Z

Thu Apr  7 15:30:39 PDT 2016

commit 9104f0cd77c30dd0acf8e2a1c53e9b57e9241c02
Author: Eric Liang 
Date:   2016-04-07T23:33:56Z

tests

commit b2fc541368e30ffd455c068b709d68251d09abd5
Author: Eric Liang 
Date:   2016-04-07T23:38:01Z

Thu Apr  7 16:38:01 PDT 2016




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org