[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-10-16 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-148823119
  
Hey @coolfrood, would you mind closing this PR for now? You can always 
re-open this PR or submit a new one if you have time to work on this issue (or 
someone else can pick it up). Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-10-16 Thread coolfrood
Github user coolfrood closed the pull request at:

https://github.com/apache/spark/pull/6361


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-10-16 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-148861222
  
Sorry, this kind of fell through the cracks. I had intended to do a PR 
against Kryo to have their fix backported.  I'll reopen this PR when I pick it 
up again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-09-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-140296338
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-121072524
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-07-05 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-118637079
  
thats fair enough. however keep in mind that kryo is a transitive
dependency of spark, and one that does not upgrade well and has not been
shaded, so you generally force people to use that version for other stuff
as well (within the same programs), including serializing data structures
in files.


On Mon, Jun 29, 2015 at 12:02 PM, Mark Hamstra notificati...@github.com
wrote:

 @koertkuipers https://github.com/koertkuipers While it may be
 convenient for you that Scalding and Spark have a common serialization
 format when using chill/kryo, that can't be considered anything but a 
lucky
 accident. We certainly haven't made guarantees that Spark's serialization
 will maintain compatibility with some external format. While that doesn't
 mean that we should gratuitously change Spark's serialization, it does 
mean
 that we shouldn't bind ourselves to a particular format if there are
 compelling reasons to do something other.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/6361#issuecomment-116742527.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-29 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-116739363
  
@JoshRosen one issue i see with publishing a modified chill package: we 
read files in spark that were written by scalding using chill/kryo for 
serialization of some objects, and i doubt that would work if spark and 
scalding use different kryo versions. i know of a few other people/companies 
that do the same thing.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-29 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-116730892
  
@koertkuipers, I have a few comments about Chill on the JIRA ticket. If 
necessary, we can consider publishing our own modified Chill package under the 
Spark namespace.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-29 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-116741038
  
I think the serialization format should be the same with a minor version 
bump.

How hard would it be to get Chill updated to use a newer version (2.24) of 
Kryo?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-29 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-116742527
  
@koertkuipers While it may be convenient for you that Scalding and Spark 
have a common serialization format when using chill/kryo, that can't be 
considered anything but a lucky accident.  We certainly haven't made guarantees 
that Spark's serialization will maintain compatibility with some external 
format.  While that doesn't mean that we should gratuitously change Spark's 
serialization, it does mean that we shouldn't bind ourselves to a particular 
format if there are compelling reasons to do something other.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-28 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-116299142
  
i am not so sure you it is safe to bump the kryo version like that. chill 
0.5.0 doesnt compile against kryo 2.24.0, so what guarantees do you have that 
chill itself even behaves as expected?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-09 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-110419312
  
@JoshRosen I narrowed down the Kryo version problem a bit.  Here's a gist 
that demonstrates the problem:
https://gist.github.com/coolfrood/1ff8c65a2b92fe9292f4

See description at the end of the gist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-09 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-110443903
  
Is there someone from Hive that we can ping to investigate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-109378001
  
To fix the SQL compiler assertion, try merging with `master` then doing a 
clean recompile; any compiler errors that you saw in SQL were likely due to 
incremental compilation issues in the quasiquotes compiler, but we're no longer 
using it in `master` as of a day or two ago.  If you want to run a hive test, 
make sure to pass the `-Phive` flag to enable the Hive profiles.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-05 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-109321223
  
This class is deserialized using Kryo, so it is likely that my trying out a 
different version of Kryo tickled some bug. I investigated this briefly but I 
could not even get Spark to compile; it was failing on a compiler assertion 
while trying to compile Spark SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-04 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-109171615
  
Trying to dig into the stracktrace a bit:, here an excerpt from 
GenericUDTFExplode.java in hive-exec-0.13.1a-sources.jar:

```java
  private transient final Object[] forwardListObj = new Object[1];
  private transient final Object[] forwardMapObj = new Object[2];

  @Override
  public void process(Object[] o) throws HiveException {
switch (inputOI.getCategory()) {
case LIST:
  ListObjectInspector listOI = (ListObjectInspector)inputOI;
  List? list = listOI.getList(o[0]);
  if (list == null) {
return;
  }
  for (Object r : list) {
forwardListObj[0] = r;// --- line 93 that throws NPE
forward(forwardListObj);
  }
  break;
```

Based on the line number, I think that the NPE is being thrown because the 
transient field `forwardListObj` is null`.  Since this field is `transient 
final` I wonder if this class wasn't designed to have its `process` method 
called on deserialized instances.  If that's the case, though, I'm puzzled at 
why we wouldn't see similar behavior with Java serialization.  The problem must 
be a bit deeper and merits more investigation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-03 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-108470313
  
The Hive tests fail in an unexplained manner, and I couldn't learn much by 
looking at the Hive source code for this failure:
```
10:03:48.669 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in 
stage 1673.0 failed 1 times; aborting job
[info] - logical.Project should not be resolved if it contains aggregates 
or generators *** FAILED *** (374 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1673.0 failed 1 times, most recent failure: Lost task 0.0 in 
stage 1673.0 (TID 6462, localhost): java.lang.NullPointerException
[info]  at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDTFExplode.process(GenericUDTFExplode.java:93)
[info]  at 
org.apache.spark.sql.hive.HiveGenericUdtf.eval(hiveUdfs.scala:510)
[info]  at 
org.apache.spark.sql.execution.Generate$$anonfun$3$$anonfun$apply$6.apply(Generate.scala:85)
[info]  at 
org.apache.spark.sql.execution.Generate$$anonfun$3$$anonfun$apply$6.apply(Generate.scala:85)
[info]  at 
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[info]  at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
[info]  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
[info]  at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
[info]  at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:101)
[info]  at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:83)
[info]  at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:83)
[info]  at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
[info]  at org.apache.spark.scheduler.Task.run(Task.scala:73)
[info]  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
[info]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[info]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[info]  at java.lang.Thread.run(Thread.java:745)
```

Can we get some Hive expert to weigh in on this?  Somehow switching Kryo 
from `2.21` to `2.24.0` causes this problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107331992
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107331980
  
  [Test build #33879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33879/consoleFull)
 for   PR 6361 at commit 
[`08c05d1`](https://github.com/apache/spark/commit/08c05d19104f5dbc3200cf217d74deb27e74b525).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SerializableBuffer(@transient var buffer: ByteBuffer)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107596088
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31438382
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoClosureSerializerSuite.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.serializer
+
+import java.io.File
+
+import com.google.common.base.Charsets._
+import com.google.common.io.Files
+
+import org.apache.spark.{SharedSparkContext, SparkFunSuite}
+import org.apache.spark.util.Utils
+
+class KryoClosureSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
+  conf.set(spark.serializer, 
org.apache.spark.serializer.KryoSerializer)
--- End diff --

`classOf[KryoSerializer].getName` is cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31440654
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -126,6 +129,21 @@ private[spark] abstract class Task[T](val stageId: 
Int, var partitionId: Int) ex
   taskThread.interrupt()
 }
   }
+
+  /**
+   * Deserializes the task from the broadcast variable.
+   * If Kryo serialization is being sed, a copy of the buffer is made 
because Kryo deserialization
+   * is not thread-safe w.r.t. the deserialization buffer (see SPARK-7708)
+   */
+  protected[this] def deserialize[T: ClassTag](taskBinary: 
Broadcast[Array[Byte]]): T = {
+val ser = SparkEnv.get.closureSerializer.newInstance()
--- End diff --

I'm hesitant to introduce that sort of global state, so I'd prefer an 
approach that keeps the pool in an `Executor` instance so that we know that it 
gets cleaned up properly.

I think that we can create a pool of `SerializerInstance`s at the top of 
`Executor`, pass the pool into `TaskRunner` via its constructor, then borrow 
instances from the pool in `TaskRunner.run()`.  Specifically, I'm suggesting 
that we make a new class that acts as the pool and implements `borrow()` and 
`release()` methods, since I think that passing the pool interface into 
`TaskRunner` is cleaner / easier to reason about than passing `Executor` itself 
down to the runner.  This class will have to be thread-safe.  It should 
allocate new serializer instances on demand when the pool is empty and an 
instance is requested (this allocation should take place without holding a lock 
/ synchronization).

Maybe Guava has a good standard implementation of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31442435
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializableBuffer.scala ---
@@ -21,12 +21,16 @@ import java.io.{EOFException, IOException, 
ObjectInputStream, ObjectOutputStream
 import java.nio.ByteBuffer
 import java.nio.channels.Channels
 
+import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
+import com.esotericsoftware.kryo.io.{Input, Output}
+
 /**
  * A wrapper around a java.nio.ByteBuffer that is serializable through 
Java serialization, to make
  * it easier to pass ByteBuffers in case class messages.
  */
 private[spark]
-class SerializableBuffer(@transient var buffer: ByteBuffer) extends 
Serializable {
+class SerializableBuffer(@transient var buffer: ByteBuffer)
+  extends Serializable with KryoSerializable {
--- End diff --

`spark.kryo.registrationRequired` -- and it is important to get this right, 
since registered vs. unregistered can make a large difference in the size of 
users' serialized classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107568810
  
Looks like a bunch of SQL tests failed.  Spark SQL uses Kryo serialization 
in some places, so I wonder if the Kryo changes here are causing those failures.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread coolfrood
Github user coolfrood commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31436871
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializableBuffer.scala ---
@@ -21,12 +21,16 @@ import java.io.{EOFException, IOException, 
ObjectInputStream, ObjectOutputStream
 import java.nio.ByteBuffer
 import java.nio.channels.Channels
 
+import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
+import com.esotericsoftware.kryo.io.{Input, Output}
+
 /**
  * A wrapper around a java.nio.ByteBuffer that is serializable through 
Java serialization, to make
  * it easier to pass ByteBuffers in case class messages.
  */
 private[spark]
-class SerializableBuffer(@transient var buffer: ByteBuffer) extends 
Serializable {
+class SerializableBuffer(@transient var buffer: ByteBuffer)
+  extends Serializable with KryoSerializable {
--- End diff --

This is not necessary because Kryo will use its own serialization by 
default.  However, registering with Kryo saves a few bytes by not requiring the 
full class name to be put in. I'll add it to the list of registered classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread coolfrood
Github user coolfrood commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31436725
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -126,6 +129,21 @@ private[spark] abstract class Task[T](val stageId: 
Int, var partitionId: Int) ex
   taskThread.interrupt()
 }
   }
+
+  /**
+   * Deserializes the task from the broadcast variable.
+   * If Kryo serialization is being sed, a copy of the buffer is made 
because Kryo deserialization
+   * is not thread-safe w.r.t. the deserialization buffer (see SPARK-7708)
+   */
+  protected[this] def deserialize[T: ClassTag](taskBinary: 
Broadcast[Array[Byte]]): T = {
+val ser = SparkEnv.get.closureSerializer.newInstance()
--- End diff --

I agree.  I'll take a crack at it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107596275
  
Since Jenkins tests are cheap, let's just try running it one more time...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31438253
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializableBuffer.scala ---
@@ -21,12 +21,16 @@ import java.io.{EOFException, IOException, 
ObjectInputStream, ObjectOutputStream
 import java.nio.ByteBuffer
 import java.nio.channels.Channels
 
+import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
+import com.esotericsoftware.kryo.io.{Input, Output}
+
 /**
  * A wrapper around a java.nio.ByteBuffer that is serializable through 
Java serialization, to make
  * it easier to pass ByteBuffers in case class messages.
  */
 private[spark]
-class SerializableBuffer(@transient var buffer: ByteBuffer) extends 
Serializable {
+class SerializableBuffer(@transient var buffer: ByteBuffer)
+  extends Serializable with KryoSerializable {
--- End diff --

I think that one gotcha is that users can configure Kryo to require that 
classes are registered and throw an error if they're not.

It might be nice to try running some of the basic Kryo jobs with that 
setting enabled (I think it's called something like `registrationRequired`, if 
you want to grep for that).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107598451
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107598988
  
  [Test build #33900 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33900/consoleFull)
 for   PR 6361 at commit 
[`08c05d1`](https://github.com/apache/spark/commit/08c05d19104f5dbc3200cf217d74deb27e74b525).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107598549
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread coolfrood
Github user coolfrood commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31439444
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -126,6 +129,21 @@ private[spark] abstract class Task[T](val stageId: 
Int, var partitionId: Int) ex
   taskThread.interrupt()
 }
   }
+
+  /**
+   * Deserializes the task from the broadcast variable.
+   * If Kryo serialization is being sed, a copy of the buffer is made 
because Kryo deserialization
+   * is not thread-safe w.r.t. the deserialization buffer (see SPARK-7708)
+   */
+  protected[this] def deserialize[T: ClassTag](taskBinary: 
Broadcast[Array[Byte]]): T = {
+val ser = SparkEnv.get.closureSerializer.newInstance()
--- End diff --

How do you feel about implementing a pool in the `Task` object and 
accessing it from `Task`'s `run` method? This way, the serializer instance can 
be handed back into the pool after the task is done executing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107584210
  
The test failures are all Hive related, so it seems unlikely, but I'll test 
them again in my environment to see if they are related.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread coolfrood
Github user coolfrood commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31445014
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializableBuffer.scala ---
@@ -21,12 +21,16 @@ import java.io.{EOFException, IOException, 
ObjectInputStream, ObjectOutputStream
 import java.nio.ByteBuffer
 import java.nio.channels.Channels
 
+import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
+import com.esotericsoftware.kryo.io.{Input, Output}
+
 /**
  * A wrapper around a java.nio.ByteBuffer that is serializable through 
Java serialization, to make
  * it easier to pass ByteBuffers in case class messages.
  */
 private[spark]
-class SerializableBuffer(@transient var buffer: ByteBuffer) extends 
Serializable {
+class SerializableBuffer(@transient var buffer: ByteBuffer)
+  extends Serializable with KryoSerializable {
--- End diff --

I see.  So it is important to make sure that anything that Spark serializes 
with kryo internally is registered with it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107640735
  
  [Test build #33900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33900/consoleFull)
 for   PR 6361 at commit 
[`08c05d1`](https://github.com/apache/spark/commit/08c05d19104f5dbc3200cf217d74deb27e74b525).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SerializableBuffer(@transient var buffer: ByteBuffer)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107640644
  
The Hive tests failed again; I guess this is due to the change of Kryo 
version. I'll dig into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107640778
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-06-01 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31449691
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializableBuffer.scala ---
@@ -21,12 +21,16 @@ import java.io.{EOFException, IOException, 
ObjectInputStream, ObjectOutputStream
 import java.nio.ByteBuffer
 import java.nio.channels.Channels
 
+import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
+import com.esotericsoftware.kryo.io.{Input, Output}
+
 /**
  * A wrapper around a java.nio.ByteBuffer that is serializable through 
Java serialization, to make
  * it easier to pass ByteBuffers in case class messages.
  */
 private[spark]
-class SerializableBuffer(@transient var buffer: ByteBuffer) extends 
Serializable {
+class SerializableBuffer(@transient var buffer: ByteBuffer)
+  extends Serializable with KryoSerializable {
--- End diff --

yes, see eg. the test for registration of HighlyCompressedMapStatus in 
`KryoSerializerSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107315287
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107315344
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107315439
  
  [Test build #33877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33877/consoleFull)
 for   PR 6361 at commit 
[`fbb31a5`](https://github.com/apache/spark/commit/fbb31a5d9212625a1f0f82cc595e8ec157a762ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107315504
  
@JoshRosen I have rebased and updated the PR based on your suggestions and 
the discussion on SPARK-7708. I switched to Kryo version 2.24.0 by excluding it 
from `chill` and adding a direct dependency.

I also added a few test cases, including a suite for end-to-end jobs that 
use `KryoSerializer`. It is certainly not an exhaustive list of tests, and 
there may be things that still break, but I think this is a good start. On my 
end, I'll be running Kryo closure serialization for my use case and I'll be on 
the look out for things that break.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107315613
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107315605
  
  [Test build #33877 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33877/consoleFull)
 for   PR 6361 at commit 
[`fbb31a5`](https://github.com/apache/spark/commit/fbb31a5d9212625a1f0f82cc595e8ec157a762ca).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31401661
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoClosureSerializerSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.serializer
+
+import java.io.File
+
+import com.google.common.base.Charsets._
+import com.google.common.io.Files
+
+import org.scalatest.FunSuite
--- End diff --

You'll have to remove this unused import in order to pass the style checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31401724
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -126,6 +129,21 @@ private[spark] abstract class Task[T](val stageId: 
Int, var partitionId: Int) ex
   taskThread.interrupt()
 }
   }
+
+  /**
+   * Deserializes the task from the broadcast variable.
+   * If Kryo serialization is being sed, a copy of the buffer is made 
because Kryo deserialization
--- End diff --

sed - used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107317718
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread coolfrood
Github user coolfrood commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31401728
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoClosureSerializerSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.serializer
+
+import java.io.File
+
+import com.google.common.base.Charsets._
+import com.google.common.io.Files
+
+import org.scalatest.FunSuite
--- End diff --

Ah yes, I just realized that. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107317726
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-107317880
  
  [Test build #33879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33879/consoleFull)
 for   PR 6361 at commit 
[`08c05d1`](https://github.com/apache/spark/commit/08c05d19104f5dbc3200cf217d74deb27e74b525).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31401808
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -126,6 +129,21 @@ private[spark] abstract class Task[T](val stageId: 
Int, var partitionId: Int) ex
   taskThread.interrupt()
 }
   }
+
+  /**
+   * Deserializes the task from the broadcast variable.
+   * If Kryo serialization is being sed, a copy of the buffer is made 
because Kryo deserialization
+   * is not thread-safe w.r.t. the deserialization buffer (see SPARK-7708)
+   */
+  protected[this] def deserialize[T: ClassTag](taskBinary: 
Broadcast[Array[Byte]]): T = {
+val ser = SparkEnv.get.closureSerializer.newInstance()
--- End diff --

It's actually pretty expensive to construct KryoSerializer instances, so 
this might become a significant bottleneck for short tasks (JavaSerializer 
instances are cheap, on the other hand).  We'll still need to ensure that the 
KryoSerializerInstance is only used in one thread at a time, though.  Maybe we 
can pool instances inside of `Executor` and loan them to tasks by passing them 
in via `runTask`.  Alternatively, we could expose the serializer instance via 
TaskContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31401833
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoClosureSerializerSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.serializer
+
+import java.io.File
+
+import com.google.common.base.Charsets._
+import com.google.common.io.Files
+
+import org.scalatest.FunSuite
+
+import org.apache.spark.{SharedSparkContext, SparkFunSuite}
+import org.apache.spark.util.Utils
+
+
+class KryoClosureSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
+  conf.set(spark.serializer, 
org.apache.spark.serializer.KryoSerializer)
--- End diff --

You can do `classOf[KryoSerializer]` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31401829
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializableBuffer.scala ---
@@ -21,12 +21,16 @@ import java.io.{EOFException, IOException, 
ObjectInputStream, ObjectOutputStream
 import java.nio.ByteBuffer
 import java.nio.channels.Channels
 
+import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
+import com.esotericsoftware.kryo.io.{Input, Output}
+
 /**
  * A wrapper around a java.nio.ByteBuffer that is serializable through 
Java serialization, to make
  * it easier to pass ByteBuffers in case class messages.
  */
 private[spark]
-class SerializableBuffer(@transient var buffer: ByteBuffer) extends 
Serializable {
+class SerializableBuffer(@transient var buffer: ByteBuffer)
+  extends Serializable with KryoSerializable {
--- End diff --

Do we need to register this with Kryo anywhere?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-31 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r31401873
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala ---
@@ -328,6 +334,52 @@ class KryoSerializerSuite extends SparkFunSuite with 
SharedSparkContext {
 assert(!ser2.getAutoReset)
   }
 
+  test(output reuse with JavaSerializer) {
--- End diff --

Since this is a regression test for a pretty subtle bug, maybe we should 
add comments to provide more context for future readers.  Could you update this 
to link to the issue that I opened against Kryo 
(https://github.com/EsotericSoftware/kryo/issues/312)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r30929534
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -55,8 +56,17 @@ private[spark] class ResultTask[T, U](
 // Deserialize the RDD and the func using the broadcast variables.
 val deserializeStartTime = System.currentTimeMillis()
 val ser = SparkEnv.get.closureSerializer.newInstance()
+// Kryo deserialization is not thread-safe w.r.t. underlying buffer
+// create a copy of the buffer if Kryo is being used
+val copy = if 
(SparkEnv.get.closureSerializer.isInstanceOf[KryoSerializer]) {
+val arr = new Array[Byte](taskBinary.value.length)
--- End diff --

I think that using `Arrays.copyOf` here would be a bit more concise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r30932134
  
--- Diff: 
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ---
@@ -188,14 +188,14 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
   }
 
   override def deserialize[T: ClassTag](bytes: ByteBuffer): T = {
-input.setBuffer(bytes.array)
+input.setBuffer(bytes.array, bytes.arrayOffset, bytes.limit + 
bytes.arrayOffset)
--- End diff --

Ah, nice catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r30931857
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -55,8 +56,17 @@ private[spark] class ResultTask[T, U](
 // Deserialize the RDD and the func using the broadcast variables.
 val deserializeStartTime = System.currentTimeMillis()
 val ser = SparkEnv.get.closureSerializer.newInstance()
+// Kryo deserialization is not thread-safe w.r.t. underlying buffer
--- End diff --

I think you had a nice mailing list reference for this.  Maybe put a 
comment with the link here?  I'm always a fan of overly-verbose comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r30931814
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -55,8 +56,17 @@ private[spark] class ResultTask[T, U](
 // Deserialize the RDD and the func using the broadcast variables.
 val deserializeStartTime = System.currentTimeMillis()
 val ser = SparkEnv.get.closureSerializer.newInstance()
+// Kryo deserialization is not thread-safe w.r.t. underlying buffer
+// create a copy of the buffer if Kryo is being used
+val copy = if 
(SparkEnv.get.closureSerializer.isInstanceOf[KryoSerializer]) {
--- End diff --

Here, you might be able to check whether `ser` is an instance of 
`KryoSerializerInstance` instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r30932486
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/SerializableBuffer.scala ---
@@ -20,13 +20,16 @@ package org.apache.spark.util
 import java.io.{EOFException, IOException, ObjectInputStream, 
ObjectOutputStream}
 import java.nio.ByteBuffer
 import java.nio.channels.Channels
+import com.esotericsoftware.kryo.{Kryo, KryoSerializable}
--- End diff --

Super-minor nit, but there should be an extra newline before the `com.*` 
import group here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104784050
  
By the way, I'm very excited to play with Kryo closure serialization 
because I think that it could make a big difference for Spark SQL tasks which 
use code generation, since those tasks tend to contain some large Scala object 
graphs to carry around the information needed for the codegen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread coolfrood
Github user coolfrood commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104786749
  
@JoshRosen, thanks for your comments.  I'll work on these.  Even without 
the tests, I have identified two places where Kryo closure serialization 
doesn't work right now:

1. `HadoopPartition` serialization when working with `sc.textFile`
2. Accumulator values don't get serialized correctly (or at all) on the 
executors.

There's clearly quite a bit more work required.  I think in most cases, 
it's due to custom serialization/deserialization code written for Java 
serialization, which doesn't get invoked when using Kryo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104765939
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104765943
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33357/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104765897
  
  [Test build #33357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33357/consoleFull)
 for   PR 6361 at commit 
[`4c39985`](https://github.com/apache/spark/commit/4c39985ad3cc49faea3d4384514d677c1cf82cd7).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r30932075
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -58,8 +59,18 @@ private[spark] class ShuffleMapTask(
 // Deserialize the RDD using the broadcast variable.
 val deserializeStartTime = System.currentTimeMillis()
 val ser = SparkEnv.get.closureSerializer.newInstance()
+// Kryo deserialization is not thread-safe w.r.t. underlying buffer
+// create a copy of the buffer if Kryo is being used
+val copy = if 
(SparkEnv.get.closureSerializer.isInstanceOf[KryoSerializer]) {
--- End diff --

I suppose that there's some code duplication between ShuffleMapTask and 
ResultTask, but maybe that's not easily avoidable.  We might be able to factor 
the deserialization code into a method in `Task` and have it be parametrized by 
the type passed to `ser.deserialize`. Maybe you could try doing this to see if 
it's possible, since it would be way cleaner?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104782414
  
One high-level comment: is there any way to add tests for this?  Since none 
of the old tests failed before this patch, I'm concerned that we won't be able 
to keep this from breaking in the future.  Could you maybe add a new regression 
test or suite which tries to run an end-to-end job that uses Kryo as the 
closure serializer and verify that this test failed with the old code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6361#discussion_r30931924
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala 
---
@@ -55,8 +56,17 @@ private[spark] class ResultTask[T, U](
 // Deserialize the RDD and the func using the broadcast variables.
 val deserializeStartTime = System.currentTimeMillis()
 val ser = SparkEnv.get.closureSerializer.newInstance()
+// Kryo deserialization is not thread-safe w.r.t. underlying buffer
--- End diff --

Also would be fine to put a JIRA reference, e.g. (see SPARK-7708)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread coolfrood
GitHub user coolfrood opened a pull request:

https://github.com/apache/spark/pull/6361

[SPARK-7708] [Core] [WIP] Fixes for Kryo closure serialization

This PR partially fixes the use of Kryo serialization for closures. It is 
not complete, but I would like to discuss if this is the right approach.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/coolfrood/spark 
topic/kryo-closure-serialization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6361.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6361


commit 4c39985ad3cc49faea3d4384514d677c1cf82cd7
Author: Akshat Aranya aara...@quantcast.com
Date:   2015-05-22T18:35:50Z

Fixes for Kryo closure serialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104743543
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104745496
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104745906
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104746051
  
  [Test build #33357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33357/consoleFull)
 for   PR 6361 at commit 
[`4c39985`](https://github.com/apache/spark/commit/4c39985ad3cc49faea3d4384514d677c1cf82cd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7708] [Core] [WIP] Fixes for Kryo closu...

2015-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6361#issuecomment-104745935
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org