Github user ilganeli closed the pull request at:
https://github.com/apache/spark/pull/3518
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is en
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-71299062
Hey @pwendell - not a problem. The solutions are similar but Reynold's has
fewer moving parts. I appreciate the recognition.
---
If your project is set up for it, you
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70399407
BTW - my apologies for marking this as a starter task, it turned out to be
more complicated. We can credit you for having worked on the feature as well.
---
If your pro
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70399358
Hey so it looks like while I was reviewing this patch @rxin actually ran
into this and just wrote a fix himself (#4093). That fix is actually even
simpler than what I wa
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70394250
Hey @ilganeli - I took a slightly deeper look this time. I still don't
totally follow how this all hooks together, but I wonder if it's possible to
write a single utilit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70198318
[Test build #25624 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25624/consoleFull)
for PR 3518 at commit
[`5b93dc1`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70198323
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70192878
[Test build #25624 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25624/consoleFull)
for PR 3518 at commit
[`5b93dc1`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70188734
[Test build #25620 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull)
for PR 3518 at commit
[`1d2d563`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70188737
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70188638
[Test build #25620 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25620/consoleFull)
for PR 3518 at commit
[`1d2d563`](https://githu
Github user ilganeli commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r23052373
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -789,6 +792,44 @@ class DAGScheduler(
}
}
+ /**
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70180151
Hi Patrick - thanks for the feedback. I would love to print out the names
of the fields but I wasn't able to figure out a way to do that - could you
suggest how?
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-70179454
Hey just took a quick pass with some code style suggestions (more coming)
and usability suggestions. One thing, would it be possible to track the name of
the fields you
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r23051282
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -459,7 +459,23 @@ private[spark] class TaskSetManager(
}
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r23051221
--- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r23051146
--- Diff: core/src/main/scala/org/apache/spark/util/ObjectWalker.scala ---
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r23051021
--- Diff:
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,308 @@
+/*
+ * Licensed to the Apache Software Foundatio
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r23050901
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -827,9 +868,21 @@ class DAGScheduler(
// might modify state of ob
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r23050776
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -789,6 +792,44 @@ class DAGScheduler(
}
}
+ /**
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-69604747
[Test build #25420 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25420/consoleFull)
for PR 3518 at commit
[`a32f0ac`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-69604755
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-69591920
Hi @JoshRosen, #3638 has now been merged and I've resolved the minor merge
conflicts and pushed the updates. If you could please review this at your
convenience, I'd lov
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-69591770
[Test build #25420 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25420/consoleFull)
for PR 3518 at commit
[`a32f0ac`](https://githu
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-68312213
Hi @JoshRosen - just checking in to make sure things are moving on #3638
since it's a blocker to this patch. Please let me know how that's going, looks
to be almost comp
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67560952
[Test build #24598 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24598/consoleFull)
for PR 3518 at commit
[`8e5f710`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67560956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67559909
[Test build #24597 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24597/consoleFull)
for PR 3518 at commit
[`07142ce`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67559915
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67549171
[Test build #24598 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24598/consoleFull)
for PR 3518 at commit
[`8e5f710`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67548390
[Test build #24597 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24597/consoleFull)
for PR 3518 at commit
[`07142ce`](https://githu
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67523669
Great - thanks, Josh. I'm working on doing a bit more code cleanup in the
mean-time to minimize touch points within the existing Spark classes.
---
If your project is s
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67520645
This looks pretty neat; I'll try to review this soon (a little busy right
now), but in the meantime you might be interested in #3638 which has some small
overlap in the
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67464113
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67464102
[Test build #24577 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24577/consoleFull)
for PR 3518 at commit
[`bb5f700`](https://gith
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67455654
Hi @JoshRosen, I have made the updates we've discussed. Example output is
shown below for two cases of unserializable RDDs. The first is when an
individual RDD is unseri
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67455620
[Test build #24577 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24577/consoleFull)
for PR 3518 at commit
[`bb5f700`](https://githu
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-67070557
Hi @JoshRosen , I think I've finally understood what you've been saying
from the beginning (apologies for being slow). I haven't been thinking
correctly about what is go
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21786759
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -458,6 +458,17 @@ private[spark] class TaskSetManager(
v
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21786754
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -873,6 +943,10 @@ class DAGScheduler(
// We've already seriali
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21786752
--- Diff:
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundati
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21786744
--- Diff: docs/configuration.md ---
@@ -517,6 +517,14 @@ Apart from these, the following properties are also
available, and may be useful
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66868725
IMO, this patch still needs a lot of work before it will be ready to merge.
I'm not convinced that telling me which RDD referenced the unserializable
object, by itself
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66658459
Hi @JoshRosen - with the updates I've made is this ok to merge?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66158890
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66158875
[Test build #24224 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24224/consoleFull)
for PR 3518 at commit
[`ef3dd39`](https://gith
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66145560
Jenkins, this is ok to test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66145968
[Test build #24224 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24224/consoleFull)
for PR 3518 at commit
[`ef3dd39`](https://githu
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-66144913
Hi @JoshRosen - can I please get this run through Jenkins? Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
GitHub user ilganeli reopened a pull request:
https://github.com/apache/spark/pull/3518
[SPARK-3694] RDD and Task serialization debugging output
Hi all - in addition to what was explicitly requested in the original JIRA,
I also added the ability to have a trace of the serialization
Github user ilganeli closed the pull request at:
https://github.com/apache/spark/pull/3518
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is en
Github user ilganeli commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21102635
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -458,6 +458,20 @@ private[spark] class TaskSetManager(
va
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-65008009
If the goal is to trace the object graph while it's being serialized would
that require modification of the serializer itself? How else would we get into
the guts of wha
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-65001985
> Thus, the serialization trace tells us which specific RDD is
unserializable and this secondary output shows how it's related to the parent
RDD.
My concern wa
Github user ilganeli commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-65001161
Thanks for the quick review Josh, I'll look into refactoring the search
step into a separate component. With regards to your first comment, I display
two types of debugg
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-64996599
Formatting nits aside, a couple of higher-level comments:
If I understand this patch correctly, it only tells me which RDD contains
the non-serializable object
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062580
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -244,6 +244,133 @@ class DAGSchedulerSuite extends
TestKit(ActorSys
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062539
--- Diff:
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundati
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062535
--- Diff:
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundati
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062530
--- Diff:
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundati
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062522
--- Diff:
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundati
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062519
--- Diff:
core/src/main/scala/org/apache/spark/util/SerializationHelper.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundati
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062491
--- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062475
--- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062458
--- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062466
--- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062448
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -458,6 +458,20 @@ private[spark] class TaskSetManager(
v
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062454
--- Diff: core/src/main/scala/org/apache/spark/util/RDDWalker.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062433
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -458,6 +458,20 @@ private[spark] class TaskSetManager(
v
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062428
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -826,9 +888,23 @@ class DAGScheduler(
// might modify state of o
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062420
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -826,9 +888,23 @@ class DAGScheduler(
// might modify state of o
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062416
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -826,9 +888,23 @@ class DAGScheduler(
// might modify state of o
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062406
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -788,6 +793,63 @@ class DAGScheduler(
}
}
+ /*
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062401
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -788,6 +793,63 @@ class DAGScheduler(
}
}
+ /*
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062393
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -788,6 +793,63 @@ class DAGScheduler(
}
}
+ /*
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062385
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -788,6 +793,63 @@ class DAGScheduler(
}
}
+ /*
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062377
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -788,6 +793,63 @@ class DAGScheduler(
}
}
+ /*
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062380
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -788,6 +793,63 @@ class DAGScheduler(
}
}
+ /*
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3518#discussion_r21062370
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -788,6 +793,63 @@ class DAGScheduler(
}
}
+ /*
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-64995074
I'm going to let Jenkins test this, but my hunch is that the first run is
going to fail due to Scalastyle warnings / errors. I'll comment on a couple of
these style po
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3518#issuecomment-64994713
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user ilganeli opened a pull request:
https://github.com/apache/spark/pull/3518
[SPARK-3694] RDD and Task serialization debugging output
Hi all - in addition to what was explicitly requested in the original JIRA,
I also added the ability to have a trace of the serialization fo
82 matches
Mail list logo