Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@HyukjinKwon makes sense - I'll try to get a design doc put together as
soon as I can, but my "day job" is preventing me from working on it right now.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21322
Shall we leave this PR closed and start it from a design doc? Let me
suggest to close this for now while I am looking through old PRs.
@JeetKunDoug, please feel free to create a clone
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21322
I'd like to recommend
http://apache-spark-developers-list.1001551.n3.nabble.com/SPIP-SPARK-25728-Structured-Intermediate-Representation-Tungsten-IR-for-generating-Java-code-td25370.html
---
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
Sounds good - it may take me a while to get the time to work this up, but
I'll get something put together and attached to the underlying issue as soon as
I can. If possible, can you point me at
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21322
I think a design doc is better, to make sure we are on the same page before
the actual coding, which saves time.
---
-
To
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan yes, essentially... the only thing I'd add is that it's bound to
"whatever scope the driver sees fit to bind it to", if that makes sense. In my
particular case, it happens to be a
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21322
Ah i see. It's kind of a broadcast but has a much small scope and its
lifecycle is bound to a stage. I'm looking forward to a full design of it,
thanks!
---
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan The use-case we have in mind (and are currently using Broadcast
+ finalizers for) is the case where you have, for example, a connection pool
that is, for one reason or another,
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21322
Introducing a new concept which is similar to broadcast seems like an
overkill. We can just update broadcast, to allow it to be memory-only.
However, there might be simpler solutions to
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan (Also left on the JIRA ticket) - Sorry this has dropped off my
radar for so long - work + life took me away from it for a while. So looking at
the PR review comments and better
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan So I see what's been messing with me - it's the whole Broadcast
variable cache/week reference thing. I originally wrote this internally for
Spark 1.6.3 (still supporting some older
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21322
IIUC it's impossible to clean up broadcasted object in `MemoryStore`. The
life cycle of the broadcasted object is:
1. The first task that tries to get the broadcasted object will read bytes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90824/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90824 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90824/testReport)**
for PR 21322 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90824 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90824/testReport)**
for PR 21322 at commit
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/21322
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90796/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90796 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90796/testReport)**
for PR 21322 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90796 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90796/testReport)**
for PR 21322 at commit
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan as to eviction, it ends up calling
`BlockManager.dropFromMemory`, which will in fact call `MemoryStore.remove` and
do the cleanup appropriately, so that should already be handled
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21322
according to `TorrentBroadcast.writeBlocks` and
`TorrentBroadcast.readBroadcastBlock`, broadcasted data are always written into
the block manager with `MEMORY_AND_DISK_SER` mode. When a task
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90652/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90652 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90652/testReport)**
for PR 21322 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90651/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90651 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90651/testReport)**
for PR 21322 at commit
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan I added the above-mentioned check for `isBroadcast` and only
release resources if it's a broadcast ID. This will affect the MapOutputTracker
as well, but I think it's a pretty
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90652 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90652/testReport)**
for PR 21322 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90651 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90651/testReport)**
for PR 21322 at commit
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan So your suggestion makes sense - it seems like the best path
forward is to check the `isBroadcast` flag on the BlockId passed in to
`MemoryStore.remove` and release resources only if
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/21322
@JeetKunDoug The same issue we discuss above. I think if there's a
deserialized version of the variable, it can be not in `MemoryStore` but only
serialized bytes in disk store.
The reason
Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@viirya it seems from my admittedly cursory look at where we use the
`cachedValues` reference map that we should be OK in this case - if there's a
deserialized version of the variable (the only
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/21322
One concern I have is, now we keep broadcasted variables in
`BroadcastManager.cachedValues` by using weak reference. So if a broadcasted
variable with `AutoCloseable` is released before we call
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21322
I think users are responsible to call `Broadcast#destroy`, which unpersist
broadcast blocks from block manager and run user-defined driver side cleanup.
It is a valid use case to allow
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90599/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21322
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21322
**[Test build #90599 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90599/testReport)**
for PR 21322 at commit
42 matches
Mail list logo