Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-101934663
Okay - I officially give in and will not comment further on this thread :P
On Tue, May 12, 2015 at 10:12 PM, Sandy Ryza
wrote:
> Is that ok with yo
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-101515251
Is that ok with you as well @pwendell ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project doe
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-100681944
I'm fine putting them in the same class too, provided that everything else
is marked as private (I think it is already, isn't it?)
---
If your project is set up for it, yo
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-100676458
@pwendell @rxin if we're putting it in the same package, I don't really
understand why we don't just have a single class `SizeEstimator` with a single
public method `estim
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99345602
@sryza do you mind doing the honors as an part 2 for this issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99345510
I'm okay to nest it under util, per Reynold's suggesting.
On Wed, May 6, 2015 at 7:33 AM, Reynold Xin
wrote:
> But we already have a bunch of stuff
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99342566
But we already have a bunch of stuff in util that's public ...
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.package
---
If your proje
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99342422
I think @pwendell 's sentiment was that `util` shouldn't have public
classes/objects -- what do you think Patrick? as an interim measure?
---
If your project is set up fo
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99341873
I think it's a good idea to do that. For now why don't we move
SizeEstimator back into util, rename the current util.SizeEstimator
util.SizeEstimator0?
For 1.5 we c
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99337748
@rxin yeah I hear you. It's not too late to move this. I struggle a bit
with where to put this then. So you mean move `util` to `util.internal` or
something, and move publ
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99228255
Actually I strongly oppose putting this in the top level package. We will
end up with a lot of random util objects or top level classes at the top.
One way to do thi
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/3913
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99048661
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99048583
[Test build #31864 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31864/consoleFull)
for PR 3913 at commit
[`8d9e082`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99048668
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/318
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99015052
[Test build #31864 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31864/consoleFull)
for PR 3913 at commit
[`8d9e082`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99014822
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99014803
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99014396
LGTM. I think the test failure was unrelated; let's see.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-99014453
Jenkins retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98903952
[Test build #31806 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31806/consoleFull)
for PR 3913 at commit
[`8d9e082`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98903962
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/318
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98903961
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98903514
Updated with the suggested approach. Let me know what you think.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98894483
[Test build #31806 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31806/consoleFull)
for PR 3913 at commit
[`8d9e082`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98894338
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98894315
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98409232
@sryza do you want to try to update this and get it in for 1.4? might still
technically be time
---
If your project is set up for it, you can reply to this email and have
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98071817
I'm good with the new idea you proposed. We keep `util.SizeEstimator` as
is, but we have a public class in the root namespace `SizeEstimator` that just
has a single meth
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98068701
Hm, my gut is that it's stranger to put this in the context when it has
nothing to do with the context. It's all private except for the one method.
Lots of accidents can h
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98039213
Hey @sryza and @srowen, sorry to vacillate here, but after looking at this
more I really do think it would be better to just make this a static method on
`SparkContext`.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98015549
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98015554
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/314
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98015505
[Test build #31476 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31476/consoleFull)
for PR 3913 at commit
[`35c974c`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98000635
[Test build #31476 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31476/consoleFull)
for PR 3913 at commit
[`35c974c`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98000517
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-98000468
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97679846
Pending an update LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97676897
Looking good though this needs a rebase now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97618302
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/313
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97618293
[Test build #31334 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31334/consoleFull)
for PR 3913 at commit
[`8adde39`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97618300
Build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does no
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97594129
[Test build #31334 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31334/consoleFull)
for PR 3913 at commit
[`8adde39`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97593664
Build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-97593702
Build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fe
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-94886124
Sure, sounds fine to me. We can have a static method for it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as we
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93816860
OK how about just moving it up a level out of `util`, if that's generally
not public, and making `SizeEstimator` public but a developer API? You have to
commit to the meth
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93806915
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93806900
[Test build #30423 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30423/consoleFull)
for PR 3913 at commit
[`4aba898`](https://gith
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93786580
The reason I proposed to put in SparkContext is to avoid committing to the
current namespace/package of that object and just expose a narrower utility
function off of Sp
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93785353
@pwendell thought this would be preferable:
> One thing we could do that would be more isolated is have a function in
SparkContext called estimateSizeOf(object: Any
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93783833
OK, I think my remaining comment is just: why route through `SparkContext`?
can this method be exposed where it is? It doesn't seem like it has to be tied
to the context.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93782611
[Test build #30423 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30423/consoleFull)
for PR 3913 at commit
[`4aba898`](https://githu
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93487382
Again, one of the main uses is estimating the size of variables you're
considering broadcasting. Another is experimenting with different
representations - e.g. how much mo
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/3913#discussion_r28439199
--- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala ---
@@ -28,6 +28,7 @@ import java.util.concurrent.ConcurrentHashMap
import scala.co
Github user squito commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93486450
I think this is a good change. Yes, you could cache an RDD and see its
size, but think about what a pain that actually is if you wanted to do it
programmatically. You'd
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-93384261
I'd like to resolve this one way or the other. My hesitation is mostly
about tacking on another method to `SparkContext`, developer API or no. Would
it really be a better
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-73991556
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-73991553
[Test build #27305 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27305/consoleFull)
for PR 3913 at commit
[`62bac82`](https://gith
Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-73981654
The size estimator should be pretty accurate for measuring the size of
small Java objects. You could add a note that we do some approximation for
large arrays (we sample
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-73980273
[Test build #27305 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27305/consoleFull)
for PR 3913 at commit
[`62bac82`](https://githu
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-73963012
Adding an `estimateSizeOf` method to `SparkContext` sounds reasonable to me.
I agree that there's not a great way to expose something like this for
Python. But I do
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-73857224
Should this just turn into a doc update PR then?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pr
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-71553196
Just not sure how overall useful it would be. For RDD data, it might be
slightly misleading here because of things like serialization in-memory. For
broadcast objects in
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-68936032
Ooh OK I'll update the doc. That's still a little cumbersome though for
someone who just wants to see how much space an object takes up. Most of the
recommendations on th
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-68933612
[Test build #25109 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25109/consoleFull)
for PR 3913 at commit
[`29fa503`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-68933621
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-68933389
That doc is very outdated - you can actually just look in the UI after
caching some data, you don't need to visit the logs.
---
If your project is set up for it, you ca
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3913#issuecomment-68923630
[Test build #25109 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25109/consoleFull)
for PR 3913 at commit
[`29fa503`](https://githu
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/3913
SPARK-5112. Expose SizeEstimator as a developer api
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-5112
Alternatively yo
70 matches
Mail list logo