I hadn't turned on codegen. I enabled it and ran it again, it is running
4-5 times faster now! :)
Since my log statements are no longer appearing, I presume the code path
seems quite different from the earlier hashmap related stuff in
Aggregates.scala?
Pramod
On Wed, May 20, 2015 at 9:18 PM,
Yup it is a different path. It runs GeneratedAggregate.
On Wed, May 20, 2015 at 11:43 PM, Pramod Biligiri pramodbilig...@gmail.com
wrote:
I hadn't turned on codegen. I enabled it and ran it again, it is running
4-5 times faster now! :)
Since my log statements are no longer appearing, I
Honestly, given the length of my email, I didn't expect a reply. :-) Thanks
for reading and replying. However, I have a follow-up question:
I don't think if I understand the block replication completely. Are the
blocks replicated immediately after they are received by the receiver? Or
are they
Thanks Akhil, Ryan!
@Akhil: YARN can only tell me how much vcores my app has been granted but
not actual cpu usage, right? Pulling mem/cpu usage from the OS means i need
to map JVM executor processes to the context they belong to, right?
@Ryan: what a great blog post -- this is super relevant
Looks like somehow the file size reported by the FSInputDStream of
Tachyon's FileSystem interface, is returning zero.
On Mon, May 11, 2015 at 4:38 AM, Dibyendu Bhattacharya
dibyendu.bhattach...@gmail.com wrote:
Just to follow up this thread further .
I was doing some fault tolerant testing
Some examples to illustrate my point. A couple of issues from the oldest
open issues
in the SQL component:
[SQL] spark-sql exits while encountered an error
https://issues.apache.org/jira/browse/SPARK-4572
This is an incomplete report that nobody can take action on. It can be
resolved
as
We have not start to prototype the vectorized one yet, will evaluated
in 1.5 and may targeted for 1.6.
We're glad to hear some feedback/suggestions/comments from your side!
On Thu, May 21, 2015 at 9:37 AM, Yijie Shen henry.yijies...@gmail.com wrote:
Hi all,
I’ve seen the Blog of Project
On Thu, May 21, 2015 at 9:06 PM, Santiago Mola sm...@stratio.com wrote:
Inactive - A feature or bug that has had no activity from users or
developers in a long time
Why is this needed? Every JIRA listing can be sorted by activity. That gets
the inactive ones out of your view quickly. I do not
2015-05-12 9:50 GMT+02:00 Patrick Wendell pwend...@gmail.com:
Inactive - A feature or bug that has had no activity from users or
developers in a long time
Why is this needed? Every JIRA listing can be sorted by activity. That gets
the inactive ones out of your view quickly. I do not see any
On Thu, May 21, 2015 at 10:03 PM, Santiago Mola sm...@stratio.com wrote:
Sure. That is why I was talking about the Inactive resolution specifically.
The
combination of Priority + other statuses are enough to solve these issues. A
minor/trivial issue that is incomplete is probably not going to
Thank you Ram and Joseph.
I am also hoping to contribute to MLib once my Scala gets up to snuff, this
is the guidance I needed for how to proceed when ready.
Best wishes,
Trevor
On Wed, May 20, 2015 at 1:55 PM, Joseph Bradley jos...@databricks.com
wrote:
Hi Trevor,
I may be repeating what
Yes Peter that's correct, you need to identify the processes and with that
you can pull the actual usage metrics.
Thanks
Best Regards
On Thu, May 21, 2015 at 2:52 PM, Peter Prettenhofer
peter.prettenho...@gmail.com wrote:
Thanks Akhil, Ryan!
@Akhil: YARN can only tell me how much vcores my
Hi Pramod
Is your data compressed? I encountered similar problem,however, after turned
codegen on, the GC time was still very long.The size of input data for my map
task is about 100M lzo file.
My query is select ip, count(*) as c from stage_bitauto_adclick_d group by ip
sort by c limit 100
On Thu, May 21, 2015 at 5:22 AM Peter Prettenhofer
peter.prettenho...@gmail.com wrote:
Thanks Akhil, Ryan!
@Akhil: YARN can only tell me how much vcores my app has been granted but
not actual cpu usage, right? Pulling mem/cpu usage from the OS means i need
to map JVM executor processes to
I’m trying to understand why Sbt is configured to pull all libs under
lib_managed.
- it seems like unnecessary duplication (I will have those libraries
under ./m2, via maven anyway)
- every time I call make-distribution I lose lib_managed (via mvn clean
install) and have to wait to
This is an excellent discussion. As mentioned in an earlier
email, we agree with a number of Chester's suggestions, but we
have yet other concerns. I've researched this further in the past
several days, and I've queried my team. This email attempts to
Hi Kevin,
I read through your e-mail and I see two main things you're talking about.
- You want a public YARN Client class and don't really care about
anything else.
In you message you already mention why that's not a good idea. It's much
better to have a standardized submission API. As you
Hi Zhang,
No my data is not compressed. I'm trying to minimize the load on the CPU.
The GC time reduced for me after codegen.
Pramod
On Thu, May 21, 2015 at 3:43 AM, zhangxiongfei zhangxiongfei0...@163.com
wrote:
Hi Pramod
Is your data compressed? I encountered similar problem,however,
In researching and discussing these issues with Cloudera and others, we've
been told that only one mechanism is supported for starting Spark jobs: the
*spark-submit* scripts.
Is this new? We've been submitting jobs directly from a programatically
created spark context (instead of through
see discussions about Spark not really liking multiple contexts in the
same JVM
Speaking of this - is there a standard way of writing unit tests that
require a SparkContext?
We've ended up copying out the code of SharedSparkContext to our own
testing hierarchy, but it occurs to me someone
Hi Nathan,
On Thu, May 21, 2015 at 7:30 PM, Nathan Kronenfeld
nkronenfeld@uncharted.software wrote:
In researching and discussing these issues with Cloudera and others,
we've been told that only one mechanism is supported for starting Spark
jobs: the *spark-submit* scripts.
Is this new?
Thanks, Marcelo
Instantiating SparkContext directly works. Well, sorta: it has
limitations. For example, see discussions about Spark not really liking
multiple contexts in the same JVM. It also does not work in cluster
deploy mode.
That's fine - when one is doing something out of
we also launch jobs programmatically, both on standalone mode and
yarn-client mode. in standalone mode it always worked, in yarn-client mode
we ran into some issues and were forced to use spark-submit, but i still
have on my todo list to move back to a normal java launch without
spark-submit at
Hi Tathagata,
Thanks for looking into this. Further investigating I found that the issue
is with Tachyon does not support File Append. The streaming receiver which
writes to WAL when failed, and again restarted, not able to append to same
WAL file after restart.
I raised this with Tachyon user
It is just 15 lines of code to copy, isn't it?
On Thu, May 21, 2015 at 7:46 PM, Nathan Kronenfeld
nkronenfeld@uncharted.software wrote:
see discussions about Spark not really liking multiple contexts in the
same JVM
Speaking of this - is there a standard way of writing unit tests that
Hi all,
I’ve seen the Blog of Project Tungsten here, it sounds awesome to me!
I’ve also noticed there is a plan to change the code generation from
record-at-a-time evaluation to a vectorized one, which interests me most.
What’s the status of vectorized evaluation? Is this an inner effort of
Hi,
Is there some way to customize the Akka configuration for Spark?
Specifically, I want to experiment with custom serialization for messages
that are send between the driver and executors in standalone mode.
Thanks,
Akshat
27 matches
Mail list logo