Hi Patrick,
We left the details of the configuration of Spark that we used out of the
blog post for brevity, but we're happy to share them. We've done quite a
bit of tuning to find the configuration settings that gave us the best
query times and run the most queries. I think there might still be
Hello all – I am working on https://issues.apache.org/jira/browse/SPARK-3694
and would like to understand the appropriate mechanism by which to check for a
debug flag before printing a graph traversal of dependencies of an RDD or Task.
I understand that I can use the logging utility and use
Yeah, the code looks for the file in the source location, not in the
packaged location. It's in the root of the examples jar; you can
extract it to src/main/resources/
kv1.txt in the local directory (creating the subdirs) and then you
can run the example.
Probably should be fixed though (bonus if
Hi all,
We are excited to announce that the benchmark entry has been reviewed by
the Sort Benchmark committee and Spark has officially won the Daytona
GraySort contest in sorting 100TB of data.
Our entry tied with a UCSD research team building high performance systems
and we jointly set a new
On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
I believe that benchmark has a pending certification on it. See
http://sortbenchmark.org under Process.
Regarding this comment, Reynold has just announced that this benchmark is
now certified.
-
Steve Nunez, I believe the information behind the links below should
address your concerns earlier about Databricks's submission to the Daytona
Gray benchmark.
On Wed, Nov 5, 2014 at 6:43 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
On Fri, Oct 31, 2014 at 3:45 PM, Nicholas Chammas
Congrats to everyone who helped make this happen. And if anyone has even more
machines they'd like us to run on next year, let us know :).
Matei
On Nov 5, 2014, at 3:11 PM, Reynold Xin r...@databricks.com wrote:
Hi all,
We are excited to announce that the benchmark entry has been
Steve,
I wouldn't say Hadoop MR is a 2001 Toyota Celica :) In either case, I
updated the blog post to actually include CPU / disk / network measures.
You should see that in any measure that matters to this benchmark, the old
2100 node cluster is vastly superior. The data even fit in memory!
On
Steve,
Your original comment was about the *reproducibility* of the benchmark,
which I was responding to. No one is suggesting you doubt the authenticity
or results of the benchmark.
For which no details or code have been released to allow others to
reproduce it. I would encourage anyone doing
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well as
call for an official vote on it on a public list. Basically, as the Spark
project scales up, we need to define a model to make sure there is still great
oversight of key components (in particular internal
Hi Matei,
Definitely in favor of moving into this model for exactly the reasons
you mentioned.
From the module list though, the module that I'm mostly involved with
and is not listed is the Mesos integration piece.
I believe we also need a maintainer for Mesos, and I wonder if there
is someone
+1 (binding)
On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
BTW, my own vote is obviously +1 (binding).
Matei
On Nov 5, 2014, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC
+1 (binding)
We are already doing this implicitly. In my experience, this can create
longer term personal commitment, which usually leads to better design
decisions if somebody knows they would need to look after something for a
while.
On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia
+1, with a question
Will these maintainers have a cleanup for those pending PRs upon we start to
apply this model? there are some patches always being there but haven’t been
merged, some of which are periodically maintained (rebase, ping , etc….), the
others are just phased out
Best,
--
Hi Tim,
We can definitely add one for that if the component grows larger or becomes
harder to maintain. The main reason I didn't propose one is that the Mesos
integration is actually a lot simpler than YARN at the moment, partly because
we support several YARN versions that have incompatible
This seems like a good idea.
An area that wasn't listed, but that I think could strongly benefit from
maintainers, is the build. Having consistent oversight over Maven, SBT,
and dependencies would allow us to avoid subtle breakages.
Component maintainers have come up several times within the
I'm a +1 on this as well, I think it will be a useful model as we
scale the project in the future and recognizes some informal process
we have now.
To respond to Sandy's comment: for changes that fall in between the
component boundaries or are straightforward, my understanding of this
model is
+1
2014-11-05 18:08 GMT-08:00 Patrick Wendell pwend...@gmail.com:
I'm a +1 on this as well, I think it will be a useful model as we
scale the project in the future and recognizes some informal process
we have now.
To respond to Sandy's comment: for changes that fall in between the
+1, Sounds good.
Now I know whom to ping for what, even if I did not follow the whole
history of the project very carefully.
Prashant Sharma
On Thu, Nov 6, 2014 at 7:01 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC
Yup, the Hadoop nodes were from 2013, each with 64 GB RAM, 12 cores, 10 Gbps
Ethernet and 12 disks. For 100 TB of data, the intermediate data could fit in
memory on this cluster, which can make shuffle much faster than with
intermediate data on SSDs. You can find the specs in
As part of my work for SPARK-3821
https://issues.apache.org/jira/browse/SPARK-3821, I tried building an AMI
today using create_image.sh.
This line
https://github.com/mesos/spark-ec2/blob/f6773584dd71afc49f1225be48439653313c0341/create_image.sh#L68
appears to be broken now (it wasn’t a week or so
Have you seen this thread ?
http://search-hadoop.com/m/LgpTk2Pnw6O/andrew+apache+mirrorsubj=Re+All+mirrored+download+links+from+the+Apache+Hadoop+site+are+broken
Cheers
On Wed, Nov 5, 2014 at 7:36 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
As part of my work for SPARK-3821
+1 (binding)
On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
+1 on this proposal.
On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Will these maintainers have a cleanup for those pending PRs upon we start
to apply this model?
I
Hi everyone,
I¹m running into a strange class loading issue when running a Spark job,
using Spark 1.0.2.
I¹m running a process where some Java code is compiled dynamically into a
jar and added to the Spark context via addJar(). It is also added to the
class loader of the thread that created the
+1 (binding)
On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra m...@clearstorydata.com wrote:
+1 (binding)
On Wed, Nov 5, 2014 at 6:29 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
+1 on this proposal.
On Wed, Nov 5, 2014 at 8:55 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Will these
Nope, thanks for pointing me to it.
Doesn't look like there is a resolution to the issue. Also, the like you
pointed to also appears to be broken now:
http://apache.mesi.com.ar/hadoop/common/
Nick
On Wed, Nov 5, 2014 at 10:43 PM, Ted Yu yuzhih...@gmail.com wrote:
Have you seen this thread ?
The artifacts are in archive:
http://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/
Cheers
On Nov 5, 2014, at 8:07 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote:
Nope, thanks for pointing me to it.
Doesn't look like there is a resolution to the issue. Also, the like you
Yup, I just stumbled on that. I'll submit a PR to fix that link. Thanks Ted.
On Wed, Nov 5, 2014 at 11:13 PM, Ted Yu yuzhih...@gmail.com wrote:
The artifacts are in archive:
http://archive.apache.org/dist/hadoop/common/hadoop-2.4.1/
Cheers
On Nov 5, 2014, at 8:07 PM, Nicholas Chammas
+1
发自我的 iPhone
在 2014年11月5日,20:06,Denny Lee denny.g@gmail.com 写道:
+1 great idea.
On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng men...@gmail.com wrote:
+1 (binding)
On Wed, Nov 5, 2014 at 7:52 PM, Mark Hamstra m...@clearstorydata.com
wrote:
+1 (binding)
On Wed, Nov 5, 2014 at
+1 since this is already the de facto model we are using.
On Thu, Nov 6, 2014 at 12:40 PM, Wangfei (X) wangf...@huawei.com wrote:
+1
发自我的 iPhone
在 2014年11月5日,20:06,Denny Lee denny.g@gmail.com 写道:
+1 great idea.
On Wed, Nov 5, 2014 at 20:04 Xiangrui Meng men...@gmail.com wrote:
Great idea! +1
— Jeremy
-
jeremyfreeman.net
@thefreemanlab
On Nov 5, 2014, at 11:48 PM, Timothy Chen tnac...@gmail.com wrote:
Matei that makes sense, +1 (non-binding)
Tim
On Wed, Nov 5, 2014 at 8:46 PM, Cheng Lian lian.cs@gmail.com wrote:
+1 since this is
+1, that definitely will speeds up the PR reviewing / merging.
-Original Message-
From: Cheng Lian [mailto:lian.cs@gmail.com]
Sent: Thursday, November 6, 2014 12:46 PM
To: dev
Subject: Re: [VOTE] Designating maintainers for some Spark components
+1 since this is already the de facto
+1 Great idea!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Designating-maintainers-for-some-Spark-components-tp9115p9142.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
+1, It makes sense!
- Kousuke
(2014/11/05 17:31), Matei Zaharia wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well as
call for an official vote on it on a public list. Basically, as the Spark
project scales up, we need to define a model to make sure
+1, sounds good.
On Wed, Nov 5, 2014 at 9:19 PM, Kousuke Saruta saru...@oss.nttdata.co.jp
wrote:
+1, It makes sense!
- Kousuke
(2014/11/05 17:31), Matei Zaharia wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well
as call for an official vote on it
+1 it make more focus and more consistence.
Yours, Xuefeng Wu 吴雪峰 敬上
On 2014年11月6日, at 上午9:31, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well as
call for an official vote on it on a public list.
Several people asked about having maintainers review the PR queue for their
modules regularly, and I like that idea. We have a new tool now to help with
that in https://spark-prs.appspot.com.
In terms of the set of open PRs itself, it is large but note that there are
also 2800 *closed* PRs,
+1
Cheers!
Manoj.
On Thu, Nov 6, 2014 at 12:51 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Several people asked about having maintainers review the PR queue for
their modules regularly, and I like that idea. We have a new tool now to
help with that in https://spark-prs.appspot.com.
In
+1
Liquan
On Wed, Nov 5, 2014 at 11:32 PM, Manoj Babu manoj...@gmail.com wrote:
+1
Cheers!
Manoj.
On Thu, Nov 6, 2014 at 12:51 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Several people asked about having maintainers review the PR queue for
their modules regularly, and I like
39 matches
Mail list logo