Hey Andrew,
I think you are correct and a follow up to SPARK-2521 will end up
fixing this. The desing of SPARK-2521 automatically broadcasts RDD
data in tasks and the approach creates a new copy of the RDD and
associated data for each task. A natural follow-up to that patch is to
stop handling
Sounds good -- I added comments to the ticket.
Since SPARK-2521 is scheduled for a 1.1.0 release and we can work around
with spark.speculation, I don't personally see a need for a 1.0.2 backport.
Thanks looking through this issue!
On Thu, Jul 17, 2014 at 2:14 AM, Patrick Wendell
Are you setting -Pyarn-alpha? ./sbt/sbt -Pyarn-alpha, followed by
projects, shows it as a module. You should only build yarn-stable
*or* yarn-alpha at any given time.
I don't remember the modules changing in a while. 'yarn-alpha' is for
YARN before it stabilized, circa early Hadoop 2.0.x.
To add, we've made some effort to yarn-alpha to work with the 2.0.x line,
but this was a time when YARN went through wild API changes. The only line
that the yarn-alpha profile is guaranteed to work against is the 0.23 line.
On Thu, Jul 17, 2014 at 12:40 AM, Sean Owen so...@cloudera.com wrote:
Please vote on releasing the following candidate as Apache Spark version 0.9.2!
The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b
The release files, including signatures, digests, etc.
@Sean and @Sandy
Thanks for the reply. I used to be able to see yarn-alpha and yarn
directories which corresponding to the modules.
I guess due to the recent SparkBuild.scala changes, I did not see
yarn-alpha (by default) and I thought yarn-alpha is renamed to yarn and
yarn-stable is the
I'm trying to compile the latest code, with the hadoop-version set for
2.0.0-mr1-cdh4.6.0.
I'm getting the following error, which I don't get when I don't set the
hadoop version:
[error]
This looks like a Jetty version problem actually. Are you bringing in
something that might be changing the version of Jetty used by Spark?
It depends a lot on how you are building things.
Good to specify exactly how your'e building here.
On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld
Looks like a real problem. I see it too. I think the same workaround
found in ClientBase.scala needs to be used here. There, the fact that
this field can be a String or String[] is handled explicitly. In fact
I think you can just call to ClientBase for this? PR it, I say.
On Thu, Jul 17, 2014 at
My full build command is:
./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly
I've changed one line in RDD.scala, nothing else.
On Thu, Jul 17, 2014 at 10:56 AM, Sean Owen so...@cloudera.com wrote:
This looks like a Jetty version problem actually. Are you bringing in
something
er, that line being in toDebugString, where it really shouldn't affect
anything (no signature changes or the like)
On Thu, Jul 17, 2014 at 10:58 AM, Nathan Kronenfeld
nkronenf...@oculusinfo.com wrote:
My full build command is:
./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly
Thank you, TD !
Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108
On Wed, Jul 16, 2014 at 6:53 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:
After every checkpointing interval, the latest state RDD is stored to HDFS
in its entirety. Along with that, the series of DStream
CC tmalaska since he touched the line in question. This is a fun one.
So, here's the line of code added last week:
val channelFactory = new NioServerSocketChannelFactory
(Executors.newCachedThreadPool(), Executors.newCachedThreadPool());
Scala parses this as two statements, one invoking a
Don't make this change yet. I have a 1642 that needs to get through around
the same code.
I can make this change after 1642 is through.
On Thu, Jul 17, 2014 at 12:25 PM, Sean Owen so...@cloudera.com wrote:
CC tmalaska since he touched the line in question. This is a fun one.
So, here's the
OK I will create PR.
thanks
On Thu, Jul 17, 2014 at 7:58 AM, Sean Owen so...@cloudera.com wrote:
Looks like a real problem. I see it too. I think the same workaround
found in ClientBase.scala needs to be used here. There, the fact that
this field can be a String or String[] is handled
Should be an easy rebase for your PR, so I went ahead just to get this fixed up:
https://github.com/apache/spark/pull/1466
On Thu, Jul 17, 2014 at 5:32 PM, Ted Malaska ted.mala...@cloudera.com wrote:
Don't make this change yet. I have a 1642 that needs to get through around
the same code.
I
On Thu, Jul 17, 2014 at 1:23 AM, Stephen Haberman
stephen.haber...@gmail.com wrote:
I'd be ecstatic if more major changes were this well/succinctly
explained
Ditto on that. The summary of user impact was very nice. It would be good
to repeat that on the user list or release notes when this
I start the voting with a +1.
Ran tests on the release candidates and some basic operations in
spark-shell and pyspark (local and standalone).
-Xiangrui
On Thu, Jul 17, 2014 at 3:16 AM, Xiangrui Meng men...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark
Having looked at trunk make-distribution.sh the --with-hive and --with-yarn
are now deprecated.
Here is the way I have built it:
Added to pom.xml:
profile
idcdh5/id
activation
activeByDefaultfalse/activeByDefault
/activation
properties
Hi all,
Cool discussion! I agree that a more standardized API for clustering, and
easy access to underlying routines, would be useful (we've also been
discussing this when trying to develop streaming clustering algorithms,
similar to https://github.com/apache/spark/pull/1361)
For divisive,
+1
Tested with my Ubuntu Linux.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Thu, Jul 17, 2014 at 6:36 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
+1
Tested on Mac, verified
Hi all,
What's the preferred environment for generating golden test outputs for new
Hive tests? In particular:
* what Hadoop version and Hive version should I be using,
* are there particular distributions people have run successfully, and
* are there any system properties or environment
+1
On Thursday, July 17, 2014, Matei Zaharia matei.zaha...@gmail.com wrote:
+1
Tested on Mac, verified CHANGES.txt is good, verified several of the bug
fixes.
Matei
On Jul 17, 2014, at 11:12 AM, Xiangrui Meng men...@gmail.com
javascript:; wrote:
I start the voting with a +1.
Ran
Hi Will,
These three environment variables are needed [1].
I have had success with Hive 0.12 and Hadoop 1.0.4. For Hive, getting
the source distribution seems to be required. Docs contribution will
be much appreciated!
[1]
Hey Stephen,
The only change the build was that we ask users to run -Phive and
-Pyarn of --with-hive and --with-yarn (which internally just set
-Phive and -Pyarn). I don't think this should affect the dependency
graph.
Just to test this, what happens if you run *without* the CDH profile
and
25 matches
Mail list logo