Hi,
Any comments or thoughts on the implications of the newly released feature
from Hadoop 2.3 on the centralized cache? How different it is from RDD?
Many thanks.
Cao
Hi Ryan,
It worked like a charm. Much appreciated.
Laeeq.
On Wednesday, May 7, 2014 1:30 AM, Ryan Compton compton.r...@gmail.com wrote:
I've been using this (you'll need maven 3).
project xmlns=http://maven.apache.org/POM/4.0.0;
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
i used to be able to get all tests to pass.
with java 6 and sbt i get PermGen errors (no matter how high i make the
PermGen). so i have given up on that.
with java 7 i see 1 error in a bagel test and a few in streaming tests. any
ideas? see the error in BagelSuite below.
[info] - large number
Got the same experience over here. 0.9.1 (not from github, from official
download page), running hadoop 2.2.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-workers-keep-getting-disconnected-Keep-dying-from-the-cluster-tp5740p5747.html
Sent from the
Well... the reason was an out-of-date version of Python (2.6.6) on the
machine where I ran the script. If anyone else experiences this issue -
just update your Python.
On Sun, May 4, 2014 at 7:51 PM, Aliaksei Litouka aliaksei.lito...@gmail.com
wrote:
I am using Spark 0.9.1. When I'm trying to
By looking at your config, I think there's something wrong with your setup.
One of the key elements of Mesos is that you are abstracted from where the
execution of your task takes place. The SPARK_EXECUTOR_URI tells Mesos
where to find the 'framework' (in Mesos jargon) required to execute a job.
Laurent the problem is that the reference.conf that is embedded in akka
jars is being overriden by some other conf. This happens when multiple
files have the same name.
I am using Spark with maven. In order to build the fat jar I use the shade
plugin and it works pretty well. The trick here is to
Hey,
I am facing a weird issue.
My spark workers keep dying every now and then and in the master logs i keep
on seeing following messages,
14/05/14 10:09:24 WARN Master: Removing worker-20140514080546-x.x.x.x-50737
because we got no heartbeat in 60 seconds
14/05/14 14:18:41 WARN Master:
Hi,
I have some strange behaviour when using textFile to read some data from
HDFS in spark 0.9.1.
I get UnknownHost exceptions, where hadoop client tries to resolve the
dfs.nameservices and fails.
So far:
- this has been tested inside the shell
- the exact same code works with spark-0.8.1
-
I wanted to know how can we efficiently get top 10 hashtags in last 5 mins
window. Currently I am using reduceByKeyAndWindow over 5 mins window and
then sorting to get top 10 hashtags. But it is taking a lot of time. How can
we do it efficiently ?
--
View this message in context:
I've done some comparisons with my own implementation of TRON on Spark.
From a distributed computing perspective, it does 2x more local work per
iteration than LBFGS, so the parallel isoefficiency is improved slightly.
I think the truncated Newton solver holds some potential because there
have
Could you try `println(result.toDebugString())` right after `val
result = ...` and attach the result? -Xiangrui
On Fri, May 9, 2014 at 8:20 AM, phoenix bai mingzhi...@gmail.com wrote:
after a couple of tests, I find that, if I use:
val result = model.predict(prdctpairs)
result.map(x =
I’m trying to do a simple count() on a large number of GZipped files in S3.
My job is failing with the following message:
14/05/15 19:12:37 WARN scheduler.TaskSetManager: Loss was due to
java.io.IOException
java.io.IOException: incorrect header check
at
Although you need to compile it differently for different versions of
HDFS / Hadoop, as far as I know Spark continues to work with Hadoop
1.x (and probably older 0.20.x as a result -- your experience is an
existence proof.) And it works with the newest Hadoop 2.4.x, again
with the appropriate
Finally find a way out of the ClassLoader maze! It took me some times to
understand how it works; I think it worths to document it in a separated
thread.
We're trying to add external utility.jar which contains CSVRecordParser,
and we added the jar to executors through sc.addJar APIs.
If the
Is there any Spark plugin/add-on that facilitate the query to a JSON
content?
Best,
Flavio
On Thu, May 15, 2014 at 6:53 PM, Michael Armbrust mich...@databricks.comwrote:
Here is a link with more info:
http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
On Wed, May
Doesnt DStream.foreach() suffice?
anyDStream.foreach { rdd =
// do something with rdd
}
On Wed, May 14, 2014 at 9:33 PM, Stephen Boesch java...@gmail.com wrote:
Looking further it appears the functionality I am seeking is in the
following *private[spark] * class ForEachdStream
(version
unsubscribe
If you check out the master branch, there are some examples that can
be used as templates under
examples/src/main/scala/org/apache/spark/examples/mllib
Best,
Xiangrui
On Wed, May 14, 2014 at 1:36 PM, yxzhao yxz...@ualr.edu wrote:
Hello,
I found the classfication algorithms SVM and
Hi,
I'm running the spark server with a single worker on a laptop using the
docker images. The spark shell examples run fine with this setup. However,
a standalone java client that tries to run wordcount on a local files (1 MB
in size), the execution fails with the following error on the stdout
unsubscribe
Hey folks,
I'm wondering what strategies other folks are using for maintaining and
monitoring the stability of stand-alone spark clusters.
Our master very regularly loses workers, and they (as expected) never
rejoin the cluster. This is the same behavior I've seen
using akka cluster (if that's
I'm using spark-ec2 to run some Spark code. When I set master to
local, then it runs fine. However, when I set master to $MASTER,
the workers immediately fail, with java.lang.NoClassDefFoundError for
the classes.
I've used sbt-assembly to make a jar with the classes, confirmed using
jar tvf
Hi Marco,
Hive itself is not working in the CDH5.0 VM (due to FNFE's on the third
party jars). While you did not mention using Shark, you may keep that in
mind. I will try out spark-only commands late today and report what I find.
2014-05-14 5:00 GMT-07:00 Marco Shaw marco.s...@gmail.com:
Serializing the main object isn't going to help here - it's SparkContext
it's complaining about.
The problem is that the context is, according to the code you sent,
computeDwt has a signature of:
class DWTSample ... {
def computDWT (sc: SparkContext, data: ArrayBuffer[(Int, Double)]):
Hi nilmish,
One option for you is to consider moving to a different algorithm. The
SpaceSaver/StreamSummary method will get you approximate results in exchange
for smaller data structure size. It has an implementation in Twitter's
Algebird library, if you're using Scala:
Hello Mohit,
I don't think there's a direct way of bleeding elements across partitions.
But you could write it yourself relatively succinctly:
A) Sort the RDD
B) Look at the sorted RDD's partitions with the .mapParititionsWithIndex( )
method. Map each partition to its partition ID, and its
Hi,
I am trying to understand and and seeing Drill as one of the upcoming
interesting tool outside.
Can somebody clarify where Drill is going to position in Hadoop ecosystem
compare with Spark and Shark?
Is it going to be used as alternative to any one of the Spark/Shark or Storm?
Or Drill
What is a good way to pass config variables to workers?
I've tried setting them in environment variables via spark-env.sh, but, as
far as I can tell, the environment variables set there don't appear in
workers' environments. If I want to be able to configure all workers,
what's a good way to do
It would look ugly.. as explicit datatypes need to be mentioned..
you are better off using parallelize instead.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Fri, May 16, 2014 at 6:11 PM, Eduardo Costa Alfaia
i tried on a few different machines, including a server, all same ubuntu
and same java, and got same errors. i also tried modifying the timeouts in
the unit tests and it did not help.
ok i will try blowing away local maven repo and do clean.
On Thu, May 15, 2014 at 12:49 PM, Sean Owen
It is running k-means many times, independently, from different random
starting points in order to pick the best clustering. Convergence ends
one run, not all of them.
Yes epsilon should be the same as convergence threshold elsewhere.
You can set epsilon if you instantiate KMeans directly. Maybe
Hi Andrew,
Could you try varying the minPartitions parameter? For example:
val r = sc.textFile(/user/aa/myfile.bz2, 4).count
val r = sc.textFile(/user/aa/myfile.bz2, 8).count
Best,
Xiangrui
On Tue, May 13, 2014 at 9:08 AM, Xiangrui Meng men...@gmail.com wrote:
Which hadoop version did you
Hey Marco, I tried the CDH5 VM today and it works fine -- but note
that you need to start the Spark service after the VM boots. Just go
to CM and choose Start from the dropdown next to Spark. spark-shell
works fine then.
On Wed, May 14, 2014 at 1:00 PM, Marco Shaw marco.s...@gmail.com wrote:
Hi,
after removing all class paramater of class Path from my code, i tried
again. different but related eror when i set
spark.files.userClassPathFirst=true
now i dont even use FileInputFormat directly. HadoopRDD does...
14/05/16 12:17:17 ERROR Executor: Exception in task ID 45
Hi
I see from the docs for 1.0.0 that the new spark-submit mechanism seems
to support specifying the jar with hdfs:// or http://
Does this support S3? (It doesn't seem to as I have tried it on EC2 but
doesn't seem to work):
./bin/spark-submit --master local[2] --class myclass
I want to use accumulators to keep counts of things like invalid lines
found and such, for reporting purposes. Similar to Hadoop counters. This
may seem simple, but my case is a bit more complicated. The code which is
creating an RDD from a transform is separated from the code which performs
the
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html
We do not currently cache blocks which are under construction, corrupt, or
otherwise incomplete.
Have you tried with a file with more than 1 block?
And
I don't think there's a direct way of bleeding elements across partitions.
But you could write it yourself relatively succinctly:
A) Sort the RDD
B) Look at the sorted RDD's partitions with the .mapParititionsWithIndex( )
method. Map each partition to its partition ID, and its maximum element.
On Wed, May 7, 2014 at 4:44 PM, Aaron Davidson ilike...@gmail.com wrote:
Spark can only run as many tasks as there are partitions, so if you don't
have enough partitions, your cluster will be underutilized.
This is a very important point.
kamatsuoka, how many partitions does your RDD have
I am working on some code which uses mapPartitions. Its working great, except
when I attempt to use a variable within the function passed to mapPartitions
which references something outside of the scope (for example, a variable
declared immediately before the mapPartitions call). When this
Hello
I have few questions regarding shark.
1) I have a table of 60 GB and i have total memory of 50 GB but when i try
to cache the table it get cached successfully. How shark caches the table
there was not enough memory to get the table in memory. And how cache
eviction policies (FIFO and LRU)
UP, doesn't anyone know something about it? ^^
2014-05-06 12:05 GMT+02:00 Andrea Esposito and1...@gmail.com:
Hi there,
sorry if i'm posting a lot lately.
i'm trying to add the KryoSerializer but i receive this exception:
2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due to
Hi guys,
I think it maybe a bug in Spark. I wrote some code to demonstrate the bug.
Example 1) This is how Spark adds jars. Basically, add jars to
cutomURLClassLoader.
https://github.com/dbtsai/classloader-experiement/blob/master/calling/src/main/java/Calling1.java
It doesn't work for two
in writing my own RDD i ran into a few issues with respect to stuff being
private in spark.
in compute i would like to return an iterator that respects task killing
(as HadoopRDD does), but the mechanics for that are inside the private
InterruptibleIterator. also the exception i am supposed to
There are examples to run them in BinaryClassification.scala in
org.apache.spark.examples...
On Wed, May 14, 2014 at 1:36 PM, yxzhao yxz...@ualr.edu wrote:
Hello,
I found the classfication algorithms SVM and LogisticRegression implemented
in the following directory. And how to run them?
What is a good way to pass config variables to workers?
I've tried setting them in environment variables via spark-env.sh, but, as
far as I can tell, the environment variables set there don't appear in
workers' environments. If I want to be able to configure all workers,
what's a good way to do
I've experienced the same bug, which I had to workaround manually. I
posted the details here:
http://stackoverflow.com/questions/23687081/spark-workers-unable-to-find-jar-on-ec2-cluster
On 5/15/14, DB Tsai dbt...@stanford.edu wrote:
Hi guys,
I think it maybe a bug in Spark. I wrote some code
What is the difference between a Spark Worker and a Spark Slave?
Hi!
I understand the usual Task not serializable issue that arises when
accessing a field or a method that is out of scope of a closure.
To fix it, I usually define a local copy of these fields/methods, which
avoids the need to serialize the whole class:
class MyClass(val myField: Any) {
def
Stuti,
I'm answering your questions in order:
1. From MLLib
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L159
*,* you can see that clustering stops when we have reached*maxIterations* or
there are no more*activeRuns*.
KMeans is
yeah sure. it is ubuntu 12.04 with jdk1.7.0_40
what else is relevant that i can provide?
On Thu, May 15, 2014 at 12:17 PM, Sean Owen so...@cloudera.com wrote:
FWIW I see no failures. Maybe you can say more about your environment, etc.
On Wed, May 7, 2014 at 10:01 PM, Koert Kuipers
I tried centralized cache step by step following the apache hadoop oficial
website, but it seems centralized cache doesn't work.
see :
http://stackoverflow.com/questions/22293358/centralized-cache-failed-in-hadoop-2-3
.
Can anyone succeed?
2014-05-15 5:30 GMT+08:00 William Kang
well, i modified ChildExecutorURLClassLoader to also delegate to
parentClassloader if NoClassDefFoundError is thrown... now i get yet
another error. i am clearly missing something with these classloaders. such
nasty stuff... giving up for now. just going to have to not use
And I thought I sent it to the right list! Here you go again - Question below :
On May 14, 2014, at 3:06 PM, Vipul Pandey vipan...@gmail.com wrote:
So here's a followup question : What's the preferred mode?
We have a new cluster coming up with petabytes of data and we intend to take
Spark
http://spark-summit.org ?
Bertrand
On Thu, May 8, 2014 at 2:05 AM, Ian Ferreira ianferre...@hotmail.comwrote:
Folks,
I keep getting questioned on real world experience of Spark as in mission
critical production deployments. Does anyone have some war stories to share
or know of resources
The jars are actually there (and in classpath), but you need to load
through reflection. I've another thread giving the workaround.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Fri,
Thanks for the answers!
On a concrete example, here is what I did to test my (wrong :) ) hypothesis
before writing my email:
class SomethingNotSerializable {
def process(a: Int): Int = 2 *a
}
object NonSerializableClosure extends App {
val sc = new spark.SparkContext(
local,
Hi there,
I was wondering if some one could explain me how the cache() function works
in Spark in these phases:
(1) If I have a huge file, say 1TB, which cannot be entirely stored in
Memory. What will happen if I try to create a RDD of this huge file and
cache?
(2) If it works in Spark, it can
Please ignore. This was sent last week not sure why it arrived so late.
-Original Message-
From: amoc [mailto:amoc...@verticalscope.com]
Sent: May-09-14 10:13 AM
To: u...@spark.incubator.apache.org
Subject: Re: slf4j and log4j loop
Hi Patrick/Sean,
Sorry to resurrect this thread, but
Thanks for responding, Sandy.
YARN for sure is a more mature way of working on shared resources. I was not
sure about how stable Spark on YARN is and if anyone is using it in production.
I have been using Standalone mode in our dev cluster but multi-tenancy and
resource allocation wise it's
I guess what you are trying to do is get a columnar projection on your
data, sparksql maybe a good option for you (especially if your data is
sparse good for columnar projection).
If you are looking to work with simple key value then you are better off
using Hbase input reader in hadoopIO get a
Here is a link with more info:
http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
On Wed, May 7, 2014 at 10:09 PM, Debasish Das debasish.da...@gmail.comwrote:
Hi,
For each line that we read as textLine from HDFS, we have a schema..if
there is an API that takes the
Not sure if this is feasible, but this literally does what I think you
are describing:
sc.parallelize(rdd1.first to rdd1.last)
On Tue, May 13, 2014 at 4:56 PM, Mohit Jaggi mohitja...@gmail.com wrote:
Hi,
I am trying to find a way to fill in missing values in an RDD. The RDD is a
sorted
Hi,
I have some complex behavior i'd like to be advised on as i'm really new to
Spark.
I'm reading some log files that contains various events. There are two types
of events: parents and children. A child event can only have one parent and
a parent can have multiple children.
Currently i'm
Paco, that's a great video reference, thanks.
To be fair to our friends at Yahoo, who have done a tremendous amount to
help advance the cause of the BDAS stack, it's not FUD coming from them,
certainly not in any organized or intentional manner.
In vacuo we prefer Mesos ourselves, but also can't
Pravesh,
Correct, the logistic regression engine is set up to perform classification
tasks that take feature vectors (arrays of real-valued numbers) that are
given a class label, and learning a linear combination of those features
that divide the classes. As the above commenters have mentioned,
We never saw your exception when reading bzip2 files with spark.
But when we wrongly compiled spark against older version of hadoop (was
default in spark), we ended up with sequential reading of bzip2 file,
not taking advantage of block splits to work in parallel.
Once we compiled spark with
How did you deal with this problem, I have met with it these days.God bless
me.
Best regard,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-tp1753p5738.html
Sent from the Apache Spark User List mailing list archive at
Hi Marcin,
On Wed, May 14, 2014 at 7:22 AM, Marcin Cylke
marcin.cy...@ext.allegro.pl wrote:
- This looks like some problems with HA - but I've checked namenodes during
the job was running, and there
was no switch between master and slave namenode.
14/05/14 15:25:44 ERROR
+1, at least with current code
just watch the log printed by DAGScheduler…
--
Nan Zhu
On Wednesday, May 14, 2014 at 1:58 PM, Mark Hamstra wrote:
serDe
Hi Xiangrui,
// FYI I'm getting your emails late due to the Apache mailing list outage
I'm using CDH4.4.0, which I think uses the MapReduce v2 API. The .jars are
named like this: hadoop-hdfs-2.0.0-cdh4.4.0.jar
I'm also glad you were able to reproduce! Please paste a link to the
Hadoop bug you
Hi Vipul,
Some advantages of using YARN:
* YARN allows you to dynamically share and centrally configure the same
pool of cluster resources between all frameworks that run on YARN. You can
throw your entire cluster at a MapReduce job, then use some of it on an
Impala query and the rest on Spark
Hi Andrew,
I verified that this is due to thread safety. I changed
SPARK_WORKER_CORES to 1 in spark-env.sh, so there is only 1 thread per
worker. Then I can load the file without any problem with different
values of minPartitions. I will submit a JIRA to both Spark and
Hadoop.
Best,
Xiangrui
On
(Sorry if you have already seen this message - it seems like there were
some issues delivering messages to the list yesterday)
We can create standalone Spark application by simply adding
spark-core_2.x to build.sbt/pom.xml and connecting it to Spark master.
We can also build custom version of
Hi Xiangrui,
We're still using Spark 0.9 branch, and our job is submitted by
./bin/spark-class org.apache.spark.deploy.yarn.Client \
--jar YOUR_APP_JAR_FILE \
--class APP_MAIN_CLASS \
--args APP_MAIN_ARGUMENTS \
--num-workers NUMBER_OF_WORKER_MACHINES \
--master-class
I found that the easiest way was to pass variables in the Spark configuration
object. The only catch is that all of your properties keys must being with
spark. in order for Spark to propagate the values. So, for example, in the
driver:
SparkConf conf = new SparkConf();
Andrew,
thanks for your response. When using the coarse mode, the jobs run fine.
My problem is the fine-grained mode. Here the parallel jobs nearly
always end in a dead lock. It seems to have something to do with
resource allocation, as Mesos shows neither used nor idle CPU resources
in this
hi, All
I encountered OOM when streaming.
I send data to spark streaming through Zeromq at a speed of 600 records per
second, but the spark streaming only handle 10 records per 5 seconds( set it
in streaming program)
my two workers have 4 cores CPU and 1G RAM.
These workers always occur Out
I have Spark code which runs beautifully when MASTER=local. When I
run it with MASTER set to a spark ec2 cluster, the workers seem to
run, but the results, which are supposed to be put to AWS S3, don't
appear on S3. I'm at a loss for how to debug this. I don't see any
S3 exceptions anywhere.
Install your custom spark jar to your local maven or ivy repo. Use this custom
jar in your pom/sbt file.
On May 15, 2014, at 3:28 AM, Andrei faithlessfri...@gmail.com wrote:
(Sorry if you have already seen this message - it seems like there were some
issues delivering messages to the
Not a hack, this is documented here:
http://spark.apache.org/docs/0.9.1/configuration.html, and is in fact the
proper way of setting per-application Spark configurations.
Additionally, you can specify default Spark configurations so you don't
need to manually set it for all applications. If you
i do not think the current solution will work. i tried writing a version of
ChildExecutorURLClassLoader that does have a proper parent and has a
modified loadClass to reverse the order of parent and child in finding
classes, and that seems to work, but now classes like SparkEnv are loaded
by the
Did you check the executor stderr logs?
On 5/16/14, 2:37 PM, Robert James srobertja...@gmail.com wrote:
I have Spark code which runs beautifully when MASTER=local. When I
run it with MASTER set to a spark ec2 cluster, the workers seem to
run, but the results, which are supposed to be put to AWS
Same here. I've posted a bunch of questions in the last few days and they
don't show up here and I'm also not getting email to my (gmail.com) account.
I came here to post directly on the mailing list but saw this thread
instead. At least, I'm not alone.
--
View this message in context:
File is just a steam with a fixed length. Usually streams don't end but in this
case it would.
On the other hand if you real your file as a steam may not be able to use the
entire data in the file for your analysis. Spark (give enough memory) can
process large amounts of data quickly.
On
Hi Stuti,
I think you're right. The epsilon parameter is indeed used as a threshold
for deciding when KMeans has converged. If you look at line 201 of mllib's
KMeans.scala:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L201
you
How did you deal with this problem finally?I also met with it.
Best regards,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkContext-startup-time-out-tp1753p5739.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi Andrew,
This is the JIRA I created:
https://issues.apache.org/jira/browse/MAPREDUCE-5893 . Hopefully
someone wants to work on it.
Best,
Xiangrui
On Fri, May 16, 2014 at 6:47 PM, Xiangrui Meng men...@gmail.com wrote:
Hi Andre,
I could reproduce the bug with Hadoop 2.2.0. Some older version
so you can use a input output format read it whichever way you write...
You can additionally provide variables in hadoop configuration to
configure.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Thu, May 8, 2014 at
Hi Andre,
I could reproduce the bug with Hadoop 2.2.0. Some older version of
Hadoop do not support splittable compression, so you ended up with
sequential reads. It is easy to reproduce the bug with the following
setup:
1) Workers are configured with multiple cores.
2) BZip2 files are big enough
Frankly if you can give enough CPU performance to VM it should be good...
but for development setting up locally is better
1. debuggable in IDE
2. Faster
3. samples like run-example etc
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi
They are different terminology for the same thing and should be
interchangeable.
On Fri, May 16, 2014 at 2:02 PM, Robert James srobertja...@gmail.comwrote:
What is the difference between a Spark Worker and a Spark Slave?
Sorry, yes, you are right, the documentation does indeed explain that setting
spark.* options is the way to pass Spark configuration options to workers.
Additionally, we've use the same mechanism to pass application-specific
configuration options to workers; the hack part is naming our
Hi,
I have data in a file. Can I read it as Stream in spark? I know it seems odd to
read file as stream but it has practical applications in real life if I can
read it as stream. It there any other tools which can give this file as stream
to Spark or I have to make batches manually which is
Hi Sophia,
Unfortunately, Spark doesn't work against YARN in CDH4. The YARN APIs
changed quite a bit before finally being stabilized in Hadoop 2.2 and CDH5.
Spark on YARN supports Hadoop 0.23.* and Hadoop 2.2+ / CDH5.0+, but does
not support CDH4, which is somewhere in between.
-Sandy
On
98 matches
Mail list logo