Are you running in yarn-standalone mode or yarn-client mode? Also, what
YARN scheduler and what NodeManager heartbeat?
On Sun, Mar 2, 2014 at 9:41 PM, polkosity polkos...@gmail.com wrote:
Thanks for the advice Mayur.
I thought I'd report back on the performance difference... Spark
I think this is caused by not setting yarn.application.classpath in your
yarn-site.xml.
-Sandy
On Sat, Mar 8, 2014 at 2:24 AM, Venkata siva kamesh Bhallamudi
kam.iit...@gmail.com wrote:
Hi All,
I am new to Spark and running pi example on Yarn Cluster. I am getting
the following exception
There was an issue related to this fixed recently:
https://github.com/apache/spark/pull/103
On Sun, Mar 9, 2014 at 8:40 PM, Koert Kuipers ko...@tresata.com wrote:
edit last line of sbt/sbt, after which i run:
sbt/sbt test
On Sun, Mar 9, 2014 at 10:24 PM, Sean Owen so...@cloudera.com wrote:
Hi Aaron,
When you say Java heap space is 1.5G per worker, 24 or 32 cores across 46
nodes. It seems like we should have more than enough to do this
comfortably., how are you configuring this?
-Sandy
On Tue, Mar 11, 2014 at 10:11 AM, Aaron Olson aaron.ol...@shopify.comwrote:
Dear Sparkians,
export SPARK_JAVA_OPTS=-Dspark.ui.port=0 -Dspark.default.parallelism=1024
-Dspark.cores.max=256 -Dspark.executor.memory=1500m
-Dspark.worker.timeout=500 -Dspark.akka.timeout=500
Does that value seem low to you?
-Aaron
On Tue, Mar 11, 2014 at 3:08 PM, Sandy Ryza sandy.r
Hi Paul,
What do you mean by distributing the jars manually? If you register jars
that are local to the client with SparkContext.addJars, Spark should handle
distributing them to the workers. Are you taking advantage of this?
-Sandy
On Tue, Mar 11, 2014 at 3:09 PM, Paul Schooss
Hi Sung,
Are you using yarn-standalone mode? Have you specified the --addJars
option with your external jars?
-Sandy
On Wed, Mar 26, 2014 at 1:17 PM, Sung Hwan Chung
coded...@cs.stanford.eduwrote:
Hello, (this is Yarn related)
I'm able to load an external jar and use its classes within
added but can't reference
classes from it. Does this have anything to do with this bug?
http://stackoverflow.com/questions/22457645/when-to-use-spark-classpath-or-sparkcontext-addjar
On Thu, Mar 27, 2014 at 2:57 PM, Sandy Ryza sandy.r...@cloudera.comwrote:
I just tried this in CDH (only
I don't think the YARN default of max 8GB container size is a good
justification for limiting memory per worker. This is a sort of arbitrary
number that came from an era where MapReduce was the main YARN application
and machines generally had less memory. I expect to see this to get
configured
Hi Christophe,
Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required. The
former makes them available to the spark-shell driver process, and the
latter tells Spark to make them available to the executor processes running
on the cluster.
-Sandy
On Wed, Apr 16, 2014 at 9:27 AM,
Hi Gordon,
We recently handled this in SPARK-1064. As of 1.0.0, you'll be able to
pass -Phadoop-provided to Maven and avoid including Hadoop and its
dependencies in the assembly jar.
-Sandy
On Tue, Apr 22, 2014 at 2:43 AM, Gordon Wang gw...@gopivotal.com wrote:
In this page
I currently don't have plans to work on that.
-Sandy
On Apr 22, 2014, at 8:06 PM, Gordon Wang gw...@gopivotal.com wrote:
Thanks I see. Do you guys have plan to port this to sbt?
On Wed, Apr 23, 2014 at 10:24 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
Right, it only works for Maven
in future versions of Spark? I personally always set it to
/dev/null when launching a spark-shell in yarn-client mode.
Thanks again for your time!
Christophe.
On 21/04/2014 19:16, Sandy Ryza wrote:
Hi Christophe,
Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required
Hi Vipul,
Some advantages of using YARN:
* YARN allows you to dynamically share and centrally configure the same
pool of cluster resources between all frameworks that run on YARN. You can
throw your entire cluster at a MapReduce job, then use some of it on an
Impala query and the rest on Spark
Hi Sophia,
Unfortunately, Spark doesn't work against YARN in CDH4. The YARN APIs
changed quite a bit before finally being stabilized in Hadoop 2.2 and CDH5.
Spark on YARN supports Hadoop 0.23.* and Hadoop 2.2+ / CDH5.0+, but does
not support CDH4, which is somewhere in between.
-Sandy
On
Hi Eric,
Have you tried setting the SPARK_WORKER_INSTANCES env variable before
running spark-shell?
http://spark.apache.org/docs/0.9.0/running-on-yarn.html
-Sandy
On Mon, May 19, 2014 at 8:08 AM, Eric Friedman e...@spottedsnake.netwrote:
Hi
I am working with a Cloudera 5 cluster with 192
Hi Jan,
How much memory capacity is configured for each node?
If you go to the ResourceManager web UI, does it indicate any containers are
running?
-Sandy
On May 19, 2014, at 11:43 PM, Jan Holmberg jan.holmb...@perigeum.fi wrote:
Hi,
I’m new to Spark and trying to test first Spark prog.
Hi Ron,
What version are you using? For 0.9, you need to set it outside your code
with the SPARK_YARN_QUEUE environment variable.
-Sandy
On Mon, May 19, 2014 at 9:29 PM, Ron Gonzalez zlgonza...@yahoo.com wrote:
Hi,
How does one submit a spark job to yarn and specify a queue?
The code
Hi Xu,
As crazy as it might sound, this all makes sense.
There are a few different quantities at play here:
* the heap size of the executor (controlled by --executor-memory)
* the amount of memory spark requests from yarn (the heap size plus
384 mb to account for fixed memory costs outside if
plan to deploy a different Hadoop cluster(with YARN) only to run Spark.
Is it necessary to deploy YARN with security enabled? Or is it possible to
access data within a security HDFS from no-security enabled Spark on YARN?
On Wed, Jul 9, 2014 at 4:19 AM, Sandy Ryza sandy.r...@cloudera.com
wrote
Spark still supports the ability to submit jobs programmatically without
shell scripts.
Koert,
The main reason that the unification can't be a part of SparkContext is
that YARN and standalone support deploy modes where the driver runs in a
managed process on the cluster. In this case, the
To add to Ron's answer, this post explains what it means to run Spark
against a YARN cluster, the difference between yarn-client and yarn-cluster
mode, and the reason spark-shell only works in yarn-client mode.
, I just want
there to be that long lived service that I can utilize.
Thanks!
On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
To add to Ron's answer, this post explains what it means to run Spark
against a YARN cluster, the difference between yarn-client and yarn
Hi Matthias,
Answers inline.
-Sandy
On Wed, Jul 16, 2014 at 12:21 AM, Matthias Kricke
matthias.kri...@mgm-tp.com wrote:
Hello @ the mailing list,
We think of using spark in one of our projects in a Hadoop cluster. During
evaluation several questions remain which are stated below.
Andrew,
Are you running on a CM-managed cluster? I just checked, and there is a
bug here (fixed in 1.0), but it's avoided by having
yarn.application.classpath defined in your yarn-site.xml.
-Sandy
On Wed, Jul 16, 2014 at 10:02 AM, Sean Owen so...@cloudera.com wrote:
Somewhere in here, you
Hi Ron,
I just checked and this bug is fixed in recent releases of Spark.
-Sandy
On Sun, Jul 13, 2014 at 8:15 PM, Chester Chen ches...@alpinenow.com wrote:
Ron,
Which distribution and Version of Hadoop are you using ?
I just looked at CDH5 ( hadoop-mapreduce-client-core-
Hi Haopu,
Spark will ask HDFS for file block locations and try to assign tasks based
on these.
There is a snag. Spark schedules its tasks inside of executor processes
that stick around for the lifetime of a Spark application. Spark requests
executors before it runs any jobs, i.e. before it has
for your patience!
--
*From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*Sent:* 2014年7月22日 9:47
*To:* user@spark.apache.org
*Subject:* Re: data locality
This currently only works for YARN. The standalone default is to place an
executor on every node
I haven't had a chance to look at the details of this issue, but we have
seen Spark successfully read Parquet tables created by Impala.
On Tue, Jul 22, 2014 at 10:10 AM, Andre Schumacher andre.sc...@gmail.com
wrote:
Hi,
I don't think anybody has been testing importing of Impala tables
At Cloudera we recommend bundling your application separately from the
Spark libraries. The two biggest reasons are:
* No need to modify your application jar when upgrading or applying a patch.
* When running on YARN, the Spark jar can be cached as a YARN local
resource, meaning it doesn't need
+user list
bcc: dev list
It's definitely possible to implement credit fraud management using Spark.
A good start would be using some of the supervised learning algorithms
that Spark provides in MLLib (logistic regression or linear SVMs).
Spark doesn't have any HMM implementation right now.
Hi Avishek,
As of Spark 1.0, PySpark does in fact run on YARN.
-Sandy
On Fri, Aug 8, 2014 at 12:47 PM, Avishek Saha avishek.s...@gmail.com
wrote:
So I think I have a better idea of the problem now.
The environment is YARN client and IIRC PySpark doesn't run on YARN
cluster.
So my client
with --help for usage help or --verbose for debug output
On 8 August 2014 13:28, Avishek Saha avishek.s...@gmail.com wrote:
You mean YARN cluster, right?
Also, my jobs runs thru all their stages just fine. But the entire
code crashes when I do a saveAsTextFile.
On 8 August 2014 13:24, Sandy
We generally recommend setting yarn.scheduler.maximum-allocation-mbto the
maximum node capacity.
-Sandy
On Fri, Aug 15, 2014 at 11:41 AM, Soumya Simanta soumya.sima...@gmail.com
wrote:
I just checked the YARN config and looks like I need to change this value.
Should be upgraded to 48G (the
Hi,
Do you know what YARN scheduler you're using and what version of YARN? It
seems like this would be caused by YarnClient.getQueueInfo returning null,
though, from browsing the YARN code, I'm not sure how this could happen.
-Sandy
On Fri, Aug 15, 2014 at 11:23 AM, Andrew Or
On closer look, it seems like this can occur if the queue doesn't exist.
Filed https://issues.apache.org/jira/browse/SPARK-3082.
-Sandy
On Sat, Aug 16, 2014 at 12:49 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi,
Do you know what YARN scheduler you're using and what version of YARN
Hi Matt,
I checked in the YARN code and I don't see any references to
yarn.resourcemanager.address. Have you made sure that your YARN client
configuration on the node you're launching from contains the right configs?
-Sandy
On Mon, Aug 18, 2014 at 4:07 PM, Matt Narrell matt.narr...@gmail.com
Hi Calvin,
When you say until all the memory in the cluster is allocated and the job
gets killed, do you know what's going on? Spark apps should never be
killed for requesting / using too many resources? Any associated error
message?
Unfortunately there are no tools currently for tweaking the
Hi Oleg. To run on YARN, simply set master to yarn. The YARN
configuration, located in a yarn-site.xml, determines where to look for the
YARN ResourceManager.
PROCESS_LOCAL is orthogonal to the choice of cluster resource manager. A
task is considered PROCESS_LOCAL when the executor it's running
Hi Praveen,
I believe you are correct. I noticed this a little while ago and had a fix
for it as part of SPARK-1714, but that's been delayed. I'll look into this
a little deeper and file a JIRA.
-Sandy
On Thu, Sep 11, 2014 at 11:44 PM, praveen seluka praveen.sel...@gmail.com
wrote:
Hi all
I'm actually surprised your memory is that high. Spark only allocates
spark.storage.memoryFraction for storing RDDs. This defaults to .6, so 32
GB * .6 * 10 executors should be a total of 192 GB.
-Sandy
On Sat, Sep 20, 2014 at 8:21 AM, Soumya Simanta soumya.sima...@gmail.com
wrote:
There 128
Hi Oleg,
Those parameters control the number and size of Spark's daemons on the
cluster. If you're interested in how these daemons relate to each other
and interact with YARN, I wrote a post on this a little while ago -
Hi Raghuveer,
This might be a better question for the cdh-user list or the Hadoop user
list. The Hadoop web interfaces for both the NameNode and ResourceManager
are enabled by default. Is it possible you have a firewall blocking those
ports?
-Sandy
On Wed, Sep 24, 2014 at 9:00 PM, Raghuveer
We're running into an error (below) when trying to read spilled shuffle
data back in.
Has anybody encountered this before / is anybody familiar with what causes
these Kryo UnsupportedOperationExceptions?
any guidance appreciated,
Sandy
---
com.esotericsoftware.kryo.KryoException
(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
I'm guessing those are from after the executors have died their mysterious
death. I'm happy ot send you the entire log if you'd like.
Thanks!
On Thu, Oct 2, 2014 at 2:02 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi
Hey Jon,
Since you're running on YARN, the Worker shouldn't be involved. Are you
able to go to the YARN ResourceManager web UI and click on nodes in the
top left. Does that node show up in the list? If you click on it, what's
shown under Total Pmem allocated for Container?
It also might be
I filed https://issues.apache.org/jira/browse/SPARK-3884 to address this.
-Sandy
On Thu, Oct 9, 2014 at 7:05 AM, Greg Hill greg.h...@rackspace.com wrote:
$MASTER is 'yarn-cluster' in spark-env.sh
spark-submit --driver-memory 12424m --class
org.apache.spark.examples.SparkPi
I'm experiencing some strange behavior with closure serialization that is
totally mind-boggling to me. It appears that two arrays of equal size take
up vastly different amount of space inside closures if they're generated in
different ways.
The basic flow of my app is to run a bunch of tiny
), and an array of those will still have a
pointer to each one, so I'd expect that many of them to be more than 80 MB
(which is very close to 1867*5*8).
Matei
On Nov 10, 2014, at 1:01 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
I'm experiencing some strange behavior with closure
if it can help.
2014-11-14 21:36 GMT+02:00 Sandy Ryza sandy.r...@cloudera.com:
Hi Egor,
Is it successful without dynamic allocation? From your log, it looks like
the job is unable to acquire resources from YARN, which could be because
other jobs are using up all the resources.
-Sandy
On Fri
Hey Alan,
Spark's application master will take up 1 core on one of the nodes on the
cluster. This means that that node will only have 31 cores remaining, not
enough to fit your third executor.
-Sandy
On Tue, Nov 18, 2014 at 10:03 AM, Alan Prando a...@scanboo.com.br wrote:
Hi Folks!
I'm
Hi Pala,
Do you have access to your YARN NodeManager logs? Are you able to check
whether they report killing any containers for exceeding memory limits?
-Sandy
On Tue, Nov 18, 2014 at 1:54 PM, Pala M Muthaia mchett...@rocketfuelinc.com
wrote:
Hi,
I am using Spark 1.0.1 on Yarn 2.5, and
While the app is running, you can find logs from the YARN web UI by
navigating to containers through the Nodes link.
After the app has completed, you can use the YARN logs command:
yarn logs -applicationId your app ID
-Sandy
On Wed, Nov 19, 2014 at 6:01 PM, innowireless TaeYun Kim
somewhat inconvenient that I must use ‘yarn logs’ rather than using
YARN resource manager web UI after the app has completed (that is, it seems
that the history server is not usable for Spark job), but it’s Ok.
*From:* Sandy Ryza [mailto:sandy.r...@cloudera.com]
*Sent:* Thursday, November 20
Hi Brett,
Are you noticing executors dying? Are you able to check the YARN
NodeManager logs and see whether YARN is killing them for exceeding memory
limits?
-Sandy
On Fri, Nov 21, 2014 at 9:47 AM, Brett Meyer brett.me...@crowdstrike.com
wrote:
I’m running a Python script with spark-submit
Hi Tobias,
One way to find out the number of executors is through
SparkContext#getExecutorMemoryStatus. You can find out the number of by
asking the SparkConf for the spark.executor.cores property, which, if not
set, means 1 for YARN.
-Sandy
On Fri, Nov 21, 2014 at 1:30 AM, Yanbo Liang
I think that actually would not work - yarn-cluster mode expects a specific
deployment path that uses SparkSubmit. Setting master as yarn-client should
work.
-Sandy
On Wed, Nov 26, 2014 at 8:32 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
How about?
- Create a SparkContext
- setMaster as
Hi Tobias,
What version are you using? In some recent versions, we had a couple of
large hardcoded sleeps on the Spark side.
-Sandy
On Fri, Dec 5, 2014 at 11:15 AM, Andrew Or and...@databricks.com wrote:
Hey Tobias,
As you suspect, the reason why it's slow is because the resource manager
this on standalone cluster mode the query finished
in 55s but on YARN, the query was still running 30min later. Would the hard
coded sleeps potentially be in play here?
On Fri, Dec 5, 2014 at 11:23 Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Tobias,
What version are you using? In some recent
is that they should be specified *before*
app jar and app args on spark-submit command line otherwise the app only
gets the default number of containers which is 2.
On Dec 5, 2014 12:22 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Denny,
Those sleeps were only at startup, so if jobs are taking
containers yet. Can you check the RM Web UI at port 8088 to see whether
your application is requesting more resources than the cluster has to offer?
2014-12-05 12:51 GMT-08:00 Sandy Ryza sandy.r...@cloudera.com:
Hey Arun,
The sleeps would only cause maximum like 5 second overhead. The idea
Hey Tobias,
Can you try using the YARN Fair Scheduler and set
yarn.scheduler.fair.continuous-scheduling-enabled to true?
-Sandy
On Sun, Dec 7, 2014 at 5:39 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Hi,
thanks for your responses!
On Sat, Dec 6, 2014 at 4:22 AM, Sandy Ryza sandy.r
Hi yuemeng,
Are you possibly running the Capacity Scheduler with the default resource
calculator?
-Sandy
On Sat, Dec 6, 2014 at 7:29 PM, yuemeng1 yueme...@huawei.com wrote:
Hi, all
When i running an app with this cmd: ./bin/spark-sql --master
yarn-client --num-executors 2
Another thing to be aware of is that YARN will round up containers to the
nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults
to 1024.
-Sandy
On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee denny.g@gmail.com wrote:
Got it - thanks!
On Sat, Dec 6, 2014 at 14:56 Arun Ahuja
Hi Tomer,
In yarn-cluster mode, the application has already been submitted to YARN by
the time the SparkContext is created, so it's too late to set the app name
there. I believe giving it with the --name property to spark-submit should
work.
-Sandy
On Thu, Dec 11, 2014 at 10:28 AM, Tomer
Hi Pala,
Spark executors only reserve spark.storage.memoryFraction (default 0.6) of
their spark.executor.memory for caching RDDs. The spark UI displays this
fraction.
spark.executor.memory controls the executor heap size.
spark.yarn.executor.memoryOverhead controls the extra that's tacked on
Hi Jon,
The fix for this is to increase spark.yarn.executor.memoryOverhead to something
greater than it's default of 384.
This will increase the gap between the executors heap size and what it requests
from yarn. It's required because jvms take up some memory beyond their heap
size.
-Sandy
Do you hit the same errors? Is it now saying your containers are exceed
~10 GB?
On Fri, Dec 19, 2014 at 11:16 AM, Jon Chase jon.ch...@gmail.com wrote:
I'm actually already running 1.1.1.
I also just tried --conf spark.yarn.executor.memoryOverhead=4096, but no
luck. Still getting
Hi Mukesh,
Based on your spark-submit command, it looks like you're only running with
2 executors on YARN. Also, how many cores does each machine have?
-Sandy
On Mon, Dec 29, 2014 at 4:36 AM, Mukesh Jha me.mukesh@gmail.com wrote:
Hello Experts,
I'm bench-marking Spark on YARN (
with ~15G of ram.
On Mon, Dec 29, 2014 at 11:23 PM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi Mukesh,
Based on your spark-submit command, it looks like you're only running
with 2 executors on YARN. Also, how many cores does each machine have?
-Sandy
On Mon, Dec 29, 2014 at 4:36 AM
*oops, I mean are you setting --executor-cores to 8
On Mon, Dec 29, 2014 at 10:15 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Are you setting --num-executors to 8?
On Mon, Dec 29, 2014 at 10:13 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Sorry Sandy, The command is just for reference
--executor-cores 2 --class com.oracle.ci.CmsgK2H
/homext/lib/MJ-ci-k2h.jar vm.cloud.com:2181/kafka spark-yarn avro 1 5000
On Mon, Dec 29, 2014 at 11:45 PM, Sandy Ryza sandy.r...@cloudera.com
wrote:
*oops, I mean are you setting --executor-cores to 8
On Mon, Dec 29, 2014 at 10:15 AM, Sandy Ryza
Hi Antony,
Unfortunately, all executors for any single Spark application must have the
same amount of memory. It's possibly to configure YARN with different
amounts of memory for each host (using
yarn.nodemanager.resource.memory-mb), so other apps might be able to take
advantage of the extra
Also, do you see any lines in the YARN NodeManager logs where it says that
it's killing a container?
-Sandy
On Wed, Feb 4, 2015 at 8:56 AM, Imran Rashid iras...@cloudera.com wrote:
Hi Michael,
judging from the logs, it seems that those tasks are just working a really
long time. If you have
you could over subscribe a node in terms of cpu
cores if you have memory available.
YMMV
HTH
-Mike
On Jan 30, 2015, at 7:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
My answer was based off the specs that Antony mentioned: different amounts
of memory, but 10 cores on all the boxes
and then constantly joining I think will be too slow
for a streaming job.
On Thu, Feb 5, 2015 at 8:06 PM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi Jon,
You'll need to put the file on HDFS (or whatever distributed filesystem
you're running on) and load it from there.
-Sandy
On Thu, Feb 5, 2015
Hi Sachin,
In your YARN configuration, either yarn.nodemanager.resource.memory-mb is
1024 on your nodes or yarn.scheduler.maximum-allocation-mb is set to 1024.
If you have more than 1024 MB on each node, you should bump these
properties. Otherwise, you should request fewer resources by setting
https://issues.apache.org/jira/browse/SPARK-5493 currently tracks this.
-Sandy
On Mon, Feb 2, 2015 at 9:37 PM, Zhan Zhang zzh...@hortonworks.com wrote:
I think you can configure hadoop/hive to do impersonation. There is no
difference between secure or insecure hadoop cluster by using kinit.
Hi Jon,
You'll need to put the file on HDFS (or whatever distributed filesystem
you're running on) and load it from there.
-Sandy
On Thu, Feb 5, 2015 at 3:18 PM, YaoPau jonrgr...@gmail.com wrote:
I have a file badFullIPs.csv of bad IP addresses used for filtering. In
yarn-client mode, I
transformations like,,,
[0..3] an integer, [4...20] an String, [21..27] another String and so on.
It's just a test code, I'd like to understand what it's happeing.
2015-02-04 18:57 GMT+01:00 Sandy Ryza sandy.r...@cloudera.com:
Hi Guillermo,
What exactly do you mean by each iteration
.bin
parameters
This is what I executed with different values in num-executors and
executor-memory.
What do you think there are too many executors for those HDDs? Could
it be the reason because of each executor takes more time?
2015-02-06 9:36 GMT+01:00 Sandy Ryza sandy.r...@cloudera.com
Hi Rishi,
If you look in the Spark UI, have any executors registered?
Are you able to collect a jstack of the driver process?
-Sandy
On Tue, Jan 20, 2015 at 9:07 PM, Rishi Yadav ri...@infoobjects.com wrote:
I am joining two tables as below, the program stalls at below log line
and never
Hi Anders,
I just tried this out and was able to successfully acquire executors. Any
strange log messages or additional color you can provide on your setup?
Does yarn-client mode work?
-Sandy
On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg arp...@spotify.com wrote:
Hi,
Compiled the latest
Hi Zsolt,
spark.executor.memory, spark.executor.cores, and spark.executor.instances
are only honored when launching through spark-submit. Marcelo is working
on a Spark launcher (SPARK-4924) that will enable using these
programmatically.
That's correct that the error comes up when
YarnClusterScheduler: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered
and have sufficient memory
On Fri, Feb 6, 2015 at 3:24 PM, Sandy Ryza sandy.r...@cloudera.com
wrote:
You can call collect() to pull in the contents of an RDD into the driver
Hey All,
I've been playing around with the new DataFrame and ML pipelines APIs and
am having trouble accomplishing what seems like should be a fairly basic
task.
I have a DataFrame where each column is a Double. I'd like to turn this
into a DataFrame with a features column and a label column
Hi Koert,
You should be using -Phadoop-2.3 instead of -Phadoop2.3.
-Sandy
On Wed, Feb 18, 2015 at 10:51 AM, Koert Kuipers ko...@tresata.com wrote:
does anyone have the right maven invocation for cdh5 with yarn?
i tried:
$ mvn -Phadoop2.3 -Dhadoop.version=2.5.0-cdh5.2.3 -Pyarn -DskipTests
What version of Java are you using? Core NLP dropped support for Java 7 in
its 3.5.0 release.
Also, the correct command line option is --jars, not --addJars.
On Thu, Feb 12, 2015 at 12:03 PM, Deborah Siegel deborah.sie...@gmail.com
wrote:
Hi Abe,
I'm new to Spark as well, so someone else
)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178)
at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:99)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
/Anders
On Thu, Feb 12, 2015 at 1:33 AM, Sandy Ryza sandy.r
Hi Antony,
If you look in the YARN NodeManager logs, do you see that it's killing the
executors? Or are they crashing for a different reason?
-Sandy
On Tue, Jan 27, 2015 at 12:43 PM, Antony Mayi antonym...@yahoo.com.invalid
wrote:
Hi,
I am using spark.yarn.executor.memoryOverhead=8192 yet
Hi Andrew,
Here's a note from the doc for sequenceFile:
* '''Note:''' Because Hadoop's RecordReader class re-uses the same
Writable object for each
* record, directly caching the returned RDD will create many references
to the same object.
* If you plan to directly cache Hadoop
record rather than holding many in memory at once). The documentation
should be updated.
On Fri, Jan 30, 2015 at 11:27 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi Andrew,
Here's a note from the doc for sequenceFile:
* '''Note:''' Because Hadoop's RecordReader class re-uses
Hi Sven,
What version of Spark are you running? Recent versions have a change that
allows PySpark to share a pool of processes instead of starting a new one
for each task.
-Sandy
On Fri, Jan 23, 2015 at 9:36 AM, Sven Krasser kras...@gmail.com wrote:
Hey all,
I am running into a problem
Hi Tomer,
Are you able to look in your NodeManager logs to see if the NodeManagers
are killing any executors for exceeding memory limits? If you observe
this, you can solve the problem by bumping up
spark.yarn.executor.memoryOverhead.
-Sandy
On Sun, Feb 1, 2015 at 5:28 AM, Tomer Benyamini
Hi Fanilo,
How many cores are you using per executor? Are you aware that you can
combat the container is running beyond physical memory limits error by
bumping the spark.yarn.executor.memoryOverhead property?
Also, are you caching the parsed version or the text?
-Sandy
On Wed, Jan 28, 2015 at
Hi Anders,
Have you checked your NodeManager logs to make sure YARN isn't killing
executors for exceeding memory limits?
-Sandy
On Tue, Jan 6, 2015 at 8:20 AM, Anders Arpteg arp...@spotify.com wrote:
Hey,
I have a job that keeps failing if too much data is processed, and I can't
see how to
StreamingContext(sparkConf, Seconds(bucketSecs))
val sc = new SparkContext()
On Tue, Feb 10, 2015 at 1:02 PM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Is the SparkContext you're using the same one that the StreamingContext
wraps? If not, I don't think using two is supported.
-Sandy
On Tue
Hi Arun,
The limit for the YARN user on the cluster nodes should be all that
matters. What version of Spark are you using? If you can turn on
sort-based shuffle it should solve this problem.
-Sandy
On Tue, Feb 10, 2015 at 1:16 PM, Arun Luthra arun.lut...@gmail.com wrote:
Hi,
I'm running
Hi Xuelin,
Spark 1.2 includes a dynamic allocation feature that allows Spark on YARN
to modulate its YARN resource consumption as the demands of the application
grow and shrink. This is somewhat coarser than what you call task-level
resource management. Elasticity comes through allocating and
Hi Mukesh,
Those line numbers in ConverterUtils in the stack trace don't appear to
line up with CDH 5.3:
https://github.com/cloudera/hadoop-common/blob/cdh5-2.5.0_5.3.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
Is it possible
1 - 100 of 172 matches
Mail list logo