Are you caching a lot of RDD's? If so, maybe you should unpersist() the
ones that you're not using. Also, if you're on 0.9, make sure
spark.shuffle.spill is enabled (which it is by default). This allows your
application to spill in-memory content to disk if necessary.
How much memory are you
, which
outputs information in a somewhat arbitrary format and will be deprecated
soon. If you find this feature useful, you can test it out by building the
master branch of Spark yourself, following the instructions in
https://github.com/apache/spark/pull/42.
Andrew
On Wed, Apr 2, 2014 at 3:39 PM
Yes, please do. :)
On Wed, Apr 2, 2014 at 7:36 PM, weida xu xwd0...@gmail.com wrote:
Hi,
Shall I send my questions to this Email address?
Sorry for bothering, and thanks a lot!
Logging inside a map function shouldn't freeze things. The messages
should be logged on the worker logs, since the code is executed on the
executors. If you throw a SparkException, however, it'll be propagated to
the driver after it has failed 4 or more times (by default).
On Fri, Apr 4, 2014 at
Setting spark.worker.timeout should not help you. What this value means is
that the master checks every 60 seconds whether the workers are still
alive, as the documentation describes. But this value also determines how
often the workers send HEARTBEAT messages to notify the master of their
, Dmitriy Lyubimov dlie...@gmail.comwrote:
On Thu, Apr 10, 2014 at 9:24 AM, Andrew Ash and...@andrewash.com wrote:
The biggest issue I've come across is that the cluster is somewhat
unstable when under memory pressure. Meaning that if you attempt to
persist an RDD that's too big for memory
, and the persisted RDD doesn't show
up on the UI because it is not the last RDD of this stage. I filed a JIRA
for this here: https://issues.apache.org/jira/browse/SPARK-1538.
Thanks again for reporting this. I will push out a fix shortly.
Andrew
On Tue, Apr 8, 2014 at 1:30 PM, Koert Kuipers ko
independently from an application.
On Sat, Apr 19, 2014 at 7:45 AM, Koert Kuipers ko...@tresata.com wrote:
got it. makes sense. i am surprised it worked before...
On Apr 18, 2014 9:12 PM, Andrew Or and...@databricks.com wrote:
Hi Koert,
I've tracked down what the bug is. The caveat
Did you build it with SPARK_HIVE=true?
On Thu, Apr 24, 2014 at 7:00 AM, diplomatic Guru
diplomaticg...@gmail.comwrote:
Hi Matei,
I checked out the git repository and built it. However, I'm still getting
below error. It couldn't find those SQL packages. Please advice.
package
This seems unrelated to not being able to load native-hadoop library. Is it
failing to connect to ResourceManager? Have you verified that there is an
RM process listening on port 8032 at the specified IP?
On Tue, May 6, 2014 at 6:25 PM, Sophia sln-1...@163.com wrote:
Hi,everyone,
Not a hack, this is documented here:
http://spark.apache.org/docs/0.9.1/configuration.html, and is in fact the
proper way of setting per-application Spark configurations.
Additionally, you can specify default Spark configurations so you don't
need to manually set it for all applications. If you
executor map to yarn workers or how
the different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM?
Thanks,
Arun
On Tue, May 20, 2014 at 2:25 PM, Andrew Or and...@databricks.com wrote:
Hi Gaurav and Arun,
Your settings seem reasonable; as long as YARN_CONF_DIR or
HADOOP_CONF_DIR
Hi Sophia,
In yarn-client mode, the node that submits the application can either be
inside or outside of the cluster. This node also hosts the driver
(SparkContext) of the application. All the executors, however, will be
launched on nodes inside the YARN cluster.
Andrew
2014-05-21 18:17 GMT-07
this
behavior.
What are you doing in your application? Do you see any exceptions in the
logs? Have you looked at the worker logs? You can browse through these on
the worker web UI on http://worker-url:8081
Andrew
is deprecated in Spark 1.0. You should
use bin/spark-submit instead. You can find information about its usage on
the docs I linked to you, or simply through the --help option.
Cheers,
Andrew
2014-05-22 11:38 GMT-07:00 Jon Bender jonathan.ben...@gmail.com:
Hey all,
I'm working through the basic
Hi Ibrahim,
If your worker machines only have 8GB of memory, then launching executors
with all the memory will leave no room for system processes. There is no
guideline, but I usually leave around 1GB just to be safe, so
conf.set(spark.executor.memory, 7g)
Andrew
2014-05-22 7:23 GMT-07:00
...@gmail.com:
Andrew,
Brilliant! I built on Java 7 but was still running our cluster on Java 6.
Upgraded the cluster and it worked (with slight tweaks to the args, I
guess the app args come first then yarn-standalone comes last):
SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0
Hi Roger,
This was due to a bug in the Spark shell code, and is fixed in the latest
master (and RC11). Here is the commit that fixed it:
https://github.com/apache/spark/commit/8edbee7d1b4afc192d97ba192a5526affc464205.
Try it now and it should work. :)
Andrew
2014-05-26 10:35 GMT+02:00 Perttu
, the steps outlined there
are quite useful.
Let me know if you get it working (or not).
Cheers,
Andrew
2014-06-02 17:24 GMT+02:00 Xu (Simon) Chen xche...@gmail.com:
Hi folks,
I have a weird problem when using pyspark with yarn. I started ipython as
follows:
IPYTHON=1 ./pyspark --master
:
https://www.mail-archive.com/reviews@spark.apache.org/msg08223.html
I've tested that zipped modules can as least be imported via
zipimport.
Any ideas?
-Simon
On Mon, Jun 2, 2014 at 11:50 AM, Andrew Or and...@databricks.com
wrote:
Hi Simon,
You shouldn't have
/201406.mbox/%3ccamjob8mr1+ias-sldz_rfrke_na2uubnmhrac4nukqyqnun...@mail.gmail.com%3e
As described in the link, the last resort is to try building your assembly
jar with JAVA_HOME set to Java 6. This usually fixes the problem (more
details in the link provided).
Cheers,
Andrew
2014-06-10 6:35 GMT
Can you try file:/root/spark_log?
2014-06-10 19:22 GMT-07:00 zhen z...@latrobe.edu.au:
I checked the permission on root and it is the following:
drwxr-xr-x 20 root root 4096 Jun 11 01:05 root
So anyway, I changed to use /tmp/spark_log instead and this time I made
sure
that all
No, I meant pass the path to the history server start script.
2014-06-10 19:33 GMT-07:00 zhen z...@latrobe.edu.au:
Sure here it is:
drwxrwxrwx 2 1000 root 4096 Jun 11 01:05 spark_logs
Zhen
--
View this message in context:
Hi Wang Hao,
This is not removed. We moved it here:
http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
If you're building with SBT, and you don't specify the
SPARK_HADOOP_VERSION, then it defaults to 1.0.4.
Andrew
2014-06-12 6:24 GMT-07:00 Hao Wang wh.s...@gmail.com
Not sure if this is what you're looking for, but have you looked at java's
ProcessBuilder? You can do something like
for (line - lines) {
val command = line.split( ) // You may need to deal with quoted strings
val process = new ProcessBuilder(command)
// redirect output of process to main
Are you referring to accessing a SparkUI for an application that has
finished? First you need to enable event logging while the application is
still running. In Spark 1.0, you set this by adding a line to
$SPARK_HOME/conf/spark-defaults.conf:
spark.eventLog.enabled true
Other than that, the
Standalone-client mode is not officially supported at the moment. For
standalone-cluster and yarn-client modes, however, they should work.
For both modes, are you running spark-submit from within the cluster, or
outside of it? If the latter, could you try running it from within the
cluster and
How long does it get stuck for? This is a common sign for the OS thrashing
due to out of memory exceptions. If you keep it running longer, does it
throw an error?
Depending on how large your other RDD is (and your join operation), memory
pressure may or may not be the problem at all. It could be
will be done through spark-submit, so you may miss out on relevant new
features or bug fixes.
Andrew
2014-06-19 7:41 GMT-07:00 Koert Kuipers ko...@tresata.com:
still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark
standalone.
for example if i have a akka timeout setting that i
if that does the job.
Andrew
2014-06-19 6:04 GMT-07:00 Praveen Seluka psel...@qubole.com:
I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN +
HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles.
Now am trying to run the example Spark job . (In Yarn-cluster
(Also, an easier workaround is to simply submit the application from within
your
cluster, thus saving you all the manual labor of reconfiguring everything
to use
public hostnames. This may or may not be applicable to your use case.)
2014-06-19 14:04 GMT-07:00 Andrew Or and...@databricks.com
to the SparkContext (see
http://spark.apache.org/docs/latest/configuration.html#spark-properties).
Andrew
2014-06-18 22:21 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in:
Hi all,
I have a doubt regarding the options in spark-env.sh. I set the following
values in the file in master and 2 workers
Ah never mind. The 0.0.0.0 is for the UI, not for Master, which uses the
output of the hostname command. But yes, long answer short, go to the web
UI and use that URL.
2014-06-23 11:13 GMT-07:00 Andrew Or and...@databricks.com:
Hm, spark://localhost:7077 should work, because the standalone
://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this
out, and we will get to fixing these shortly.
Best,
Andrew
2014-06-20 6:06 GMT-07:00 Gino Bustelo lbust...@gmail.com:
I've found that the jar will be copied to the worker from hdfs fine, but
it is not added to the spark context
Hi Sophia, did you ever resolve this?
A common cause for not giving resources to the job is that the RM cannot
communicate with the workers.
This itself has many possible causes. Do you have a full stack trace from
the logs?
Andrew
2014-06-13 0:46 GMT-07:00 Sophia sln-1...@163.com
RDDs they are most interested
in, so it makes sense to give them control over caching behavior.
Best,
Andrew
2014-06-26 5:36 GMT-07:00 tomsheep...@gmail.com tomsheep...@gmail.com:
Hi all,
I have a newbie question about StorageLevel of spark. I came up with
these sentences in spark documents
Hi Konstatin,
We use hadoop as a library in a few places in Spark. I wonder why the path
includes null though.
Could you provide the full stack trace?
Andrew
2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com:
Hi all,
I'm trying to run some transformation
executor will also die of the same
problem.
Best,
Andrew
2014-07-02 6:22 GMT-07:00 Yana Kadiyska yana.kadiy...@gmail.com:
Can you elaborate why You need to configure the spark.shuffle.spill
true again in the config -- the default for spark.shuffle.spill is
set to true according to the
doc
your null keys before
passing your key value pairs to a combine operator (e.g. groupBy,
reduceBy). For instance, rdd.map { case (k, v) = if (k == null)
(SPECIAL_VALUE, v) else (k, v) }.
Best,
Andrew
2014-07-02 10:22 GMT-07:00 Konstantin Kudryavtsev
kudryavtsev.konstan...@gmail.com:
Hi all
Hi Christophe, another Andrew speaking.
Your configuration looks fine to me. From the stack trace it seems that we
are in fact closing the file system pre-maturely elsewhere in the system,
such that when it tries to write the APPLICATION_COMPLETE file it throws
the exception you see. This does
Others have also asked for this on the mailing list, and hence there's a
related JIRA: https://issues.apache.org/jira/browse/SPARK-1762. Ankur
brings up a good point in that any current implementation of in-memory
shuffles will compete with application RDD blocks. I think we should
definitely add
the red text is because it appears only on the driver
containers, not the executor containers. This is because SparkUI belongs to
the SparkContext, which only exists on the driver.
Andrew
2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com:
Hi guys,
Not sure if you have similar issues. Did
the redirect error has
little to do with Spark itself, but more to do with how you set up the
cluster. I have actually run into this myself, but I haven't found a
workaround. Let me know if you find anything.
2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com:
As Andrew explained, the port
Here's the most updated version of the same page:
http://spark.apache.org/docs/latest/job-scheduling
2014-07-08 12:44 GMT-07:00 Sujeet Varakhedi svarakh...@gopivotal.com:
This is a good start:
http://www.eecs.berkeley.edu/~tdas/spark_docs/job-scheduling.html
On Tue, Jul 8, 2014 at 9:11
It seems that your driver (which I'm assuming you launched on the master
node) can now connect to the Master, but your executors cannot. Did you
make sure that all nodes have the same conf/spark-defaults.conf,
conf/spark-env.sh, and conf/slaves? It would be good if you can post the
stderr of the
or the --master parameter to
spark-submit.
We will update the documentation shortly. Thanks for letting us know.
Andrew
2014-07-08 16:29 GMT-07:00 Mikhail Strebkov streb...@gmail.com:
Hi! I've been using Spark compiled from 1.0 branch at some point (~2 month
ago). The setup is a standalone cluster with 4
I don't see why using SparkSubmit.scala as your entry point would be any
different, because all that does is invoke the main class of Client.scala
(e.g. for Yarn) after setting up all the class paths and configuration
options. (Though I haven't tried this myself)
2014-07-09 9:40 GMT-07:00 Ron
?
Andrew
2014-07-10 10:17 GMT-07:00 Aris Vlasakakis a...@vlasakakis.com:
Thank you very much Yana for replying!
So right now the set up is a single-node machine which is my cluster,
and YES you are right my submitting laptop has a different path to the
spark-1.0.0 installation than the cluster
Yes, there are a few bugs in the UI in the event of a node failure.
The duplicated stages in both the active and completed tables should be
fixed by this PR: https://github.com/apache/spark/pull/1262
The fact that the progress bar on the stages page displays an overflow
(e.g. 5/4) is still an
.mbox/%3cCAMJOb8mYTzxrHWcaDOnVoOTw1TFrd9kJjOyj1=nkgmsk5vs...@mail.gmail.com%3e
Andrew
2014-07-10 1:57 GMT-07:00 cjwang c...@cjwang.us:
Not sure that was what I want. I tried to run Spark Shell on a machine
other
than the master and got the same error. The 192 was suppose to be a
simple
-submit (or
spark-shell, which calls spark-submit) with the --verbose flag.
Let me know if this fixes it. I will get to fixing the root problem soon.
Andrew
2014-07-10 18:43 GMT-07:00 cjwang c...@cjwang.us:
Andrew,
Thanks for replying. I did the following and the result was still the
same.
1
Yes, the documentation is actually a little outdated. We will get around to
fix it shortly. Please use --driver-cores or --executor-cores instead.
2014-07-14 19:10 GMT-07:00 cjwang c...@cjwang.us:
Neither do they work in new 1.0.1 either
--
View this message in context:
to be some inconsistency or missing pieces in the logs you
posted. After an executor says driver disassociated, what happens in the
driver logs? Is an exception thrown or something?
It would be useful if you could also post your conf/spark-env.sh.
Andrew
2014-07-17 14:11 GMT-07:00 Marcelo Vanzin
thing to
check is whether the node from which you launch spark submit can access the
internal address of the master (and port 7077). One quick way to verify
that is to attempt a telnet into it.
Let me know if you find anything.
Andrew
2014-07-17 15:57 GMT-07:00 ranjanp piyush_ran...@hotmail.com:
Hi
Hi Chen,
spark.executor.extraJavaOptions is introduced in Spark 1.0, not in Spark
0.9. You need to
export SPARK_JAVA_OPTS= -Dspark.config1=value1 -Dspark.config2=value2
in conf/spark-env.sh.
Let me know if that works.
Andrew
2014-07-17 18:15 GMT-07:00 Tathagata Das tathagata.das1
HDFS. Try removing
all old jars from your .sparkStaging directory and try again?
Let me know if that does the job,
Andrew
2014-07-16 23:42 GMT-07:00 cmti95035 cmti95...@gmail.com:
They're all the same version. Actually even without the --jars parameter
it
got the same error. Looks like
still work (I just tried this on my own EC2 cluster). By the way,
SPARK_MASTER is actually deprecated. Instead, please use bin/spark-submit
--master [your master].
Andrew
2014-07-16 23:46 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com:
You can try the following in the spark-shell:
1. Run
SPARK_JAVA_OPTS is deprecated as of 1.0)
2014-07-17 21:08 GMT-07:00 Chen Song chen.song...@gmail.com:
Thanks Andrew.
Say that I want to turn on CMS gc for each worker.
All I need to do is add the following line to conf/spark-env.sh on node
where I submit the application.
-XX
metrics are aggregated over the entire duration of the task (i.e. within
each task you can spill multiple times).
Andrew
2014-07-18 4:09 GMT-07:00 Sébastien Rainville sebastienrainvi...@gmail.com
:
Hi,
in the Spark UI, one of the metrics is shuffle spill (memory). What is
it exactly? Spilling
is deprecated)
- add --master yarn-cluster in your spark-submit command
Another worrying thing is the warning from your logs:
14/07/21 22:38:42 WARN spark.SparkConf: null jar passed to SparkContext
constructor
How are you creating your SparkContext?
Andrew
2014-07-21 7:47 GMT-07:00 Sam Liu
workaround for this issue, but you might try to reduce the
number of concurrently running tasks (partitions) to avoid emitting too
many events. The root cause of the listener queue taking too much time to
process events is recorded in SPARK-2316, which we also intend to fix by
Spark 1.1.
Andrew
.
Andrew
2014-07-21 8:37 GMT-07:00 mrm ma...@skimlinks.com:
Hi,
I am using pyspark and have persisted a list of rdds within a function, but
I don't have a reference to them anymore. The RDD's are listed in the UI,
under the Storage tab, and they have names associated to them (e.g. 4
line.
Andrew
2014-07-21 10:01 GMT-07:00 Nick R. Katsipoulakis kat...@cs.pitt.edu:
Thank you Abel,
It seems that your advice worked. Even though I receive a message that it
is a deprecated way of defining Spark Memory (the system prompts that I
should set spark.driver.memory), the memory
, it seems that you set your log level to WARN. The cause is most
probably because the cache is not big enough, but setting the log level to
INFO will provide you with more information on the exact sizes that are
being used by the storage and the blocks).
Andrew
2014-07-19 13:01 GMT-07:00 rindra
Hi Earthson,
Is your problem resolved? The way you submit your application looks alright
to me; spark-submit should be able to parse the combination of --master and
--deploy-mode correctly. I suspect you might have hard-coded yarn-cluster
or something in your application.
Andrew
2014-07-22 1
/ephemeral-hdfs.
Andrew
2014-07-22 7:07 GMT-07:00 mrm ma...@skimlinks.com:
Hi,
Where can I find the version of Hadoop my cluster is using? I launched my
ec2 cluster using the spark-ec2 script with the --hadoop-major-version=2
option. However, the folder hadoop-native/lib in the master node only
driver.
Andrew
2014-07-23 10:40 GMT-07:00 didi did...@gmail.com:
Hi all
I guess the problem has something to do with the fact i submit the job to
remote location
I submit from OracleVM running ubuntu and suspect some NAT issues maybe?
akka tcp tries this address as follows from the STDERR
Hi Eric,
Have you checked the executor logs? It is possible they died because of
some exception, and the message you see is just a side effect.
Andrew
2014-07-23 8:27 GMT-07:00 Eric Friedman eric.d.fried...@gmail.com:
I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc
in spark
should not be done through any config or environment variable that
references java opts.
Andrew
2014-07-23 1:04 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in:
Hi all,
Sorry for taking this topic again,still I am confused on this.
I set SPARK_DAEMON_JAVA_OPTS=-XX:+UseCompressedOops
-submit. The equivalent also holds for executor memory (i.e.
--executor-memory). That way you don't have to wrangle with the millions
of overlapping configs / environment variables for all the deploy modes.
-Andrew
2014-07-23 4:18 GMT-07:00 mrm ma...@skimlinks.com:
Hi,
I figured out my problem
Hi Andrew,
It's definitely not bad practice to use spark-shell with HistoryServer. The
issue here is not with spark-shell, but the way we pass Spark configs to
the application. spark-defaults.conf does not currently support embedding
environment variables, but instead interprets everything
Yes, both of these are derived from the same source, and this source
includes the driver. In other words, if you submit a job with 10 executors
you will get back 11 for both statuses.
2014-07-28 15:40 GMT-07:00 Sung Hwan Chung coded...@cs.stanford.edu:
Do getExecutorStorageStatus and
They are found in the executors' logs (not the worker's). In general, all
code inside foreach or map etc. are executed on the executors. You can find
these either through the Master UI (under Running Applications) or manually
on the worker machines (under $SPARK_HOME/work).
-Andrew
2014-07-30
of UI fixes
since 1.0. Could you check if this is still a problem on the latest master:
https://github.com/apache/spark
Andrew
2014-08-04 12:10 GMT-07:00 anthonyjschu...@gmail.com
anthonyjschu...@gmail.com:
I am (not) seeing this also... No items in the storage UI page. using 1.0
with HDFS
they can
still be kicked out by LRU, however.
-Andrew
2014-08-05 0:13 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com:
You need to use persist or cache those rdds to appear in the Storage.
Unless you do it, those rdds will be computed again.
Thanks
Best Regards
On Tue, Aug 5, 2014 at 8:03 AM
in your conf won't actually do anything for you. Instead, you need to run
spark-submit as follows
bin/spark-submit --driver-memory 2g --class your.class.here app.jar
This will start the JVM with 2G instead of the default 512M.
-Andrew
2014-08-05 6:43 GMT-07:00 Grzegorz Białek grzegorz.bia
(Clarification: you'll need to pass in --driver-memory not just for local
mode, but for any application you're launching with client deploy mode)
2014-08-05 9:24 GMT-07:00 Andrew Or and...@databricks.com:
Hi Grzegorz,
For local mode you only have one executor, and this executor is your
-hdfs/conf. (Are you running HdfsWordCount by any chance?)
As Mayur mentioned, a good way to see whether or not there is any service
listening on port 9000 is telnet.
Andrew
2014-08-05 15:01 GMT-07:00 Mayur Rustagi mayur.rust...@gmail.com:
Then dont specify hdfs when you read file.
Also
not using the EC2 scripts, you will have to rsync the directory manually
(copy-dir just calls rsync internally).
-Andrew
2014-08-06 2:39 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com:
Looks like a netty conflict there, most likely you are having mutiple
versions of netty jars (eg:
netty-3.6.6
/09f7e4587bbdf74207d2629e8c1314f93d865999.
This will be available in Spark 1.1, but for now you will have to open all
ports among the nodes in your cluster.
-Andrew
2014-08-06 10:23 GMT-07:00 durin m...@simon-schaefer.net:
Update: I can get it to work by disabling iptables temporarily. I can
in Spark 1.1.
-Andrew
2014-08-06 8:29 GMT-07:00 Gary Malouf malouf.g...@gmail.com:
We have Spark 1.0.1 on Mesos deployed as a cluster in EC2. Our Devops
lead tells me that Spark jobs can not be submitted from local machines due
to the complexity of opening the right ports to the world etc
(there is a button), though this is not specific to
standalone mode.
There is currently a lot of trust between the standalone master and the
application. Maybe this is not always a good thing. :)
-Andrew
2014-08-06 12:23 GMT-07:00 Gary Malouf malouf.g...@gmail.com:
I have a few questions
The Spark UI isn't available through the same address; otherwise new
applications won't be able to bind to it. Once the old application
finishes, the standalone Master renders the after-the-fact application UI
and exposes it under a different URL. To see this, go to the Master UI
(master-url:8080)
To add to the pile of information we're asking you to provide, what version
of Spark are you running?
2014-08-13 11:11 GMT-07:00 Shivaram Venkataraman shiva...@eecs.berkeley.edu
:
If the JVM heap size is close to the memory limit the OS sometimes kills
the process under memory pressure. I've
mechanism, your executors will quickly run out of
memory with the default of 512m.
Let me know if setting this does the job. If so, you can even persist the
RDDs to memory as well to get better performance, though this depends on
your workload.
-Andrew
2014-08-13 11:38 GMT-07:00 rpandya r
of setting this is by adding the line spark.eventLog.enabled true to
$SPARK_HOME/conf/spark-defaults.conf. This will be picked up by Spark
submit and passed to your application.
Cheers,
Andrew
2014-08-14 15:45 GMT-07:00 durin m...@simon-schaefer.net:
If I don't understand you wrong, setting event
:7077). In other modes, you will need to use the
history server instead.
Does this make sense?
Andrew
2014-08-14 18:08 GMT-07:00 SK skrishna...@gmail.com:
More specifically, as indicated by Patrick above, in 1.0+, apps will have
persistent state so that the UI can be reloaded. Is there a way
/spark-defaults.conf for you automatically
so you don't have to specify it each time on the command line. Of course,
you can also do the same in YARN.
-Andrew
2014-08-15 10:45 GMT-07:00 Soumya Simanta soumya.sima...@gmail.com:
I've been using the standalone cluster all this time and it worked fine
Hi 齐忠,
Thanks for reporting this. You're correct that the default deploy mode is
client. However, this seems to be a bug in the YARN integration code; we
should not throw null pointer exception in any case. What version of Spark
are you using?
Andrew
2014-08-15 0:23 GMT-07:00 centerqi hu cente
to just the summary statistics under Completed Applications. I have
listed a few debugging steps in the paragraph above, so maybe they're also
applicable to you.
Let me know if that works,
Andrew
2014-08-15 11:07 GMT-07:00 SK skrishna...@gmail.com:
Hi,
Ok, I was specifying --master local. I
The --master should override any other ways of setting the Spark
master.
Ah yes, actually you can set spark.master directly in your application
through SparkConf. Thanks Marcelo.
2014-08-19 14:47 GMT-07:00 Marcelo Vanzin van...@cloudera.com:
On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja
This should be fixed in the latest Spark. What branch are you running?
2014-08-25 1:32 GMT-07:00 Wang, Jensen jensen.w...@sap.com:
Hi, All
When I run spark applications, I see from the web-ui that some
stage description are like “apply at Option.scala:120”.
Why spark splits a
Hi Cheng,
You specify extra python files through --py-files. For example:
bin/spark-submit [your other options] --py-files helper.py main_app.py
-Andrew
2014-08-27 22:58 GMT-07:00 Chengi Liu chengi.liu...@gmail.com:
Hi,
I have two files..
main_app.py and helper.py
main_app.py calls
No, not currently.
2014-09-01 2:53 GMT-07:00 Zhanfeng Huo huozhanf...@gmail.com:
Hi,all:
Can value in spark-defaults.conf support system variables?
Such as mess = ${user.home}/${user.name}.
Best Regards
--
Zhanfeng Huo
somewhat puzzled as to how you ran into an OOM from this configuration,
however. Does this problem still occur if you set the correct master?
-Andrew
2014-09-02 2:42 GMT-07:00 Oleg Ruchovets oruchov...@gmail.com:
Hi ,
I've installed pyspark on hpd hortonworks cluster.
Executing pi example
, it is unlikely to fully fit in memory anyway, so it's probably not a
bad idea to just write your results to a file in batches while the
application is still running.
-Andrew
2014-09-01 22:16 GMT-07:00 Hao Wang wh.s...@gmail.com:
Hi, all
I am wondering if I use Spark-shell to scan a large file
properties
$ export SPARK_YARN_USER_ENV=YARN_LOCAL_DIR=/mnt,/mnt2
$ bin/spark-shell --master yarn --jars /local/path/to/my/jar1,/another/jar2
Best,
-Andrew
on the
submitter node.
Let me know if you have more questions,
-Andrew
2014-09-02 15:12 GMT-07:00 Dimension Data, LLC. subscripti...@didata.us:
Hello friends:
I have a follow-up to Andrew's well articulated answer below (thank you
for that).
(1) I've seen both of these invocations
Hi Greg,
For future references you can set spark.history.ui.port in
SPARK_HISTORY_OPTS. By default this should point to 18080. This information
is actually in the link that you provided :) (as well as the most updated
docs here: http://spark.apache.org/docs/latest/monitoring.html)
-Andrew
2014
...@mail.gmail.com%3e
Let me know if you can get it working,
-Andrew
2014-09-03 5:03 GMT-07:00 Oleg Ruchovets oruchov...@gmail.com:
Hi all.
I am trying to run pyspark on yarn already couple of days:
http://hortonworks.com/kb/spark-1-0-1-technical-preview-hdp-2-1-3/
I posted exception
1 - 100 of 574 matches
Mail list logo