Re: GC overhead limit exceeded

2014-03-27 Thread Andrew Or
Are you caching a lot of RDD's? If so, maybe you should unpersist() the ones that you're not using. Also, if you're on 0.9, make sure spark.shuffle.spill is enabled (which it is by default). This allows your application to spill in-memory content to disk if necessary. How much memory are you

Re: Is there a way to get the current progress of the job?

2014-04-02 Thread Andrew Or
, which outputs information in a somewhat arbitrary format and will be deprecated soon. If you find this feature useful, you can test it out by building the master branch of Spark yourself, following the instructions in https://github.com/apache/spark/pull/42. Andrew On Wed, Apr 2, 2014 at 3:39 PM

Re: How to ask questions on Spark usage?

2014-04-02 Thread Andrew Or
Yes, please do. :) On Wed, Apr 2, 2014 at 7:36 PM, weida xu xwd0...@gmail.com wrote: Hi, Shall I send my questions to this Email address? Sorry for bothering, and thanks a lot!

Re: How are exceptions in map functions handled in Spark?

2014-04-04 Thread Andrew Or
Logging inside a map function shouldn't freeze things. The messages should be logged on the worker logs, since the code is executed on the executors. If you throw a SparkException, however, it'll be propagated to the driver after it has failed 4 or more times (by default). On Fri, Apr 4, 2014 at

Re: Heartbeat exceeds

2014-04-05 Thread Andrew Or
Setting spark.worker.timeout should not help you. What this value means is that the master checks every 60 seconds whether the workers are still alive, as the documentation describes. But this value also determines how often the workers send HEARTBEAT messages to notify the master of their

Re: Spark - ready for prime time?

2014-04-10 Thread Andrew Or
, Dmitriy Lyubimov dlie...@gmail.comwrote: On Thu, Apr 10, 2014 at 9:24 AM, Andrew Ash and...@andrewash.com wrote: The biggest issue I've come across is that the cluster is somewhat unstable when under memory pressure. Meaning that if you attempt to persist an RDD that's too big for memory

Re: ui broken in latest 1.0.0

2014-04-18 Thread Andrew Or
, and the persisted RDD doesn't show up on the UI because it is not the last RDD of this stage. I filed a JIRA for this here: https://issues.apache.org/jira/browse/SPARK-1538. Thanks again for reporting this. I will push out a fix shortly. Andrew On Tue, Apr 8, 2014 at 1:30 PM, Koert Kuipers ko

Re: ui broken in latest 1.0.0

2014-04-19 Thread Andrew Or
independently from an application. On Sat, Apr 19, 2014 at 7:45 AM, Koert Kuipers ko...@tresata.com wrote: got it. makes sense. i am surprised it worked before... On Apr 18, 2014 9:12 PM, Andrew Or and...@databricks.com wrote: Hi Koert, I've tracked down what the bug is. The caveat

Re: How do I access the SPARK SQL

2014-04-24 Thread Andrew Or
Did you build it with SPARK_HIVE=true? On Thu, Apr 24, 2014 at 7:00 AM, diplomatic Guru diplomaticg...@gmail.comwrote: Hi Matei, I checked out the git repository and built it. However, I'm still getting below error. It couldn't find those SQL packages. Please advice. package

Re: Unable to load native-hadoop library problem

2014-05-15 Thread Andrew Or
This seems unrelated to not being able to load native-hadoop library. Is it failing to connect to ResourceManager? Have you verified that there is an RM process listening on port 8032 at the specified IP? On Tue, May 6, 2014 at 6:25 PM, Sophia sln-1...@163.com wrote: Hi,everyone,

Re: How to pass config variables to workers

2014-05-16 Thread Andrew Or
Not a hack, this is documented here: http://spark.apache.org/docs/0.9.1/configuration.html, and is in fact the proper way of setting per-application Spark configurations. Additionally, you can specify default Spark configurations so you don't need to manually set it for all applications. If you

Re: Yarn configuration file doesn't work when run with yarn-client mode

2014-05-20 Thread Andrew Or
executor map to yarn workers or how the different memory settings interplay, SPARK_MEM vs YARN_WORKER_MEM? Thanks, Arun On Tue, May 20, 2014 at 2:25 PM, Andrew Or and...@databricks.com wrote: Hi Gaurav and Arun, Your settings seem reasonable; as long as YARN_CONF_DIR or HADOOP_CONF_DIR

Re: yarn-client mode question

2014-05-21 Thread Andrew Or
Hi Sophia, In yarn-client mode, the node that submits the application can either be inside or outside of the cluster. This node also hosts the driver (SparkContext) of the application. All the executors, however, will be launched on nodes inside the YARN cluster. Andrew 2014-05-21 18:17 GMT-07

Re: SparkContext#stop

2014-05-22 Thread Andrew Or
this behavior. What are you doing in your application? Do you see any exceptions in the logs? Have you looked at the worker logs? You can browse through these on the worker web UI on http://worker-url:8081 Andrew

Re: Spark / YARN classpath issues

2014-05-22 Thread Andrew Or
is deprecated in Spark 1.0. You should use bin/spark-submit instead. You can find information about its usage on the docs I linked to you, or simply through the --help option. Cheers, Andrew 2014-05-22 11:38 GMT-07:00 Jon Bender jonathan.ben...@gmail.com: Hey all, I'm working through the basic

Re: spark setting maximum available memory

2014-05-22 Thread Andrew Or
Hi Ibrahim, If your worker machines only have 8GB of memory, then launching executors with all the memory will leave no room for system processes. There is no guideline, but I usually leave around 1GB just to be safe, so conf.set(spark.executor.memory, 7g) Andrew 2014-05-22 7:23 GMT-07:00

Re: Spark / YARN classpath issues

2014-05-22 Thread Andrew Or
...@gmail.com: Andrew, Brilliant! I built on Java 7 but was still running our cluster on Java 6. Upgraded the cluster and it worked (with slight tweaks to the args, I guess the app args come first then yarn-standalone comes last): SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0

Re: Running a spark-submit compatible app in spark-shell

2014-05-26 Thread Andrew Or
Hi Roger, This was due to a bug in the Spark shell code, and is fixed in the latest master (and RC11). Here is the commit that fixed it: https://github.com/apache/spark/commit/8edbee7d1b4afc192d97ba192a5526affc464205. Try it now and it should work. :) Andrew 2014-05-26 10:35 GMT+02:00 Perttu

Re: pyspark problems on yarn (job not parallelized, and Py4JJavaError)

2014-06-02 Thread Andrew Or
, the steps outlined there are quite useful. Let me know if you get it working (or not). Cheers, Andrew 2014-06-02 17:24 GMT+02:00 Xu (Simon) Chen xche...@gmail.com: Hi folks, I have a weird problem when using pyspark with yarn. I started ipython as follows: IPYTHON=1 ./pyspark --master

Re: pyspark problems on yarn (job not parallelized, and Py4JJavaError)

2014-06-03 Thread Andrew Or
: https://www.mail-archive.com/reviews@spark.apache.org/msg08223.html I've tested that zipped modules can as least be imported via zipimport. Any ideas? -Simon On Mon, Jun 2, 2014 at 11:50 AM, Andrew Or and...@databricks.com wrote: Hi Simon, You shouldn't have

Re: Can't find pyspark when using PySpark on YARN

2014-06-10 Thread Andrew Or
/201406.mbox/%3ccamjob8mr1+ias-sldz_rfrke_na2uubnmhrac4nukqyqnun...@mail.gmail.com%3e As described in the link, the last resort is to try building your assembly jar with JAVA_HOME set to Java 6. This usually fixes the problem (more details in the link provided). Cheers, Andrew 2014-06-10 6:35 GMT

Re: problem starting the history server on EC2

2014-06-10 Thread Andrew Or
Can you try file:/root/spark_log? 2014-06-10 19:22 GMT-07:00 zhen z...@latrobe.edu.au: I checked the permission on root and it is the following: drwxr-xr-x 20 root root 4096 Jun 11 01:05 root So anyway, I changed to use /tmp/spark_log instead and this time I made sure that all

Re: problem starting the history server on EC2

2014-06-10 Thread Andrew Or
No, I meant pass the path to the history server start script. 2014-06-10 19:33 GMT-07:00 zhen z...@latrobe.edu.au: Sure here it is: drwxrwxrwx 2 1000 root 4096 Jun 11 01:05 spark_logs Zhen -- View this message in context:

Re: Spark 1.0.0 Standalone AppClient cannot connect Master

2014-06-12 Thread Andrew Or
Hi Wang Hao, This is not removed. We moved it here: http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html If you're building with SBT, and you don't specify the SPARK_HADOOP_VERSION, then it defaults to 1.0.4. Andrew 2014-06-12 6:24 GMT-07:00 Hao Wang wh.s...@gmail.com

Re: use spark-shell in the source

2014-06-12 Thread Andrew Or
Not sure if this is what you're looking for, but have you looked at java's ProcessBuilder? You can do something like for (line - lines) { val command = line.split( ) // You may need to deal with quoted strings val process = new ProcessBuilder(command) // redirect output of process to main

Re: spark master UI does not keep detailed application history

2014-06-16 Thread Andrew Or
Are you referring to accessing a SparkUI for an application that has finished? First you need to enable event logging while the application is still running. In Spark 1.0, you set this by adding a line to $SPARK_HOME/conf/spark-defaults.conf: spark.eventLog.enabled true Other than that, the

Re: Yarn-client mode and standalone-client mode hang during job start

2014-06-17 Thread Andrew Or
Standalone-client mode is not officially supported at the moment. For standalone-cluster and yarn-client modes, however, they should work. For both modes, are you running spark-submit from within the cluster, or outside of it? If the latter, could you try running it from within the cluster and

Re: join operation is taking too much time

2014-06-17 Thread Andrew Or
How long does it get stuck for? This is a common sign for the OS thrashing due to out of memory exceptions. If you keep it running longer, does it throw an error? Depending on how large your other RDD is (and your join operation), memory pressure may or may not be the problem at all. It could be

Re: little confused about SPARK_JAVA_OPTS alternatives

2014-06-19 Thread Andrew Or
will be done through spark-submit, so you may miss out on relevant new features or bug fixes. Andrew 2014-06-19 7:41 GMT-07:00 Koert Kuipers ko...@tresata.com: still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark standalone. for example if i have a akka timeout setting that i

Re: Getting started : Spark on YARN issue

2014-06-19 Thread Andrew Or
if that does the job. Andrew 2014-06-19 6:04 GMT-07:00 Praveen Seluka psel...@qubole.com: I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN + HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles. Now am trying to run the example Spark job . (In Yarn-cluster

Re: Getting started : Spark on YARN issue

2014-06-19 Thread Andrew Or
(Also, an easier workaround is to simply submit the application from within your cluster, thus saving you all the manual labor of reconfiguring everything to use public hostnames. This may or may not be applicable to your use case.) 2014-06-19 14:04 GMT-07:00 Andrew Or and...@databricks.com

Re: options set in spark-env.sh is not reflecting on actual execution

2014-06-20 Thread Andrew Or
to the SparkContext (see http://spark.apache.org/docs/latest/configuration.html#spark-properties). Andrew 2014-06-18 22:21 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in: Hi all, I have a doubt regarding the options in spark-env.sh. I set the following values in the file in master and 2 workers

Re: hi

2014-06-23 Thread Andrew Or
Ah never mind. The 0.0.0.0 is for the UI, not for Master, which uses the output of the hostname command. But yes, long answer short, go to the web UI and use that URL. 2014-06-23 11:13 GMT-07:00 Andrew Or and...@databricks.com: Hm, spark://localhost:7077 should work, because the standalone

Re: problem about cluster mode of spark 1.0.0

2014-06-24 Thread Andrew Or
://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this out, and we will get to fixing these shortly. Best, Andrew 2014-06-20 6:06 GMT-07:00 Gino Bustelo lbust...@gmail.com: I've found that the jar will be copied to the worker from hdfs fine, but it is not added to the spark context

Re: Spark 1.0.0 on yarn cluster problem

2014-06-25 Thread Andrew Or
Hi Sophia, did you ever resolve this? A common cause for not giving resources to the job is that the RM cannot communicate with the workers. This itself has many possible causes. Do you have a full stack trace from the logs? Andrew 2014-06-13 0:46 GMT-07:00 Sophia sln-1...@163.com

Re: About StorageLevel

2014-06-26 Thread Andrew Or
RDDs they are most interested in, so it makes sense to give them control over caching behavior. Best, Andrew 2014-06-26 5:36 GMT-07:00 tomsheep...@gmail.com tomsheep...@gmail.com: Hi all, I have a newbie question about StorageLevel of spark. I came up with these sentences in spark documents

Re: Run spark unit test on Windows 7

2014-07-02 Thread Andrew Or
Hi Konstatin, We use hadoop as a library in a few places in Spark. I wonder why the path includes null though. Could you provide the full stack trace? Andrew 2014-07-02 9:38 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all, I'm trying to run some transformation

Re: Help alleviating OOM errors

2014-07-02 Thread Andrew Or
executor will also die of the same problem. Best, Andrew 2014-07-02 6:22 GMT-07:00 Yana Kadiyska yana.kadiy...@gmail.com: Can you elaborate why You need to configure the spark.shuffle.spill true again in the config -- the default for spark.shuffle.spill is set to true according to the doc

Re: NullPointerException on ExternalAppendOnlyMap

2014-07-02 Thread Andrew Or
your null keys before passing your key value pairs to a combine operator (e.g. groupBy, reduceBy). For instance, rdd.map { case (k, v) = if (k == null) (SPECIAL_VALUE, v) else (k, v) }. Best, Andrew 2014-07-02 10:22 GMT-07:00 Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com: Hi all

Re: write event logs with YARN

2014-07-03 Thread Andrew Or
Hi Christophe, another Andrew speaking. Your configuration looks fine to me. From the stack trace it seems that we are in fact closing the file system pre-maturely elsewhere in the system, such that when it tries to write the APPLICATION_COMPLETE file it throws the exception you see. This does

Re: tiers of caching

2014-07-07 Thread Andrew Or
Others have also asked for this on the mailing list, and hence there's a related JIRA: https://issues.apache.org/jira/browse/SPARK-1762. Ankur brings up a good point in that any current implementation of in-memory shuffles will compete with application RDD blocks. I think we should definitely add

Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Andrew Or
the red text is because it appears only on the driver containers, not the executor containers. This is because SparkUI belongs to the SparkContext, which only exists on the driver. Andrew 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did

Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Andrew Or
the redirect error has little to do with Spark itself, but more to do with how you set up the cluster. I have actually run into this myself, but I haven't found a workaround. Let me know if you find anything. 2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com: As Andrew explained, the port

Re: Scheduling in spark

2014-07-08 Thread Andrew Or
Here's the most updated version of the same page: http://spark.apache.org/docs/latest/job-scheduling 2014-07-08 12:44 GMT-07:00 Sujeet Varakhedi svarakh...@gopivotal.com: This is a good start: http://www.eecs.berkeley.edu/~tdas/spark_docs/job-scheduling.html On Tue, Jul 8, 2014 at 9:11

Re: Spark: All masters are unresponsive!

2014-07-08 Thread Andrew Or
It seems that your driver (which I'm assuming you launched on the master node) can now connect to the Master, but your executors cannot. Did you make sure that all nodes have the same conf/spark-defaults.conf, conf/spark-env.sh, and conf/slaves? It would be good if you can post the stderr of the

Re: issues with ./bin/spark-shell for standalone mode

2014-07-08 Thread Andrew Or
or the --master parameter to spark-submit. We will update the documentation shortly. Thanks for letting us know. Andrew 2014-07-08 16:29 GMT-07:00 Mikhail Strebkov streb...@gmail.com: Hi! I've been using Spark compiled from 1.0 branch at some point (~2 month ago). The setup is a standalone cluster with 4

Re: Purpose of spark-submit?

2014-07-09 Thread Andrew Or
I don't see why using SparkSubmit.scala as your entry point would be any different, because all that does is invoke the main class of Client.scala (e.g. for Yarn) after setting up all the class paths and configuration options. (Though I haven't tried this myself) 2014-07-09 9:40 GMT-07:00 Ron

Re: Cannot submit to a Spark Application to a remote cluster Spark 1.0

2014-07-10 Thread Andrew Or
? Andrew 2014-07-10 10:17 GMT-07:00 Aris Vlasakakis a...@vlasakakis.com: Thank you very much Yana for replying! So right now the set up is a single-node machine which is my cluster, and YES you are right my submitting laptop has a different path to the spark-1.0.0 installation than the cluster

Re: All of the tasks have been completed but the Stage is still shown as Active?

2014-07-10 Thread Andrew Or
Yes, there are a few bugs in the UI in the event of a node failure. The duplicated stages in both the active and completed tables should be fixed by this PR: https://github.com/apache/spark/pull/1262 The fact that the progress bar on the stages page displays an overflow (e.g. 5/4) is still an

Re: executor failed, cannot find compute-classpath.sh

2014-07-10 Thread Andrew Or
.mbox/%3cCAMJOb8mYTzxrHWcaDOnVoOTw1TFrd9kJjOyj1=nkgmsk5vs...@mail.gmail.com%3e Andrew 2014-07-10 1:57 GMT-07:00 cjwang c...@cjwang.us: Not sure that was what I want. I tried to run Spark Shell on a machine other than the master and got the same error. The 192 was suppose to be a simple

Re: executor failed, cannot find compute-classpath.sh

2014-07-11 Thread Andrew Or
-submit (or spark-shell, which calls spark-submit) with the --verbose flag. Let me know if this fixes it. I will get to fixing the root problem soon. Andrew 2014-07-10 18:43 GMT-07:00 cjwang c...@cjwang.us: Andrew, Thanks for replying. I did the following and the result was still the same. 1

Re: ---cores option in spark-shell

2014-07-14 Thread Andrew Or
Yes, the documentation is actually a little outdated. We will get around to fix it shortly. Please use --driver-cores or --executor-cores instead. 2014-07-14 19:10 GMT-07:00 cjwang c...@cjwang.us: Neither do they work in new 1.0.1 either -- View this message in context:

Re: can't get jobs to run on cluster (enough memory and cpus are available on worker)

2014-07-17 Thread Andrew Or
to be some inconsistency or missing pieces in the logs you posted. After an executor says driver disassociated, what happens in the driver logs? Is an exception thrown or something? It would be useful if you could also post your conf/spark-env.sh. Andrew 2014-07-17 14:11 GMT-07:00 Marcelo Vanzin

Re: Error with spark-submit (formatting corrected)

2014-07-17 Thread Andrew Or
thing to check is whether the node from which you launch spark submit can access the internal address of the master (and port 7077). One quick way to verify that is to attempt a telnet into it. Let me know if you find anything. Andrew 2014-07-17 15:57 GMT-07:00 ranjanp piyush_ran...@hotmail.com: Hi

Re: how to pass extra Java opts to workers for spark streaming jobs

2014-07-17 Thread Andrew Or
Hi Chen, spark.executor.extraJavaOptions is introduced in Spark 1.0, not in Spark 0.9. You need to export SPARK_JAVA_OPTS= -Dspark.config1=value1 -Dspark.config2=value2 in conf/spark-env.sh. Let me know if that works. Andrew 2014-07-17 18:15 GMT-07:00 Tathagata Das tathagata.das1

Re: jar changed on src filesystem

2014-07-17 Thread Andrew Or
HDFS. Try removing all old jars from your .sparkStaging directory and try again? Let me know if that does the job, Andrew 2014-07-16 23:42 GMT-07:00 cmti95035 cmti95...@gmail.com: They're all the same version. Actually even without the --jars parameter it got the same error. Looks like

Re: Errors accessing hdfs while in local mode

2014-07-17 Thread Andrew Or
still work (I just tried this on my own EC2 cluster). By the way, SPARK_MASTER is actually deprecated. Instead, please use bin/spark-submit --master [your master]. Andrew 2014-07-16 23:46 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com: You can try the following in the spark-shell: 1. Run

Re: how to pass extra Java opts to workers for spark streaming jobs

2014-07-17 Thread Andrew Or
SPARK_JAVA_OPTS is deprecated as of 1.0) 2014-07-17 21:08 GMT-07:00 Chen Song chen.song...@gmail.com: Thanks Andrew. Say that I want to turn on CMS gc for each worker. All I need to do is add the following line to conf/spark-env.sh on node where I submit the application. -XX

Re: What is shuffle spill to memory?

2014-07-18 Thread Andrew Or
metrics are aggregated over the entire duration of the task (i.e. within each task you can spill multiple times). Andrew 2014-07-18 4:09 GMT-07:00 Sébastien Rainville sebastienrainvi...@gmail.com : Hi, in the Spark UI, one of the metrics is shuffle spill (memory). What is it exactly? Spilling

Re: Why spark-submit command hangs?

2014-07-21 Thread Andrew Or
is deprecated) - add --master yarn-cluster in your spark-submit command Another worrying thing is the warning from your logs: 14/07/21 22:38:42 WARN spark.SparkConf: null jar passed to SparkContext constructor How are you creating your SparkContext? Andrew 2014-07-21 7:47 GMT-07:00 Sam Liu

Re: LiveListenerBus throws exception and weird web UI bug

2014-07-21 Thread Andrew Or
workaround for this issue, but you might try to reduce the number of concurrently running tasks (partitions) to avoid emitting too many events. The root cause of the listener queue taking too much time to process events is recorded in SPARK-2316, which we also intend to fix by Spark 1.1. Andrew

Re: gain access to persisted rdd

2014-07-21 Thread Andrew Or
. Andrew 2014-07-21 8:37 GMT-07:00 mrm ma...@skimlinks.com: Hi, I am using pyspark and have persisted a list of rdds within a function, but I don't have a reference to them anymore. The RDD's are listed in the UI, under the Storage tab, and they have names associated to them (e.g. 4

Re: Give more Java Heap Memory on Standalone mode

2014-07-21 Thread Andrew Or
line. Andrew 2014-07-21 10:01 GMT-07:00 Nick R. Katsipoulakis kat...@cs.pitt.edu: Thank you Abel, It seems that your advice worked. Even though I receive a message that it is a deprecated way of defining Spark Memory (the system prompts that I should set spark.driver.memory), the memory

Re: Caching issue with msg: RDD block could not be dropped from memory as it does not exist

2014-07-21 Thread Andrew Or
, it seems that you set your log level to WARN. The cause is most probably because the cache is not big enough, but setting the log level to INFO will provide you with more information on the exact sizes that are being used by the storage and the blocks). Andrew 2014-07-19 13:01 GMT-07:00 rindra

Re: Why spark-submit command hangs?

2014-07-22 Thread Andrew Or
Hi Earthson, Is your problem resolved? The way you submit your application looks alright to me; spark-submit should be able to parse the combination of --master and --deploy-mode correctly. I suspect you might have hard-coded yarn-cluster or something in your application. Andrew 2014-07-22 1

Re: hadoop version

2014-07-22 Thread Andrew Or
/ephemeral-hdfs. Andrew 2014-07-22 7:07 GMT-07:00 mrm ma...@skimlinks.com: Hi, Where can I find the version of Hadoop my cluster is using? I launched my ec2 cluster using the spark-ec2 script with the --hadoop-major-version=2 option. However, the folder hadoop-native/lib in the master node only

Re: spark-submit to remote master fails

2014-07-23 Thread Andrew Or
driver. Andrew 2014-07-23 10:40 GMT-07:00 didi did...@gmail.com: Hi all I guess the problem has something to do with the fact i submit the job to remote location I submit from OracleVM running ubuntu and suspect some NAT issues maybe? akka tcp tries this address as follows from the STDERR

Re: Lost executors

2014-07-23 Thread Andrew Or
Hi Eric, Have you checked the executor logs? It is possible they died because of some exception, and the message you see is just a side effect. Andrew 2014-07-23 8:27 GMT-07:00 Eric Friedman eric.d.fried...@gmail.com: I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc

Re: Use of SPARK_DAEMON_JAVA_OPTS

2014-07-23 Thread Andrew Or
in spark should not be done through any config or environment variable that references java opts. Andrew 2014-07-23 1:04 GMT-07:00 MEETHU MATHEW meethu2...@yahoo.co.in: Hi all, Sorry for taking this topic again,still I am confused on this. I set SPARK_DAEMON_JAVA_OPTS=-XX:+UseCompressedOops

Re: driver memory

2014-07-23 Thread Andrew Or
-submit. The equivalent also holds for executor memory (i.e. --executor-memory). That way you don't have to wrangle with the millions of overlapping configs / environment variables for all the deploy modes. -Andrew 2014-07-23 4:18 GMT-07:00 mrm ma...@skimlinks.com: Hi, I figured out my problem

Re: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Or
Hi Andrew, It's definitely not bad practice to use spark-shell with HistoryServer. The issue here is not with spark-shell, but the way we pass Spark configs to the application. spark-defaults.conf does not currently support embedding environment variables, but instead interprets everything

Re: Getting the number of slaves

2014-07-28 Thread Andrew Or
Yes, both of these are derived from the same source, and this source includes the driver. In other words, if you submit a job with 10 executors you will get back 11 for both statuses. 2014-07-28 15:40 GMT-07:00 Sung Hwan Chung coded...@cs.stanford.edu: Do getExecutorStorageStatus and

Re: Worker logs

2014-07-30 Thread Andrew Or
They are found in the executors' logs (not the worker's). In general, all code inside foreach or map etc. are executed on the executors. You can find these either through the Master UI (under Running Applications) or manually on the worker machines (under $SPARK_HOME/work). -Andrew 2014-07-30

Re: Can't see any thing one the storage panel of application UI

2014-08-04 Thread Andrew Or
of UI fixes since 1.0. Could you check if this is still a problem on the latest master: https://github.com/apache/spark Andrew 2014-08-04 12:10 GMT-07:00 anthonyjschu...@gmail.com anthonyjschu...@gmail.com: I am (not) seeing this also... No items in the storage UI page. using 1.0 with HDFS

Re: Can't see any thing one the storage panel of application UI

2014-08-05 Thread Andrew Or
they can still be kicked out by LRU, however. -Andrew 2014-08-05 0:13 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com: You need to use persist or cache those rdds to appear in the Storage. Unless you do it, those rdds will be computed again. Thanks Best Regards On Tue, Aug 5, 2014 at 8:03 AM

Re: Setting spark.executor.memory problem

2014-08-05 Thread Andrew Or
in your conf won't actually do anything for you. Instead, you need to run spark-submit as follows bin/spark-submit --driver-memory 2g --class your.class.here app.jar This will start the JVM with 2G instead of the default 512M. -Andrew 2014-08-05 6:43 GMT-07:00 Grzegorz Białek grzegorz.bia

Re: Setting spark.executor.memory problem

2014-08-05 Thread Andrew Or
(Clarification: you'll need to pass in --driver-memory not just for local mode, but for any application you're launching with client deploy mode) 2014-08-05 9:24 GMT-07:00 Andrew Or and...@databricks.com: Hi Grzegorz, For local mode you only have one executor, and this executor is your

Re: Configuration setup and Connection refused

2014-08-05 Thread Andrew Or
-hdfs/conf. (Are you running HdfsWordCount by any chance?) As Mayur mentioned, a good way to see whether or not there is any service listening on port 9000 is telnet. Andrew 2014-08-05 15:01 GMT-07:00 Mayur Rustagi mayur.rust...@gmail.com: Then dont specify hdfs when you read file. Also

Re: can't submit my application on standalone spark cluster

2014-08-06 Thread Andrew Or
not using the EC2 scripts, you will have to rsync the directory manually (copy-dir just calls rsync internally). -Andrew 2014-08-06 2:39 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com: Looks like a netty conflict there, most likely you are having mutiple versions of netty jars (eg: netty-3.6.6

Re: Spark Streaming fails - where is the problem?

2014-08-06 Thread Andrew Or
/09f7e4587bbdf74207d2629e8c1314f93d865999. This will be available in Spark 1.1, but for now you will have to open all ports among the nodes in your cluster. -Andrew 2014-08-06 10:23 GMT-07:00 durin m...@simon-schaefer.net: Update: I can get it to work by disabling iptables temporarily. I can

Re: Runnning a Spark Shell locally against EC2

2014-08-06 Thread Andrew Or
in Spark 1.1. -Andrew 2014-08-06 8:29 GMT-07:00 Gary Malouf malouf.g...@gmail.com: We have Spark 1.0.1 on Mesos deployed as a cluster in EC2. Our Devops lead tells me that Spark jobs can not be submitted from local machines due to the complexity of opening the right ports to the world etc

Re: Spark memory management

2014-08-06 Thread Andrew Or
(there is a button), though this is not specific to standalone mode. There is currently a lot of trust between the standalone master and the application. Maybe this is not always a good thing. :) -Andrew 2014-08-06 12:23 GMT-07:00 Gary Malouf malouf.g...@gmail.com: I have a few questions

Re: Viewing web UI after fact

2014-08-13 Thread Andrew Or
The Spark UI isn't available through the same address; otherwise new applications won't be able to bind to it. Once the old application finishes, the standalone Master renders the after-the-fact application UI and exposes it under a different URL. To see this, go to the Master UI (master-url:8080)

Re: Lost executors

2014-08-13 Thread Andrew Or
To add to the pile of information we're asking you to provide, what version of Spark are you running? 2014-08-13 11:11 GMT-07:00 Shivaram Venkataraman shiva...@eecs.berkeley.edu : If the JVM heap size is close to the memory limit the OS sometimes kills the process under memory pressure. I've

Re: Lost executors

2014-08-13 Thread Andrew Or
mechanism, your executors will quickly run out of memory with the default of 512m. Let me know if setting this does the job. If so, you can even persist the RDDs to memory as well to get better performance, though this depends on your workload. -Andrew 2014-08-13 11:38 GMT-07:00 rpandya r

Re: Spark webUI - application details page

2014-08-14 Thread Andrew Or
of setting this is by adding the line spark.eventLog.enabled true to $SPARK_HOME/conf/spark-defaults.conf. This will be picked up by Spark submit and passed to your application. Cheers, Andrew 2014-08-14 15:45 GMT-07:00 durin m...@simon-schaefer.net: If I don't understand you wrong, setting event

Re: Spark webUI - application details page

2014-08-14 Thread Andrew Or
:7077). In other modes, you will need to use the history server instead. Does this make sense? Andrew 2014-08-14 18:08 GMT-07:00 SK skrishna...@gmail.com: More specifically, as indicated by Patrick above, in 1.0+, apps will have persistent state so that the UI can be reloaded. Is there a way

Re: Running Spark shell on YARN

2014-08-15 Thread Andrew Or
/spark-defaults.conf for you automatically so you don't have to specify it each time on the command line. Of course, you can also do the same in YARN. -Andrew 2014-08-15 10:45 GMT-07:00 Soumya Simanta soumya.sima...@gmail.com: I've been using the standalone cluster all this time and it worked fine

Re: spark on yarn cluster can't launch

2014-08-15 Thread Andrew Or
Hi 齐忠, Thanks for reporting this. You're correct that the default deploy mode is client. However, this seems to be a bug in the YARN integration code; we should not throw null pointer exception in any case. What version of Spark are you using? Andrew 2014-08-15 0:23 GMT-07:00 centerqi hu cente

Re: Spark webUI - application details page

2014-08-15 Thread Andrew Or
to just the summary statistics under Completed Applications. I have listed a few debugging steps in the paragraph above, so maybe they're also applicable to you. Let me know if that works, Andrew 2014-08-15 11:07 GMT-07:00 SK skrishna...@gmail.com: Hi, Ok, I was specifying --master local. I

Re: spark-submit with Yarn

2014-08-19 Thread Andrew Or
The --master should override any other ways of setting the Spark master. Ah yes, actually you can set spark.master directly in your application through SparkConf. Thanks Marcelo. 2014-08-19 14:47 GMT-07:00 Marcelo Vanzin van...@cloudera.com: On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja

Re: apply at Option.scala:120

2014-08-25 Thread Andrew Or
This should be fixed in the latest Spark. What branch are you running? 2014-08-25 1:32 GMT-07:00 Wang, Jensen jensen.w...@sap.com: Hi, All When I run spark applications, I see from the web-ui that some stage description are like “apply at Option.scala:120”. Why spark splits a

Re: Submitting multiple files pyspark

2014-08-28 Thread Andrew Or
Hi Cheng, You specify extra python files through --py-files. For example: bin/spark-submit [your other options] --py-files helper.py main_app.py -Andrew 2014-08-27 22:58 GMT-07:00 Chengi Liu chengi.liu...@gmail.com: Hi, I have two files.. main_app.py and helper.py main_app.py calls

Re: Can value in spark-defaults.conf support system variables?

2014-09-01 Thread Andrew Or
No, not currently. 2014-09-01 2:53 GMT-07:00 Zhanfeng Huo huozhanf...@gmail.com: Hi,all: Can value in spark-defaults.conf support system variables? Such as mess = ${user.home}/${user.name}. Best Regards -- Zhanfeng Huo

Re: pyspark yarn got exception

2014-09-02 Thread Andrew Or
somewhat puzzled as to how you ran into an OOM from this configuration, however. Does this problem still occur if you set the correct master? -Andrew 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets oruchov...@gmail.com: Hi , I've installed pyspark on hpd hortonworks cluster. Executing pi example

Re: Spark-shell return results when the job is executing?

2014-09-02 Thread Andrew Or
, it is unlikely to fully fit in memory anyway, so it's probably not a bad idea to just write your results to a file in batches while the application is still running. -Andrew 2014-09-01 22:16 GMT-07:00 Hao Wang wh.s...@gmail.com: Hi, all I am wondering if I use Spark-shell to scan a large file

Re: Spark on YARN question

2014-09-02 Thread Andrew Or
properties $ export SPARK_YARN_USER_ENV=YARN_LOCAL_DIR=/mnt,/mnt2 $ bin/spark-shell --master yarn --jars /local/path/to/my/jar1,/another/jar2 Best, -Andrew

Re: Spark on YARN question

2014-09-02 Thread Andrew Or
on the submitter node. Let me know if you have more questions, -Andrew 2014-09-02 15:12 GMT-07:00 Dimension Data, LLC. subscripti...@didata.us: Hello friends: I have a follow-up to Andrew's well articulated answer below (thank you for that). (1) I've seen both of these invocations

Re: spark history server trying to hit port 8021

2014-09-03 Thread Andrew Or
Hi Greg, For future references you can set spark.history.ui.port in SPARK_HISTORY_OPTS. By default this should point to 18080. This information is actually in the link that you provided :) (as well as the most updated docs here: http://spark.apache.org/docs/latest/monitoring.html) -Andrew 2014

Re: pyspark on yarn hdp hortonworks

2014-09-03 Thread Andrew Or
...@mail.gmail.com%3e Let me know if you can get it working, -Andrew 2014-09-03 5:03 GMT-07:00 Oleg Ruchovets oruchov...@gmail.com: Hi all. I am trying to run pyspark on yarn already couple of days: http://hortonworks.com/kb/spark-1-0-1-technical-preview-hdp-2-1-3/ I posted exception

  1   2   3   4   5   6   >