Fwd: Container preempted by scheduler - Spark job error
Hi Ted We use Hadoop 2.6 & Spark 1.3.1. I also attached the error file to this mail, please have a look at it. Thanks On Thu, Jun 2, 2016 at 11:51 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you show the error in bit more detail ? > > Which release of hadoop / Spark are you using ? > > Is CapacityScheduler being used ? > > Thanks > > On Thu, Jun 2, 2016 at 1:32 AM, Prabeesh K. <prabsma...@gmail.com> wrote: > >> Hi I am using the below command to run a spark job and I get an error >> like "Container preempted by scheduler" >> >> I am not sure if it's related to the wrong usage of Memory: >> >> nohup ~/spark1.3/bin/spark-submit \ --num-executors 50 \ --master yarn \ >> --deploy-mode cluster \ --queue adhoc \ --driver-memory 18G \ >> --executor-memory 12G \ --class main.ru..bigdata.externalchurn.Main >> \ --conf "spark.task.maxFailures=100" \ --conf >> "spark.yarn.max.executor.failures=1" \ --conf "spark.executor.cores=1" >> \ --conf "spark.akka.frameSize=50" \ --conf >> "spark.storage.memoryFraction=0.5" \ --conf >> "spark.driver.maxResultSize=10G" \ >> ~/external-flow/externalChurn-1.0-SNAPSHOT-shaded.jar \ >> prepareTraining=true \ prepareTrainingMNP=true \ prepareMap=false \ >> bouldozerMode=true \ &> ~/external-flow/run.log & echo "STARTED" tail -f >> ~/external-flow/run.log >> >> Thanks, >> >> >> >> >> > spark-error Description: Binary data - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Container preempted by scheduler - Spark job error
Hi I am using the below command to run a spark job and I get an error like "Container preempted by scheduler" I am not sure if it's related to the wrong usage of Memory: nohup ~/spark1.3/bin/spark-submit \ --num-executors 50 \ --master yarn \ --deploy-mode cluster \ --queue adhoc \ --driver-memory 18G \ --executor-memory 12G \ --class main.ru..bigdata.externalchurn.Main \ --conf "spark.task.maxFailures=100" \ --conf "spark.yarn.max.executor.failures=1" \ --conf "spark.executor.cores=1" \ --conf "spark.akka.frameSize=50" \ --conf "spark.storage.memoryFraction=0.5" \ --conf "spark.driver.maxResultSize=10G" \ ~/external-flow/externalChurn-1.0-SNAPSHOT-shaded.jar \ prepareTraining=true \ prepareTrainingMNP=true \ prepareMap=false \ bouldozerMode=true \ &> ~/external-flow/run.log & echo "STARTED" tail -f ~/external-flow/run.log Thanks,
Re: Spark + Jupyter (IPython Notebook)
Refer this post http://blog.prabeeshk.com/blog/2015/06/19/pyspark-notebook-with-docker/ Spark + Jupyter + Docker On 18 August 2015 at 21:29, Jerry Lam chiling...@gmail.com wrote: Hi Guru, Thanks! Great to hear that someone tried it in production. How do you like it so far? Best Regards, Jerry On Tue, Aug 18, 2015 at 11:38 AM, Guru Medasani gdm...@gmail.com wrote: Hi Jerry, Yes. I’ve seen customers using this in production for data science work. I’m currently using this for one of my projects on a cluster as well. Also, here is a blog that describes how to configure this. http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/ Guru Medasani gdm...@gmail.com On Aug 18, 2015, at 8:35 AM, Jerry Lam chiling...@gmail.com wrote: Hi spark users and developers, Did anyone have IPython Notebook (Jupyter) deployed in production that uses Spark as the computational engine? I know Databricks Cloud provides similar features with deeper integration with Spark. However, Databricks Cloud has to be hosted by Databricks so we cannot do this. Other solutions (e.g. Zeppelin) seem to reinvent the wheel that IPython has already offered years ago. It would be great if someone can educate me the reason behind this. Best Regards, Jerry
Re: Packaging Java + Python library
Refer this post http://blog.prabeeshk.com/blog/2015/04/07/self-contained-pyspark-application/ On 13 April 2015 at 17:41, Punya Biswal pbis...@palantir.com wrote: Dear Spark users, My team is working on a small library that builds on PySpark and is organized like PySpark as well -- it has a JVM component (that runs in the Spark driver and executor) and a Python component (that runs in the PySpark driver and executor processes). What's a good approach for packaging such a library? Some ideas we've considered: - Package up the JVM component as a Jar and the Python component as a binary egg. This is reasonable but it means that there are two separate artifacts that people have to manage and keep in sync. - Include Python files in the Jar and add it to the PYTHONPATH. This follows the example of the Spark assembly jar, but deviates from the Python community's standards. We'd really appreciate hearing experiences from other people who have built libraries on top of PySpark. Punya
Re: How to learn Spark ?
You can also refer this blog http://blog.prabeeshk.com/blog/archives/ On 2 April 2015 at 12:19, Star Guo st...@ceph.me wrote: Hi, all I am new to here. Could you give me some suggestion to learn Spark ? Thanks. Best Regards, Star Guo
Re: Beginner in Spark
Refer this blog http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/ for step by step installation of Spark on Ubuntu On 7 February 2015 at 03:12, Matei Zaharia matei.zaha...@gmail.com wrote: You don't need HDFS or virtual machines to run Spark. You can just download it, unzip it and run it on your laptop. See http://spark.apache.org/docs/latest/index.html. Matei On Feb 6, 2015, at 2:58 PM, David Fallside falls...@us.ibm.com wrote: King, consider trying the Spark Kernel ( https://github.com/ibm-et/spark-kernel) which will install Spark etc and provide you with a Spark/Scala Notebook in which you can develop your algorithm. The Vagrant installation described in https://github.com/ibm-et/spark-kernel/wiki/Vagrant-Development-Environment will have you quickly up and running on a single machine without having to manage the details of the system installations. There is a Docker version, https://github.com/ibm-et/spark-kernel/wiki/Using-the-Docker-Container-for-the-Spark-Kernel, if you prefer Docker. Regards, David King sami kgsam...@gmail.com wrote on 02/06/2015 08:09:39 AM: From: King sami kgsam...@gmail.com To: user@spark.apache.org Date: 02/06/2015 08:11 AM Subject: Beginner in Spark Hi, I'm new in Spark, I'd like to install Spark with Scala. The aim is to build a data processing system foor door events. the first step is install spark, scala, hdfs and other required tools. the second is build the algorithm programm in Scala which can treat a file of my data logs (events). Could you please help me to install the required tools: Spark, Scala, HDF and tell me how can I execute my programm treating the entry file. Best regards,
Re: Kestrel and Spark Stream
You can refer the following link https://github.com/prabeesh/Spark-Kestrel On Tue, Nov 18, 2014 at 3:51 PM, Akhil Das ak...@sigmoidanalytics.com wrote: You can implement a custom receiver http://spark.apache.org/docs/latest/streaming-custom-receivers.html to connect to Kestrel and use it. I think someone have already tried it, not sure if it is working though. Here's the link https://github.com/prabeesh/Spark-Kestrel/blob/master/streaming/src/main/scala/spark/streaming/dstream/KestrelInputDStream.scala . Thanks Best Regards On Tue, Nov 18, 2014 at 4:23 PM, Eduardo Alfaia e.costaalf...@unibs.it wrote: Hi guys, Has anyone already tried doing this work? Thanks Informativa sulla Privacy: http://www.unibs.it/node/8155
Re: Unable to run a Standalone job
try sbt clean command before build the app. or delete .ivy2 ans .sbt folders(not a good methode). Then try to rebuild the project. On Thu, Jun 5, 2014 at 11:45 AM, Sean Owen so...@cloudera.com wrote: I think this is SPARK-1949 again: https://github.com/apache/spark/pull/906 I think this change fixed this issue for a few people using the SBT build, worth committing? On Thu, Jun 5, 2014 at 6:40 AM, Shrikar archak shrika...@gmail.com wrote: Hi All, Now that the Spark Version 1.0.0 is release there should not be any problem with the local jars. Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.4 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0, org.apache.spark %% spark-streaming % 1.0.0) resolvers += Akka Repository at http://repo.akka.io/releases/; I am still having this issue [error] (run-main) java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse at org.apache.spark.HttpServer.start(HttpServer.scala:54) at org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31) at org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48) at org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218) at org.apache.spark.SparkContext.init(SparkContext.scala:202) Any help would be greatly appreciated. Thanks, Shrikar On Fri, May 23, 2014 at 3:58 PM, Shrikar archak shrika...@gmail.com wrote: Still the same error no change Thanks, Shrikar On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl wrote: Hi Shrikar, How did you build Spark 1.0.0-SNAPSHOT on your machine? My understanding is that `sbt publishLocal` is not enough and you really need `sbt assembly` instead. Give it a try and report back. As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down spark-core as a transitive dep. The resolver for Akka Repository is not needed. Your build.sbt should really look as follows: name := Simple Project version := 1.0 scalaVersion := 2.10.4 libraryDependencies += org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT Jacek On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com wrote: Hi All, I am trying to run the network count example as a seperate standalone job and running into some issues. Environment: 1) Mac Mavericks 2) Latest spark repo from Github. I have a structure like this Shrikars-MacBook-Pro:SimpleJob shrikar$ find . . ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/NetworkWordCount.scala ./src/main/scala/SimpleApp.scala.bk simple.sbt name := Simple Project version := 1.0 scalaVersion := 2.10.3 libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0-SNAPSHOT, org.apache.spark %% spark-streaming % 1.0.0-SNAPSHOT) resolvers += Akka Repository at http://repo.akka.io/releases/; I am able to run the SimpleApp which is mentioned in the doc but when I try to run the NetworkWordCount app I get error like this am I missing something? [info] Running com.shrikar.sparkapps.NetworkWordCount 14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to: shrikar 14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(shrikar) 14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/05/22 14:26:48 INFO Remoting: Starting remoting 14/05/22 14:26:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.10.88:49963] 14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker 14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster 14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local directory at /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14 14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with capacity 911.6 MB. 14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to port 49964 with id = ConnectionManagerId(192.168.10.88,49964) 14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register BlockManager
Re: mismatched hdfs protocol
For building Spark for particular version of Hadoop Refer http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html On Thu, Jun 5, 2014 at 8:14 AM, Koert Kuipers ko...@tresata.com wrote: you have to build spark against the version of hadoop your are using On Wed, Jun 4, 2014 at 10:25 PM, bluejoe2008 bluejoe2...@gmail.com wrote: hi, all when my spark program accessed hdfs files an error happened: Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 it seems the client was trying to connect hadoop2 via an old hadoop protocol so my question is: how to specify the version of hadoop on connection? thank you! bluejoe 2014-06-05 --
Unable to execute saveAsTextFile on multi node mesos
Hi, scenario : Read data from HDFS and apply hive query on it and the result is written back to HDFS. Scheme creation, Querying and saveAsTextFile are working fine with following mode - local mode - mesos cluster with single node - spark cluster with multi node Schema creation and querying are working fine with mesos multi node cluster. But while trying to write back to HDFS using saveAsTextFile, the following error occurs * 14/05/30 10:16:35 INFO DAGScheduler: The failed fetch was from Stage 4 (mapPartitionsWithIndex at Operator.scala:333); marking it for resubmission* *14/05/30 10:16:35 INFO DAGScheduler: Executor lost: 201405291518-3644595722-5050-17933-1 (epoch 148)* Let me know your thoughts regarding this. Regards, prabeesh
Re: Announcing Spark 1.0.0
Please update the http://spark.apache.org/docs/latest/ link On Fri, May 30, 2014 at 4:03 PM, Margusja mar...@roo.ee wrote: Is it possible to download pre build package? http://mirror.symnds.com/software/Apache/incubator/ spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz - gives me 404 Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) On 30/05/14 13:18, Christopher Nguyen wrote: Awesome work, Pat et al.! -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com linkedin.com/in/ctnguyen http://linkedin.com/in/ctnguyen On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell pwend...@gmail.com mailto:pwend...@gmail.com wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's largest release ever, with contributions from 117 developers. I'd like to thank everyone involved in this release - it was truly a community effort with fixes, features, and optimizations contributed from dozens of organizations. This release expands Spark's standard libraries, introducing a new SQL package (SparkSQL) which lets users integrate SQL queries into existing Spark workflows. MLlib, Spark's machine learning library, is expanded with sparse vector support and several new algorithms. The GraphX and Streaming libraries also introduce new features and optimizations. Spark's core engine adds support for secured YARN clusters, a unified tool for submitting Spark applications, and several performance and stability improvements. Finally, Spark adds support for Java 8 lambda syntax and improves coverage of the Java and Python API's. Those features only scratch the surface - check out the release notes here: http://spark.apache.org/releases/spark-release-1-0-0.html Note that since release artifacts were posted recently, certain mirrors may not have working downloads for a few hours. - Patrick
java.lang.OutOfMemoryError while running Shark on Mesos
Hi, I am trying to apply inner join in shark using 64MB and 27MB files. I am able to run the following queris on Mesos - SELECT * FROM geoLocation1 - SELECT * FROM geoLocation1 WHERE country = 'US' But while trying inner join as SELECT * FROM geoLocation1 g1 INNER JOIN geoBlocks1 g2 ON (g1.locId = g2.locId) I am getting following error as follows. Exception in thread main org.apache.spark.SparkException: Job aborted: Task 1.0:7 failed 4 times (most recent failure: Exception failure: java.lang.OutOfMemoryError: Java heap space) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Please help me to resolve this. Thanks in adv regards, prabeesh
Re: Better option to use Querying in Spark
Thank you for your prompt reply. Regards, prabeesh On Tue, May 6, 2014 at 11:44 AM, Mayur Rustagi mayur.rust...@gmail.comwrote: All three have different usecases. If you are looking for more of a warehouse you are better off with Shark. SparkSQL is a way to query regular data in sql like syntax leveraging columnar store. BlinkDB is a experiment, meant to integrate with Shark in the long term. Not meant for production usecase directly. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, May 6, 2014 at 11:22 AM, prabeesh k prabsma...@gmail.com wrote: Hi, I have seen three different ways to query data from Spark 1. Default SQL support( https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/sql/examples/HiveFromSpark.scala ) 2. Shark 3. Blink DB I would like know which one is more efficient Regards. prabeesh
Re: Compile SimpleApp.scala encountered error, please can any one help?
ensure the only one SimpleApp object in your project, also check is there any copy of SimpleApp.scala. Normally the file SimpleApp.scala in src/main/scala or in the project root folder. On Sat, Apr 12, 2014 at 11:07 AM, jni2000 james...@federatedwireless.comwrote: Hi I am a new Spark user and try to test run it from scratch. I followed the documentation and was able the build the Spark package and run the spark shell. However when I move on to building the standalone sample SimpleApp.scala, I see the following errors: Loading /usr/share/sbt/bin/sbt-launch-lib.bash [info] Set current project to Simple Project (in build file:/home/james/workplace/framework/3rd-party/spark-0.9.0/test-project/) [info] Compiling 1 Scala source and 1 Java source to /home/james/workplace/framework/3rd-party/spark-0.9.0/test-project/target/scala-2.10/classes... [error] /home/james/workplace/framework/3rd-party/spark-0.9.0/test-project/src/main/scala/SimpleApp.scala:5: SimpleApp is already defined as object SimpleApp [error] object SimpleApp { [error]^ [error] one error found [error] (compile:compile) Compilation failed [error] Total time: 2 s, completed Apr 12, 2014 1:12:43 AM Can some one help me understand what could be wrong? Thanks a lot. James -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Compile-SimpleApp-scala-encountered-error-please-can-any-one-help-tp4160.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
[BLOG] For Beginners
Hi all, Here I am sharing a blog for beginners, about creating spark streaming stand alone application and bundle the app as single runnable jar. Take a look and drop your comments in blog page. http://prabstechblog.blogspot.in/2014/04/a-standalone-spark-application-in-scala.html http://prabstechblog.blogspot.in/2014/04/creating-single-jar-for-spark-project.html prabeesh