Running spark-submit from a remote machine using a YARN application
We are trying to submit a Spark application from a Tomcat application running our business logic. The Tomcat app lives in a seperate non-hadoop cluster. We first were doing this by using the spark-yarn package to directly call Client#runApp() but found that the API we were using in Spark is being made private in future releases. Now our solution is to make a very simply YARN application which execustes as its command spark-submit --master yarn-cluster s3n://application/jar.jar This seemed so simple and elegant, but it has some weird issues. We get NoClassDefFoundErrors. When we ssh to the box, run the same spark-submit command it works, but doing this through YARN leads in the NoClassDefFoundErrors mentioned. Also, examining the environment and Java properties between the working and broken, we find that they have a different java classpath. So weird... Has anyone had this problem or know a solution? We would be happy to post our very simple code for creating the YARN application. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-submit-from-a-remote-machine-using-a-YARN-application-tp20642.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Calling spark from a java web application.
If you are able to use YARN in your hadoop cluster, then the following technique is pretty straightforward: http://blog.sequenceiq.com/blog/2014/08/22/spark-submit-in-java/ We use this in our system and it's super easy to execute from our Tomcat application. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Calling-spark-from-a-java-web-application-tp20007p20145.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Multiple Applications(Spark Contexts) Concurrently Fail With Broadcast Error
We are unable to run more than one application at a time using Spark 1.0.0 on CDH5. We submit two applications using two different SparkContexts on the same Spark Master. The Spark Master was started using the following command and parameters and is running in standalone mode: /usr/java/jdk1.7.0_55-cloudera/bin/java -XX:MaxPermSize=128m -Djava.net.preferIPv4Stack=true -Dspark.akka.logLifecycleEvents=true -Xms8589934592 -Xmx8589934592 org.apache.spark.deploy.master.Master --ip ip-10-186-155-45.ec2.internal When submitting this application by itself it finishes and all of the data comes out happy. The problem occurs when trying to run another application while an existing application is still processing and we get an error stating that the spark contexts were shut down prematurely.The errors can be viewed in the following pastebins. All IP addresses have been changed to 1.1.1.1 for security reasons. Notice that on the top of the logs we have printed out the spark config stuff for reference.The working logs: Working Pastebin http://pastebin.com/CnitnMhy The broken logs: Broken Pastebin http://pastebin.com/VGs87bBZ We have also included the worker logs. For the second app, we see in the work/app/ directory 7 additional directors: `0/ 1/ 2/ 3/ 4/ 5/ 6/`. There are then two different groups of errors. The first three are one group and the other 4 are the other group of errors. Worker log for broken app group 1: Broken App Group 1 http://pastebin.com/7VwZ1Gwu Worker log for broken app group 2: Broken App Group 2 http://pastebin.com/shs4d8T4 Worker log for working app: available upon request. The two different errors are the last lines of both groups and are: Received LaunchTask command but executor was null Slave registration failed: Duplicate executor ID: 4 tl;drWe are unable to run more than one application in the same spark master using different spark contexts. The only errors we see are broadcast errors. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-Applications-Spark-Contexts-Concurrently-Fail-With-Broadcast-Error-tp18374.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: application as a service
You can also look into using ooyala's job server at https://github.com/ooyala/spark-jobserver This already has a spary server built in that allows you to do what has already been explained above. Sounds like it should solve your problem. Enjoy! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/application-as-a-service-tp12253p12267.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org