Remove COMPLETED applications and shuffle data
Hi All, Please help me with the below 2 issues: *Environment:* I am running my spark cluster in stand alone mode. I am initializing the spark context from inside my tomcat server. I am setting below properties in environment.sh in $SPARK_HOME/conf directory SPARK_MASTER_OPTS=-Dspark.deploy.retainedApplications=1 -Dspark.deploy.retainedDrivers=1 SPARK_WORKER_OPTS=-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=600 -Dspark.worker.cleanup.appDataTtl=600 SPARK_LOCAL_DIRS=$user.home/tmp *Issue 1:* Still in my $SPARK_HOME/work folder, application-folders continue to grow as and when I restart the tomcat. I also tried to stop the spark context (sc.stop()) in tomcat’s contextDestroyed listener but still I am not able to remove the undesired application folders. *Issue 2:* The ‘tmp’ folder is getting filled up with shuffle data and eating my entire hard disk. Is there any setting to remove shuffle data of ‘FINISHED’ applications. Thanks in advance, Sayantini
Removing FINISHED applications and shuffle data
Hi, Please help me with below two issues: *Environment:* I am running my spark cluster in stand alone mode. I am initializing the spark context from inside my tomcat server. I am setting below properties in environment.sh in $SPARK_HOME/conf directory SPARK_MASTER_OPTS=-Dspark.deploy.retainedApplications=1 -Dspark.deploy.retainedDrivers=1 SPARK_WORKER_OPTS=-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=600 -Dspark.worker.cleanup.appDataTtl=600 SPARK_LOCAL_DIRS=$user.home/tmp *Issue 1:* Still in my $SPARK_HOME/work folder, application-folders continue to grow as and when I restart the tomcat. I also tried to stop the spark context (sc.stop()) in tomcat’s contextDestroyed listener but still I am not able to remove the undesired application folders. *Issue 2:* The ‘tmp’ folder is getting filled up with shuffle data and eating my entire hard disk. Is there any setting to remove shuffle data of ‘FINISHED’ applications. Thanks in advance. Sayantini
Decrease In Performance due to Auto Increase of Partitions in Spark
In our application where we load our historical data in 40 partitioned RDDs (no. of available cores X 2) and we have not implemented any custom partitioner. After applying transformations on these RDDs intermediate RDDs are created which have partitions greater than 40 and sometimes partitions are going up till 300. 1. Is Spark intelligent enough to manage the partitions of RDD? Please suggest why there is an increase in the no. of partitions? 2. We suspect that increasing the no. of partitions is causing decrease in performance. 3. If we create a custom Partitioner will it improve our performance? Thanks, Sayantini
Integration of Spark1.2.0 cdh4 with Jetty 9.2.10
Hi all, We are using spark-assembly-1.2.0-hadoop 2.0.0-mr1-cdh4.2.0.jar in our application. When we try to deploy the application on Jetty (jetty-distribution-9.2.10.v20150310) we get the below exception at the server startup. Initially we were getting the below exception, Caused by: java.lang.IllegalArgumentException: The servletContext ServletContext@o.e.j.s.ServletContextHandler{/static,null} org.eclipse.jetty.servlet.ServletContextHandler$Context is not org.eclipse.jetty.server.handler.ContextHandler$Context at org.eclipse.jetty.servlet.DefaultServlet.initContextHandler(DefaultServlet.java:310) at org.eclipse.jetty.servlet.DefaultServlet.init(DefaultServlet.java:175) at javax.servlet.GenericServlet.init(GenericServlet.java:242) at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:532) Then we tweaked jars jetty server and jetty-util and now we are getting the below exception Caused by: java.lang.ClassNotFoundException: org.eclipse.jetty.server.bio.SocketConnector at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at org.eclipse.jetty.webapp.WebAppClassLoader.findClass(WebAppClassLoader.java:510) at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:441) at org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:403) Request you to please suggest some solution to this. Regards, Sayantini