Remove COMPLETED applications and shuffle data

2015-05-26 Thread sayantini
Hi All,



Please help me with the below 2 issues:



*Environment:*



I am running my spark cluster in stand alone mode.



I am initializing the spark context from inside my tomcat server.



I am setting below properties in environment.sh in $SPARK_HOME/conf
directory



SPARK_MASTER_OPTS=-Dspark.deploy.retainedApplications=1
-Dspark.deploy.retainedDrivers=1

SPARK_WORKER_OPTS=-Dspark.worker.cleanup.enabled=true
-Dspark.worker.cleanup.interval=600 -Dspark.worker.cleanup.appDataTtl=600



SPARK_LOCAL_DIRS=$user.home/tmp



*Issue 1:*



Still in my $SPARK_HOME/work folder, application-folders continue to grow
as and when I restart the tomcat.



I also tried to stop the spark context (sc.stop()) in tomcat’s
contextDestroyed listener but still I am not able to remove the undesired
application folders.



*Issue 2:*

The ‘tmp’ folder is getting filled up with shuffle data and eating my
entire hard disk. Is there any setting to remove shuffle data of ‘FINISHED’
applications.



Thanks in advance,

Sayantini


Removing FINISHED applications and shuffle data

2015-05-13 Thread sayantini
Hi,



Please help me with below two issues:



*Environment:*



I am running my spark cluster in stand alone mode.



I am initializing the spark context from inside my tomcat server.



I am setting below properties in environment.sh in $SPARK_HOME/conf
directory



SPARK_MASTER_OPTS=-Dspark.deploy.retainedApplications=1
-Dspark.deploy.retainedDrivers=1

SPARK_WORKER_OPTS=-Dspark.worker.cleanup.enabled=true
-Dspark.worker.cleanup.interval=600 -Dspark.worker.cleanup.appDataTtl=600



SPARK_LOCAL_DIRS=$user.home/tmp



*Issue 1:*



Still in my $SPARK_HOME/work folder, application-folders continue to grow
as and when I restart the tomcat.



I also tried to stop the spark context (sc.stop()) in tomcat’s
contextDestroyed listener but still I am not able to remove the undesired
application folders.



*Issue 2:*

The ‘tmp’ folder is getting filled up with shuffle data and eating my
entire hard disk. Is there any setting to remove shuffle data of ‘FINISHED’
applications.



Thanks in advance.

 Sayantini


Decrease In Performance due to Auto Increase of Partitions in Spark

2015-03-27 Thread sayantini
In our application where we load our historical data in 40 partitioned RDDs
(no. of available cores X 2) and we have not implemented any custom
partitioner.

After applying transformations on these RDDs intermediate RDDs are created
which have partitions greater than 40 and sometimes partitions are going up
till 300.

1. Is Spark intelligent enough to manage the partitions of RDD? Please
suggest why there is an increase in the no. of partitions?

2. We suspect that increasing the no. of partitions is causing decrease in
performance.

3. If we create a custom Partitioner will it improve our performance?



Thanks,

Sayantini


Integration of Spark1.2.0 cdh4 with Jetty 9.2.10

2015-03-18 Thread sayantini
Hi all,


We are using spark-assembly-1.2.0-hadoop 2.0.0-mr1-cdh4.2.0.jar in our
application. When we try to deploy the application on Jetty
(jetty-distribution-9.2.10.v20150310) we get the below exception at the
server startup.



Initially we were getting the below exception,


Caused by: java.lang.IllegalArgumentException: The servletContext
ServletContext@o.e.j.s.ServletContextHandler{/static,null}
org.eclipse.jetty.servlet.ServletContextHandler$Context is not
org.eclipse.jetty.server.handler.ContextHandler$Context

at
org.eclipse.jetty.servlet.DefaultServlet.initContextHandler(DefaultServlet.java:310)

at
org.eclipse.jetty.servlet.DefaultServlet.init(DefaultServlet.java:175)

at
javax.servlet.GenericServlet.init(GenericServlet.java:242)

at
org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:532)





Then we tweaked jars jetty server and jetty-util and now we are getting the
below exception



Caused by: java.lang.ClassNotFoundException:
org.eclipse.jetty.server.bio.SocketConnector

at java.net.URLClassLoader$1.run(Unknown Source)

at java.net.URLClassLoader$1.run(Unknown Source)

at java.security.AccessController.doPrivileged(Native
Method)

at java.net.URLClassLoader.findClass(Unknown Source)

at
org.eclipse.jetty.webapp.WebAppClassLoader.findClass(WebAppClassLoader.java:510)

at
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:441)

at
org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:403)



Request you to please suggest some solution to this.



Regards,

Sayantini