I built Apache Spark on Ubuntu 14.04 LTS with the following command:
mvn -Pspark-ganglia-lgpl -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0
-DskipTests clean package
Build was successful. Then, following modifications were made.
1. Included "SPARK_LOCAL_IP=127.0.0.1" to the file
$SPARK_HOME/conf/spark-env.sh to avoid the following warnings.
16/04/27 17:45:54 WARN Utils: Your hostname, ganglia resolves
to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface
eth0)
16/04/27 17:45:54 WARN Utils: Set SPARK_LOCAL_IP if you need to
bind to another address
After that, Spark started well without these warnings.
2. To enable Ganglia metrics, following lines were included to the
file $SPARK_HOME/conf/metrics.properties
# Enable GangliaSink for all instances
*.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.name=SparkCluster
*.sink.ganglia.host=XYZ.XYZ.XYZ.XYZ (Replaced by real IP)
*.sink.ganglia.port=8649
*.sink.ganglia.period=10
*.sink.ganglia.unit=seconds
*.sink.ganglia.ttl=1
*.sink.ganglia.mode=multicast
Following errors were displayed, but Spark got started.
16/04/27 17:45:59 ERROR MetricsSystem: Sink class
org.apache.spark.metrics.sink.GangliaSink cannot be instantiated
16/04/27 17:45:59 ERROR SparkContext: Error initializing
SparkContext.
java.lang.ClassNotFoundException:
org.apache.spark.metrics.sink.GangliaSink
at
scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
.......
"GangliaSink" class can be found at:
$SPARK_HOME/external/spark-ganglia-lgpl/target/classes/org/apache/spark/metrics/sink/GangliaSink.class
I can see previous threads regarding the same problem but I cannot find
any solution. Any idea?