Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/108#discussion_r10417262 --- Diff: docs/monitoring.md --- @@ -48,11 +48,22 @@ Each instance can report to zero or more _sinks_. Sinks are contained in the * `ConsoleSink`: Logs metrics information to the console. * `CSVSink`: Exports metrics data to CSV files at regular intervals. -* `GangliaSink`: Sends metrics to a Ganglia node or multicast group. * `JmxSink`: Registers metrics for viewing in a JXM console. * `MetricsServlet`: Adds a servlet within the existing Spark UI to serve metrics data as JSON data. * `GraphiteSink`: Sends metrics to a Graphite node. +Spark also supports a Ganglia sink which is not included in the default build due to +licensing restrictions: + +* `GangliaSink`: Sends metrics to a Ganglia node or multicast group. + +To install the `GangliaSink` you'll need to perform a custom build of Spark. _**Note that +by embedding this library you will include [LGPL](http://www.gnu.org/copyleft/lesser.html)-licensed +code in your Spark package**_. For sbt users, set the +`SPARK_GANGLIA_LGPL` environment varaible before building. For Maven users, enable +the `-Pspark-ganglia-lgpl` profile. For users linking applications against Spark, link +include the `spark-ganglia-lgpl` artifact as a dependency. --- End diff -- This is kind of confusing because it's not clear that you should *both* build a custom Spark *and* have applications link to spark-ganglia-lgpl. It made it sound like you do one of the Maven command, or the SBT one, or adding that dependency. But in fact you need to deploy the special build to the cluster and also link your app to this.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---