Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/108#discussion_r10417447 --- Diff: docs/monitoring.md --- @@ -48,11 +48,22 @@ Each instance can report to zero or more _sinks_. Sinks are contained in the * `ConsoleSink`: Logs metrics information to the console. * `CSVSink`: Exports metrics data to CSV files at regular intervals. -* `GangliaSink`: Sends metrics to a Ganglia node or multicast group. * `JmxSink`: Registers metrics for viewing in a JXM console. * `MetricsServlet`: Adds a servlet within the existing Spark UI to serve metrics data as JSON data. * `GraphiteSink`: Sends metrics to a Graphite node. +Spark also supports a Ganglia sink which is not included in the default build due to +licensing restrictions: + +* `GangliaSink`: Sends metrics to a Ganglia node or multicast group. + +To install the `GangliaSink` you'll need to perform a custom build of Spark. _**Note that +by embedding this library you will include [LGPL](http://www.gnu.org/copyleft/lesser.html)-licensed +code in your Spark package**_. For sbt users, set the +`SPARK_GANGLIA_LGPL` environment varaible before building. For Maven users, enable +the `-Pspark-ganglia-lgpl` profile. For users linking applications against Spark, link +include the `spark-ganglia-lgpl` artifact as a dependency. --- End diff -- Makes sense, I'll update. Though it depends what you are doing. If you run locally you can just link against it - if you run on a cluster and mark Spark as provided you need to do it in both places. The second is going to be way more likely for this so I'll just mention to do both things.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---