Hi Marcelo,
Thanks for bringing this up here, as this has been a topic of debate
recently. Some thoughts below.
... all of the suffer from the fact that the log message needs to be built
> even
> though it might not be used.
>
This is not true of the current implementation (and this is actually why
Spark has a logging trait instead of just using a logger directly.)
If you look at the original function signatures:
protected def logDebug(msg: => String) ...
The => implies that we are passing the msg by name instead of by value.
Under the covers, scala is creating a closure that can be used to calculate
the log message, only if its actually required. This does result is a
significant performance improvement, but still requires allocating an
object for the closure. The bytecode is really something like this:
val logMessage = new Function0() { def call() = "Log message" +
someExpensiveComputation() }
log.debug(logMessage)
In Catalyst and Spark SQL we are using the scala-logging package, which
uses macros to automatically rewrite all of your log statements.
You write: logger.debug(s"Log message $someExpensiveComputation")
You get:
if(logger.debugEnabled) {
val logMsg = "Log message" + someExpensiveComputation()
logger.debug(logMsg)
}
IMHO, this is the cleanest option (and is supported by Typesafe). Based on
a micro-benchmark, it is also the fastest:
std logging: 19885.48ms
spark logging 914.408ms
scala logging 729.779ms
Once the dust settles from the 1.0 release, I'd be in favor of
standardizing on scala-logging.
Michael