Re: Using Log4j for logging messages inside lambda functions

2015-05-26 Thread Spico Florin
Hello!
  Thank you all for your answers. Akhil's proposed solution works fine.
Thanks.
 Florin

On Tue, May 26, 2015 at 3:08 AM, Wesley Miao wesley.mi...@gmail.com wrote:

 The reason it didn't work for you is that the function you registered with
 someRdd.map will be running on the worker/executor side, not in your
 driver's program. Then you need to be careful to not accidentally close
 over some objects instantiated from your driver's program, like the log
 object in your sample code above. You can look for more information online
 to understand more the concept of Closure so that you can understand to
 the bottom of it why it didn't work for you at first place.

 The usual solution to this type of problems is to instantiate the objects
 you want to use in your map functions from within your map functions. You
 can define a factory object that you can create your log object from.

 On Mon, May 25, 2015 at 11:05 PM, Spico Florin spicoflo...@gmail.com
 wrote:

 Hello!
   I would like to use the logging mechanism provided by the log4j, but
 I'm getting the
 Exception in thread main org.apache.spark.SparkException: Task not
 serializable - Caused by: java.io.NotSerializableException:
 org.apache.log4j.Logger

 The code (and the problem) that I'm using resembles the one used here :
 http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala,
 meaning:

 val log = Logger.getLogger(getClass.getName)

   def doTest() {
val conf = new SparkConf().setMaster(local[4]).setAppName(LogTest)
val spark = new SparkContext(conf)

val someRdd = spark.parallelize(List(1, 2, 3))
someRdd.map {
  element =
*log.info http://log.info(s$element will be processed)*
element + 1
 }
 I'm posting the same problem due to the fact that the one from
 stackoverflow didn't get any answer.
 In this case, can you please tell us what is the best way to use  logging?
 Is any solution that is not using the rdd.forEachPartition?

 I look forward for your answers.
 Regards,
 Florin










Re: Using Log4j for logging messages inside lambda functions

2015-05-25 Thread Akhil Das
Try this way:

object Holder extends Serializable {  @transient lazy val log =
Logger.getLogger(getClass.getName)}


val someRdd = spark.parallelize(List(1, 2, 3))
   someRdd.map {
 element =
   Holder.*log.info http://log.info/(s$element will be processed)*
   element + 1
}



Thanks
Best Regards

On Mon, May 25, 2015 at 8:35 PM, Spico Florin spicoflo...@gmail.com wrote:

 Hello!
   I would like to use the logging mechanism provided by the log4j, but I'm
 getting the
 Exception in thread main org.apache.spark.SparkException: Task not
 serializable - Caused by: java.io.NotSerializableException:
 org.apache.log4j.Logger

 The code (and the problem) that I'm using resembles the one used here :
 http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala,
 meaning:

 val log = Logger.getLogger(getClass.getName)

   def doTest() {
val conf = new SparkConf().setMaster(local[4]).setAppName(LogTest)
val spark = new SparkContext(conf)

val someRdd = spark.parallelize(List(1, 2, 3))
someRdd.map {
  element =
*log.info http://log.info(s$element will be processed)*
element + 1
 }
 I'm posting the same problem due to the fact that the one from
 stackoverflow didn't get any answer.
 In this case, can you please tell us what is the best way to use  logging?
 Is any solution that is not using the rdd.forEachPartition?

 I look forward for your answers.
 Regards,
 Florin








Using Log4j for logging messages inside lambda functions

2015-05-25 Thread Spico Florin
Hello!
  I would like to use the logging mechanism provided by the log4j, but I'm
getting the
Exception in thread main org.apache.spark.SparkException: Task not
serializable - Caused by: java.io.NotSerializableException:
org.apache.log4j.Logger

The code (and the problem) that I'm using resembles the one used here :
http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala,
meaning:

val log = Logger.getLogger(getClass.getName)

  def doTest() {
   val conf = new SparkConf().setMaster(local[4]).setAppName(LogTest)
   val spark = new SparkContext(conf)

   val someRdd = spark.parallelize(List(1, 2, 3))
   someRdd.map {
 element =
   *log.info http://log.info(s$element will be processed)*
   element + 1
}
I'm posting the same problem due to the fact that the one from
stackoverflow didn't get any answer.
In this case, can you please tell us what is the best way to use  logging?
Is any solution that is not using the rdd.forEachPartition?

I look forward for your answers.
Regards,
Florin


Re: Using Log4j for logging messages inside lambda functions

2015-05-25 Thread Wesley Miao
The reason it didn't work for you is that the function you registered with
someRdd.map will be running on the worker/executor side, not in your
driver's program. Then you need to be careful to not accidentally close
over some objects instantiated from your driver's program, like the log
object in your sample code above. You can look for more information online
to understand more the concept of Closure so that you can understand to
the bottom of it why it didn't work for you at first place.

The usual solution to this type of problems is to instantiate the objects
you want to use in your map functions from within your map functions. You
can define a factory object that you can create your log object from.

On Mon, May 25, 2015 at 11:05 PM, Spico Florin spicoflo...@gmail.com
wrote:

 Hello!
   I would like to use the logging mechanism provided by the log4j, but I'm
 getting the
 Exception in thread main org.apache.spark.SparkException: Task not
 serializable - Caused by: java.io.NotSerializableException:
 org.apache.log4j.Logger

 The code (and the problem) that I'm using resembles the one used here :
 http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala,
 meaning:

 val log = Logger.getLogger(getClass.getName)

   def doTest() {
val conf = new SparkConf().setMaster(local[4]).setAppName(LogTest)
val spark = new SparkContext(conf)

val someRdd = spark.parallelize(List(1, 2, 3))
someRdd.map {
  element =
*log.info http://log.info(s$element will be processed)*
element + 1
 }
 I'm posting the same problem due to the fact that the one from
 stackoverflow didn't get any answer.
 In this case, can you please tell us what is the best way to use  logging?
 Is any solution that is not using the rdd.forEachPartition?

 I look forward for your answers.
 Regards,
 Florin