Re: logging in pyspark

2014-05-22 Thread Shivani Rao
I am having trouble adding logging to the class that does serialization and deserialization. Where is the code for org.apache.spark.Logging located? and is this serializable? On Mon, May 12, 2014 at 10:02 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Ah, yes, that is correct. You

Re: logging in pyspark

2014-05-14 Thread Diana Carroll
foreach vs. map isn't the issue. Both require serializing the called function, so the pickle error would still apply, yes? And at the moment, I'm just testing. Definitely wouldn't want to log something for each element, but may want to detect something and log for SOME elements. So my question

Re: logging in pyspark

2014-05-12 Thread Nicholas Chammas
Ah, yes, that is correct. You need a serializable object one way or the other. An alternate suggestion would be to use a combination of RDD.sample()http://spark.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#sampleand collect() to take a look at some small amount of data and just

logging in pyspark

2014-05-06 Thread Diana Carroll
What should I do if I want to log something as part of a task? This is what I tried. To set up a logger, I followed the advice here: http://py4j.sourceforge.net/faq.html#how-to-turn-logging-on-off logger = logging.getLogger(py4j) logger.setLevel(logging.INFO)

Re: logging in pyspark

2014-05-06 Thread Nicholas Chammas
I think you're looking for RDD.foreach()http://spark.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#foreach . According to the programming guidehttp://spark.apache.org/docs/latest/scala-programming-guide.html : Run a function func on each element of the dataset. This is usually