I am having trouble adding logging to the class that does serialization and
deserialization. Where is the code for org.apache.spark.Logging located?
and is this serializable?
On Mon, May 12, 2014 at 10:02 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
Ah, yes, that is correct. You
foreach vs. map isn't the issue. Both require serializing the called
function, so the pickle error would still apply, yes?
And at the moment, I'm just testing. Definitely wouldn't want to log
something for each element, but may want to detect something and log for
SOME elements.
So my question
Ah, yes, that is correct. You need a serializable object one way or the
other.
An alternate suggestion would be to use a combination of
RDD.sample()http://spark.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#sampleand
collect() to take a look at some small amount of data and just
What should I do if I want to log something as part of a task?
This is what I tried. To set up a logger, I followed the advice here:
http://py4j.sourceforge.net/faq.html#how-to-turn-logging-on-off
logger = logging.getLogger(py4j)
logger.setLevel(logging.INFO)
I think you're looking for
RDD.foreach()http://spark.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#foreach
.
According to the programming
guidehttp://spark.apache.org/docs/latest/scala-programming-guide.html
:
Run a function func on each element of the dataset. This is usually