Ok, so those line numbers in our DAG don't refer to our code. Is there any
way to display (or calculate) line numbers that refer to code we actually
wrote, or is that only possible in Scala Spark?
On Thu, Jul 21, 2016 at 12:24 PM, Jacek Laskowski wrote:
> Hi,
>
> My little
That -1 is coming from here:
PythonRDD.writeIteratorToStream(inputIterator, dataOut)
dataOut.writeInt(SpecialLengths.END_OF_DATA_SECTION) —> val
END_OF_DATA_SECTION = -1
dataOut.writeInt(SpecialLengths.END_OF_STREAM)
dataOut.flush()
> On Jul 21, 2016, at 12:24 PM, Jacek Laskowski
Hi,
My little understanding of Python-Spark bridge is that at some point
the python code communicates over the wire with Spark's backbone that
includes PythonRDD [1].
When the CallSite can't be computed, it's null:-1 to denote "nothing
could be referred to".
[1]
>
> It's called a CallSite that shows where the line comes from. You can see
> the code yourself given the python file and the line number.
>
But that's what I don't understand. Which python file? We spark submit one
file called ctr_parsing.py, but it only has 150 lines. So what is
MapPartitions
On Thu, Jul 21, 2016 at 2:56 AM, C. Josephson wrote:
> I just started looking at the DAG for a Spark Streaming job, and had a
> couple of questions about it (image inline).
>
> 1.) What do the numbers in brackets mean, e.g. PythonRDD[805]?
>
Every RDD has its identifier (as id