Re: Understanding Spark UI DAGs

2016-07-21 Thread C. Josephson
Ok, so those line numbers in our DAG don't refer to our code. Is there any way to display (or calculate) line numbers that refer to code we actually wrote, or is that only possible in Scala Spark? On Thu, Jul 21, 2016 at 12:24 PM, Jacek Laskowski wrote: > Hi, > > My little

Re: Understanding Spark UI DAGs

2016-07-21 Thread RK Aduri
That -1 is coming from here: PythonRDD.writeIteratorToStream(inputIterator, dataOut) dataOut.writeInt(SpecialLengths.END_OF_DATA_SECTION) —> val END_OF_DATA_SECTION = -1 dataOut.writeInt(SpecialLengths.END_OF_STREAM) dataOut.flush() > On Jul 21, 2016, at 12:24 PM, Jacek Laskowski

Re: Understanding Spark UI DAGs

2016-07-21 Thread Jacek Laskowski
Hi, My little understanding of Python-Spark bridge is that at some point the python code communicates over the wire with Spark's backbone that includes PythonRDD [1]. When the CallSite can't be computed, it's null:-1 to denote "nothing could be referred to". [1]

Re: Understanding Spark UI DAGs

2016-07-21 Thread C. Josephson
> > It's called a CallSite that shows where the line comes from. You can see > the code yourself given the python file and the line number. > But that's what I don't understand. Which python file? We spark submit one file called ctr_parsing.py, but it only has 150 lines. So what is MapPartitions

Re: Understanding Spark UI DAGs

2016-07-21 Thread Jacek Laskowski
On Thu, Jul 21, 2016 at 2:56 AM, C. Josephson wrote: > I just started looking at the DAG for a Spark Streaming job, and had a > couple of questions about it (image inline). > > 1.) What do the numbers in brackets mean, e.g. PythonRDD[805]? > Every RDD has its identifier (as id