Re: Understanding Spark UI DAGs
Ok, so those line numbers in our DAG don't refer to our code. Is there any way to display (or calculate) line numbers that refer to code we actually wrote, or is that only possible in Scala Spark? On Thu, Jul 21, 2016 at 12:24 PM, Jacek Laskowskiwrote: > Hi, > > My little understanding of Python-Spark bridge is that at some point > the python code communicates over the wire with Spark's backbone that > includes PythonRDD [1]. > > When the CallSite can't be computed, it's null:-1 to denote "nothing > could be referred to". > > [1] > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Jul 21, 2016 at 8:36 PM, C. Josephson wrote: > >> It's called a CallSite that shows where the line comes from. You can see > >> the code yourself given the python file and the line number. > > > > > > But that's what I don't understand. Which python file? We spark submit > one > > file called ctr_parsing.py, but it only has 150 lines. So what is > > MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py > imports a > > number of support functions we wrote, but how do we know which python > file > > to look at? > > > > Furthermore, what on earth is null:-1 referring to? > -- Colleen Josephson Engineering Researcher Uhana, Inc.
Re: Understanding Spark UI DAGs
That -1 is coming from here: PythonRDD.writeIteratorToStream(inputIterator, dataOut) dataOut.writeInt(SpecialLengths.END_OF_DATA_SECTION) —> val END_OF_DATA_SECTION = -1 dataOut.writeInt(SpecialLengths.END_OF_STREAM) dataOut.flush() > On Jul 21, 2016, at 12:24 PM, Jacek Laskowskiwrote: > > Hi, > > My little understanding of Python-Spark bridge is that at some point > the python code communicates over the wire with Spark's backbone that > includes PythonRDD [1]. > > When the CallSite can't be computed, it's null:-1 to denote "nothing > could be referred to". > > [1] > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Jul 21, 2016 at 8:36 PM, C. Josephson wrote: >>> It's called a CallSite that shows where the line comes from. You can see >>> the code yourself given the python file and the line number. >> >> >> But that's what I don't understand. Which python file? We spark submit one >> file called ctr_parsing.py, but it only has 150 lines. So what is >> MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py imports a >> number of support functions we wrote, but how do we know which python file >> to look at? >> >> Furthermore, what on earth is null:-1 referring to? > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > -- Collective[i] dramatically improves sales and marketing performance using technology, applications and a revolutionary network designed to provide next generation analytics and decision-support directly to business users. Our goal is to maximize human potential and minimize mistakes. In most cases, the results are astounding. We cannot, however, stop emails from sometimes being sent to the wrong person. If you are not the intended recipient, please notify us by replying to this email's sender and deleting it (and any attachments) permanently from your system. If you are, please respect the confidentiality of this communication's contents.
Re: Understanding Spark UI DAGs
Hi, My little understanding of Python-Spark bridge is that at some point the python code communicates over the wire with Spark's backbone that includes PythonRDD [1]. When the CallSite can't be computed, it's null:-1 to denote "nothing could be referred to". [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jul 21, 2016 at 8:36 PM, C. Josephsonwrote: >> It's called a CallSite that shows where the line comes from. You can see >> the code yourself given the python file and the line number. > > > But that's what I don't understand. Which python file? We spark submit one > file called ctr_parsing.py, but it only has 150 lines. So what is > MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py imports a > number of support functions we wrote, but how do we know which python file > to look at? > > Furthermore, what on earth is null:-1 referring to? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Understanding Spark UI DAGs
> > It's called a CallSite that shows where the line comes from. You can see > the code yourself given the python file and the line number. > But that's what I don't understand. Which python file? We spark submit one file called ctr_parsing.py, but it only has 150 lines. So what is MapPartitions at PythonRDD.scala:374 referring to? ctr_parsing.py imports a number of support functions we wrote, but how do we know which python file to look at? Furthermore, what on earth is null:-1 referring to?
Re: Understanding Spark UI DAGs
On Thu, Jul 21, 2016 at 2:56 AM, C. Josephsonwrote: > I just started looking at the DAG for a Spark Streaming job, and had a > couple of questions about it (image inline). > > 1.) What do the numbers in brackets mean, e.g. PythonRDD[805]? > Every RDD has its identifier (as id attribute) within a SparkContext (which is the broadest scope an RDD can belong to). In this case, it means you've already created 806 RDDs (counting from 0). > 2.) What code is "RDD at PythonRDD.scala:43" referring to? Is there any > way to tie this back to lines of code we've written in pyspark? > It's called a CallSite that shows where the line comes from. You can see the code yourself given the python file and the line number. Jacek