Sorry, it wasn't the count it was the reduce method that retrieves information from the RDD. I has to go through all the rdd values to return the result.
2016-09-16 11:18 GMT-03:00 chen yong <cy...@hotmail.com>: > Dear Dirceu, > > > I am totally confused . In your reply you mentioned ".....the count does > that, ..." .However, in the code snippet shown in the attachment file > FelixProblem.png > of your previous mail, I cannot find any 'count' ACTION is called. Would > you please clearly show me the line it is which triggeres the evaluation. > > Thanks you very much > ------------------------------ > *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> > *发送时间:* 2016年9月16日 21:07 > *收件人:* chen yong > *抄送:* user@spark.apache.org > *主题:* Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in > an anonymous function > > Hello Felix, > No, this line isn't the one that is triggering the execution of the > function, the count does that, unless your count val is a lazy val. > The count method is the one that retrieves the information of the rdd, it > has do go through all of it's data do determine how many records the RDD > has. > > Regards, > > 2016-09-15 22:23 GMT-03:00 chen yong <cy...@hotmail.com>: > >> >> Dear Dirceu, >> >> Thanks for your kind help. >> i cannot see any code line corresponding to "..... retrieve the data from >> your DataFrame/RDDs....". which you suggested in the previous replies. >> >> Later, I guess >> >> the line >> >> val test = count >> >> is the key point. without it, it would not stop at the breakpont-1, right? >> >> >> >> ------------------------------ >> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >> *发送时间:* 2016年9月16日 0:39 >> *收件人:* chen yong >> *抄送:* user@spark.apache.org >> *主题:* Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an >> anonymous function >> >> Hi Felix, >> Are sure your n is greater than 0? >> Here it stops first at breakpoint 1, image attached. >> Have you got the count to see if it's also greater than 0? >> >> 2016-09-15 11:41 GMT-03:00 chen yong <cy...@hotmail.com>: >> >>> Dear Dirceu >>> >>> >>> Thank you for your help. >>> >>> >>> Acutally, I use Intellij IDEA to dubug the spark code. >>> >>> >>> Let me use the following code snippet to illustrate my problem. In the >>> code lines below, I've set two breakpoints, breakpoint-1 and breakpoint-2. >>> when i debuged the code, it did not stop at breakpoint-1, it seems that >>> the map >>> >>> function was skipped and it directly reached and stoped at the >>> breakpoint-2. >>> >>> Additionally, I find the following two posts >>> (1)http://stackoverflow.com/questions/29208844/apache-spark- >>> logging-within-scala >>> (2)https://www.mail-archive.com/user@spark.apache.org/msg29010.html >>> >>> I am wondering whether loggin is an alternative approach to debugging >>> spark anonymous functions. >>> >>> >>> val count = spark.parallelize(1 to n, slices).map { i => >>> val x = random * 2 - 1 >>> val y = random * 2 - 1 (breakpoint-1 set in this line) >>> if (x*x + y*y < 1) 1 else 0 >>> }.reduce(_ + _) >>> val test = x (breakpoint-2 set in this line) >>> >>> >>> >>> ------------------------------ >>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >>> *发送时间:* 2016年9月14日 23:32 >>> *收件人:* chen yong >>> *主题:* Re: 答复: 答复: t it does not stop at breakpoints which is in an >>> anonymous function >>> >>> I don't know which IDE do you use. I use Intellij, and here there is an >>> Evaluate Expression dialog where I can execute code, whenever it has >>> stopped in a breakpoint. >>> In eclipse you have watch and inspect where you can do the same. >>> Probably you are not seeing the debug stop in your functions because you >>> never retrieve the data from your DataFrame/RDDs. >>> What are you doing with this function? Are you getting the result of >>> this RDD/Dataframe at some place? >>> You can add a count after the function that you want to debug, just for >>> debug, but don't forget to remove this after testing. >>> >>> >>> >>> 2016-09-14 12:20 GMT-03:00 chen yong <cy...@hotmail.com>: >>> >>>> Dear Dirceu, >>>> >>>> >>>> thanks you again. >>>> >>>> >>>> Actually,I never saw it stopped at the breakpoints no matter how long I >>>> wait. It just skipped the whole anonymous function to direactly reach >>>> the first breakpoint immediately after the anonymous function body. Is that >>>> normal? I suspect sth wrong in my debugging operations or settings. I am >>>> very new to spark and scala. >>>> >>>> >>>> Additionally, please give me some detailed instructions about "....Some >>>> ides provide you a place where you can execute the code to see it's >>>> results....". where is the PLACE >>>> >>>> >>>> your help badly needed! >>>> >>>> >>>> ------------------------------ >>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >>>> *发送时间:* 2016年9月14日 23:07 >>>> *收件人:* chen yong >>>> *主题:* Re: 答复: t it does not stop at breakpoints which is in an >>>> anonymous function >>>> >>>> You can call a count in the ide just to debug, or you can wait until it >>>> reaches the code, so you can debug. >>>> Some ides provide you a place where you can execute the code to see >>>> it's results. >>>> Be aware of not adding this operations in your production code, because >>>> they can slow down the execution of your code. >>>> >>>> >>>> >>>> 2016-09-14 11:43 GMT-03:00 chen yong <cy...@hotmail.com>: >>>> >>>>> >>>>> Thanks for your reply. >>>>> >>>>> you mean i have to insert some codes, such as xxxxx.count or >>>>> xxxxx.collect, between the original spark code lines to invoke some >>>>> operations, right? >>>>> but, where is the right places to put my code lines? >>>>> >>>>> Felix >>>>> >>>>> ------------------------------ >>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com> >>>>> *发送时间:* 2016年9月14日 22:33 >>>>> *收件人:* chen yong >>>>> *抄送:* user@spark.apache.org >>>>> *主题:* Re: t it does not stop at breakpoints which is in an anonymous >>>>> function >>>>> >>>>> Hello Felix, >>>>> Spark functions run lazy, and that's why it doesn't stop in those >>>>> breakpoints. >>>>> They will be executed only when you call some methods of your >>>>> dataframe/rdd, like the count, collect, ... >>>>> >>>>> Regards, >>>>> Dirceu >>>>> >>>>> 2016-09-14 11:26 GMT-03:00 chen yong <cy...@hotmail.com>: >>>>> >>>>>> Hi all, >>>>>> >>>>>> >>>>>> >>>>>> I am newbie to spark. I am learning spark by debugging the >>>>>> spark code. It is strange to me that it does not stop at breakpoints >>>>>> which >>>>>> is in an anonymous function, it is normal in ordianry function, >>>>>> though. It that normal. How to obverse variables in an anonymous >>>>>> function. >>>>>> >>>>>> >>>>>> Please help me. Thanks in advance! >>>>>> >>>>>> >>>>>> Felix >>>>>> >>>>> >>>>> >>>> >>> >> >