Sorry, it wasn't the count it was the reduce method that retrieves
information from the RDD.
I has to go through all the rdd values to return the result.


2016-09-16 11:18 GMT-03:00 chen yong <cy...@hotmail.com>:

> Dear Dirceu,
>
>
> I am totally confused . In your reply you mentioned ".....the count does
> that, ..." .However, in the code snippet shown in  the attachment file 
> FelixProblem.png
> of your previous mail,  I cannot find any 'count' ACTION is called.  Would
> you please clearly show me the line it is which triggeres the evaluation.
>
> Thanks you very much
> ------------------------------
> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
> *发送时间:* 2016年9月16日 21:07
> *收件人:* chen yong
> *抄送:* user@spark.apache.org
> *主题:* Re: 答复: 答复: 答复: 答复: t it does not stop at breakpoints which is in
> an anonymous function
>
> Hello Felix,
> No, this line isn't the one that is triggering the execution of the
> function, the count does that, unless your count val is a lazy val.
> The count method is the one that retrieves the information of the rdd, it
> has do go through all of it's data do determine how many records the RDD
> has.
>
> Regards,
>
> 2016-09-15 22:23 GMT-03:00 chen yong <cy...@hotmail.com>:
>
>>
>> Dear Dirceu,
>>
>> Thanks for your kind help.
>> i cannot see any code line corresponding to "..... retrieve the data from
>> your DataFrame/RDDs....". which you suggested in the previous replies.
>>
>> Later, I guess
>>
>> the line
>>
>> val test = count
>>
>> is the key point. without it, it would not stop at the breakpont-1, right?
>>
>>
>>
>> ------------------------------
>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>> *发送时间:* 2016年9月16日 0:39
>> *收件人:* chen yong
>> *抄送:* user@spark.apache.org
>> *主题:* Re: 答复: 答复: 答复: t it does not stop at breakpoints which is in an
>> anonymous function
>>
>> Hi Felix,
>> Are sure your n is greater than 0?
>> Here it stops first at breakpoint 1, image attached.
>> Have you got the count to see if it's also greater than 0?
>>
>> 2016-09-15 11:41 GMT-03:00 chen yong <cy...@hotmail.com>:
>>
>>> Dear Dirceu
>>>
>>>
>>> Thank you for your help.
>>>
>>>
>>> Acutally, I use Intellij IDEA to dubug the spark code.
>>>
>>>
>>> Let me use the following code snippet to illustrate my problem. In the
>>> code lines below, I've set two breakpoints, breakpoint-1 and breakpoint-2.
>>> when i debuged the code, it did not stop at breakpoint-1, it seems that
>>> the map
>>>
>>> function was skipped and it directly reached and stoped at the
>>> breakpoint-2.
>>>
>>> Additionally, I find the following two posts
>>> (1)http://stackoverflow.com/questions/29208844/apache-spark-
>>> logging-within-scala
>>> (2)https://www.mail-archive.com/user@spark.apache.org/msg29010.html
>>>
>>> I am wondering whether loggin is an alternative approach to debugging
>>> spark anonymous functions.
>>>
>>>
>>> val count = spark.parallelize(1 to n, slices).map { i =>
>>>       val x = random * 2 - 1
>>>       val y = random * 2 - 1 (breakpoint-1 set in this line)
>>>       if (x*x + y*y < 1) 1 else 0
>>>     }.reduce(_ + _)
>>> val test = x (breakpoint-2 set in this line)
>>>
>>>
>>>
>>> ------------------------------
>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>>> *发送时间:* 2016年9月14日 23:32
>>> *收件人:* chen yong
>>> *主题:* Re: 答复: 答复: t it does not stop at breakpoints which is in an
>>> anonymous function
>>>
>>> I don't know which IDE do you use. I use Intellij, and here there is an
>>> Evaluate Expression dialog where I can execute code, whenever it has
>>> stopped in a breakpoint.
>>> In eclipse you have watch and inspect where you can do the same.
>>> Probably you are not seeing the debug stop in your functions because you
>>> never retrieve the data from your DataFrame/RDDs.
>>> What are you doing with this function? Are you getting the result of
>>> this RDD/Dataframe at some place?
>>> You can add a count after the function that you want to debug, just for
>>> debug, but don't forget to remove this after testing.
>>>
>>>
>>>
>>> 2016-09-14 12:20 GMT-03:00 chen yong <cy...@hotmail.com>:
>>>
>>>> Dear Dirceu,
>>>>
>>>>
>>>> thanks you again.
>>>>
>>>>
>>>> Actually,I never saw it stopped at the breakpoints no matter how long I
>>>> wait.  It just skipped the whole anonymous function to direactly reach
>>>> the first breakpoint immediately after the anonymous function body. Is that
>>>> normal? I suspect sth wrong in my debugging operations or settings. I am
>>>> very new to spark and  scala.
>>>>
>>>>
>>>> Additionally, please give me some detailed instructions about  "....Some
>>>> ides provide you a place where you can execute the code to see it's
>>>> results....". where is the PLACE
>>>>
>>>>
>>>> your help badly needed!
>>>>
>>>>
>>>> ------------------------------
>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>>>> *发送时间:* 2016年9月14日 23:07
>>>> *收件人:* chen yong
>>>> *主题:* Re: 答复: t it does not stop at breakpoints which is in an
>>>> anonymous function
>>>>
>>>> You can call a count in the ide just to debug, or you can wait until it
>>>> reaches the code, so you can debug.
>>>> Some ides provide you a place where you can execute the code to see
>>>> it's results.
>>>> Be aware of not adding this operations in your production code, because
>>>> they can slow down the execution of your code.
>>>>
>>>>
>>>>
>>>> 2016-09-14 11:43 GMT-03:00 chen yong <cy...@hotmail.com>:
>>>>
>>>>>
>>>>> Thanks for your reply.
>>>>>
>>>>> you mean i have to insert some codes, such as xxxxx.count or
>>>>> xxxxx.collect, between the original spark code lines to invoke some
>>>>> operations, right?
>>>>> but, where is the right places to put my code lines?
>>>>>
>>>>> Felix
>>>>>
>>>>> ------------------------------
>>>>> *发件人:* Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
>>>>> *发送时间:* 2016年9月14日 22:33
>>>>> *收件人:* chen yong
>>>>> *抄送:* user@spark.apache.org
>>>>> *主题:* Re: t it does not stop at breakpoints which is in an anonymous
>>>>> function
>>>>>
>>>>> Hello Felix,
>>>>> Spark functions run lazy, and that's why it doesn't stop in those
>>>>> breakpoints.
>>>>> They will be executed only when you call some methods of your
>>>>> dataframe/rdd, like the count, collect, ...
>>>>>
>>>>> Regards,
>>>>> Dirceu
>>>>>
>>>>> 2016-09-14 11:26 GMT-03:00 chen yong <cy...@hotmail.com>:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am newbie to spark. I am learning spark by debugging the
>>>>>> spark code. It is strange to me that it does not stop at breakpoints  
>>>>>> which
>>>>>> is in  an anonymous function, it is normal in ordianry function,
>>>>>> though. It that normal. How to obverse variables in  an anonymous
>>>>>> function.
>>>>>>
>>>>>>
>>>>>> Please help me. Thanks in advance!
>>>>>>
>>>>>>
>>>>>> Felix
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to