Re: Updating Values Inside Foreach Rdd loop

Rishi Mishra Tue, 10 May 2016 08:24:24 -0700

Hi Harsh,
Probably you need to maintain some state for your values, as you are
updating some of the keys in a batch and check for a global state of your
equation.
Can you check the API mapWithState of DStream ?


Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra

On Mon, May 9, 2016 at 8:40 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote:

> Hi
>
> Please help.
>
> On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, <takkarha...@gmail.com>
> wrote:
>
>> Hi Ted
>>
>> Following is my use case.
>>
>> I have a prediction algorithm where i need to update some records to
>> predict the target.
>>
>> For eg.
>> I have an eq. Y=  mX +c
>> I need to change value of Xi of some records and calculate sum(Yi) if the
>> value of prediction is not close to target value then repeat the process.
>>
>> In each iteration different set of values are updated but result is
>> checked when we sum up the values.
>>
>> On Sat, 7 May 2016, 8:58 a.m. Ted Yu, <yuzhih...@gmail.com> wrote:
>>
>>> Using RDDs requires some 'low level' optimization techniques.
>>> While using dataframes / Spark SQL allows you to leverage existing code.
>>>
>>> If you can share some more of your use case, that would help other
>>> people provide suggestions.
>>>
>>> Thanks
>>>
>>> On May 6, 2016, at 6:57 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote:
>>>
>>> Hi Ted
>>>
>>> I am aware that rdd are immutable, but in my use case i need to update
>>> same data set after each iteration.
>>>
>>> Following are the points which i was exploring.
>>>
>>> 1. Generating rdd in each iteration.( It might use a lot of memory).
>>>
>>> 2. Using Hive tables and update the same table after each iteration.
>>>
>>> Please suggest,which one of the methods listed above will be good to use
>>> , or is there are more better ways to accomplish it.
>>>
>>> On Fri, 6 May 2016, 7:09 p.m. Ted Yu, <yuzhih...@gmail.com> wrote:
>>>
>>>> Please see the doc at the beginning of RDD class:
>>>>
>>>>  * A Resilient Distributed Dataset (RDD), the basic abstraction in
>>>> Spark. Represents an immutable,
>>>>  * partitioned collection of elements that can be operated on in
>>>> parallel. This class contains the
>>>>  * basic operations available on all RDDs, such as `map`, `filter`, and
>>>> `persist`. In addition,
>>>>
>>>> On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Is there a way i can modify a RDD, in for-each loop,
>>>>>
>>>>> Basically, i have a use case in which i need to perform multiple
>>>>> iteration over data and modify few values in each iteration.
>>>>>
>>>>>
>>>>> Please help.
>>>>>
>>>>
>>>>

Re: Updating Values Inside Foreach Rdd loop

Reply via email to