Hi

Please help.

On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, <takkarha...@gmail.com> wrote:

> Hi Ted
>
> Following is my use case.
>
> I have a prediction algorithm where i need to update some records to
> predict the target.
>
> For eg.
> I have an eq. Y=  mX +c
> I need to change value of Xi of some records and calculate sum(Yi) if the
> value of prediction is not close to target value then repeat the process.
>
> In each iteration different set of values are updated but result is
> checked when we sum up the values.
>
> On Sat, 7 May 2016, 8:58 a.m. Ted Yu, <yuzhih...@gmail.com> wrote:
>
>> Using RDDs requires some 'low level' optimization techniques.
>> While using dataframes / Spark SQL allows you to leverage existing code.
>>
>> If you can share some more of your use case, that would help other people
>> provide suggestions.
>>
>> Thanks
>>
>> On May 6, 2016, at 6:57 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote:
>>
>> Hi Ted
>>
>> I am aware that rdd are immutable, but in my use case i need to update
>> same data set after each iteration.
>>
>> Following are the points which i was exploring.
>>
>> 1. Generating rdd in each iteration.( It might use a lot of memory).
>>
>> 2. Using Hive tables and update the same table after each iteration.
>>
>> Please suggest,which one of the methods listed above will be good to use
>> , or is there are more better ways to accomplish it.
>>
>> On Fri, 6 May 2016, 7:09 p.m. Ted Yu, <yuzhih...@gmail.com> wrote:
>>
>>> Please see the doc at the beginning of RDD class:
>>>
>>>  * A Resilient Distributed Dataset (RDD), the basic abstraction in
>>> Spark. Represents an immutable,
>>>  * partitioned collection of elements that can be operated on in
>>> parallel. This class contains the
>>>  * basic operations available on all RDDs, such as `map`, `filter`, and
>>> `persist`. In addition,
>>>
>>> On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Is there a way i can modify a RDD, in for-each loop,
>>>>
>>>> Basically, i have a use case in which i need to perform multiple
>>>> iteration over data and modify few values in each iteration.
>>>>
>>>>
>>>> Please help.
>>>>
>>>
>>>

Reply via email to