Hi Please help.
On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, <takkarha...@gmail.com> wrote: > Hi Ted > > Following is my use case. > > I have a prediction algorithm where i need to update some records to > predict the target. > > For eg. > I have an eq. Y= mX +c > I need to change value of Xi of some records and calculate sum(Yi) if the > value of prediction is not close to target value then repeat the process. > > In each iteration different set of values are updated but result is > checked when we sum up the values. > > On Sat, 7 May 2016, 8:58 a.m. Ted Yu, <yuzhih...@gmail.com> wrote: > >> Using RDDs requires some 'low level' optimization techniques. >> While using dataframes / Spark SQL allows you to leverage existing code. >> >> If you can share some more of your use case, that would help other people >> provide suggestions. >> >> Thanks >> >> On May 6, 2016, at 6:57 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote: >> >> Hi Ted >> >> I am aware that rdd are immutable, but in my use case i need to update >> same data set after each iteration. >> >> Following are the points which i was exploring. >> >> 1. Generating rdd in each iteration.( It might use a lot of memory). >> >> 2. Using Hive tables and update the same table after each iteration. >> >> Please suggest,which one of the methods listed above will be good to use >> , or is there are more better ways to accomplish it. >> >> On Fri, 6 May 2016, 7:09 p.m. Ted Yu, <yuzhih...@gmail.com> wrote: >> >>> Please see the doc at the beginning of RDD class: >>> >>> * A Resilient Distributed Dataset (RDD), the basic abstraction in >>> Spark. Represents an immutable, >>> * partitioned collection of elements that can be operated on in >>> parallel. This class contains the >>> * basic operations available on all RDDs, such as `map`, `filter`, and >>> `persist`. In addition, >>> >>> On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> Is there a way i can modify a RDD, in for-each loop, >>>> >>>> Basically, i have a use case in which i need to perform multiple >>>> iteration over data and modify few values in each iteration. >>>> >>>> >>>> Please help. >>>> >>> >>>