Hi Harsh, Probably you need to maintain some state for your values, as you are updating some of the keys in a batch and check for a global state of your equation. Can you check the API mapWithState of DStream ?
Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedin.com/in/rishiteshmishra On Mon, May 9, 2016 at 8:40 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote: > Hi > > Please help. > > On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, <takkarha...@gmail.com> > wrote: > >> Hi Ted >> >> Following is my use case. >> >> I have a prediction algorithm where i need to update some records to >> predict the target. >> >> For eg. >> I have an eq. Y= mX +c >> I need to change value of Xi of some records and calculate sum(Yi) if the >> value of prediction is not close to target value then repeat the process. >> >> In each iteration different set of values are updated but result is >> checked when we sum up the values. >> >> On Sat, 7 May 2016, 8:58 a.m. Ted Yu, <yuzhih...@gmail.com> wrote: >> >>> Using RDDs requires some 'low level' optimization techniques. >>> While using dataframes / Spark SQL allows you to leverage existing code. >>> >>> If you can share some more of your use case, that would help other >>> people provide suggestions. >>> >>> Thanks >>> >>> On May 6, 2016, at 6:57 PM, HARSH TAKKAR <takkarha...@gmail.com> wrote: >>> >>> Hi Ted >>> >>> I am aware that rdd are immutable, but in my use case i need to update >>> same data set after each iteration. >>> >>> Following are the points which i was exploring. >>> >>> 1. Generating rdd in each iteration.( It might use a lot of memory). >>> >>> 2. Using Hive tables and update the same table after each iteration. >>> >>> Please suggest,which one of the methods listed above will be good to use >>> , or is there are more better ways to accomplish it. >>> >>> On Fri, 6 May 2016, 7:09 p.m. Ted Yu, <yuzhih...@gmail.com> wrote: >>> >>>> Please see the doc at the beginning of RDD class: >>>> >>>> * A Resilient Distributed Dataset (RDD), the basic abstraction in >>>> Spark. Represents an immutable, >>>> * partitioned collection of elements that can be operated on in >>>> parallel. This class contains the >>>> * basic operations available on all RDDs, such as `map`, `filter`, and >>>> `persist`. In addition, >>>> >>>> On Fri, May 6, 2016 at 5:25 AM, HARSH TAKKAR <takkarha...@gmail.com> >>>> wrote: >>>> >>>>> Hi >>>>> >>>>> Is there a way i can modify a RDD, in for-each loop, >>>>> >>>>> Basically, i have a use case in which i need to perform multiple >>>>> iteration over data and modify few values in each iteration. >>>>> >>>>> >>>>> Please help. >>>>> >>>> >>>>