Re: Updating Values Inside Foreach Rdd loop

2016-05-10 Thread Rishi Mishra
Hi Harsh, Probably you need to maintain some state for your values, as you are updating some of the keys in a batch and check for a global state of your equation. Can you check the API mapWithState of DStream ? Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/)

Re: Updating Values Inside Foreach Rdd loop

2016-05-09 Thread HARSH TAKKAR
Hi Please help. On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, wrote: > Hi Ted > > Following is my use case. > > I have a prediction algorithm where i need to update some records to > predict the target. > > For eg. > I have an eq. Y= mX +c > I need to change value of Xi

Re: Updating Values Inside Foreach Rdd loop

2016-05-07 Thread HARSH TAKKAR
Hi Ted Following is my use case. I have a prediction algorithm where i need to update some records to predict the target. For eg. I have an eq. Y= mX +c I need to change value of Xi of some records and calculate sum(Yi) if the value of prediction is not close to target value then repeat the

Re: Updating Values Inside Foreach Rdd loop

2016-05-06 Thread Ted Yu
Using RDDs requires some 'low level' optimization techniques. While using dataframes / Spark SQL allows you to leverage existing code. If you can share some more of your use case, that would help other people provide suggestions. Thanks > On May 6, 2016, at 6:57 PM, HARSH TAKKAR

Re: Updating Values Inside Foreach Rdd loop

2016-05-06 Thread HARSH TAKKAR
Hi Ted I am aware that rdd are immutable, but in my use case i need to update same data set after each iteration. Following are the points which i was exploring. 1. Generating rdd in each iteration.( It might use a lot of memory). 2. Using Hive tables and update the same table after each

Re: Updating Values Inside Foreach Rdd loop

2016-05-06 Thread Ted Yu
Please see the doc at the beginning of RDD class: * A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, * partitioned collection of elements that can be operated on in parallel. This class contains the * basic operations available on all RDDs, such

Updating Values Inside Foreach Rdd loop

2016-05-06 Thread HARSH TAKKAR
Hi Is there a way i can modify a RDD, in for-each loop, Basically, i have a use case in which i need to perform multiple iteration over data and modify few values in each iteration. Please help.