Hi Ravneet, Have you taken a look at fork join queues? http://www.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html or High concurrency counters without sharding? http://blog.notdot.net/2010/04/High-concurrency-counters-without-sharding
I think they may do what you need and are proven solutions. Thanks Rob On Jun 15, 9:38 pm, thecheatah <thechea...@gmail.com> wrote: > This is actually a pretty good implementation. The only issue is the > size of the processed task list. Instead of having two tasks, I am > thinking that the one task will clean up the processed task list > before it begins its work. Basically check that the processed inserts > have indeed been deleted. > > So the processed list records all the inserts processed in the > previous run. It first deletes all those inserts if needed, then it > goes on to process new tasks. > > Thanks, > > Ravneet > > On Jun 15, 11:48 am, Ravi Sharma <ping2r...@gmail.com> wrote: > > > > > > > > > In those scenerio you can go ahead and do something extra... > > > Keep a list of Keys in your highly updating object, and whenevr you process > > one insert and update it into main updating object make sure you put the key > > in this object's list property.,SO your main object will know if i have got > > the content of this insert or not > > > Say after it when you are deleting or updating insert object .. > > then when next time you get the same insert(as it was faile when you were > > marking it as processed), check if key exists in list.... if yes then mark > > the insert object processed and also remove it from list property. > > > Also then you need to have a another job which will clean the list property > > from updating object. read the object list..get the insert object for each > > key, if they are marked as processed then remove it from this list. > > > this will eventually increase your datatsore put but you will not have to > > worry about some inconsistency. > > > So you code will look like this > > Highly updating object will have property liek this > > List<Key> processedInserts; (in Java JDO) > > > TASK -1 > > 1) getNextInsert object say i1, assume its key is k1 > > //at this atge say processedInserts is empty > > 2) check if k1 exists in processedInserts, if no then go to step 3 else go > > to 4 > > 3) update Highly updating object with content of insert object i1, also add > > the k1 into processedInserts > > //at this stage it will have k1 in processedInserts > > 4) Update i1 as processed. > > > Now after this we will have a growing list of processedInserts > > property...and it has upper boud. So to keep it down. you need to have > > another job running once in a while or submit a task depending on step2, if > > processedInserts.size > some number say 500. > > TASK -2 > > In this task > > 1) getHighlyUpdatingObject > > 2) Loop through processedInserts > > 3) get Insert Object , if it is processed delete that key from > > processedInserts > > > Just make sure one of the TASK-1 and TASK2 running at one time. You can even > > run task-2 as part task-1 after step 4, upto you where you see it as safe > > and less If then else :) > > > On Wed, Jun 15, 2011 at 4:20 PM, thecheatah <thechea...@gmail.com> wrote: > > > Ravi, > > > > Thanks for the feedback. I was thinking exactly along the lines of > > > what you have said. The only problem that I see is that I plan on > > > processing multiple inserts in one batch job. The inserts and the > > > highly updated object will not be updatable in a single transaction. > > > Thus, there might be situations where an insert was processed but the > > > flag was not set or the row was not deleted. To overcome this issue, I > > > am going to either use make sure that processing inserts multiple > > > times does not effect the output or accept a small percentage of > > > failures. > > > > Ravneet > > > > On Jun 15, 5:18 am, Ravi Sharma <ping2r...@gmail.com> wrote: > > > > if A B C and are not dependent on each other and ordering doesnt matter > > > for > > > > you e.g. if you process C A B..then also its fine then you can put a > > > another > > > > column in this insert table. say processed. > > > > When inserting make it N(if string) or false(boolean). > > > > > and query that entity based on this column. > > > > Whenevr you prcoess one row, make the value Y or true. and carry on with > > > > next insert. > > > > > or even you can delete these rows once you have processed them ..then > > > > you > > > > will not need to have extra column.... > > > > > Note: I am considering that for one update you will be processing all > > > > its > > > > insert in one task or job...no mutliprocessing > > > > > On Wed, Jun 15, 2011 at 4:20 AM, thecheatah <thechea...@gmail.com> > > > wrote: > > > > > I am trying to implement a system for an object that will be updated a > > > > > lot. The way I was thinking was to turn the updates into inserts then > > > > > have a batch job that executes the inserts in batches to update the > > > > > highly writable object. The inserts can either be sorted by time or by > > > > > some sort of an incremented identifier. This identifier or timestamp > > > > > can be stored on the highly writable object so the next time the job > > > > > runs it knows where to start executing the next batch. > > > > > > Using timestamp I am running into a problem with eventual consistency. > > > > > When I search for inserts to execute some inserts might not make it > > > > > into the query because they were not inserted into the index yet. So > > > > > suppose we have insert A, B and C. If A and C make it into the batch > > > > > job, it will mark all work up to C completed and B will never be > > > > > executed. > > > > > > Using incremented identifiers seems like it will solve the problem but > > > > > implementing such an identifier itself is not clear. To explain why it > > > > > would solve the original problem, we would be able to detect when we > > > > > went from A to C as the difference in the identifiers would be greater > > > > > then 1. The sharded counter is great for counting, but is not good to > > > > > use as a unique identifier given eventual consistency. > > > > > > I can use the memcached increment function but the counter might be > > > > > flushed out of memory at anytime. I believe the memcache update speed > > > > > should be enough for what I want to do. > > > > > > If I had an upper bound time limit on the eventual consistency, I > > > > > could make my system so that it only processes inserts older then the > > > > > time limit. > > > > > > Anyways those are my thoughts and any feedback is appreciated. > > > > > > BTW: The inserts processed in batches are assumed to be not dependent > > > > > on each other. > > > > > > -- > > > > > You received this message because you are subscribed to the Google > > > Groups > > > > > "Google App Engine" group. > > > > > To post to this group, send email to google-appengine@googlegroups.com > > > . > > > > > To unsubscribe from this group, send email to > > > > > google-appengine+unsubscr...@googlegroups.com. > > > > > For more options, visit this group at > > > > >http://groups.google.com/group/google-appengine?hl=en. > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "Google App Engine" group. > > > To post to this group, send email to google-appengine@googlegroups.com. > > > To unsubscribe from this group, send email to > > > google-appengine+unsubscr...@googlegroups.com. > > > For more options, visit this group at > > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.