Hi Ravneet,

Have you taken a look at fork join queues?
http://www.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html
or
High concurrency counters without sharding?
http://blog.notdot.net/2010/04/High-concurrency-counters-without-sharding

I think they may do what you need and are proven solutions.

Thanks
Rob

On Jun 15, 9:38 pm, thecheatah <thechea...@gmail.com> wrote:
> This is actually a pretty good implementation. The only issue is the
> size of the processed task list. Instead of having two tasks, I am
> thinking that the one task will clean up the processed task list
> before it begins its work. Basically check that the processed inserts
> have indeed been deleted.
>
> So the processed list records all the inserts processed in the
> previous run. It first deletes all those inserts if needed, then it
> goes on to process new tasks.
>
> Thanks,
>
> Ravneet
>
> On Jun 15, 11:48 am, Ravi Sharma <ping2r...@gmail.com> wrote:
>
>
>
>
>
>
>
> > In those scenerio you can go ahead and do something extra...
>
> > Keep a list of Keys in your highly updating object, and whenevr you process
> > one insert and update it into main updating object make sure you put the key
> > in this object's list property.,SO your main object will know if i have got
> > the content of this insert or not
>
> > Say after it when you are deleting or updating insert object ..
> > then when next time you get the same insert(as it was faile when you were
> > marking it as processed), check if key exists in list.... if yes then mark
> > the insert object processed and also remove it from list property.
>
> > Also then you need to have a another job which will clean the list property
> > from updating object. read the object list..get the insert object for each
> > key, if they are marked as processed then remove it from this list.
>
> > this will eventually increase your datatsore put but you will not have to
> > worry about some inconsistency.
>
> > So you code will look like this
> > Highly updating object will have property liek this
> > List<Key> processedInserts; (in Java JDO)
>
> > TASK -1
> > 1) getNextInsert  object say i1,  assume its key is k1
> > //at this atge say processedInserts is empty
> > 2) check if k1 exists in processedInserts, if no then go to step 3 else go
> > to 4
> > 3) update Highly updating object with content of insert object i1, also add
> > the k1 into processedInserts
> > //at this stage it will have k1 in processedInserts
> > 4) Update i1 as processed.
>
> > Now after this we will have a growing list of processedInserts
> > property...and it has upper boud. So to keep it down. you need to have
> > another job running once in a while or submit a task depending on step2, if
> > processedInserts.size > some number say 500.
> > TASK -2
> > In this task
> > 1) getHighlyUpdatingObject
> > 2) Loop through processedInserts
> > 3) get Insert Object , if it is processed delete that key from
> > processedInserts
>
> > Just make sure one of the TASK-1 and TASK2 running at one time. You can even
> > run task-2 as part task-1 after step 4, upto you where you see it as safe
> > and less If then else :)
>
> > On Wed, Jun 15, 2011 at 4:20 PM, thecheatah <thechea...@gmail.com> wrote:
> > > Ravi,
>
> > > Thanks for the feedback. I was thinking exactly along the lines of
> > > what you have said. The only problem that I see is that I plan on
> > > processing multiple inserts in one batch job. The inserts and the
> > > highly updated object will not be updatable in a single transaction.
> > > Thus, there might be situations where an insert was processed but the
> > > flag was not set or the row was not deleted. To overcome this issue, I
> > > am going to either use make sure that processing inserts multiple
> > > times does not effect the output or accept a small percentage of
> > > failures.
>
> > > Ravneet
>
> > > On Jun 15, 5:18 am, Ravi Sharma <ping2r...@gmail.com> wrote:
> > > > if A B C and are not dependent on each other and ordering doesnt matter
> > > for
> > > > you e.g. if you process C A B..then also its fine then you can put a
> > > another
> > > > column in this insert table.  say processed.
> > > > When inserting make it N(if string) or false(boolean).
>
> > > > and query that entity based on this column.
> > > > Whenevr you prcoess one row, make the value Y or true. and carry on with
> > > > next insert.
>
> > > > or even you can delete these rows once you have processed them ..then 
> > > > you
> > > > will not need to have extra column....
>
> > > > Note: I am considering that for one update you will be processing all 
> > > > its
> > > > insert in one task or job...no mutliprocessing
>
> > > > On Wed, Jun 15, 2011 at 4:20 AM, thecheatah <thechea...@gmail.com>
> > > wrote:
> > > > > I am trying to implement a system for an object that will be updated a
> > > > > lot. The way I was thinking was to turn the updates into inserts then
> > > > > have a batch job that executes the inserts in batches to update the
> > > > > highly writable object. The inserts can either be sorted by time or by
> > > > > some sort of an incremented identifier. This identifier or timestamp
> > > > > can be stored on the highly writable object so the next time the job
> > > > > runs it knows where to start executing the next batch.
>
> > > > > Using timestamp I am running into a problem with eventual consistency.
> > > > > When I search for inserts to execute some inserts might not make it
> > > > > into the query because they were not inserted into the index yet. So
> > > > > suppose we have insert A, B and C. If A and C make it into the batch
> > > > > job, it will mark all work up to C completed and B will never be
> > > > > executed.
>
> > > > > Using incremented identifiers seems like it will solve the problem but
> > > > > implementing such an identifier itself is not clear. To explain why it
> > > > > would solve the original problem, we would be able to detect when we
> > > > > went from A to C as the difference in the identifiers would be greater
> > > > > then 1. The sharded counter is great for counting, but is not good to
> > > > > use as a unique identifier given eventual consistency.
>
> > > > > I can use the memcached increment function but the counter might be
> > > > > flushed out of memory at anytime. I believe the memcache update speed
> > > > > should be enough for what I want to do.
>
> > > > > If I had an upper bound time limit on the eventual consistency, I
> > > > > could make my system so that it only processes inserts older then the
> > > > > time limit.
>
> > > > > Anyways those are my thoughts and any feedback is appreciated.
>
> > > > > BTW: The inserts processed in batches are assumed to be not dependent
> > > > > on each other.
>
> > > > > --
> > > > > You received this message because you are subscribed to the Google
> > > Groups
> > > > > "Google App Engine" group.
> > > > > To post to this group, send email to google-appengine@googlegroups.com
> > > .
> > > > > To unsubscribe from this group, send email to
> > > > > google-appengine+unsubscr...@googlegroups.com.
> > > > > For more options, visit this group at
> > > > >http://groups.google.com/group/google-appengine?hl=en.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to google-appengine@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > google-appengine+unsubscr...@googlegroups.com.
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to