Re: [google-appengine] Re: Architecture approaches for puts

Robert Kluin Thu, 11 Nov 2010 09:57:02 -0800

Hi Steve,
  Some follow-up responses are inline.

On Wed, Nov 10, 2010 at 13:34, stevep <prosse...@gmail.com> wrote:
> Thanks for the response.
>
> We are not very write intensive, so new records occur infrequently. As
> I read the allocate_ids documentation, it appears more oriented toward
> batch processes. Also, Google's recommended shard process for
> generating unique, sequential integers is a very nice bit of flexible,
> fast kit that we like.

allocate_ids can be used to generate a single id, just like doing
SomeKind().put() will generate an id automatically.  I am not aware of
any recommended way to use sharded counters to generate a unique
sequence of sequential numbers.  The whole point of sharding is to let
you partially mitigate issues with transactions and contention; but
without transactions you could easily generate duplicate id values.  I
would suggest using the Google provided infrastructure to generate
unique id, why reinvent the wheel?

Eli brought up a good point regarding this issue.  I assume the reason
for this 'complicated' process is to return an id to the client as
quickly as possible, that way if the client re-submits the request you
do not get duplicated data?  I also use a very similar processing
sequence for error checking: once I get a save request I generate a
new key, then process the data for errors.  If the errors are 'fatal'
I return the key and the error information for the client to fix
without saving the entity. If the errors are 'acceptable' (ie more
like warnings) I will save the entity before returning to the client.
In either case I always return a key for the new entity.

>
> We use an client async URL call for the new record post which allows
> us to define a time limit on the server response. If we do not get a
> response within a few seconds, the call itself generates an error
> which we handle same as a server error response. If the response comes
> after this, it will be ignored.

I was more curious about the case when you make a request then the
internet connection fails (or the user hits 'submit' twice really
fast).  The server will still successfully completes the write, but
the client will not know.  So how do you prevent the client from
re-submitting the request, which might result in a duplicate record.

>
> If we do end up having to save the user data, the id will be saved
> with it. Logic for the resend will check first to see if the record
> did eventually get posted by the Task Queue. If not, it will send it.
> The client will always check the local store for un-posted records
> each time the user opens the web page. However, we certainly cannot
> expect every user presented with a "try again later" message will
> return, so holes in the key number sequence are certain.

I totally agree with this approach, I think it is very similar to my
process.  I do not use the task-queue to write the initial record
though, it is put during the request I return the key in.  Your
approach should be safe because you return the key in the first
request, but there could be cases that result in lots of unneeded
re-submits.  For instance if the task-queue is backed up.

>
> This "too late" task queue may be a more common error than a total
> failure, so it may make sense to add something like email follow-up.
> However, the overall risk (based on recent GAE performance) is a
> situation where an app suddenly starts throwing Deadline Exceeded
> errors due to GAE infrastructure issues rather than developer-
> contolled code issues. In that case, the ability to post the record
> for email follow-up will likely fail also.
>
> Down the road, I've thought to run an AWS / MySql server for backup.
> If a specific "high need"  GAE post fails, then it would be relatively
> simple to use redirect to AWS. Its a good bit of redundancy work at
> this point though- and only works for a specific type of record), so
> we will put that off until we start to get out into the real word and
> see some volume**. Odds of both clouds sources having internal
> infrastructure issues at the same time hopefully will be very low.

Again, totally agree with this.  I've been working on using TyphoonAE
to keep a 'hot-copy' running that I can fail-over to.  This lets me
run the same code in both places, just need a little sync-logic.  My
app has good service points, and the core business data comes from a
single kind.  So I am able to simply fire tasks with 'update' requests
to the fail-over app.  Everything stays in sync and gives me emergency
fail-over ability.

>
> Again, thanks for your response.
> stevep
>
> **  GAE clearly wins against AWS for low-volume apps because there is
> no hourly charge. However, as an app increases its use, the constant
> kill/start of instances after n transactions (~10K right now) appears
> to me as achieving an hourly change based on CPU cycles "overhead".
> However, need a lot more data before understanding this.
>
> On Nov 9, 11:39 pm, Robert Kluin <robert.kl...@gmail.com> wrote:
>> Why not use allocate_ids to generate the ids?  That might simplify the
>> process a bit.
>>  http://code.google.com/appengine/docs/python/datastore/functions.html...
>>
>> I've been using a similar process for batch updates for quite some
>> time.  Works well for my case, but in my case there is not a user
>> involved.  It is an auto-mated sync process to another system's
>> database, so I have a unique id to use for lookups to avoid
>> duplicates.
>>
>> What happens if the client does not get the response in step 4.
>>
>> Also, I assume if you get a failure, and resend the entity you'll use
>> the previous id?
>>
>> Robert
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to 
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Architecture approaches for puts

Reply via email to