[google-appengine] Architecture approaches for puts

2010-11-09 Thread stevep
I would like some feedback about pluses / minuses for handling new
records. Currently I need to optimize how the client request handler
processes new entity put()s. Several custom indices for the model are
used, so puts run too close to the 1,000 ms limit (were running over
the limit prior to Nov. 6th maintenance – thanks Google).


The entities are written with unique integer key values. Integers are
generated using Google’s recommended sharded process. Client currently
POSTs a new record to the GAE handler. If handler does not send back a
successful response, client will retry POST “n” times (at least twice,
but possibly more). Continued failures past “n” will prompt user that
record could not be created, saves data locally, and asks user to try
later.


Planned new process will use Task Queue.
1) Client POSTs new entity data to the handler. At this point, user
sees a dialog box saying record is being written.
2) Handler will use the shards to generate the next integer value for
the key.
3) Handler sets up a task queue with the new key value and record data,
and responds back to the client with they key value.
4) Client receives key value back from handler, and changes to inform
user that record write is being confirmed on the server (or as before
retries entire POST if response is an error code).
5) Client waits a second or two (for task queue to finish), then issues
a GET to the handler to read the new record using the key value.
6) Handler does a simple key value read of the new record. Responds
back to client either with found or not found status.
7) If client gets found response, then we are done. If not found, or
error response client will wait a few seconds, and issue another GET.
7) If after “n” tries, no GET yields a successful read, then client
informs user that record could not be written, and “please try again in
a few minutes” (saving new record data locally).


I know this is not ideal, but believe it is a valid, given GAE’s
limitations, as an approach to minimize lost writes. Would very much
appreciate feedback. I should note that the imposition of a few seconds
delay while writing the record should not be an issue given it is a
single transaction at the end of a previous creative process which has
engaged user for several minutes. Also, we do not use logic that cannot
handle gaps (missing) integer values in the model's key values.


TIA,
stevep

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Architecture approaches for puts

2010-11-09 Thread Robert Kluin
Why not use allocate_ids to generate the ids?  That might simplify the
process a bit.
  
http://code.google.com/appengine/docs/python/datastore/functions.html#allocate_ids

I've been using a similar process for batch updates for quite some
time.  Works well for my case, but in my case there is not a user
involved.  It is an auto-mated sync process to another system's
database, so I have a unique id to use for lookups to avoid
duplicates.

What happens if the client does not get the response in step 4.

Also, I assume if you get a failure, and resend the entity you'll use
the previous id?



Robert



On Tue, Nov 9, 2010 at 15:07, stevep  wrote:
> I would like some feedback about pluses / minuses for handling new records.
> Currently I need to optimize how the client request handler processes new
> entity put()s. Several custom indices for the model are used, so puts run
> too close to the 1,000 ms limit (were running over the limit prior to Nov.
> 6th maintenance – thanks Google).
> The entities are written with unique integer key values. Integers are
> generated using Google’s recommended sharded process. Client currently POSTs
> a new record to the GAE handler. If handler does not send back a successful
> response, client will retry POST “n” times (at least twice, but possibly
> more). Continued failures past “n” will prompt user that record could not be
> created, saves data locally, and asks user to try later.
> Planned new process will use Task Queue.
> 1) Client POSTs new entity data to the handler. At this point, user sees a
> dialog box saying record is being written.
> 2) Handler will use the shards to generate the next integer value for the
> key.
> 3) Handler sets up a task queue with the new key value and record data, and
> responds back to the client with they key value.
> 4) Client receives key value back from handler, and changes to inform user
> that record write is being confirmed on the server (or as before retries
> entire POST if response is an error code).
> 5) Client waits a second or two (for task queue to finish), then issues a
> GET to the handler to read the new record using the key value.
> 6) Handler does a simple key value read of the new record. Responds back to
> client either with found or not found status.
> 7) If client gets found response, then we are done. If not found, or error
> response client will wait a few seconds, and issue another GET.
> 7) If after “n” tries, no GET yields a successful read, then client informs
> user that record could not be written, and “please try again in a few
> minutes” (saving new record data locally).
> I know this is not ideal, but believe it is a valid, given GAE’s
> limitations, as an approach to minimize lost writes. Would very much
> appreciate feedback. I should note that the imposition of a few seconds
> delay while writing the record should not be an issue given it is a single
> transaction at the end of a previous creative process which has engaged user
> for several minutes.  Also, we do not use logic that cannot handle gaps
> (missing) integer values in the model's key values.
> TIA,
> stevep
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Architecture approaches for puts

2010-11-10 Thread Eli Jones
How big is the average entity for this Model that you are putting to? (Are
you just putting one entity at a time?)

If you create a separate Model with the same properties but no indexes, how
long and how much CPU does it use up on a put (in comparison to your fully
indexed model)?  Also, what do you mean by "puts run too close to the 1,000
ms limit"?  Do you just mean that your app uses up 1,000 CPU MS or 1,000
API_CPU MS?

Why are you generating a custom integer id instead of using the one that the
datastore would create (I am not saying you should not do this, but I am
wondering what the requirement is that makes you need to do it.)?

Also, you mention that you are not very write intensive and new records
occur infrequently.. so what is the main reasoning for this complicated put
process (does the processing leading up to the put place you near the 30
second limit)?

Depending on what your restrictions are.. there are different
recommendations that can/could be made.

On Tue, Nov 9, 2010 at 3:07 PM, stevep  wrote:

> I would like some feedback about pluses / minuses for handling new records.
> Currently I need to optimize how the client request handler processes new
> entity put()s. Several custom indices for the model are used, so puts run
> too close to the 1,000 ms limit (were running over the limit prior to Nov.
> 6th maintenance – thanks Google).
>
> The entities are written with unique integer key values. Integers are
> generated using Google’s recommended sharded process. Client currently POSTs
> a new record to the GAE handler. If handler does not send back a successful
> response, client will retry POST “n” times (at least twice, but possibly
> more). Continued failures past “n” will prompt user that record could not be
> created, saves data locally, and asks user to try later.
>
> Planned new process will use Task Queue.
> 1) Client POSTs new entity data to the handler. At this point, user sees a
> dialog box saying record is being written.
> 2) Handler will use the shards to generate the next integer value for the
> key.
> 3) Handler sets up a task queue with the new key value and record data, and
> responds back to the client with they key value.
> 4) Client receives key value back from handler, and changes to inform user
> that record write is being confirmed on the server (or as before retries
> entire POST if response is an error code).
> 5) Client waits a second or two (for task queue to finish), then issues a
> GET to the handler to read the new record using the key value.
> 6) Handler does a simple key value read of the new record. Responds back to
> client either with found or not found status.
> 7) If client gets found response, then we are done. If not found, or error
> response client will wait a few seconds, and issue another GET.
> 7) If after “n” tries, no GET yields a successful read, then client informs
> user that record could not be written, and “please try again in a few
> minutes” (saving new record data locally).
>
> I know this is not ideal, but believe it is a valid, given GAE’s
> limitations, as an approach to minimize lost writes. Would very much
> appreciate feedback. I should note that the imposition of a few seconds
> delay while writing the record should not be an issue given it is a single
> transaction at the end of a previous creative process which has engaged user
> for several minutes.  Also, we do not use logic that cannot handle gaps
> (missing) integer values in the model's key values.
>
> TIA,
> stevep
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Architecture approaches for puts

2010-11-12 Thread Stephen Johnson
I apologize if this has been asked and answered in this thread. I tried
looking through it but it is very long. From what I can tell you haven't
answered what your latency is for these requests and I believe that is very
important before you do all the work which is outlined below. From what I
understand throttling is affected by the  latency time for the requests not
for CPU or API CPU usage. For instance you could have a request which takes
2,000 MS API CPU but it's latency is 300MS. The first number in the logs
before the CPU MS and API CPU MS indicates the latency for the requests. You
say that you have a lot of custom indexes for these entities. My assumption
would be that these custom indexes would be updated by the datastore in
parallel and as such you could have a very high API CPU usage but your
latency could be very low. For example, looking at my log I have a request
which has numbers like this:

238ms 678cpu_ms 305api_cpu_msThe first number is the number after the 200
return code and is the latency. It is almost a third less than the total for
CPU MS and falls (what I am presuming) safely within the recommended latency
times. At least this is the way I understand it. Also, if this has been
asked/answered below I apologize for repeating.

Steve

On Tue, Nov 9, 2010 at 1:07 PM, stevep  wrote:

> I would like some feedback about pluses / minuses for handling new records.
> Currently I need to optimize how the client request handler processes new
> entity put()s. Several custom indices for the model are used, so puts run
> too close to the 1,000 ms limit (were running over the limit prior to Nov.
> 6th maintenance – thanks Google).
>
> The entities are written with unique integer key values. Integers are
> generated using Google’s recommended sharded process. Client currently POSTs
> a new record to the GAE handler. If handler does not send back a successful
> response, client will retry POST “n” times (at least twice, but possibly
> more). Continued failures past “n” will prompt user that record could not be
> created, saves data locally, and asks user to try later.
>
> Planned new process will use Task Queue.
> 1) Client POSTs new entity data to the handler. At this point, user sees a
> dialog box saying record is being written.
> 2) Handler will use the shards to generate the next integer value for the
> key.
> 3) Handler sets up a task queue with the new key value and record data, and
> responds back to the client with they key value.
> 4) Client receives key value back from handler, and changes to inform user
> that record write is being confirmed on the server (or as before retries
> entire POST if response is an error code).
> 5) Client waits a second or two (for task queue to finish), then issues a
> GET to the handler to read the new record using the key value.
> 6) Handler does a simple key value read of the new record. Responds back to
> client either with found or not found status.
> 7) If client gets found response, then we are done. If not found, or error
> response client will wait a few seconds, and issue another GET.
> 7) If after “n” tries, no GET yields a successful read, then client informs
> user that record could not be written, and “please try again in a few
> minutes” (saving new record data locally).
>
> I know this is not ideal, but believe it is a valid, given GAE’s
> limitations, as an approach to minimize lost writes. Would very much
> appreciate feedback. I should note that the imposition of a few seconds
> delay while writing the record should not be an issue given it is a single
> transaction at the end of a previous creative process which has engaged user
> for several minutes.  Also, we do not use logic that cannot handle gaps
> (missing) integer values in the model's key values.
>
> TIA,
> stevep
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.