[google-appengine] Architecture approaches for puts
I would like some feedback about pluses / minuses for handling new records. Currently I need to optimize how the client request handler processes new entity put()s. Several custom indices for the model are used, so puts run too close to the 1,000 ms limit (were running over the limit prior to Nov. 6th maintenance – thanks Google). The entities are written with unique integer key values. Integers are generated using Google’s recommended sharded process. Client currently POSTs a new record to the GAE handler. If handler does not send back a successful response, client will retry POST “n” times (at least twice, but possibly more). Continued failures past “n” will prompt user that record could not be created, saves data locally, and asks user to try later. Planned new process will use Task Queue. 1) Client POSTs new entity data to the handler. At this point, user sees a dialog box saying record is being written. 2) Handler will use the shards to generate the next integer value for the key. 3) Handler sets up a task queue with the new key value and record data, and responds back to the client with they key value. 4) Client receives key value back from handler, and changes to inform user that record write is being confirmed on the server (or as before retries entire POST if response is an error code). 5) Client waits a second or two (for task queue to finish), then issues a GET to the handler to read the new record using the key value. 6) Handler does a simple key value read of the new record. Responds back to client either with found or not found status. 7) If client gets found response, then we are done. If not found, or error response client will wait a few seconds, and issue another GET. 7) If after “n” tries, no GET yields a successful read, then client informs user that record could not be written, and “please try again in a few minutes” (saving new record data locally). I know this is not ideal, but believe it is a valid, given GAE’s limitations, as an approach to minimize lost writes. Would very much appreciate feedback. I should note that the imposition of a few seconds delay while writing the record should not be an issue given it is a single transaction at the end of a previous creative process which has engaged user for several minutes. Also, we do not use logic that cannot handle gaps (missing) integer values in the model's key values. TIA, stevep -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Architecture approaches for puts
Why not use allocate_ids to generate the ids? That might simplify the process a bit. http://code.google.com/appengine/docs/python/datastore/functions.html#allocate_ids I've been using a similar process for batch updates for quite some time. Works well for my case, but in my case there is not a user involved. It is an auto-mated sync process to another system's database, so I have a unique id to use for lookups to avoid duplicates. What happens if the client does not get the response in step 4. Also, I assume if you get a failure, and resend the entity you'll use the previous id? Robert On Tue, Nov 9, 2010 at 15:07, stevep wrote: > I would like some feedback about pluses / minuses for handling new records. > Currently I need to optimize how the client request handler processes new > entity put()s. Several custom indices for the model are used, so puts run > too close to the 1,000 ms limit (were running over the limit prior to Nov. > 6th maintenance – thanks Google). > The entities are written with unique integer key values. Integers are > generated using Google’s recommended sharded process. Client currently POSTs > a new record to the GAE handler. If handler does not send back a successful > response, client will retry POST “n” times (at least twice, but possibly > more). Continued failures past “n” will prompt user that record could not be > created, saves data locally, and asks user to try later. > Planned new process will use Task Queue. > 1) Client POSTs new entity data to the handler. At this point, user sees a > dialog box saying record is being written. > 2) Handler will use the shards to generate the next integer value for the > key. > 3) Handler sets up a task queue with the new key value and record data, and > responds back to the client with they key value. > 4) Client receives key value back from handler, and changes to inform user > that record write is being confirmed on the server (or as before retries > entire POST if response is an error code). > 5) Client waits a second or two (for task queue to finish), then issues a > GET to the handler to read the new record using the key value. > 6) Handler does a simple key value read of the new record. Responds back to > client either with found or not found status. > 7) If client gets found response, then we are done. If not found, or error > response client will wait a few seconds, and issue another GET. > 7) If after “n” tries, no GET yields a successful read, then client informs > user that record could not be written, and “please try again in a few > minutes” (saving new record data locally). > I know this is not ideal, but believe it is a valid, given GAE’s > limitations, as an approach to minimize lost writes. Would very much > appreciate feedback. I should note that the imposition of a few seconds > delay while writing the record should not be an issue given it is a single > transaction at the end of a previous creative process which has engaged user > for several minutes. Also, we do not use logic that cannot handle gaps > (missing) integer values in the model's key values. > TIA, > stevep > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appeng...@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Architecture approaches for puts
How big is the average entity for this Model that you are putting to? (Are you just putting one entity at a time?) If you create a separate Model with the same properties but no indexes, how long and how much CPU does it use up on a put (in comparison to your fully indexed model)? Also, what do you mean by "puts run too close to the 1,000 ms limit"? Do you just mean that your app uses up 1,000 CPU MS or 1,000 API_CPU MS? Why are you generating a custom integer id instead of using the one that the datastore would create (I am not saying you should not do this, but I am wondering what the requirement is that makes you need to do it.)? Also, you mention that you are not very write intensive and new records occur infrequently.. so what is the main reasoning for this complicated put process (does the processing leading up to the put place you near the 30 second limit)? Depending on what your restrictions are.. there are different recommendations that can/could be made. On Tue, Nov 9, 2010 at 3:07 PM, stevep wrote: > I would like some feedback about pluses / minuses for handling new records. > Currently I need to optimize how the client request handler processes new > entity put()s. Several custom indices for the model are used, so puts run > too close to the 1,000 ms limit (were running over the limit prior to Nov. > 6th maintenance – thanks Google). > > The entities are written with unique integer key values. Integers are > generated using Google’s recommended sharded process. Client currently POSTs > a new record to the GAE handler. If handler does not send back a successful > response, client will retry POST “n” times (at least twice, but possibly > more). Continued failures past “n” will prompt user that record could not be > created, saves data locally, and asks user to try later. > > Planned new process will use Task Queue. > 1) Client POSTs new entity data to the handler. At this point, user sees a > dialog box saying record is being written. > 2) Handler will use the shards to generate the next integer value for the > key. > 3) Handler sets up a task queue with the new key value and record data, and > responds back to the client with they key value. > 4) Client receives key value back from handler, and changes to inform user > that record write is being confirmed on the server (or as before retries > entire POST if response is an error code). > 5) Client waits a second or two (for task queue to finish), then issues a > GET to the handler to read the new record using the key value. > 6) Handler does a simple key value read of the new record. Responds back to > client either with found or not found status. > 7) If client gets found response, then we are done. If not found, or error > response client will wait a few seconds, and issue another GET. > 7) If after “n” tries, no GET yields a successful read, then client informs > user that record could not be written, and “please try again in a few > minutes” (saving new record data locally). > > I know this is not ideal, but believe it is a valid, given GAE’s > limitations, as an approach to minimize lost writes. Would very much > appreciate feedback. I should note that the imposition of a few seconds > delay while writing the record should not be an issue given it is a single > transaction at the end of a previous creative process which has engaged user > for several minutes. Also, we do not use logic that cannot handle gaps > (missing) integer values in the model's key values. > > TIA, > stevep > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appeng...@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Architecture approaches for puts
I apologize if this has been asked and answered in this thread. I tried looking through it but it is very long. From what I can tell you haven't answered what your latency is for these requests and I believe that is very important before you do all the work which is outlined below. From what I understand throttling is affected by the latency time for the requests not for CPU or API CPU usage. For instance you could have a request which takes 2,000 MS API CPU but it's latency is 300MS. The first number in the logs before the CPU MS and API CPU MS indicates the latency for the requests. You say that you have a lot of custom indexes for these entities. My assumption would be that these custom indexes would be updated by the datastore in parallel and as such you could have a very high API CPU usage but your latency could be very low. For example, looking at my log I have a request which has numbers like this: 238ms 678cpu_ms 305api_cpu_msThe first number is the number after the 200 return code and is the latency. It is almost a third less than the total for CPU MS and falls (what I am presuming) safely within the recommended latency times. At least this is the way I understand it. Also, if this has been asked/answered below I apologize for repeating. Steve On Tue, Nov 9, 2010 at 1:07 PM, stevep wrote: > I would like some feedback about pluses / minuses for handling new records. > Currently I need to optimize how the client request handler processes new > entity put()s. Several custom indices for the model are used, so puts run > too close to the 1,000 ms limit (were running over the limit prior to Nov. > 6th maintenance – thanks Google). > > The entities are written with unique integer key values. Integers are > generated using Google’s recommended sharded process. Client currently POSTs > a new record to the GAE handler. If handler does not send back a successful > response, client will retry POST “n” times (at least twice, but possibly > more). Continued failures past “n” will prompt user that record could not be > created, saves data locally, and asks user to try later. > > Planned new process will use Task Queue. > 1) Client POSTs new entity data to the handler. At this point, user sees a > dialog box saying record is being written. > 2) Handler will use the shards to generate the next integer value for the > key. > 3) Handler sets up a task queue with the new key value and record data, and > responds back to the client with they key value. > 4) Client receives key value back from handler, and changes to inform user > that record write is being confirmed on the server (or as before retries > entire POST if response is an error code). > 5) Client waits a second or two (for task queue to finish), then issues a > GET to the handler to read the new record using the key value. > 6) Handler does a simple key value read of the new record. Responds back to > client either with found or not found status. > 7) If client gets found response, then we are done. If not found, or error > response client will wait a few seconds, and issue another GET. > 7) If after “n” tries, no GET yields a successful read, then client informs > user that record could not be written, and “please try again in a few > minutes” (saving new record data locally). > > I know this is not ideal, but believe it is a valid, given GAE’s > limitations, as an approach to minimize lost writes. Would very much > appreciate feedback. I should note that the imposition of a few seconds > delay while writing the record should not be an issue given it is a single > transaction at the end of a previous creative process which has engaged user > for several minutes. Also, we do not use logic that cannot handle gaps > (missing) integer values in the model's key values. > > TIA, > stevep > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appeng...@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.