[google-appengine] Re: Architecture approaches for puts

stevep Thu, 11 Nov 2010 11:35:54 -0800

On Nov 10, 11:28 am, Eli Jones <eli.jo...@gmail.com> wrote:

> How big is the average entity for this Model that you are putting to? (Are
> you just putting one entity at a time?)


The data will vary based on user input. For the model in question, the
size is generally 3k or less. However, we have a large number of
custom indices to support user searches on multiple combinations of
properties. Pretty sure the index updates are the real resource cost
for the new record put().

>
> If you create a separate Model with the same properties but no indexes, how
> long and how much CPU does it use up on a put (in comparison to your fully
> indexed model)?  Also, what do you mean by "puts run too close to the 1,000
> ms limit"?  Do you just mean that your app uses up 1,000 CPU MS or 1,000
> API_CPU MS?

If I look at the appstats detail for my POST function the api-cost
cost for the most recent put is approximately 585 ms. There are some
other DB accesses in the full function call, so I segmented out this
data. This 585 ms is the bulk of the total api-cpu cost for the full
function. I am sorry, but right now I do not have anything set up to
test a write a similar model entity without any indices.

To be honest, I am not sure what the "throttling due to 1,000 cpu ms
response" really means relative to CPU or API_CPU ms, whether it
involves any smoothing across multiple handler functions.

Just looking at the 585 ms api put() resource indicates we would
likely have throttling issues should the infrastructure slow down due
to overall load. So, we're moving the put()s to the task queue. This
is in accordance to what I believe Google engineers' comments to this
forum about the 1,000 ms throttling affect (some posts perhaps
suggesting 800 ms is a more practical limit vs. the 1,000 ms
"theoretical limit" -- the latter being my term).

The TQ approach also allows us to break up some of the overhead in the
current single function. Overall, we'll be much better off re:
throttling risk, and task separations, but it does impose the overhead
of not being able to respond to the client that the everything related
to the function was complete in response to a client's single POST
request.

>
> Why are you generating a custom integer id instead of using the one that the
> datastore would create (I am not saying you should not do this, but I am
> wondering what the requirement is that makes you need to do it.)?

In retrospect, using our own generated integer key value may not have
been needed when the put() is done by the handler function called by
the client POST call. However, as we look to use the task queue for
executing the post, it will prove beneficial because it allows us to
quickly stage the task queue update, and provide information to the
client about what key value to request when it seeks to confirm the
task queue function using GETs subsequent to the initial POST.

>
> Also, you mention that you are not very write intensive and new records
> occur infrequently.. so what is the main reasoning for this complicated put
> process (does the processing leading up to the put place you near the 30
> second limit)?

The current process is pretty simple. If we did not face the issue
with our handler being throttled because we have a transaction or two
going through over 1,000 ms, then I'd leave it as is.

However (again according to my reading of their comments) Google's
engineers advocate moving logic that can take over 1,000 ms to the
task queue. I'm ok with that quite honestly.

As I think through the TQ step we will need to use, I am not sure how
a client can with a single POST to the on-line handler can be assured
that the new record TQ put() is completed. So that leads me to the
complicated process of the initial client POST being followed by a GET
to ensure completion.

If I am missing something, and the client can simply send the POST
data to the Task Queue and assume 100% of the TQ tasks sent will be
completed, then please advise. Just having the client send the current
single POST call is so much easier.

We are no where near 30 seconds for anything, even with a cold start.
No client URL calls currently take more that few seconds when combined
with a cold start. Prior to the Nov 6th update, all the new record
puts for this model were between 1-2 seconds, but are now coming in
lower than 1 sec.

Again, this is all to avoid being throttled due to a the new record
put()s should: 1) current gains from Nov 6th maintenance not hold, or
2) variability in infrastructure load even with the Nov 6th gains push
our 585 ms api-cpu over 1,000. (Getting throttled when the
infrastructure overall is running slow is a double whammy for which
users might not wait!)

>
> Depending on what your restrictions are.. there are different
> recommendations that can/could be made.

I'll happily try to clarify the restrictions, but am not sure what is
being asked here.

Again, many thanks for your help.
stevep

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: Architecture approaches for puts

Reply via email to