Thanks for the well thought response, numbers, and reality check Stephen! That makes a lot of sense when you consider parallel deletes and datastore CPU time.
On Nov 14, 9:37 pm, Stephen Johnson <onepagewo...@gmail.com> wrote: > Thank you for sharing your numbers with us. I think it's a good way for all > of us to get an idea of how much things cost on the cloud, so here's my > thoughts. > > Even though you had one shard executing the shard should be doing batch > deletes and not one delete at a time. From the documentation batch deletes > can do up to 500 entities in one call and would execute in parallel (perhaps > not 500 all at once but with parallelism none the less). I would assume the > shard would probably do about 100 or so at a time (maybe more / maybe less). > > Anyway, a good way to prove some parallelism must be occurring would be to > do a proof by negation. So, let's assume that in fact the shard is doing one > delete at a time. Looking at the System Status the latency of a single > delete on an entity (probably a very simple entity with no composite indexes > which would add additional overhead) is approximately 50ms to 100ms or so. > If we assume 50ms per delete for latency we end up with (assuming no > overhead for mapreduce/shard maintenance and spawning additional tasks, etc. > which would add even more additional time). > > 300000 entities * .05 seconds per entitiy = 15000 seconds > 15000 seconds / 60 seconds per minute = 250 minutes or 4 hours 10 > minutes > > Additionally if a delete takes approximately 100 milliseconds then 300000 > entities would take 8 hours 20 minutes to complete. > Even an unrealistic 25ms per delete is still over two hours. > > Now remember this is latency (real time) and not CPU time. So even if > something has latency time of 50ms it could still eat up 100ms of API CPU > time. For example 50ms to delete the entity and 50ms to update the indexes > (done in parallel). So if latency time is 4 hours 10 minutes and we just > double latency time to approximate API CPU time we get over 8 hours of CPU > time. If average delete time for your job was 75ms then latency time is > approximately 6 hours and CPU time 12 hours. Your total was 11 hours billed > time so if my logic is sound it seems reasonable the amount you were billed > could be correct. > > Furthermore if we take another look at this from another angle we find that > if your delete job took 15 minutes to complete then: > > 300000 entities / 15 minutes = 20000 entities per minute > 20000 entities per minute / 60 seconds per minute = 333.33 entities per > second > > So, if 333.33 entities are being deleted per second serially then the > average latency would be 3ms per delete which seems rather unlikely. > > My thoughts. Hope it helps (and I hope my math is right), > Steve > > On Sun, Nov 14, 2010 at 2:57 PM, Erik <erik.e.wil...@gmail.com> wrote: > > > On Nov 14, 1:32 pm, Stephen Johnson <onepagewo...@gmail.com> wrote: > > > Why do you say that's silly? If your map reduce task does bulk deletes > > and > > > let's say they do 100 at a time, then those 100 deletes are done in > > > parallel. So that's 100x. So for each second of delete real time you're > > > getting 100 seconds of CPU time. You should be pleased that instead of > > your > > > task taking 11 hours to delete all your data it took only 15 minutes. > > Isn't > > > that scalability? Isn't that what you're looking for? How many entities > > did > > > you delete? How many indexes did you have (composite and single > > property)? > > > This was using only 1 shard per kind that was being deleted, so > > effectively there should be no parallelism occurring, unless there is > > something I am missing? > > Deleted about ~300k entities, each with a single indexed collection. > > > > On Sun, Nov 14, 2010 at 10:29 AM, Erik <erik.e.wil...@gmail.com> wrote: > > > > > If you check in the datastore viewer you might be able to find and > > > > delete your jobs from one of the tables. You may also need to go into > > > > your task queues and purge the default. > > > > > On this topic, why does deleting data have such a large difference > > > > between actual time spent and billed time? > > > > > For instance, I had two mapreduce shards running to delete data, which > > > > took a combined a total of 15 minutes, but I was actually charged for > > > > 11(!) hours. I know there isn't a 1:1 correlation but a >40x > > > > difference is a little silly! > > > > > On Nov 14, 4:25 am, Justin <justin.worr...@gmail.com> wrote: > > > > > I've been trying to bulk delete data from my application as described > > > > > here > > >http://code.google.com/appengine/docs/python/datastore/creatinggettin... > > > > > > This seems to have kicked off a series of mapreduce workers, whose > > > > > execution is killing my CPU - approximately 5 mins later I have > > > > > reached 100% CPU time and am locked out for the rest of the day. > > > > > > I figure I'll just delete by hand; create some appropriate :delete > > > > > controllers and wait till the next day. > > > > > > Unfortunately the mapreduce process still seems to be running - 10 > > > > > past midnight and my CPU has reached 100% again. > > > > > > Is there some way to kill these processes and get back control of my > > > > > app? > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to google-appengine@googlegroups.com > > . > > > > To unsubscribe from this group, send email to > > > > google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com> > > <google-appengine%2bunsubscr...@googlegroups.com<google-appengine%252bunsubscr...@googlegroups.com> > > > > > . > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to google-appeng...@googlegroups.com. > > To unsubscribe from this group, send email to > > google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com> > > . > > For more options, visit this group at > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.