Thank you for sharing your numbers with us. I think it's a good way for all
of us to get an idea of how much things cost on the cloud, so here's my
thoughts.

Even though you had one shard executing the shard should be doing batch
deletes and not one delete at a time. From the documentation batch deletes
can do up to 500 entities in one call and would execute in parallel (perhaps
not 500 all at once but with parallelism none the less). I would assume the
shard would probably do about 100 or so at a time (maybe more / maybe less).

Anyway, a good way to prove some parallelism must be occurring would be to
do a proof by negation. So, let's assume that in fact the shard is doing one
delete at a time. Looking at the System Status the latency of a single
delete on an entity (probably a very simple entity with no composite indexes
which would add additional overhead) is approximately 50ms to 100ms or so.
If we assume 50ms per delete for latency we end up with (assuming no
overhead for mapreduce/shard maintenance and spawning additional tasks, etc.
which would add even more additional time).

    300000 entities * .05 seconds per entitiy = 15000 seconds
    15000 seconds / 60 seconds per minute = 250 minutes or 4 hours 10
minutes

Additionally if a delete takes approximately 100 milliseconds then 300000
entities would take 8 hours 20 minutes to complete.
Even an unrealistic 25ms per delete is still over two hours.

Now remember this is latency (real time) and not CPU time. So even if
something has latency time of 50ms it could still eat up 100ms of API CPU
time. For example 50ms to delete the entity and 50ms to update the indexes
(done in parallel). So if latency time is 4 hours 10 minutes and we just
double latency time to approximate API CPU time we get over 8 hours of CPU
time. If average delete time for your job was 75ms then latency time is
approximately 6 hours and CPU time 12 hours. Your total was 11 hours billed
time so if my logic is sound it seems reasonable the amount you were billed
could be correct.

Furthermore if we take another look at this from another angle we find that
if your delete job took 15 minutes to complete then:

300000 entities / 15 minutes = 20000 entities per minute
20000 entities per minute / 60 seconds per minute = 333.33 entities per
second

So, if 333.33 entities are being deleted per second serially then the
average latency would be 3ms per delete which seems rather unlikely.

My thoughts. Hope it helps (and I hope my math is right),
Steve


On Sun, Nov 14, 2010 at 2:57 PM, Erik <erik.e.wil...@gmail.com> wrote:

>
>
> On Nov 14, 1:32 pm, Stephen Johnson <onepagewo...@gmail.com> wrote:
> > Why do you say that's silly? If your map reduce task does bulk deletes
> and
> > let's say they do 100 at a time, then those 100 deletes are done in
> > parallel. So that's 100x. So for each second of delete real time you're
> > getting 100 seconds of CPU time.  You should be pleased that instead of
> your
> > task taking 11 hours to delete all your data it took only 15 minutes.
> Isn't
> > that scalability? Isn't that what you're looking for? How many entities
> did
> > you delete? How many indexes did you have (composite and single
> property)?
>
> This was using only 1 shard per kind that was being deleted, so
> effectively there should be no parallelism occurring, unless there is
> something I am missing?
> Deleted about ~300k entities, each with a single indexed collection.
>
> > On Sun, Nov 14, 2010 at 10:29 AM, Erik <erik.e.wil...@gmail.com> wrote:
> >
> > > If you check in the datastore viewer you might be able to find and
> > > delete your jobs from one of the tables.  You may also need to go into
> > > your task queues and purge the default.
> >
> > > On this topic, why does deleting data have such a large difference
> > > between actual time spent and billed time?
> >
> > > For instance, I had two mapreduce shards running to delete data, which
> > > took a combined a total of 15 minutes, but I was actually charged for
> > > 11(!) hours.  I know there isn't a 1:1 correlation but a >40x
> > > difference is a little silly!
> >
> > > On Nov 14, 4:25 am, Justin <justin.worr...@gmail.com> wrote:
> > > > I've been trying to bulk delete data from my application as described
> > > > here
> >
> > > >
> http://code.google.com/appengine/docs/python/datastore/creatinggettin...
> >
> > > > This seems to have kicked off a series of mapreduce workers, whose
> > > > execution is killing my CPU - approximately 5 mins later I have
> > > > reached 100% CPU time and am locked out for the rest of the day.
> >
> > > > I figure I'll just delete by hand; create some appropriate :delete
> > > > controllers and wait till the next day.
> >
> > > > Unfortunately the mapreduce process still seems to be running - 10
> > > > past midnight and my CPU has reached 100% again.
> >
> > > > Is there some way to kill these processes and get back control of my
> > > > app?
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to google-appengine@googlegroups.com
> .
> > > To unsubscribe from this group, send email to
> > > google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com>
> <google-appengine%2bunsubscr...@googlegroups.com<google-appengine%252bunsubscr...@googlegroups.com>
> >
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to