I'm in the same boat and glad to hear I'm not alone!  It's way too
expensive to delete things right now, it makes me afraid to add any
more data to GAE.  :-/

On Jan 5, 11:57 pm, Yohan <yohan.lau...@gmail.com> wrote:
> Hi,
>
> I feel your pain. it cost me a few thousand dollars to delete my
> millions enities from the datastore after a migration job (ikai never
> replied my post though...) and im still paying since the deletion is
> not completed yet (spending 100-300$ a day for the past 2 weeks
> now!!). Not doing much just running the "delete all" mapreduce job
> from the admin panel.
>
> There is totally somethig wrong with the way datastore writes are
> priced and google should seriously do something about it before they
> lose their big customers (i.e. the ones affected by this problem).
>
> It is simply too costly to go through your data to change an index or
> update stuff or delete your data. And in your case (like mine) even if
> you want to take your data out to externalize
> your custom search an storage it will cost you X000$+ to take it out
> and another XX,000$ to cleanup behind you (you seem to have a lot of
> indexed properties in your dataset).
>
> Please keep me posted on how things go with you as I'm still hoping i
> can get some credit/refund/assisance from google at this stage
> although i havent heard from them.
>
> On Jan 6, 7:24 am, "Corey [Firespotter]" <co...@firespotter.com>
> wrote:
>
>
>
>
>
>
>
> > I work with Petey on this and can help clarify some of the details.
>
> > The Entities;
> > We have a lot of entities (~14mi) each of which have a
> > StringListProperty called "geoboxes".  Like so:
> >     class Place(search.SearchableModel):
> >       name = db.StringProperty()
> >       ...
> >       # Location specific fields.
> >       coordinates = db.GeoPtProperty(default=None)
> >       geohash = db.StringProperty()
> >       geoboxes = db.StringListProperty()
>
> > Background (details on geoboxing at bottom):
> > We're running a mapreduce to change the geobox sizes/precision for a
> > large number of entities.  These entities currently have a 'geoboxes'
> > StringListProperty with ~20 strings.  For example:
> > geoboxes = [u'37.341|-121.894|37.339|-121.892', u'37.341|-121.892|
> > 37.339|-121.891', ...]
> > We are changing those 20 strings to 20 new strings.  Example:
> > geoboxes = [u'37.3411|-121.8940|37.3395|-121.8926',
> > u'37.3411|-121.8929|37.3395|-121.8916', ...]
>
> > The Cost:
> > We did almost this same mapreduce when we first added the geoboxes
> > back in July.  In that case we were populating the list for the first
> > time so we can assume half as many operations were required (no
> > removing of old values).  Total cost i July was ~$160 for the CPU
> > time.
>
> > When we ran the mapreduce again this week to change the box sizes the
> > cost was $18 for Frontend Instance Hours, $15 for Datastore Reads
> > (21mil) and $2,500 for Datastore Writes (2500mil).  This was not a
> > complete run of the mapreduce.  We aborted it after 5.4mil (38%) of
> > the entities were updated.  Hence Petey's estimate that the full
> > update would cost $6,500.
>
> > The Operations:
> > Each entity update is removing ~20 existing strings from the geoboxes
> > StringList and adding 20 more.  The geobox property is indexed (and
> > has to be) and is involved in 3 composite indexes so as best I
> > understand it this means each string change results in 10 writes (4 +
> > 2 * 3).  So on every entity we update the geoboxes we perform 401
> > write operations (1 + 10 * 40).
>
> > This agrees pretty well with the charges (2,500,000,000 ops /
> > 5,424,000 entities) = 460 ops per entity.
>
> > That's a lot of writes and likely the core of the surprising cost.
> > However, I'm not sure how we could avoid that with App Engine (open to
> > ideas!), and since we could pay for dedicated servers for that amount,
> > I think the pricing is probably off as well.
>
> > Even if we treat the geobox update as a one-time cost, we have other
> > properties like scores, labels, etc that require occasional tweaking.
> > Updating even a single indexed property across all these entities
> > costs us $60-$100 and typically many times that in practice because
> > these interesting fields tend to be used in composite indexes.
>
> > -Corey
>
> > Geoboxing Details
> > Geoboxing is a technique used to search for entities near a point on
> > the earth in a database that can only perform equality queries (like
> > App Engine).  In short, you break up the world into boxes and record
> > which box each entity belongs to as well as any nearby boxes.  Then
> > you break up the world into larger boxes and repeat until you have a
> > good range of sizes covered.
> > There's a good article on the logic of algorithm 
> > here:http://code.google.com/appengine/articles/geosearch.html
>
> > On Jan 5, 11:58 am, "Ikai Lan (Google)" <ika...@google.com> wrote:
>
> > > Brian (apologies if that is not your name),
>
> > > How much of the costs are instance hours versus datastore writes? There's
> > > probably something going on here. The largest costs are to update indexes,
> > > not entities. Assuming $6500 is the cost of datastore writes alone, that
> > > breaks down to:
>
> > > ~$0.0004 a write
>
> > > Pricing is $0.10 per 100k operations, so that means using this equation:
>
> > > (6500.00 / 14000000) / (0.10 / 100000)
>
> > > You're doing about 464 write operations per put, which roughly translates
> > > to 6.5 billion writes.
>
> > > I'm trying to extrapolate what you are doing, and it sounds like you are
> > > doing full text indexing or something similar ... and having to update all
> > > the indexes. When you update a property, it takes a certain amount of
> > > writes. Assuming you are changing String properties, each property you
> > > update takes this many writes:
>
> > > - 2 indexes deleted (ascending and descending)
> > > - 2 indexes update (ascending and descending)
>
> > > So if you were only updating all the list properties, that means you are
> > > updating 100 list properties.
>
> > > Given that this is a regular thing you need to do, perhaps there is an
> > > engineering solution for what you are trying to do that will be more cost
> > > effective. Can you describe why you're running this job? What features 
> > > does
> > > this support in your product?
>
> > > --
> > > Ikai Lan
> > > Developer Programs Engineer, Google App Engine
> > > plus.ikailan.com | twitter.com/ikai
>
> > > On Thu, Jan 5, 2012 at 10:08 AM, Petey <brianpeter...@gmail.com> wrote:
> > > > In this one case we had to change all of the items in the
> > > > listproperty. In our most common case we might have to add and delete
> > > > a couple items to the list property every once in a while. That would
> > > > still cost us well over $1,000 each time.
>
> > > > Most of the reasons for this type of data in our product is to
> > > > compensate for the fact that there isn't full text search yet. I know
> > > > they are beta testing full text, but I'm still worried that that also
> > > > might be too expensive per write.
>
> > > > On Jan 5, 6:54 am, Richard Watson <richard.wat...@gmail.com> wrote:
> > > > > A couple thoughts.
>
> > > > > Maybe the GAE team should borrow the idea of spot prices from Amazon.
> > > > > That's a great way to have lower-priority jobs that can run when there
> > > > are
> > > > > instances available. We set the price we're willing to pay, if the 
> > > > > spot
> > > > > cost drops below that, we get the resources. It creates a market where
> > > > more
> > > > > urgent jobs get done sooner and Google makes better use of quiet 
> > > > > periods.
>
> > > > > On your issue:
> > > > > Do you need to update every entity when you do this? How many items on
> > > > the
> > > > > listproperty need to be changed? Could you tell us a bit more of what 
> > > > > the
> > > > > data looks like?
>
> > > > > I'm thinking that 14 million entities x 18 items each is the amount of
> > > > > entries you really have, each distributed across at least 3 servers 
> > > > > and
> > > > > then indexed. That seems like a lot of writes if you're re-writing
> > > > > everything.  It's likely a bad idea to rely on an infrastructure 
> > > > > change
> > > > to
> > > > > fix this (recurring) issue, but there is hopefully a way to reduce the
> > > > > amount of writes you have to do.
>
> > > > > Also, could you maybe run your mapreduce on smaller sets of the data 
> > > > > to
> > > > > spread it out over multiple days and avoid adding too many instances? 
> > > > > Has
> > > > > anyone done anything like this?
>
> > > > --
> > > > You received this message because you are subscribed to the Google 
> > > > Groups
> > > > "Google App Engine" group.
> > > > To post to this group, send email to google-appengine@googlegroups.com.
> > > > To unsubscribe from this group, send email to
> > > > google-appengine+unsubscr...@googlegroups.com.
> > > > For more options, visit this group at
> > > >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to