I'm in the same boat and glad to hear I'm not alone! It's way too expensive to delete things right now, it makes me afraid to add any more data to GAE. :-/
On Jan 5, 11:57 pm, Yohan <yohan.lau...@gmail.com> wrote: > Hi, > > I feel your pain. it cost me a few thousand dollars to delete my > millions enities from the datastore after a migration job (ikai never > replied my post though...) and im still paying since the deletion is > not completed yet (spending 100-300$ a day for the past 2 weeks > now!!). Not doing much just running the "delete all" mapreduce job > from the admin panel. > > There is totally somethig wrong with the way datastore writes are > priced and google should seriously do something about it before they > lose their big customers (i.e. the ones affected by this problem). > > It is simply too costly to go through your data to change an index or > update stuff or delete your data. And in your case (like mine) even if > you want to take your data out to externalize > your custom search an storage it will cost you X000$+ to take it out > and another XX,000$ to cleanup behind you (you seem to have a lot of > indexed properties in your dataset). > > Please keep me posted on how things go with you as I'm still hoping i > can get some credit/refund/assisance from google at this stage > although i havent heard from them. > > On Jan 6, 7:24 am, "Corey [Firespotter]" <co...@firespotter.com> > wrote: > > > > > > > > > I work with Petey on this and can help clarify some of the details. > > > The Entities; > > We have a lot of entities (~14mi) each of which have a > > StringListProperty called "geoboxes". Like so: > > class Place(search.SearchableModel): > > name = db.StringProperty() > > ... > > # Location specific fields. > > coordinates = db.GeoPtProperty(default=None) > > geohash = db.StringProperty() > > geoboxes = db.StringListProperty() > > > Background (details on geoboxing at bottom): > > We're running a mapreduce to change the geobox sizes/precision for a > > large number of entities. These entities currently have a 'geoboxes' > > StringListProperty with ~20 strings. For example: > > geoboxes = [u'37.341|-121.894|37.339|-121.892', u'37.341|-121.892| > > 37.339|-121.891', ...] > > We are changing those 20 strings to 20 new strings. Example: > > geoboxes = [u'37.3411|-121.8940|37.3395|-121.8926', > > u'37.3411|-121.8929|37.3395|-121.8916', ...] > > > The Cost: > > We did almost this same mapreduce when we first added the geoboxes > > back in July. In that case we were populating the list for the first > > time so we can assume half as many operations were required (no > > removing of old values). Total cost i July was ~$160 for the CPU > > time. > > > When we ran the mapreduce again this week to change the box sizes the > > cost was $18 for Frontend Instance Hours, $15 for Datastore Reads > > (21mil) and $2,500 for Datastore Writes (2500mil). This was not a > > complete run of the mapreduce. We aborted it after 5.4mil (38%) of > > the entities were updated. Hence Petey's estimate that the full > > update would cost $6,500. > > > The Operations: > > Each entity update is removing ~20 existing strings from the geoboxes > > StringList and adding 20 more. The geobox property is indexed (and > > has to be) and is involved in 3 composite indexes so as best I > > understand it this means each string change results in 10 writes (4 + > > 2 * 3). So on every entity we update the geoboxes we perform 401 > > write operations (1 + 10 * 40). > > > This agrees pretty well with the charges (2,500,000,000 ops / > > 5,424,000 entities) = 460 ops per entity. > > > That's a lot of writes and likely the core of the surprising cost. > > However, I'm not sure how we could avoid that with App Engine (open to > > ideas!), and since we could pay for dedicated servers for that amount, > > I think the pricing is probably off as well. > > > Even if we treat the geobox update as a one-time cost, we have other > > properties like scores, labels, etc that require occasional tweaking. > > Updating even a single indexed property across all these entities > > costs us $60-$100 and typically many times that in practice because > > these interesting fields tend to be used in composite indexes. > > > -Corey > > > Geoboxing Details > > Geoboxing is a technique used to search for entities near a point on > > the earth in a database that can only perform equality queries (like > > App Engine). In short, you break up the world into boxes and record > > which box each entity belongs to as well as any nearby boxes. Then > > you break up the world into larger boxes and repeat until you have a > > good range of sizes covered. > > There's a good article on the logic of algorithm > > here:http://code.google.com/appengine/articles/geosearch.html > > > On Jan 5, 11:58 am, "Ikai Lan (Google)" <ika...@google.com> wrote: > > > > Brian (apologies if that is not your name), > > > > How much of the costs are instance hours versus datastore writes? There's > > > probably something going on here. The largest costs are to update indexes, > > > not entities. Assuming $6500 is the cost of datastore writes alone, that > > > breaks down to: > > > > ~$0.0004 a write > > > > Pricing is $0.10 per 100k operations, so that means using this equation: > > > > (6500.00 / 14000000) / (0.10 / 100000) > > > > You're doing about 464 write operations per put, which roughly translates > > > to 6.5 billion writes. > > > > I'm trying to extrapolate what you are doing, and it sounds like you are > > > doing full text indexing or something similar ... and having to update all > > > the indexes. When you update a property, it takes a certain amount of > > > writes. Assuming you are changing String properties, each property you > > > update takes this many writes: > > > > - 2 indexes deleted (ascending and descending) > > > - 2 indexes update (ascending and descending) > > > > So if you were only updating all the list properties, that means you are > > > updating 100 list properties. > > > > Given that this is a regular thing you need to do, perhaps there is an > > > engineering solution for what you are trying to do that will be more cost > > > effective. Can you describe why you're running this job? What features > > > does > > > this support in your product? > > > > -- > > > Ikai Lan > > > Developer Programs Engineer, Google App Engine > > > plus.ikailan.com | twitter.com/ikai > > > > On Thu, Jan 5, 2012 at 10:08 AM, Petey <brianpeter...@gmail.com> wrote: > > > > In this one case we had to change all of the items in the > > > > listproperty. In our most common case we might have to add and delete > > > > a couple items to the list property every once in a while. That would > > > > still cost us well over $1,000 each time. > > > > > Most of the reasons for this type of data in our product is to > > > > compensate for the fact that there isn't full text search yet. I know > > > > they are beta testing full text, but I'm still worried that that also > > > > might be too expensive per write. > > > > > On Jan 5, 6:54 am, Richard Watson <richard.wat...@gmail.com> wrote: > > > > > A couple thoughts. > > > > > > Maybe the GAE team should borrow the idea of spot prices from Amazon. > > > > > That's a great way to have lower-priority jobs that can run when there > > > > are > > > > > instances available. We set the price we're willing to pay, if the > > > > > spot > > > > > cost drops below that, we get the resources. It creates a market where > > > > more > > > > > urgent jobs get done sooner and Google makes better use of quiet > > > > > periods. > > > > > > On your issue: > > > > > Do you need to update every entity when you do this? How many items on > > > > the > > > > > listproperty need to be changed? Could you tell us a bit more of what > > > > > the > > > > > data looks like? > > > > > > I'm thinking that 14 million entities x 18 items each is the amount of > > > > > entries you really have, each distributed across at least 3 servers > > > > > and > > > > > then indexed. That seems like a lot of writes if you're re-writing > > > > > everything. It's likely a bad idea to rely on an infrastructure > > > > > change > > > > to > > > > > fix this (recurring) issue, but there is hopefully a way to reduce the > > > > > amount of writes you have to do. > > > > > > Also, could you maybe run your mapreduce on smaller sets of the data > > > > > to > > > > > spread it out over multiple days and avoid adding too many instances? > > > > > Has > > > > > anyone done anything like this? > > > > > -- > > > > You received this message because you are subscribed to the Google > > > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to google-appengine@googlegroups.com. > > > > To unsubscribe from this group, send email to > > > > google-appengine+unsubscr...@googlegroups.com. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.