Corey,

Did you guys consider something along the lines of SimpleGeo to
outsource your spatial stuff?

Is there a political or philosophical reason to keep everything inside
of GAE?

-- George



On Jan 5, 3:24 pm, "Corey [Firespotter]" <co...@firespotter.com>
wrote:
> I work with Petey on this and can help clarify some of the details.
>
> The Entities;
> We have a lot of entities (~14mi) each of which have a
> StringListProperty called "geoboxes".  Like so:
>     class Place(search.SearchableModel):
>       name = db.StringProperty()
>       ...
>       # Location specific fields.
>       coordinates = db.GeoPtProperty(default=None)
>       geohash = db.StringProperty()
>       geoboxes = db.StringListProperty()
>
> Background (details on geoboxing at bottom):
> We're running a mapreduce to change the geobox sizes/precision for a
> large number of entities.  These entities currently have a 'geoboxes'
> StringListProperty with ~20 strings.  For example:
> geoboxes = [u'37.341|-121.894|37.339|-121.892', u'37.341|-121.892|
> 37.339|-121.891', ...]
> We are changing those 20 strings to 20 new strings.  Example:
> geoboxes = [u'37.3411|-121.8940|37.3395|-121.8926',
> u'37.3411|-121.8929|37.3395|-121.8916', ...]
>
> The Cost:
> We did almost this same mapreduce when we first added the geoboxes
> back in July.  In that case we were populating the list for the first
> time so we can assume half as many operations were required (no
> removing of old values).  Total cost i July was ~$160 for the CPU
> time.
>
> When we ran the mapreduce again this week to change the box sizes the
> cost was $18 for Frontend Instance Hours, $15 for Datastore Reads
> (21mil) and $2,500 for Datastore Writes (2500mil).  This was not a
> complete run of the mapreduce.  We aborted it after 5.4mil (38%) of
> the entities were updated.  Hence Petey's estimate that the full
> update would cost $6,500.
>
> The Operations:
> Each entity update is removing ~20 existing strings from the geoboxes
> StringList and adding 20 more.  The geobox property is indexed (and
> has to be) and is involved in 3 composite indexes so as best I
> understand it this means each string change results in 10 writes (4 +
> 2 * 3).  So on every entity we update the geoboxes we perform 401
> write operations (1 + 10 * 40).
>
> This agrees pretty well with the charges (2,500,000,000 ops /
> 5,424,000 entities) = 460 ops per entity.
>
> That's a lot of writes and likely the core of the surprising cost.
> However, I'm not sure how we could avoid that with App Engine (open to
> ideas!), and since we could pay for dedicated servers for that amount,
> I think the pricing is probably off as well.
>
> Even if we treat the geobox update as a one-time cost, we have other
> properties like scores, labels, etc that require occasional tweaking.
> Updating even a single indexed property across all these entities
> costs us $60-$100 and typically many times that in practice because
> these interesting fields tend to be used in composite indexes.
>
> -Corey
>
> Geoboxing Details
> Geoboxing is a technique used to search for entities near a point on
> the earth in a database that can only perform equality queries (like
> App Engine).  In short, you break up the world into boxes and record
> which box each entity belongs to as well as any nearby boxes.  Then
> you break up the world into larger boxes and repeat until you have a
> good range of sizes covered.
> There's a good article on the logic of algorithm 
> here:http://code.google.com/appengine/articles/geosearch.html
>
> On Jan 5, 11:58 am, "Ikai Lan (Google)" <ika...@google.com> wrote:
>
>
>
>
>
>
>
> > Brian (apologies if that is not your name),
>
> > How much of the costs are instance hours versus datastore writes? There's
> > probably something going on here. The largest costs are to update indexes,
> > not entities. Assuming $6500 is the cost of datastore writes alone, that
> > breaks down to:
>
> > ~$0.0004 a write
>
> > Pricing is $0.10 per 100k operations, so that means using this equation:
>
> > (6500.00 / 14000000) / (0.10 / 100000)
>
> > You're doing about 464 write operations per put, which roughly translates
> > to 6.5 billion writes.
>
> > I'm trying to extrapolate what you are doing, and it sounds like you are
> > doing full text indexing or something similar ... and having to update all
> > the indexes. When you update a property, it takes a certain amount of
> > writes. Assuming you are changing String properties, each property you
> > update takes this many writes:
>
> > - 2 indexes deleted (ascending and descending)
> > - 2 indexes update (ascending and descending)
>
> > So if you were only updating all the list properties, that means you are
> > updating 100 list properties.
>
> > Given that this is a regular thing you need to do, perhaps there is an
> > engineering solution for what you are trying to do that will be more cost
> > effective. Can you describe why you're running this job? What features does
> > this support in your product?
>
> > --
> > Ikai Lan
> > Developer Programs Engineer, Google App Engine
> > plus.ikailan.com | twitter.com/ikai
>
> > On Thu, Jan 5, 2012 at 10:08 AM, Petey <brianpeter...@gmail.com> wrote:
> > > In this one case we had to change all of the items in the
> > > listproperty. In our most common case we might have to add and delete
> > > a couple items to the list property every once in a while. That would
> > > still cost us well over $1,000 each time.
>
> > > Most of the reasons for this type of data in our product is to
> > > compensate for the fact that there isn't full text search yet. I know
> > > they are beta testing full text, but I'm still worried that that also
> > > might be too expensive per write.
>
> > > On Jan 5, 6:54 am, Richard Watson <richard.wat...@gmail.com> wrote:
> > > > A couple thoughts.
>
> > > > Maybe the GAE team should borrow the idea of spot prices from Amazon.
> > > > That's a great way to have lower-priority jobs that can run when there
> > > are
> > > > instances available. We set the price we're willing to pay, if the spot
> > > > cost drops below that, we get the resources. It creates a market where
> > > more
> > > > urgent jobs get done sooner and Google makes better use of quiet 
> > > > periods.
>
> > > > On your issue:
> > > > Do you need to update every entity when you do this? How many items on
> > > the
> > > > listproperty need to be changed? Could you tell us a bit more of what 
> > > > the
> > > > data looks like?
>
> > > > I'm thinking that 14 million entities x 18 items each is the amount of
> > > > entries you really have, each distributed across at least 3 servers and
> > > > then indexed. That seems like a lot of writes if you're re-writing
> > > > everything.  It's likely a bad idea to rely on an infrastructure change
> > > to
> > > > fix this (recurring) issue, but there is hopefully a way to reduce the
> > > > amount of writes you have to do.
>
> > > > Also, could you maybe run your mapreduce on smaller sets of the data to
> > > > spread it out over multiple days and avoid adding too many instances? 
> > > > Has
> > > > anyone done anything like this?
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to google-appengine@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > google-appengine+unsubscr...@googlegroups.com.
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to