The first question I would ask is: Is this really a problem? Have you looked at your bill and decided for certain that the savings would be worth changing your code & possibly cluttering your business logic?
That said, I would be tempted to see what optimization you can make with memcache. Seems like your write load is light and your read load is heavy, so there's probably a lot of opportunity here without fundamentally changing your data architecture. Jeff On Thu, Sep 15, 2011 at 11:18 PM, Steve <unetright.thebas...@xoxy.net> wrote: > I'm looking for some opinions on to what degree I should aggregate my now > small entities together into larger entities. Presently I have 50,000 "Day" > entities where each entity is represents a day. > They are relatively small with 6 float, 2 bool, 1 int, and 1 string > property. No property indexes. Datastore statistics says they average 161 > bytes of data and 80 bytes of metadata (241b total). > 80% of my user requests are GETs in which I read: > 70% of the time, 10 day entities > 25% of the time, 30 day entities > 05% of the time, 365 day entities > 20 of my user requests are POSTs in which I read & write: > 75% of the time, 1 day entity > 15% of the time, 7 day entities > 10% of the time, ~15 day entities > Since the new pricing is going to charge me per entity read and per entity > write (and thankfully no property indexes here), I think I should look at > reducing how many reads and writes are involved. I could very easily chunk > these individual day entities into groups of 10, or groups of 30. That > would (by my rough guess on metadata savings) put the entity size around 2k > or 6k respectively. > I am wondering where the line is between retrieving fewer entities and each > entity becoming too big because the overhead of unwanted days. With a 10 > day chunk, my most frequent GET request would usually need 2 entities (4k) > where only half the data was in the needed range. At a 30 day chunk, > usually 1 entity would suffice (6k) but 4k of that would be unwanted > overhead. > I'm having a hard time getting some internal model for what the impact > of serializing & deserializing the overhead days would be. I wish appstats > wasn't just for RPCs. I'm guessing the extra time to transfer the larger > entities to/from the datastore is relatively minimal with Google's network > infrastructure. But now that CPU is throttled down to 600Mhz I don't know > what kind of latencies I'd be adding in with serialization. > Right now my most common POST operation is to put 1 entity of 241b. With a > 30 day chunk, that would be still a single entity put but 6k in size. > Any opinions, ideas, gut feelings, etc? > Cheers, > Steve > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/google-appengine/-/6Tk1jzXYZTAJ. > To post to this group, send email to google-appengine@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.