On Tue, Mar 23, 2010 at 1:57 PM, homunq <jameson.qu...@gmail.com> wrote:
> > > > > > Watching my deletion process start to get trapped in molasses, as Eli > > > Jones mentions above, I have to ask two things again: > > > > > 1. Is there ANY ANY way to delete all indexes on a given property > > > name? Without worrying about keeping indexes in order when I'm just > > > paring them down to 0, I'd just be running through key names and > > > deleting them. It seems that would be much faster. (If it's any help, > > > I strongly suspect that most of my key names are globally unique > > > across all of Google). > > > > No - that would violate the constant that indexes are always kept in sync > > with the data they refer to. > > > > It seems to me that having no index at all is the same situation as if > the property was indexed=False from the beginning. If that's so, it > can't be violating a hard constraint. > Internally, indexed fields are stored in the 'properties' list in the Entity Protocol Buffer, while unindexed fields are stored in the 'unindexed_properties' list in the Entity PB. The only way to change the indexing properties is to fetch them and store them. > > > > > > 2. What is the reason for the slowdown? If I understand his suggestion > > > to delete every 10th record, Eli Jones seems to suspect that it's > > > because there's some kind of resource conflict on specific sections of > > > storage, thus the solution is to attempt to spread your load across > > > machines. I don't see why that would cause a gradual slowdown. My best > > > theory is that write-then-delete leaves the index somehow a little > > > messier (for instance, maybe the index doesn't fully recover the > > > unused space because it expects you to fill it again) and that when > > > you do it on a massive scale you get massively messy and slow indexes. > > > Thus, again, I suspect this question reduces to question 1, although I > > > guess that if my theory is right a compress/garbage-collect/degunking > > > call for the indexes would be (for me) second best after a way to nuke > > > them. > > > > Deletes using the naive approach slow down because when a record is > deleted > > in Bigtable, it simply inserts a 'tombstone' record indicating the > original > > record is deleted - the record isn't actually removed entirely from the > > datastore until the tablet it's on does its next compaction cycle. Until > > then, every subsequent query has to skip over the tombstone records to > find > > the live records. > > > > This is easy to avoid: Use cursors to delete records sequentially. That > way, > > your queries won't be skipping the same tombstoned records over and over > > again - O(n) instead of O(n^2)! > > > > Thanks for explaining. Can you say anything about how often the > compaction cycles are? Just an order of magnitude - hours, days, or > weeks? > They're based on the quantity of modifications to data in a given tablet. Doing many inserts, updates or deletes will, sooner or later, cause a compaction. -Nick Johnson > > Thanks, > Jameson > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to google-appeng...@googlegroups.com. > To unsubscribe from this group, send email to > google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.