[google-appengine] Re: ~7 GB of ghost data???

homunq Tue, 23 Mar 2010 06:57:40 -0700

>
> > Watching my deletion process start to get trapped in molasses, as Eli
> > Jones mentions above, I have to ask two things again:
>
> > 1. Is there ANY ANY way to delete all indexes on a given property
> > name? Without worrying about keeping indexes in order when I'm just
> > paring them down to 0, I'd just be running through key names and
> > deleting them. It seems that would be much faster. (If it's any help,
> > I strongly suspect that most of my key names are globally unique
> > across all of Google).
>
> No - that would violate the constant that indexes are always kept in sync
> with the data they refer to.
>


It seems to me that having no index at all is the same situation as if
the property was indexed=False from the beginning. If that's so, it
can't be violating a hard constraint.

>
> > 2. What is the reason for the slowdown? If I understand his suggestion
> > to delete every 10th record, Eli Jones seems to suspect that it's
> > because there's some kind of resource conflict on specific sections of
> > storage, thus the solution is to attempt to spread your load across
> > machines. I don't see why that would cause a gradual slowdown. My best
> > theory is that write-then-delete leaves the index somehow a little
> > messier (for instance, maybe the index doesn't fully recover the
> > unused space because it expects you to fill it again) and that when
> > you do it on a massive scale you get massively messy and slow indexes.
> > Thus, again, I suspect this question reduces to question 1, although I
> > guess that if my theory is right a compress/garbage-collect/degunking
> > call for the indexes would be (for me) second best after a way to nuke
> > them.
>
> Deletes using the naive approach slow down because when a record is deleted
> in Bigtable, it simply inserts a 'tombstone' record indicating the original
> record is deleted - the record isn't actually removed entirely from the
> datastore until the tablet it's on does its next compaction cycle. Until
> then, every subsequent query has to skip over the tombstone records to find
> the live records.
>
> This is easy to avoid: Use cursors to delete records sequentially. That way,
> your queries won't be skipping the same tombstoned records over and over
> again - O(n) instead of O(n^2)!
>

Thanks for explaining. Can you say anything about how often the
compaction cycles are? Just an order of magnitude - hours, days, or
weeks?

Thanks,
Jameson

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: ~7 GB of ghost data???

Reply via email to