Re: [google-appengine] Re: ~7 GB of ghost data???

Nick Johnson (Google) Tue, 23 Mar 2010 03:40:11 -0700

Hi,

On Tue, Mar 23, 2010 at 10:25 AM, homunq <jameson.qu...@gmail.com> wrote:


>
>
> On Mar 22, 3:48 pm, "Nick Johnson (Google)" <nick.john...@google.com>
> wrote
> > On Mon, Mar 22, 2010 at 8:45 PM, homunq <jameson.qu...@gmail.com> wrote:
> > > OK, after hashing it out on IRC, I see that I have to erase my data
> > > and start again.
> >
> > Why is that? Wouldn't updating the data be a better option?
>
> Because everything about it is wrong for saving space - the key names,
> the field names, the indexes, and even in one case the fact of
> breaking a string out into a list. (something I did for better
> searching in several cases, one of which is not worth it now I realize
> that 10X is easy to hit.)
>
> And because the data import runs smoothly, and I have code for that
> already.
>
> ....
>
> Watching my deletion process start to get trapped in molasses, as Eli
> Jones mentions above, I have to ask two things again:
>
> 1. Is there ANY ANY way to delete all indexes on a given property
> name? Without worrying about keeping indexes in order when I'm just
> paring them down to 0, I'd just be running through key names and
> deleting them. It seems that would be much faster. (If it's any help,
> I strongly suspect that most of my key names are globally unique
> across all of Google).
>

No - that would violate the constant that indexes are always kept in sync
with the data they refer to.


>
> 2. What is the reason for the slowdown? If I understand his suggestion
> to delete every 10th record, Eli Jones seems to suspect that it's
> because there's some kind of resource conflict on specific sections of
> storage, thus the solution is to attempt to spread your load across
> machines. I don't see why that would cause a gradual slowdown. My best
> theory is that write-then-delete leaves the index somehow a little
> messier (for instance, maybe the index doesn't fully recover the
> unused space because it expects you to fill it again) and that when
> you do it on a massive scale you get massively messy and slow indexes.
> Thus, again, I suspect this question reduces to question 1, although I
> guess that if my theory is right a compress/garbage-collect/degunking
> call for the indexes would be (for me) second best after a way to nuke
> them.
>

Deletes using the naive approach slow down because when a record is deleted
in Bigtable, it simply inserts a 'tombstone' record indicating the original
record is deleted - the record isn't actually removed entirely from the
datastore until the tablet it's on does its next compaction cycle. Until
then, every subsequent query has to skip over the tombstone records to find
the live records.

This is easy to avoid: Use cursors to delete records sequentially. That way,
your queries won't be skipping the same tombstoned records over and over
again - O(n) instead of O(n^2)!

-Nick Johnson


>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appeng...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com<google-appengine%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>


-- 
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: ~7 GB of ghost data???

Reply via email to