Re: [google-appengine] Re: Deleting Data Really Expensive!
there are often changes, i integrated it in the daily processing, if an entity is ask for do also the changes, if from another datastore first fetch and save (changed) in the new. 2011/3/11 Jeff Knox lairdk...@gmail.com If you want to keep costs down but not leave the data spinning around and around why not delete a subset every day? It may take an extremely long time but at least it would be better than paying storage costs forever. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- gr, Wim den Ouden Custom Google App Engine http://code.google.com/intl/nl/appengine/ based webapps https://neighborshare.appspot.com/. Free open source neighborshare framework http://code.google.com/p/relat/. Gae tips http://code.google.com/p/relat/wiki/gaetips Datastore (async)http://code.google.com/p/relat/wiki/gaetips?ts=1299673682updated=gaetips#Datastore_plus_(async) -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
If would be better if the admin tool uses DatastoreKeyInputReader. I think it will use it, and thus being the fastest way of deleting large number of entities. It would be more cost effective if we make every index needed explicit. If for a single property you only need the ascending index you get a descending index for penalty extra. 2011/3/10 David Mora dla.m...@gmail.com: why would it be cheaper if at the end, the datastore admin creates a map reduce that iterates thru the model via splitting the index and loading each entity per key name ? Each map is a task queue so it is exactly the same :) -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
[google-appengine] Re: Deleting Data Really Expensive!
If you want to keep costs down but not leave the data spinning around and around why not delete a subset every day? It may take an extremely long time but at least it would be better than paying storage costs forever. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
[google-appengine] Re: Deleting Data Really Expensive!
We are also finding that deletion is an expensive prospect. So much so that we find ourselves discussing the cost-tradeoff of simply leaving the dead data around because the storage cost is so much lower than what it would take to delete - even using the techniques mentioned in this post. It hurts me to think that we're leaving disks spinning with junk on them. Google, any improvements on this front that would lead to less datastore CPU usage for deletions? j On Mar 10, 5:01 am, andreas schmid a.schmi...@gmail.com wrote: at this point i would delete the app and create a new one at no cost! On Mar 10, 2011, at 3:25 AM, Robert Kluin wrote: I've tried using the datastore admin to delete some very large datasets as well. The solution Bemmu is using will be *significantly* faster, and from my experience, should be more cost effective. I'm also eager to hear what Bemmu thinks after the delete has ran for a while. Robert On Wed, Mar 9, 2011 at 21:20, David Mora dla.m...@gmail.com wrote: hmmm, well Wesley's option B was merely because the batch operations i think. One nice feature about the map reduce is the mutation pool which handles the logic of batching an operation while you yield thru iterations. I guess in a big dataset like yours make sense (the model retrieval vs key only). Anyways, interested case - i'll love to see where it ends :) On 9 March 2011 19:26, Simon Knott knott.si...@gmail.com wrote: I've been given the impression from these forums that the datastore admin tool is so expensive for exactly the reason you've stated David - it loads each entity by key before deletion, whereas deleting purely on keys is much cheaper CPU-wise as you don't need to bother with the retrieval of the entire entity to carry out the delete. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- http://about.me/david.mora -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
[google-appengine] Re: Deleting Data Really Expensive!
I also think deleting data is quite cpu intensive and expensive process, different techniques described above by Tim, Wesley, Ikai and others give only marginal benefits. There are some use cases where we deal with ephemeral data and all entities of a model are obsolete-irrelevant after some time. There must be a way to wipe out a model without having to delete all individual entities. Happy coding ;-) On Mar 10, 6:27 pm, Jason Collins jason.a.coll...@gmail.com wrote: We are also finding that deletion is an expensive prospect. So much so that we find ourselves discussing the cost-tradeoff of simply leaving the dead data around because the storage cost is so much lower than what it would take to delete - even using the techniques mentioned in this post. It hurts me to think that we're leaving disks spinning with junk on them. Google, any improvements on this front that would lead to less datastore CPU usage for deletions? j On Mar 10, 5:01 am, andreas schmid a.schmi...@gmail.com wrote: at this point i would delete the app and create a new one at no cost! On Mar 10, 2011, at 3:25 AM, Robert Kluin wrote: I've tried using the datastore admin to delete some very large datasets as well. The solution Bemmu is using will be *significantly* faster, and from my experience, should be more cost effective. I'm also eager to hear what Bemmu thinks after the delete has ran for a while. Robert On Wed, Mar 9, 2011 at 21:20, David Mora dla.m...@gmail.com wrote: hmmm, well Wesley's option B was merely because the batch operations i think. One nice feature about the map reduce is the mutation pool which handles the logic of batching an operation while you yield thru iterations. I guess in a big dataset like yours make sense (the model retrieval vs key only). Anyways, interested case - i'll love to see where it ends :) On 9 March 2011 19:26, Simon Knott knott.si...@gmail.com wrote: I've been given the impression from these forums that the datastore admin tool is so expensive for exactly the reason you've stated David - it loads each entity by key before deletion, whereas deleting purely on keys is much cheaper CPU-wise as you don't need to bother with the retrieval of the entire entity to carry out the delete. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- http://about.me/david.mora -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
I'm also interested in this. I have 300 million entities I no longer need, they cost $5 / day to store, but short test with the Datastore Admin seemed to show that they would cost $1500 to delete. Not a nice thing to discover since I'd like to delete them precisely because I need to save money. Isn't there any cheaper way to delete all entities of a type? -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
Crossposting my reply from stackoverflow. I got advice on #appengine in IRC that simply getting the keys of 2000 entities at a time and spawning tasks to delete them in pieces (can pass keys as strings to tasks) may be cheaper than using the Datastore Admin tool. I am trying this now. I will try to remember to report back tomorrow if this seems to be cheaper or not. http://stackoverflow.com/questions/5252477/economically-deleting-data-from-app-engine -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
why would it be cheaper if at the end, the datastore admin creates a map reduce that iterates thru the model via splitting the index and loading each entity per key name ? Each map is a task queue so it is exactly the same :) On 9 March 2011 18:22, Bemmu bemmu@gmail.com wrote: Crossposting my reply from stackoverflow. I got advice on #appengine in IRC that simply getting the keys of 2000 entities at a time and spawning tasks to delete them in pieces (can pass keys as strings to tasks) may be cheaper than using the Datastore Admin tool. I am trying this now. I will try to remember to report back tomorrow if this seems to be cheaper or not. http://stackoverflow.com/questions/5252477/economically-deleting-data-from-app-engine -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- http://about.me/david.mora -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
I've been given the impression from these forums that the datastore admin tool is so expensive for exactly the reason you've stated David - it loads each entity by key before deletion, whereas deleting purely on keys is much cheaper CPU-wise as you don't need to bother with the retrieval of the entire entity to carry out the delete. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
hmmm, well Wesley's option B was merely because the batch operations i think. One nice feature about the map reduce is the mutation pool which handles the logic of batching an operation while you yield thru iterations. I guess in a big dataset like yours make sense (the model retrieval vs key only). Anyways, interested case - i'll love to see where it ends :) On 9 March 2011 19:26, Simon Knott knott.si...@gmail.com wrote: I've been given the impression from these forums that the datastore admin tool is so expensive for exactly the reason you've stated David - it loads each entity by key before deletion, whereas deleting purely on keys is much cheaper CPU-wise as you don't need to bother with the retrieval of the entire entity to carry out the delete. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- http://about.me/david.mora -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
I've tried using the datastore admin to delete some very large datasets as well. The solution Bemmu is using will be *significantly* faster, and from my experience, should be more cost effective. I'm also eager to hear what Bemmu thinks after the delete has ran for a while. Robert On Wed, Mar 9, 2011 at 21:20, David Mora dla.m...@gmail.com wrote: hmmm, well Wesley's option B was merely because the batch operations i think. One nice feature about the map reduce is the mutation pool which handles the logic of batching an operation while you yield thru iterations. I guess in a big dataset like yours make sense (the model retrieval vs key only). Anyways, interested case - i'll love to see where it ends :) On 9 March 2011 19:26, Simon Knott knott.si...@gmail.com wrote: I've been given the impression from these forums that the datastore admin tool is so expensive for exactly the reason you've stated David - it loads each entity by key before deletion, whereas deleting purely on keys is much cheaper CPU-wise as you don't need to bother with the retrieval of the entire entity to carry out the delete. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- http://about.me/david.mora -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Deleting Data Really Expensive!
One more tip: only index the properties you need. When you delete an entity, we also have to delete all the associated indexes. -- Ikai Lan Developer Programs Engineer, Google App Engine Blogger: http://googleappengine.blogspot.com Reddit: http://www.reddit.com/r/appengine Twitter: http://twitter.com/app_engine On Tue, Dec 28, 2010 at 9:01 PM, Wesley C (Google) wesc+...@google.comwesc%2b...@google.com wrote: A. +1 on tim's suggestion.as an example (in Python), the following piece of code... for obj in Object.all(): obj.delete() # thousands of individual delete()s ... should run way slower than... Object.all(keys_only=True) db.delete(posts) # one massive delete() the second is faster in 2 ways: 1) keys-only means that it doesn't have to fetch the actual data 2) uses the google.appengine.ext.db.delete() once where you pass in individual keys B. another alternative is the new Datastore Admin where you can manually bulk delete entities: http://code.google.com/appengine/docs/python/datastore/creatinggettinganddeletingdata.html#Deleting_Entities_in_Bulk http://googleappengine.blogspot.com/2010/10/new-app-engine-sdk-138-includes-new.html C. other alternatives when a game finishes, *must* you do the deletes before returning back to the user? if it's not critical, then you can farm out this job to a task queue or use the new mapper API (half of the MapReduce solution). more info on both at: Tasks Queue API http://code.google.com/appengine/docs/python/taskqueue/overview.html Mapper API http://googleappengine.blogspot.com/2010/07/introducing-mapper-api.html http://code.google.com/p/appengine-mapreduce/ http://code.google.com/appengine/articles/mr/mapper.html if deletion is complex, you may also consider the new Pipeline API: http://code.google.com/p/appengine-pipeline/ http://news.ycombinator.com/item?id=2013133 hope this helps! -- wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Core Python Programming, Prentice Hall, (c)2007,2001 Python Fundamentals, Prentice Hall, (c)2009 http://corepython.com wesley.chun : wesc+api at google.com : @wescpy developer relations :: google app engine @app_engine :: googleappengine.blogspot.com -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.comgoogle-appengine%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
[google-appengine] Re: Deleting Data Really Expensive!
Make sure you are only getting the keys, rather than whole entities. That will be a lot less costly Rgds Tim -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appeng...@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.