[google-appengine] Re: Transactionally updating multiple entities over 1MB
Wierd, I just hit a limit on the size of a transaction when commiting: java.lang.IllegalArgumentException: datastore transaction or write too big. All (23) entities in the transaction where in the same entity group, not using batch put and ~990k in size. On 23 Juli, 12:30, Nick Johnson (Google) nick.john...@google.com wrote: Hi Juraj, No, there's no limit to the size of an entity group - only on the maximum rate at which you can update entities in a single entity group. -Nick Johnson On Fri, Jul 17, 2009 at 4:03 PM, Juraj Vitko juraj.vi...@gmail.com wrote: Nick, just one clarification (I can't find in docs) - is there a limit on the total size of an entity group? On Jun 29, 12:28 pm, Nick Johnson (Google) nick.john...@google.com wrote: On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net wrote: Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Thanks. I'll take this opportunity to promote a couple of related feature requests. (1) We need a way to estimate entity sizes http://code.google.com/p/googleappengine/issues/detail?id=1084 The 1MB limit is on the API call, rather than the entity itself, per-se, so index size doesn't count in the 1MB limit. You can always serialize the entity yourself and check its size, though that requires touching datastore-internal methods. (2) We need a way to help predict when datastore operations will fail http://code.google.com/p/googleappengine/issues/detail?id=917 I assume that db.get((k1, k2,)) can fail because of size reasons when db.get(k1) followed by db.get(k2) will succeed. Does db.get((k1, k2,)) return at least one entity in that case? No, the operation will simply fail. Given that it's an invariant that the returned list has the same length as the passed list, there's no sensible way to return partial results without implying that certain entities didn't exist when they actually do. -Nick Johnson On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com wrote: On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote: the 1MB limit applies only to single API calls Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Where are limits on the total size per call documented? http://code.google.com/appengine/docs/python/datastore/overview.html#... only mentions a limit on the size of individual entities and the total number of entities for batch methods. The batch method documentation ( http://code.google.com/appengine/docs/python/datastore/functions.html andhttp:// code.google.com/appengine/docs/python/memcache/functions.html) does not mention any limits. You're right - we need to improve our documentation in that area. The 1MB limit applies to _all_ API calls. Is there a documented limit on the number of entities per memcache call? No. BTW - There is a typo in http://code.google.com/appengine/docs/python/memcache/overview.html#Q... . It says In addition to quotas, the following limits apply to the use of the Mail service: instead of Memcache service Thanks for the heads-up. -Nick Johnson On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities.
[google-appengine] Re: Transactionally updating multiple entities over 1MB
Hi Juraj, No, there's no limit to the size of an entity group - only on the maximum rate at which you can update entities in a single entity group. -Nick Johnson On Fri, Jul 17, 2009 at 4:03 PM, Juraj Vitko juraj.vi...@gmail.com wrote: Nick, just one clarification (I can't find in docs) - is there a limit on the total size of an entity group? On Jun 29, 12:28 pm, Nick Johnson (Google) nick.john...@google.com wrote: On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net wrote: Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Thanks. I'll take this opportunity to promote a couple of related feature requests. (1) We need a way to estimate entity sizes http://code.google.com/p/googleappengine/issues/detail?id=1084 The 1MB limit is on the API call, rather than the entity itself, per-se, so index size doesn't count in the 1MB limit. You can always serialize the entity yourself and check its size, though that requires touching datastore-internal methods. (2) We need a way to help predict when datastore operations will fail http://code.google.com/p/googleappengine/issues/detail?id=917 I assume that db.get((k1, k2,)) can fail because of size reasons when db.get(k1) followed by db.get(k2) will succeed. Does db.get((k1, k2,)) return at least one entity in that case? No, the operation will simply fail. Given that it's an invariant that the returned list has the same length as the passed list, there's no sensible way to return partial results without implying that certain entities didn't exist when they actually do. -Nick Johnson On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com wrote: On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote: the 1MB limit applies only to single API calls Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Where are limits on the total size per call documented? http://code.google.com/appengine/docs/python/datastore/overview.html#... only mentions a limit on the size of individual entities and the total number of entities for batch methods. The batch method documentation ( http://code.google.com/appengine/docs/python/datastore/functions.html andhttp:// code.google.com/appengine/docs/python/memcache/functions.html) does not mention any limits. You're right - we need to improve our documentation in that area. The 1MB limit applies to _all_ API calls. Is there a documented limit on the number of entities per memcache call? No. BTW - There is a typo in http://code.google.com/appengine/docs/python/memcache/overview.html#Q... . It says In addition to quotas, the following limits apply to the use of the Mail service: instead of Memcache service Thanks for the heads-up. -Nick Johnson On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities. 3. Whenever the remote node creates the computed new set of entities, it creates them all with a new version number -- applying the same version for all the entities in the same transaction. 4. These new entities are actually .put() as totally separate and new entities, i.e. they do not overwrite the old entities. 5. Once a remote node
[google-appengine] Re: Transactionally updating multiple entities over 1MB
Nick, just one clarification (I can't find in docs) - is there a limit on the total size of an entity group? On Jun 29, 12:28 pm, Nick Johnson (Google) nick.john...@google.com wrote: On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net wrote: Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Thanks. I'll take this opportunity to promote a couple of related feature requests. (1) We need a way to estimate entity sizes http://code.google.com/p/googleappengine/issues/detail?id=1084 The 1MB limit is on the API call, rather than the entity itself, per-se, so index size doesn't count in the 1MB limit. You can always serialize the entity yourself and check its size, though that requires touching datastore-internal methods. (2) We need a way to help predict when datastore operations will fail http://code.google.com/p/googleappengine/issues/detail?id=917 I assume that db.get((k1, k2,)) can fail because of size reasons when db.get(k1) followed by db.get(k2) will succeed. Does db.get((k1, k2,)) return at least one entity in that case? No, the operation will simply fail. Given that it's an invariant that the returned list has the same length as the passed list, there's no sensible way to return partial results without implying that certain entities didn't exist when they actually do. -Nick Johnson On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com wrote: On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote: the 1MB limit applies only to single API calls Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Where are limits on the total size per call documented? http://code.google.com/appengine/docs/python/datastore/overview.html#... only mentions a limit on the size of individual entities and the total number of entities for batch methods. The batch method documentation (http://code.google.com/appengine/docs/python/datastore/functions.html andhttp://code.google.com/appengine/docs/python/memcache/functions.html) does not mention any limits. You're right - we need to improve our documentation in that area. The 1MB limit applies to _all_ API calls. Is there a documented limit on the number of entities per memcache call? No. BTW - There is a typo in http://code.google.com/appengine/docs/python/memcache/overview.html#Q... . It says In addition to quotas, the following limits apply to the use of the Mail service: instead of Memcache service Thanks for the heads-up. -Nick Johnson On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities. 3. Whenever the remote node creates the computed new set of entities, it creates them all with a new version number -- applying the same version for all the entities in the same transaction. 4. These new entities are actually .put() as totally separate and new entities, i.e. they do not overwrite the old entities. 5. Once a remote node successfully writes new versions of all the entities relating to a User, it updates the UserRecord with the latest version number. 6. From the remote node, delete all Entities related to a User which don't have the latest version number. 7. Have a background thread check and do deletions of invalid versions in case a remote node had died
[google-appengine] Re: Transactionally updating multiple entities over 1MB
On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net wrote: Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Thanks. I'll take this opportunity to promote a couple of related feature requests. (1) We need a way to estimate entity sizes http://code.google.com/p/googleappengine/issues/detail?id=1084 The 1MB limit is on the API call, rather than the entity itself, per-se, so index size doesn't count in the 1MB limit. You can always serialize the entity yourself and check its size, though that requires touching datastore-internal methods. (2) We need a way to help predict when datastore operations will fail http://code.google.com/p/googleappengine/issues/detail?id=917 I assume that db.get((k1, k2,)) can fail because of size reasons when db.get(k1) followed by db.get(k2) will succeed. Does db.get((k1, k2,)) return at least one entity in that case? No, the operation will simply fail. Given that it's an invariant that the returned list has the same length as the passed list, there's no sensible way to return partial results without implying that certain entities didn't exist when they actually do. -Nick Johnson On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com wrote: On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote: the 1MB limit applies only to single API calls Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Where are limits on the total size per call documented? http://code.google.com/appengine/docs/python/datastore/overview.html#... only mentions a limit on the size of individual entities and the total number of entities for batch methods. The batch method documentation (http://code.google.com/appengine/docs/python/datastore/functions.html andhttp://code.google.com/appengine/docs/python/memcache/functions.html) does not mention any limits. You're right - we need to improve our documentation in that area. The 1MB limit applies to _all_ API calls. Is there a documented limit on the number of entities per memcache call? No. BTW - There is a typo in http://code.google.com/appengine/docs/python/memcache/overview.html#Q... . It says In addition to quotas, the following limits apply to the use of the Mail service: instead of Memcache service Thanks for the heads-up. -Nick Johnson On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities. 3. Whenever the remote node creates the computed new set of entities, it creates them all with a new version number -- applying the same version for all the entities in the same transaction. 4. These new entities are actually .put() as totally separate and new entities, i.e. they do not overwrite the old entities. 5. Once a remote node successfully writes new versions of all the entities relating to a User, it updates the UserRecord with the latest version number. 6. From the remote node, delete all Entities related to a User which don't have the latest version number. 7. Have a background thread check and do deletions of invalid versions in case a remote node had died whilst doing step 4, 5 or 6... I've skipped out the complications caused by multiple remote nodes working on data relating to the same User -- but, overall, the approach is pretty much the same. Now, the advantage of this approach (as far as I can see) is that data
[google-appengine] Re: Transactionally updating multiple entities over 1MB
Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Thanks. I'll take this opportunity to promote a couple of related feature requests. (1) We need a way to estimate entity sizes http://code.google.com/p/googleappengine/issues/detail?id=1084 (2) We need a way to help predict when datastore operations will fail http://code.google.com/p/googleappengine/issues/detail?id=917 I assume that db.get((k1, k2,)) can fail because of size reasons when db.get(k1) followed by db.get(k2) will succeed. Does db.get((k1, k2,)) return at least one entity in that case? On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com wrote: On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote: the 1MB limit applies only to single API calls Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Where are limits on the total size per call documented? http://code.google.com/appengine/docs/python/datastore/overview.html#... only mentions a limit on the size of individual entities and the total number of entities for batch methods. The batch method documentation (http://code.google.com/appengine/docs/python/datastore/functions.html andhttp://code.google.com/appengine/docs/python/memcache/functions.html) does not mention any limits. You're right - we need to improve our documentation in that area. The 1MB limit applies to _all_ API calls. Is there a documented limit on the number of entities per memcache call? No. BTW - There is a typo in http://code.google.com/appengine/docs/python/memcache/overview.html#Q... . It says In addition to quotas, the following limits apply to the use of the Mail service: instead of Memcache service Thanks for the heads-up. -Nick Johnson On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities. 3. Whenever the remote node creates the computed new set of entities, it creates them all with a new version number -- applying the same version for all the entities in the same transaction. 4. These new entities are actually .put() as totally separate and new entities, i.e. they do not overwrite the old entities. 5. Once a remote node successfully writes new versions of all the entities relating to a User, it updates the UserRecord with the latest version number. 6. From the remote node, delete all Entities related to a User which don't have the latest version number. 7. Have a background thread check and do deletions of invalid versions in case a remote node had died whilst doing step 4, 5 or 6... I've skipped out the complications caused by multiple remote nodes working on data relating to the same User -- but, overall, the approach is pretty much the same. Now, the advantage of this approach (as far as I can see) is that data relating to a User is never *lost*. That is, data is never lost before there is valid data to replace it. However, the disadvantage is that for (unknown) periods of time, there would be duplicate data sets for a given User... All of which is caused by the fact that the datastore calls cannot exceed 1MB. =( So queries will yield duplicate data -- gah!! Is there a better approach to try at all? Thanks! -- love, tav plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369
[google-appengine] Re: Transactionally updating multiple entities over 1MB
Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities. 3. Whenever the remote node creates the computed new set of entities, it creates them all with a new version number -- applying the same version for all the entities in the same transaction. 4. These new entities are actually .put() as totally separate and new entities, i.e. they do not overwrite the old entities. 5. Once a remote node successfully writes new versions of all the entities relating to a User, it updates the UserRecord with the latest version number. 6. From the remote node, delete all Entities related to a User which don't have the latest version number. 7. Have a background thread check and do deletions of invalid versions in case a remote node had died whilst doing step 4, 5 or 6... I've skipped out the complications caused by multiple remote nodes working on data relating to the same User -- but, overall, the approach is pretty much the same. Now, the advantage of this approach (as far as I can see) is that data relating to a User is never *lost*. That is, data is never lost before there is valid data to replace it. However, the disadvantage is that for (unknown) periods of time, there would be duplicate data sets for a given User... All of which is caused by the fact that the datastore calls cannot exceed 1MB. =( So queries will yield duplicate data -- gah!! Is there a better approach to try at all? Thanks! -- love, tav plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369 http://tav.espians.com | http://twitter.com/tav | skype:tavespian -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: Transactionally updating multiple entities over 1MB
the 1MB limit applies only to single API calls Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Where are limits on the total size per call documented? http://code.google.com/appengine/docs/python/datastore/overview.html#Quotas_and_Limits only mentions a limit on the size of individual entities and the total number of entities for batch methods. The batch method documentation (http://code.google.com/appengine/docs/python/datastore/functions.html and http://code.google.com/appengine/docs/python/memcache/functions.html) does not mention any limits. Is there a documented limit on the number of entities per memcache call? BTW - There is a typo in http://code.google.com/appengine/docs/python/memcache/overview.html#Quotas_and_Limits. It says In addition to quotas, the following limits apply to the use of the Mail service: instead of Memcache service On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities. 3. Whenever the remote node creates the computed new set of entities, it creates them all with a new version number -- applying the same version for all the entities in the same transaction. 4. These new entities are actually .put() as totally separate and new entities, i.e. they do not overwrite the old entities. 5. Once a remote node successfully writes new versions of all the entities relating to a User, it updates the UserRecord with the latest version number. 6. From the remote node, delete all Entities related to a User which don't have the latest version number. 7. Have a background thread check and do deletions of invalid versions in case a remote node had died whilst doing step 4, 5 or 6... I've skipped out the complications caused by multiple remote nodes working on data relating to the same User -- but, overall, the approach is pretty much the same. Now, the advantage of this approach (as far as I can see) is that data relating to a User is never *lost*. That is, data is never lost before there is valid data to replace it. However, the disadvantage is that for (unknown) periods of time, there would be duplicate data sets for a given User... All of which is caused by the fact that the datastore calls cannot exceed 1MB. =( So queries will yield duplicate data -- gah!! Is there a better approach to try at all? Thanks! -- love, tav plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369 http://tav.espians.com|http://twitter.com/tav| skype:tavespian -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047- Hide quoted text - - Show quoted text - --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~--~~~~--~~--~--~---
[google-appengine] Re: Transactionally updating multiple entities over 1MB
On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote: the 1MB limit applies only to single API calls Does that mean that db.put((e1, e2, e3,)) where all of the entities are 500kb will fail? Yes. Where are limits on the total size per call documented? http://code.google.com/appengine/docs/python/datastore/overview.html#Quotas_and_Limits only mentions a limit on the size of individual entities and the total number of entities for batch methods. The batch method documentation (http://code.google.com/appengine/docs/python/datastore/functions.html and http://code.google.com/appengine/docs/python/memcache/functions.html) does not mention any limits. You're right - we need to improve our documentation in that area. The 1MB limit applies to _all_ API calls. Is there a documented limit on the number of entities per memcache call? No. BTW - There is a typo in http://code.google.com/appengine/docs/python/memcache/overview.html#Quotas_and_Limits . It says In addition to quotas, the following limits apply to the use of the Mail service: instead of Memcache service Thanks for the heads-up. -Nick Johnson On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com wrote: Hi tav, Batch puts aren't transactional unless all the entities are in the same entity group. Transactions, however, _are_ transactional, and the 1MB limit applies only to single API calls, so you can make multiple puts to the same entity group in a transaction. -Nick Johnson On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote: Hey guys and girls, I've got a situation where I'd have to transactionally update multiple entities which would cumulatively be greater than the 1MB datastore API limit... is there a decent solution for this? For example, let's say that I start off with entities E1, E2, E3 which are all about 400kb each. All the entities are specific to a given User. I grab them all on a remote node and do some calculations on them to yield new computed entities E1', E2', and E3'. Any failure of the remote node or the datastore is recoverable except when the remote node tries to *update* the datastore... in that situation, it'd have to batch the update into 2 separate .put() calls to overcome the 1MB limit. And should the remote node die after the first put(), we have a messy situation =) My solution at the moment is to: 1. Create a UserRecord entity which has a 'version' attribute corresponding to the latest versions of the related entities for any given User. 2. Add a 'version' attribute to all the entities. 3. Whenever the remote node creates the computed new set of entities, it creates them all with a new version number -- applying the same version for all the entities in the same transaction. 4. These new entities are actually .put() as totally separate and new entities, i.e. they do not overwrite the old entities. 5. Once a remote node successfully writes new versions of all the entities relating to a User, it updates the UserRecord with the latest version number. 6. From the remote node, delete all Entities related to a User which don't have the latest version number. 7. Have a background thread check and do deletions of invalid versions in case a remote node had died whilst doing step 4, 5 or 6... I've skipped out the complications caused by multiple remote nodes working on data relating to the same User -- but, overall, the approach is pretty much the same. Now, the advantage of this approach (as far as I can see) is that data relating to a User is never *lost*. That is, data is never lost before there is valid data to replace it. However, the disadvantage is that for (unknown) periods of time, there would be duplicate data sets for a given User... All of which is caused by the fact that the datastore calls cannot exceed 1MB. =( So queries will yield duplicate data -- gah!! Is there a better approach to try at all? Thanks! -- love, tav plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369 http://tav.espians.com|http://twitter.com/tav|http://twitter.com/tav%7Cskype:tavespian -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047- Hide quoted text - - Show quoted text - -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com For more options, visit this group at