[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-08-16 Thread Stakka

Wierd, I just hit a limit on the size of a transaction when commiting:

java.lang.IllegalArgumentException: datastore transaction or write
too big.

All (23) entities in the transaction where in the same entity group,
not using batch put and ~990k in size.


On 23 Juli, 12:30, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi Juraj,

 No, there's no limit to the size of an entity group - only on the maximum
 rate at which you can update entities in a single entity group.

 -Nick Johnson



 On Fri, Jul 17, 2009 at 4:03 PM, Juraj Vitko juraj.vi...@gmail.com wrote:

  Nick, just one clarification (I can't find in docs) - is there a limit
  on the total size of an entity group?

  On Jun 29, 12:28 pm, Nick Johnson (Google) nick.john...@google.com
  wrote:
   On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net
  wrote:

 Does that mean that db.put((e1, e2, e3,)) where all of the entities
 are 500kb will fail?

Yes.

Thanks.

I'll take this opportunity to promote a couple of related feature
requests.

(1) We need a way to estimate entity sizes
   http://code.google.com/p/googleappengine/issues/detail?id=1084

   The 1MB limit is on the API call, rather than the entity itself,
   per-se, so index size doesn't count in the 1MB limit. You can always
   serialize the entity yourself and check its size, though that requires
   touching datastore-internal methods.

(2) We need a way to help predict when datastore operations will fail
   http://code.google.com/p/googleappengine/issues/detail?id=917

I assume that db.get((k1, k2,)) can fail because of size reasons when
db.get(k1) followed by db.get(k2) will succeed.  Does db.get((k1,
k2,)) return at least one entity in that case?

   No, the operation will simply fail. Given that it's an invariant that
   the returned list has the same length as the passed list, there's no
   sensible way to return partial results without implying that certain
   entities didn't exist when they actually do.

   -Nick Johnson

On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com
wrote:
On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net
  wrote:

   the 1MB limit applies only to single API calls

 Does that mean that db.put((e1, e2, e3,)) where all of the entities
 are 500kb will fail?

Yes.

 Where are limits on the total size per call documented?

 http://code.google.com/appengine/docs/python/datastore/overview.html#...
 only mentions a limit on the size of individual entities and the
  total
 number of entities for batch methods.  The batch method
  documentation
 (
 http://code.google.com/appengine/docs/python/datastore/functions.html
 andhttp://
  code.google.com/appengine/docs/python/memcache/functions.html)
 does not mention any limits.

You're right - we need to improve our documentation in that area. The
  1MB
limit applies to _all_ API calls.

 Is there a documented limit on the number of entities per memcache
 call?

No.

 BTW - There is a typo in

 http://code.google.com/appengine/docs/python/memcache/overview.html#Q...
 .
 It says In addition to quotas, the following limits apply to the
  use
 of the Mail service: instead of Memcache service

Thanks for the heads-up.

-Nick Johnson

 On Jun 26, 7:28 am, Nick Johnson (Google) 
  nick.john...@google.com
 wrote:
  Hi tav,

  Batch puts aren't transactional unless all the entities are in the
  same entity group. Transactions, however, _are_ transactional, and
  the
  1MB limit applies only to single API calls, so you can make
  multiple
  puts to the same entity group in a transaction.

  -Nick Johnson

  On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:

   Hey guys and girls,

   I've got a situation where I'd have to transactionally update
   multiple entities which would cumulatively be greater than the
  1MB
   datastore API limit... is there a decent solution for this?

   For example, let's say that I start off with entities E1, E2, E3
  which
   are all about 400kb each. All the entities are specific to a
  given
   User. I grab them all on a remote node and do some
  calculations on
   them to yield new computed entities E1', E2', and E3'.

   Any failure of the remote node or the datastore is recoverable
  except
   when the remote node tries to *update* the datastore... in that
   situation, it'd have to batch the update into 2 separate .put()
  calls
   to overcome the 1MB limit. And should the remote node die after
  the
   first put(), we have a messy situation =)

   My solution at the moment is to:

   1. Create a UserRecord entity which has a 'version' attribute
   corresponding to the latest versions of the related entities
  for any
   given User.

   2. Add a 'version' attribute to all the entities.

 

[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-07-23 Thread Nick Johnson (Google)
Hi Juraj,

No, there's no limit to the size of an entity group - only on the maximum
rate at which you can update entities in a single entity group.

-Nick Johnson

On Fri, Jul 17, 2009 at 4:03 PM, Juraj Vitko juraj.vi...@gmail.com wrote:


 Nick, just one clarification (I can't find in docs) - is there a limit
 on the total size of an entity group?




 On Jun 29, 12:28 pm, Nick Johnson (Google) nick.john...@google.com
 wrote:
  On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net
 wrote:
 
Does that mean that db.put((e1, e2, e3,)) where all of the entities
are 500kb will fail?
 
   Yes.
 
   Thanks.
 
   I'll take this opportunity to promote a couple of related feature
   requests.
 
   (1) We need a way to estimate entity sizes
  http://code.google.com/p/googleappengine/issues/detail?id=1084
 
  The 1MB limit is on the API call, rather than the entity itself,
  per-se, so index size doesn't count in the 1MB limit. You can always
  serialize the entity yourself and check its size, though that requires
  touching datastore-internal methods.
 
 
 
   (2) We need a way to help predict when datastore operations will fail
  http://code.google.com/p/googleappengine/issues/detail?id=917
 
   I assume that db.get((k1, k2,)) can fail because of size reasons when
   db.get(k1) followed by db.get(k2) will succeed.  Does db.get((k1,
   k2,)) return at least one entity in that case?
 
  No, the operation will simply fail. Given that it's an invariant that
  the returned list has the same length as the passed list, there's no
  sensible way to return partial results without implying that certain
  entities didn't exist when they actually do.
 
  -Nick Johnson
 
 
 
 
 
   On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com
   wrote:
   On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net
 wrote:
 
  the 1MB limit applies only to single API calls
 
Does that mean that db.put((e1, e2, e3,)) where all of the entities
are 500kb will fail?
 
   Yes.
 
Where are limits on the total size per call documented?
 
   
 http://code.google.com/appengine/docs/python/datastore/overview.html#...
only mentions a limit on the size of individual entities and the
 total
number of entities for batch methods.  The batch method
 documentation
(
 http://code.google.com/appengine/docs/python/datastore/functions.html
andhttp://
 code.google.com/appengine/docs/python/memcache/functions.html)
does not mention any limits.
 
   You're right - we need to improve our documentation in that area. The
 1MB
   limit applies to _all_ API calls.
 
Is there a documented limit on the number of entities per memcache
call?
 
   No.
 
BTW - There is a typo in
   
 http://code.google.com/appengine/docs/python/memcache/overview.html#Q...
.
It says In addition to quotas, the following limits apply to the
 use
of the Mail service: instead of Memcache service
 
   Thanks for the heads-up.
 
   -Nick Johnson
 
On Jun 26, 7:28 am, Nick Johnson (Google) 
 nick.john...@google.com
wrote:
 Hi tav,
 
 Batch puts aren't transactional unless all the entities are in the
 same entity group. Transactions, however, _are_ transactional, and
 the
 1MB limit applies only to single API calls, so you can make
 multiple
 puts to the same entity group in a transaction.
 
 -Nick Johnson
 
 On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:
 
  Hey guys and girls,
 
  I've got a situation where I'd have to transactionally update
  multiple entities which would cumulatively be greater than the
 1MB
  datastore API limit... is there a decent solution for this?
 
  For example, let's say that I start off with entities E1, E2, E3
 which
  are all about 400kb each. All the entities are specific to a
 given
  User. I grab them all on a remote node and do some
 calculations on
  them to yield new computed entities E1', E2', and E3'.
 
  Any failure of the remote node or the datastore is recoverable
 except
  when the remote node tries to *update* the datastore... in that
  situation, it'd have to batch the update into 2 separate .put()
 calls
  to overcome the 1MB limit. And should the remote node die after
 the
  first put(), we have a messy situation =)
 
  My solution at the moment is to:
 
  1. Create a UserRecord entity which has a 'version' attribute
  corresponding to the latest versions of the related entities
 for any
  given User.
 
  2. Add a 'version' attribute to all the entities.
 
  3. Whenever the remote node creates the computed new set of
  entities, it creates them all with a new version number --
 applying
  the same version for all the entities in the same transaction.
 
  4. These new entities are actually .put() as totally separate
 and new
  entities, i.e. they do not overwrite the old entities.
 
  5. Once a remote node 

[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-07-17 Thread Juraj Vitko

Nick, just one clarification (I can't find in docs) - is there a limit
on the total size of an entity group?




On Jun 29, 12:28 pm, Nick Johnson (Google) nick.john...@google.com
wrote:
 On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net wrote:

   Does that mean that db.put((e1, e2, e3,)) where all of the entities
   are 500kb will fail?

  Yes.

  Thanks.

  I'll take this opportunity to promote a couple of related feature
  requests.

  (1) We need a way to estimate entity sizes
 http://code.google.com/p/googleappengine/issues/detail?id=1084

 The 1MB limit is on the API call, rather than the entity itself,
 per-se, so index size doesn't count in the 1MB limit. You can always
 serialize the entity yourself and check its size, though that requires
 touching datastore-internal methods.



  (2) We need a way to help predict when datastore operations will fail
 http://code.google.com/p/googleappengine/issues/detail?id=917

  I assume that db.get((k1, k2,)) can fail because of size reasons when
  db.get(k1) followed by db.get(k2) will succeed.  Does db.get((k1,
  k2,)) return at least one entity in that case?

 No, the operation will simply fail. Given that it's an invariant that
 the returned list has the same length as the passed list, there's no
 sensible way to return partial results without implying that certain
 entities didn't exist when they actually do.

 -Nick Johnson





  On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com
  wrote:
  On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote:

 the 1MB limit applies only to single API calls

   Does that mean that db.put((e1, e2, e3,)) where all of the entities
   are 500kb will fail?

  Yes.

   Where are limits on the total size per call documented?

  http://code.google.com/appengine/docs/python/datastore/overview.html#...
   only mentions a limit on the size of individual entities and the total
   number of entities for batch methods.  The batch method documentation
   (http://code.google.com/appengine/docs/python/datastore/functions.html
   andhttp://code.google.com/appengine/docs/python/memcache/functions.html)
   does not mention any limits.

  You're right - we need to improve our documentation in that area. The 1MB
  limit applies to _all_ API calls.

   Is there a documented limit on the number of entities per memcache
   call?

  No.

   BTW - There is a typo in
  http://code.google.com/appengine/docs/python/memcache/overview.html#Q...
   .
   It says In addition to quotas, the following limits apply to the use
   of the Mail service: instead of Memcache service

  Thanks for the heads-up.

  -Nick Johnson

   On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com
   wrote:
Hi tav,

Batch puts aren't transactional unless all the entities are in the
same entity group. Transactions, however, _are_ transactional, and the
1MB limit applies only to single API calls, so you can make multiple
puts to the same entity group in a transaction.

-Nick Johnson

On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:

 Hey guys and girls,

 I've got a situation where I'd have to transactionally update
 multiple entities which would cumulatively be greater than the 1MB
 datastore API limit... is there a decent solution for this?

 For example, let's say that I start off with entities E1, E2, E3 
 which
 are all about 400kb each. All the entities are specific to a given
 User. I grab them all on a remote node and do some calculations on
 them to yield new computed entities E1', E2', and E3'.

 Any failure of the remote node or the datastore is recoverable except
 when the remote node tries to *update* the datastore... in that
 situation, it'd have to batch the update into 2 separate .put() calls
 to overcome the 1MB limit. And should the remote node die after the
 first put(), we have a messy situation =)

 My solution at the moment is to:

 1. Create a UserRecord entity which has a 'version' attribute
 corresponding to the latest versions of the related entities for 
 any
 given User.

 2. Add a 'version' attribute to all the entities.

 3. Whenever the remote node creates the computed new set of
 entities, it creates them all with a new version number -- applying
 the same version for all the entities in the same transaction.

 4. These new entities are actually .put() as totally separate and new
 entities, i.e. they do not overwrite the old entities.

 5. Once a remote node successfully writes new versions of all the
 entities relating to a User, it updates the UserRecord with the 
 latest
 version number.

 6. From the remote node, delete all Entities related to a User which
 don't have the latest version number.

 7. Have a background thread check and do deletions of invalid 
 versions
 in case a remote node had died 

[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-06-29 Thread Nick Johnson (Google)

On Sat, Jun 27, 2009 at 4:14 PM, Andy Freemanana...@earthlink.net wrote:

  Does that mean that db.put((e1, e2, e3,)) where all of the entities
  are 500kb will fail?

 Yes.

 Thanks.

 I'll take this opportunity to promote a couple of related feature
 requests.

 (1) We need a way to estimate entity sizes
 http://code.google.com/p/googleappengine/issues/detail?id=1084

The 1MB limit is on the API call, rather than the entity itself,
per-se, so index size doesn't count in the 1MB limit. You can always
serialize the entity yourself and check its size, though that requires
touching datastore-internal methods.


 (2) We need a way to help predict when datastore operations will fail
 http://code.google.com/p/googleappengine/issues/detail?id=917

 I assume that db.get((k1, k2,)) can fail because of size reasons when
 db.get(k1) followed by db.get(k2) will succeed.  Does db.get((k1,
 k2,)) return at least one entity in that case?

No, the operation will simply fail. Given that it's an invariant that
the returned list has the same length as the passed list, there's no
sensible way to return partial results without implying that certain
entities didn't exist when they actually do.

-Nick Johnson




 On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com
 wrote:
 On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote:

the 1MB limit applies only to single API calls

  Does that mean that db.put((e1, e2, e3,)) where all of the entities
  are 500kb will fail?

 Yes.



  Where are limits on the total size per call documented?

 http://code.google.com/appengine/docs/python/datastore/overview.html#...
  only mentions a limit on the size of individual entities and the total
  number of entities for batch methods.  The batch method documentation
  (http://code.google.com/appengine/docs/python/datastore/functions.html
  andhttp://code.google.com/appengine/docs/python/memcache/functions.html)
  does not mention any limits.

 You're right - we need to improve our documentation in that area. The 1MB
 limit applies to _all_ API calls.



  Is there a documented limit on the number of entities per memcache
  call?

 No.



  BTW - There is a typo in
 http://code.google.com/appengine/docs/python/memcache/overview.html#Q...
  .
  It says In addition to quotas, the following limits apply to the use
  of the Mail service: instead of Memcache service

 Thanks for the heads-up.

 -Nick Johnson







  On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com
  wrote:
   Hi tav,

   Batch puts aren't transactional unless all the entities are in the
   same entity group. Transactions, however, _are_ transactional, and the
   1MB limit applies only to single API calls, so you can make multiple
   puts to the same entity group in a transaction.

   -Nick Johnson

   On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:

Hey guys and girls,

I've got a situation where I'd have to transactionally update
multiple entities which would cumulatively be greater than the 1MB
datastore API limit... is there a decent solution for this?

For example, let's say that I start off with entities E1, E2, E3 which
are all about 400kb each. All the entities are specific to a given
User. I grab them all on a remote node and do some calculations on
them to yield new computed entities E1', E2', and E3'.

Any failure of the remote node or the datastore is recoverable except
when the remote node tries to *update* the datastore... in that
situation, it'd have to batch the update into 2 separate .put() calls
to overcome the 1MB limit. And should the remote node die after the
first put(), we have a messy situation =)

My solution at the moment is to:

1. Create a UserRecord entity which has a 'version' attribute
corresponding to the latest versions of the related entities for any
given User.

2. Add a 'version' attribute to all the entities.

3. Whenever the remote node creates the computed new set of
entities, it creates them all with a new version number -- applying
the same version for all the entities in the same transaction.

4. These new entities are actually .put() as totally separate and new
entities, i.e. they do not overwrite the old entities.

5. Once a remote node successfully writes new versions of all the
entities relating to a User, it updates the UserRecord with the latest
version number.

6. From the remote node, delete all Entities related to a User which
don't have the latest version number.

7. Have a background thread check and do deletions of invalid versions
in case a remote node had died whilst doing step 4, 5 or 6...

I've skipped out the complications caused by multiple remote nodes
working on data relating to the same User -- but, overall, the
approach is pretty much the same.

Now, the advantage of this approach (as far as I can see) is that data

[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-06-27 Thread Andy Freeman

  Does that mean that db.put((e1, e2, e3,)) where all of the entities
  are 500kb will fail?

 Yes.

Thanks.

I'll take this opportunity to promote a couple of related feature
requests.

(1) We need a way to estimate entity sizes
http://code.google.com/p/googleappengine/issues/detail?id=1084

(2) We need a way to help predict when datastore operations will fail
http://code.google.com/p/googleappengine/issues/detail?id=917

I assume that db.get((k1, k2,)) can fail because of size reasons when
db.get(k1) followed by db.get(k2) will succeed.  Does db.get((k1,
k2,)) return at least one entity in that case?



On Jun 26, 9:36 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote:

    the 1MB limit applies only to single API calls

  Does that mean that db.put((e1, e2, e3,)) where all of the entities
  are 500kb will fail?

 Yes.



  Where are limits on the total size per call documented?

 http://code.google.com/appengine/docs/python/datastore/overview.html#...
  only mentions a limit on the size of individual entities and the total
  number of entities for batch methods.  The batch method documentation
  (http://code.google.com/appengine/docs/python/datastore/functions.html
  andhttp://code.google.com/appengine/docs/python/memcache/functions.html)
  does not mention any limits.

 You're right - we need to improve our documentation in that area. The 1MB
 limit applies to _all_ API calls.



  Is there a documented limit on the number of entities per memcache
  call?

 No.



  BTW - There is a typo in
 http://code.google.com/appengine/docs/python/memcache/overview.html#Q...
  .
  It says In addition to quotas, the following limits apply to the use
  of the Mail service: instead of Memcache service

 Thanks for the heads-up.

 -Nick Johnson







  On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com
  wrote:
   Hi tav,

   Batch puts aren't transactional unless all the entities are in the
   same entity group. Transactions, however, _are_ transactional, and the
   1MB limit applies only to single API calls, so you can make multiple
   puts to the same entity group in a transaction.

   -Nick Johnson

   On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:

Hey guys and girls,

I've got a situation where I'd have to transactionally update
multiple entities which would cumulatively be greater than the 1MB
datastore API limit... is there a decent solution for this?

For example, let's say that I start off with entities E1, E2, E3 which
are all about 400kb each. All the entities are specific to a given
User. I grab them all on a remote node and do some calculations on
them to yield new computed entities E1', E2', and E3'.

Any failure of the remote node or the datastore is recoverable except
when the remote node tries to *update* the datastore... in that
situation, it'd have to batch the update into 2 separate .put() calls
to overcome the 1MB limit. And should the remote node die after the
first put(), we have a messy situation =)

My solution at the moment is to:

1. Create a UserRecord entity which has a 'version' attribute
corresponding to the latest versions of the related entities for any
given User.

2. Add a 'version' attribute to all the entities.

3. Whenever the remote node creates the computed new set of
entities, it creates them all with a new version number -- applying
the same version for all the entities in the same transaction.

4. These new entities are actually .put() as totally separate and new
entities, i.e. they do not overwrite the old entities.

5. Once a remote node successfully writes new versions of all the
entities relating to a User, it updates the UserRecord with the latest
version number.

6. From the remote node, delete all Entities related to a User which
don't have the latest version number.

7. Have a background thread check and do deletions of invalid versions
in case a remote node had died whilst doing step 4, 5 or 6...

I've skipped out the complications caused by multiple remote nodes
working on data relating to the same User -- but, overall, the
approach is pretty much the same.

Now, the advantage of this approach (as far as I can see) is that data
relating to a User is never *lost*. That is, data is never lost before
there is valid data to replace it.

However, the disadvantage is that for (unknown) periods of time, there
would be duplicate data sets for a given User... All of which is
caused by the fact that the datastore calls cannot exceed 1MB. =(

So queries will yield duplicate data -- gah!!

Is there a better approach to try at all? Thanks!

--
love, tav

plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369
   

[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-06-26 Thread Nick Johnson (Google)

Hi tav,

Batch puts aren't transactional unless all the entities are in the
same entity group. Transactions, however, _are_ transactional, and the
1MB limit applies only to single API calls, so you can make multiple
puts to the same entity group in a transaction.

-Nick Johnson

On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:

 Hey guys and girls,

 I've got a situation where I'd have to transactionally update
 multiple entities which would cumulatively be greater than the 1MB
 datastore API limit... is there a decent solution for this?

 For example, let's say that I start off with entities E1, E2, E3 which
 are all about 400kb each. All the entities are specific to a given
 User. I grab them all on a remote node and do some calculations on
 them to yield new computed entities E1', E2', and E3'.

 Any failure of the remote node or the datastore is recoverable except
 when the remote node tries to *update* the datastore... in that
 situation, it'd have to batch the update into 2 separate .put() calls
 to overcome the 1MB limit. And should the remote node die after the
 first put(), we have a messy situation =)

 My solution at the moment is to:

 1. Create a UserRecord entity which has a 'version' attribute
 corresponding to the latest versions of the related entities for any
 given User.

 2. Add a 'version' attribute to all the entities.

 3. Whenever the remote node creates the computed new set of
 entities, it creates them all with a new version number -- applying
 the same version for all the entities in the same transaction.

 4. These new entities are actually .put() as totally separate and new
 entities, i.e. they do not overwrite the old entities.

 5. Once a remote node successfully writes new versions of all the
 entities relating to a User, it updates the UserRecord with the latest
 version number.

 6. From the remote node, delete all Entities related to a User which
 don't have the latest version number.

 7. Have a background thread check and do deletions of invalid versions
 in case a remote node had died whilst doing step 4, 5 or 6...

 I've skipped out the complications caused by multiple remote nodes
 working on data relating to the same User -- but, overall, the
 approach is pretty much the same.

 Now, the advantage of this approach (as far as I can see) is that data
 relating to a User is never *lost*. That is, data is never lost before
 there is valid data to replace it.

 However, the disadvantage is that for (unknown) periods of time, there
 would be duplicate data sets for a given User... All of which is
 caused by the fact that the datastore calls cannot exceed 1MB. =(

 So queries will yield duplicate data -- gah!!

 Is there a better approach to try at all? Thanks!

 --
 love, tav

 plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369
 http://tav.espians.com | http://twitter.com/tav | skype:tavespian

 




-- 
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
Number: 368047

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-06-26 Thread Andy Freeman

  the 1MB limit applies only to single API calls

Does that mean that db.put((e1, e2, e3,)) where all of the entities
are 500kb will fail?

Where are limits on the total size per call documented?
http://code.google.com/appengine/docs/python/datastore/overview.html#Quotas_and_Limits
only mentions a limit on the size of individual entities and the total
number of entities for batch methods.  The batch method documentation
(http://code.google.com/appengine/docs/python/datastore/functions.html
and http://code.google.com/appengine/docs/python/memcache/functions.html)
does not mention any limits.

Is there a documented limit on the number of entities per memcache
call?

BTW - There is a typo in 
http://code.google.com/appengine/docs/python/memcache/overview.html#Quotas_and_Limits.
It says In addition to quotas, the following limits apply to the use
of the Mail service: instead of Memcache service

On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com
wrote:
 Hi tav,

 Batch puts aren't transactional unless all the entities are in the
 same entity group. Transactions, however, _are_ transactional, and the
 1MB limit applies only to single API calls, so you can make multiple
 puts to the same entity group in a transaction.

 -Nick Johnson





 On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:

  Hey guys and girls,

  I've got a situation where I'd have to transactionally update
  multiple entities which would cumulatively be greater than the 1MB
  datastore API limit... is there a decent solution for this?

  For example, let's say that I start off with entities E1, E2, E3 which
  are all about 400kb each. All the entities are specific to a given
  User. I grab them all on a remote node and do some calculations on
  them to yield new computed entities E1', E2', and E3'.

  Any failure of the remote node or the datastore is recoverable except
  when the remote node tries to *update* the datastore... in that
  situation, it'd have to batch the update into 2 separate .put() calls
  to overcome the 1MB limit. And should the remote node die after the
  first put(), we have a messy situation =)

  My solution at the moment is to:

  1. Create a UserRecord entity which has a 'version' attribute
  corresponding to the latest versions of the related entities for any
  given User.

  2. Add a 'version' attribute to all the entities.

  3. Whenever the remote node creates the computed new set of
  entities, it creates them all with a new version number -- applying
  the same version for all the entities in the same transaction.

  4. These new entities are actually .put() as totally separate and new
  entities, i.e. they do not overwrite the old entities.

  5. Once a remote node successfully writes new versions of all the
  entities relating to a User, it updates the UserRecord with the latest
  version number.

  6. From the remote node, delete all Entities related to a User which
  don't have the latest version number.

  7. Have a background thread check and do deletions of invalid versions
  in case a remote node had died whilst doing step 4, 5 or 6...

  I've skipped out the complications caused by multiple remote nodes
  working on data relating to the same User -- but, overall, the
  approach is pretty much the same.

  Now, the advantage of this approach (as far as I can see) is that data
  relating to a User is never *lost*. That is, data is never lost before
  there is valid data to replace it.

  However, the disadvantage is that for (unknown) periods of time, there
  would be duplicate data sets for a given User... All of which is
  caused by the fact that the datastore calls cannot exceed 1MB. =(

  So queries will yield duplicate data -- gah!!

  Is there a better approach to try at all? Thanks!

  --
  love, tav

  plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369
 http://tav.espians.com|http://twitter.com/tav| skype:tavespian

 --
 Nick Johnson, App Engine Developer Programs Engineer
 Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
 Number: 368047- Hide quoted text -

 - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~--~~~~--~~--~--~---



[google-appengine] Re: Transactionally updating multiple entities over 1MB

2009-06-26 Thread Nick Johnson (Google)
On Fri, Jun 26, 2009 at 4:42 PM, Andy Freeman ana...@earthlink.net wrote:


   the 1MB limit applies only to single API calls

 Does that mean that db.put((e1, e2, e3,)) where all of the entities
 are 500kb will fail?


Yes.




 Where are limits on the total size per call documented?

 http://code.google.com/appengine/docs/python/datastore/overview.html#Quotas_and_Limits
 only mentions a limit on the size of individual entities and the total
 number of entities for batch methods.  The batch method documentation
 (http://code.google.com/appengine/docs/python/datastore/functions.html
 and http://code.google.com/appengine/docs/python/memcache/functions.html)
 does not mention any limits.


You're right - we need to improve our documentation in that area. The 1MB
limit applies to _all_ API calls.



 Is there a documented limit on the number of entities per memcache
 call?


No.




 BTW - There is a typo in
 http://code.google.com/appengine/docs/python/memcache/overview.html#Quotas_and_Limits
 .
 It says In addition to quotas, the following limits apply to the use
 of the Mail service: instead of Memcache service


Thanks for the heads-up.

-Nick Johnson




 On Jun 26, 7:28 am, Nick Johnson (Google) nick.john...@google.com
 wrote:
  Hi tav,
 
  Batch puts aren't transactional unless all the entities are in the
  same entity group. Transactions, however, _are_ transactional, and the
  1MB limit applies only to single API calls, so you can make multiple
  puts to the same entity group in a transaction.
 
  -Nick Johnson
 
 
 
 
 
  On Fri, Jun 26, 2009 at 8:53 AM, tavt...@espians.com wrote:
 
   Hey guys and girls,
 
   I've got a situation where I'd have to transactionally update
   multiple entities which would cumulatively be greater than the 1MB
   datastore API limit... is there a decent solution for this?
 
   For example, let's say that I start off with entities E1, E2, E3 which
   are all about 400kb each. All the entities are specific to a given
   User. I grab them all on a remote node and do some calculations on
   them to yield new computed entities E1', E2', and E3'.
 
   Any failure of the remote node or the datastore is recoverable except
   when the remote node tries to *update* the datastore... in that
   situation, it'd have to batch the update into 2 separate .put() calls
   to overcome the 1MB limit. And should the remote node die after the
   first put(), we have a messy situation =)
 
   My solution at the moment is to:
 
   1. Create a UserRecord entity which has a 'version' attribute
   corresponding to the latest versions of the related entities for any
   given User.
 
   2. Add a 'version' attribute to all the entities.
 
   3. Whenever the remote node creates the computed new set of
   entities, it creates them all with a new version number -- applying
   the same version for all the entities in the same transaction.
 
   4. These new entities are actually .put() as totally separate and new
   entities, i.e. they do not overwrite the old entities.
 
   5. Once a remote node successfully writes new versions of all the
   entities relating to a User, it updates the UserRecord with the latest
   version number.
 
   6. From the remote node, delete all Entities related to a User which
   don't have the latest version number.
 
   7. Have a background thread check and do deletions of invalid versions
   in case a remote node had died whilst doing step 4, 5 or 6...
 
   I've skipped out the complications caused by multiple remote nodes
   working on data relating to the same User -- but, overall, the
   approach is pretty much the same.
 
   Now, the advantage of this approach (as far as I can see) is that data
   relating to a User is never *lost*. That is, data is never lost before
   there is valid data to replace it.
 
   However, the disadvantage is that for (unknown) periods of time, there
   would be duplicate data sets for a given User... All of which is
   caused by the fact that the datastore calls cannot exceed 1MB. =(
 
   So queries will yield duplicate data -- gah!!
 
   Is there a better approach to try at all? Thanks!
 
   --
   love, tav
 
   plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369
  http://tav.espians.com|http://twitter.com/tav|http://twitter.com/tav%7Cskype:tavespian
 
  --
  Nick Johnson, App Engine Developer Programs Engineer
  Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
  Number: 368047- Hide quoted text -
 
  - Show quoted text -
 



-- 
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
368047

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at